Attribution gaps in AI content pose legal risk, experts urge regulatory action
Attribution - the capacity to identify both the human initiator and the training data that influence AI-generated content - is essential for protecting moral rights and establishing legal accountability. Yet current laws are insufficient. The study identifies a disconnect between the foundational assumption of human authorship in treaties like the Berne Convention and the practical realities of AI-generated works, where human input may be minimal or entirely absent.
A new cross-disciplinary study warns that the lack of legal and technological infrastructure for attributing AI-generated content poses escalating risks for intellectual property rights and public accountability. Published as "Who Owns the Output? Bridging Law and Technology in LLMs Attribution" on arXiv in March 2025, the study offers a comprehensive review of how attribution failures in generative AI systems intersect with unresolved gaps in copyright law, particularly in the context of large language models (LLMs) and multimodal AI.
Conducted by a team of researchers from the Ethikon Institute, the study seeks to answer three critical questions:
- What legal frameworks currently govern the attribution of AI-generated outputs?
- Which technological methods are capable of supporting attribution?
- How legal and technical tools can be harmonized to create enforceable standards for content provenance in an era of machine authorship?
The study investigates these questions through the lens of international copyright law, AI ethics, technical feasibility, and recent litigation involving AI outputs.
Attribution - the capacity to identify both the human initiator and the training data that influence AI-generated content - is essential for protecting moral rights and establishing legal accountability. Yet current laws are insufficient. The study identifies a disconnect between the foundational assumption of human authorship in treaties like the Berne Convention and the practical realities of AI-generated works, where human input may be minimal or entirely absent. The authors find that while attribution is recognized as a moral right in many jurisdictions, including under Article 6bis of the Berne Convention, enforcement is uneven, particularly in jurisdictions like the United States, where moral rights have weaker legal standing.
In both the European Union and the U.S., courts have consistently reinforced the requirement for human authorship in order for copyright protection to apply. The European Court of Justice and U.S. federal courts have both ruled that works generated without meaningful human creativity cannot be considered original works eligible for copyright. The EU’s recently adopted Artificial Intelligence Act reaffirms this stance by requiring human involvement in the generation of legally protected content, while also mandating transparency obligations for AI developers. These include public disclosure of the datasets used in training and compliance with opt-out mechanisms under the Digital Single Market Directive.
The study underscores that even when economic rights are addressed through licensing or text and data mining (TDM) exceptions, moral rights, especially attribution, are often overlooked. For example, in many EU countries, moral rights cannot be waived or transferred, meaning that the use of AI models trained on copyrighted content without acknowledging the original creator may constitute a violation, even when TDM exceptions are legally invoked.
The second major thrust of the research explores the technological capacity to enforce attribution. Attribution techniques are divided into four categories: content identification (watermarking and fingerprinting), explainability tools, data traceability systems, and security-based methods such as blockchain and zero-knowledge proofs. These techniques are designed to track the lineage of AI-generated content back to its training data, determine authorship prompts, and verify the integrity of digital rights metadata.
Watermarking and fingerprinting are shown to offer promising but limited solutions. While watermarking embeds imperceptible markers into outputs, it remains vulnerable to adversarial modifications such as paraphrasing or token shifts. Fingerprinting, by contrast, identifies unique model behaviors or outputs using covert triggers. Yet the study notes that no single method can provide deterministic attribution. Only a multi-layered approach - combining digital signatures, lineage tracking, and model explainability - can approximate reliable attribution at scale.
Explainability methods, such as Shapley Additive Explanations (SHAP) and local surrogate modeling (LIME), also provide insights into how specific data samples influence model outputs. These tools allow for forensic analysis of whether a particular training dataset has materially shaped AI behavior. Similarly, data lineage systems like Spark-Atlas-Connector and business metadata management tools allow enterprise-level tracking of data movement and transformation through LLM training pipelines.
However, the authors caution that without regulatory mandates, few AI developers will voluntarily implement these systems. They point to the limited adoption of watermarking and tracing methods among major technology companies and argue that governmental oversight - through obligations like Article 53 of the EU AI Act - is necessary to standardize best practices. This article requires providers of general-purpose AI models to adopt policies that identify and respect opt-outs, and to publish training data summaries that enable scrutiny.
Finally, the paper presents real-world legal use cases to illustrate the feasibility of attribution enforcement. In the Getty Images lawsuit against Stability AI, digital watermark detection in generated images served as compelling evidence of unauthorized training on copyrighted content. Similarly, in Doe v. GitHub, plaintiffs alleged that Copilot, GitHub’s code generation tool, reproduced fingerprinted code sequences from public repositories, effectively bypassing licensing restrictions.
Such cases highlight the legal viability of attribution evidence and the urgent need for AI companies to embed technical compliance into their development pipelines. Licensing models like those adopted by Shutterstock and Getty, where content providers are compensated for training data contributions, are offered as best practices. These systems are predicated on the ability to trace content back to its source and confirm contractual compliance.
The study calls for the convergence of legal reforms and technological innovation. It advocates for the development of standardized, interoperable attribution protocols that can be enforced across jurisdictions. These protocols must integrate both legal doctrines such as the right to attribution and compliance with copyright exceptions and technological measures such as cryptographic signatures, data lineage tools, and transparent model documentation.
- FIRST PUBLISHED IN:
- Devdiscourse

