New research reveals EU copyright law outpaces current AI safeguards, demanding a broader focus beyond mere verbatim memorization. The PSALM framework offers a way forward. The intersection of artificial intelligence and copyright law is proving to be a complex battlefield, especially in the European Union. Large language models (LLMs), trained on vast web-scale datasets, have sparked concerns about copyright infringement. However, the EU's copyright doctrine, which emphasizes 'substantial similarity,' extends beyond mere verbatim copying and considers stylistic choices and narrative structures. This leaves a chasm between legal requirements and the current capabilities of technical safeguards. Enter PSALM, an innovative framework designed to bridge this compliance gap.
Why Current Safeguards Fall Short #
In the European Union, copyright law protects not just the words themselves but the creative essence of a work. This includes its style, narrative voice, and thematic depth. Existing technical methods predominantly target literal copying, overlooking the broader spectrum of potential infringements. The introduction of PSALM, which stands for a suite of evaluative tools, aims to operationalize the EU's broader copyright doctrine. It assesses both computational overlap and stylistic dimensions, casting a much wider net than traditional methods.
The PSALM Approach #
PSALM employs ten evaluators to scrutinize LLM outputs, examining everything from writing style to character and plot development. In recent tests on Llama 3.2 models fine-tuned with translated Dutch literature, results showed that instruction-tuned models already displayed significant stylistic similarities before any fine-tuning. Post fine-tuning, the models exhibited systematic stylistic appropriation. This suggests that the conventional focus on preventing verbatim memorization is woefully inadequate.
A particularly intriguing finding is the application of Negative Preference Optimization unlearning, which aims to minimize similarity. While effective, it still leaves detectable stylistic patterns. So, are these residual patterns enough to constitute infringement? The answer remains elusive without further validation from legal experts.
Implications for EU AI Compliance #
These insights are a wake-up call for regulators and developers alike. The EU's comprehensive copyright framework requires careful consideration and adaptation by AI developers. PSALM could indeed be the key to implementing auditable, legally informed compliance evaluations. Yet, the reliance on automated similarity scores must be scrutinized. The relationship between these scores and legal infringement isn't yet fully established, and it will require a concerted effort by legal and technical experts to reconcile these domains.
What does this mean for the future of AI in the European Union? The stakes are high. Failure to address this compliance gap could stifle innovation or lead to costly legal battles. Brussels moves slowly, but when it moves, it moves everyone. The creation of frameworks like PSALM might just be the impetus needed for a harmonized approach to AI compliance across the EU's 27 member states.
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained #
Artificial Intelligence The science of creating machines that can perform tasks requiring human-like intelligence — reasoning, learning, perception, language understanding, and decision-making.
Fine-Tuning The process of taking a pre-trained model and continuing to train it on a smaller, specific dataset to adapt it for a particular task or domain.
LLaMA Meta's family of open-weight large language models.
LLM Large Language Model.