A planned code of practice would reinforce the EU’s artificial intelligence copyright problem
The European Union is continuing to unnecessarily restrict access to copyright-protected content for AI model training

Under the European Union’s Artificial Intelligence Act (Regulation (EU) 2024/1689) transparency and copyright measures apply to all general-purpose AI models, irrespective of size. A code of practice (CoP) for the implementation of the AI Act is under preparation. A , issued on 11 March, made progress on some issues but is still unnecessarily restrictive on copyright.
The CoP contains a bundle of draft copyright obligations. Some of these make sense. AI models should not directly reproduce copyright-protected content, such as text, music or video, in response to user queries (CoP Measure I.2.5). Models should also only use lawfully accessible content for model training, not pirated content or content accessed by circumventing technological protections (Measure I.2.2).
Another measure (I.2.3) is more problematic. It requires AI developers to respect machine-readable protocols put in place by copyright holders to signal explicit opt-outs or to prohibit use of their data, in line with the EU Copyright Directive (specifically Article 4(3) of Directive (EU) 2019/790).
There is a wide variety of these protocols, and little standardisation. That complicates detection and interpretation by model developers. Moreover, the number of rapidly, currently covering 25%-30% of all online content – a share that is increasing. That implies a very substantial reduction in training data for AI models and in , unless model developers are willing to pay licence fees to copyright holders.
This is where the CoP falls short. It offers no meaningful way to go about large-scale licensing procedures. Model developers directly negotiate licences with a few large publishers. But reaching out to millions of rightsholders would run into insurmountable transaction costs and unrealistic licensing fee demands. That results in biased training datasets. Moreover, it is likely to reduce competition because AI start-ups cannot afford licence fees. Collective licensing only shifts the identification and negotiation problem to intermediaries.
The CoP leaves open several ways to bypass these copyright measures. It applies only to training data harvested on the web, not to other sources of data. It is more lenient for AI training data collected by third-party intermediaries, including research intermediaries that are not covered by copyright opt-outs. Recent ‘reasoning’ AI models that work off data extracted from larger ‘teacher’ models and allow more time for reasoning when responding to user queries (DeepSeek is a ‘student’ model that does this) may bypass these CoP copyright measures. AI models themselves are not copyright protected. Furthermore, decentralised model-training technologies, such as federated learning, can also avoid reproduction of original copyright-protected content.
There is no economic need for the CoP copyright measures, beyond the measure barring direct reproduction (I.2.5), because rightsholders’ economic interests are not harmed by the reuse of their content for model-training purposes only. Apart from a few very large publishers, copyright holders will not gain because model developers will not go for widespread and costly licensing.
More importantly, implementation of these copyright measures would be harmful for society. They go against calls to increase European Union competitiveness. They will reduce the quality of AI models available in the EU, slow down AI-driven innovation and reduce the attractiveness of the EU as a location for EU model training. Other jurisdictions offer more liberal copyright regimes for AI model training.
The proposed CoP copyright measures illustrate how deeply EU copyright law is biased in favour of private rights for creators at the expense of using copyright as an economic-policy instrument to promote collective innovation and competitiveness. This is perhaps best captured by a CoP measure (I.2.3(5)) that instructs model developers to ensure that copyright opt-outs do not negatively affect the findability of copyright-protected content by search engines.
In plain language, model developers should thus collect opted-out content and feed it into the machine-learning algorithms of their search engines, but should prevent its use in their AI model algorithms. The CoP thus allows individuals to find and learn from CR-protected content through search engines but prohibits machine-learning from that same content through an intermediary AI model.
Human cognitive capacity to search, collect and learn from large volumes of data and documents is limited. AI models are a tool to overcome such capacity constraints, especially when AI models, in the near future, turn into personal assistants that will do document collection and pre-processing for humans. The CoP would block that route to individual and collective benefits, in order to preserve the private rights of authors. That puts a major brake on learning and innovation in society. The time may have come for a fundamental revision of EU copyright law, to put AI-assisted human learning on an equal footing with direct learning.