登录查看更多内容

AI and copyright: the Text and Data Mining exception with respect to training AI models

Dr. Peter Katko

Digital and AI Law Leader, EY Law

发布日期: 2025年3月6日

In the rapidly evolving world of AI, the training of AI models hinges on vast amounts of content (text, pictures, video and other data), which may be protected by copyright. The intersection of AI development and copyright law creates an inherent tension between AI developers who need to use such content to train their AI systems, and the creators of that content, who may wish to oppose the use of their copyright-protected works for such purposes.

The EU AI Act brings this issue to the fore, highlighting the need to respect the rights of copyright rightsholders while addressing the practicalities of AI training.? Among other obligations, Article 53 (1) lit c of the EU AI Act requires providers of general-purpose AI models to put in place policies to comply with copyright and to deal with rightsholders’ reservations of rights under the text and data mining (TDM) exception in Article 4(3) of Directive (EU) 2019/790. The Commission has already published the first draft of the respective General-Purpose Artificial Intelligence Code of Practice mainly dealing with the copyright topic.[1]

In this blog post, we explore the TDM exception and its relevance to AI. We start by examining the nature of the exception under EU law. Next, we compare it to the position under UK law. Join us as we explore how these legal frameworks impact the development of AI and the delicate balance between innovation and intellectual property rights.

The EU TDM exception

EU copyright law permits reproductions and extractions of lawfully accessible works and other subject matter for the purpose of text and data mining which thus also applies to the training of AI models.[2] This exception is crucial for AI development; without it, it would be extremely difficult for AI developers to gather content to train AI systems as individual rightsholders’ permission would always be required.

In the EU, the TDM exception applies broadly for the purpose of scientific research and more narrowly for other purposes, including commercial purposes. In the latter case, the rightsholder has a right to “opt out” from the TDM exception. This right of opt-out was included to address rightsholder concerns relating to undue use of their copyright works.

The effect of the opt-out is that a rightsholder can reserve their rights, provided that this is done in an appropriate manner (such as through machine-readable means in the case of content made publicly available online). If a rightsholder chooses to reserve their rights, an AI developer would need to seek permission to copy the rightsholder’s copyright work.

There is no uniform way that a rightsholder may opt out. This may lead to uncertainty for rightsholders who may be concerned as to whether their opt-out has been stated sufficiently clearly or explicitly enough to prevent their content from being used to train AI systems. Examples of ways to opt out are via the use of metadata, via the terms and conditions of a website or service, or by other means such as under a contractual agreement or unilateral declaration.

The TDM exception equally creates uncertainty for AI developers. AI developers must confirm that appropriate checks are undertaken at various stages of training an AI model. AI developers must confirm that they always observe a rightsholder’s reserved rights. There is an additional layer of complexity to this in that it may not be clear the point at which a rightsholder has exercised its right of opt-out, e.g., whether that is only in relation to the initial collection of copyright content, whether a regular opt-out assessment is required throughout the AI training lifecycle, and if throughout, whether specific content then needs to be deleted at different stages. Reproductions and extractions of copyright works may only be retained for as long as they are necessary for the TDM purpose, i.e., once an AI model has been trained, any unnecessary TDM content will need to be deleted. Other issues include the fact that it may not always be clear who carries the burden of proof in the case of any dispute over the use of a copyright work, including the applicability of the TDM exception in each instance: the rightsholder, the AI developer or the AI provider, as the case may be.

Finally, it is worth mentioning that the TDM exception is transposed into the various EU Member States’ laws, so that each Member State’s interpretation of the TDM exception will also need to be considered on a per-country basis.

The UK TDM exception

In contrast, the TDM exception in the UK is narrow compared with that of the EU. A copyright work may only be copied for computational analysis of anything recorded in that work for the sole purpose of research for a non-commercial purpose.[3] Under UK law, any contractual term purporting to prevent or restrict TDM for non-commercial research purposes would be unenforceable. However, the UK TDM exception has limited scope when it comes to training AI models for commercial purposes.???

To date, the UK has not adopted a legislative approach to AI,[4] and there is no legislation in the UK akin to the EU AI Act. The UK favours a more flexible and agile approach to reform, and there are no plans currently to introduce AI-specific regulation.?

The UK government was planning to introduce a new copyright and database exception for TDM, which would have expanded the TDM exception to allow it for any purpose (both commercial and non-commercial). The UK Intellectual Property Office also announced in May 2023 that it would introduce a Code of Practice on Copyright and AI (AI Code of Practice),[5] which would have been drafted in consultation with relevant UK industry experts. However, these plans were shelved.

If the UK government had decided to go ahead with its proposed reforms, it would have significantly expanded the scope of the TDM exception in the UK. Not only would the exception have covered copyright works used in AI training models with commercial application, rightsholders would also not have been allowed to opt out of the TDM exception. The reforms and the AI Code of Practice[6] never came into being. This was mainly due to strong objections from rightsholders. Those in the creative industries were especially against the proposals. They believed the changes unfairly benefited the licence holders and AI developers. The UK government issued a statement[7] in February 2024 confirming that the working group would not be progressing with the AI Code of Practice.?

At the time of writing, the UK’s position in terms of any future reform in relation to the TDM exception for commercial uses of AI is not clear. Those involved in the AI supply chain should therefore familiarise themselves with the current UK regime, having regard to any future changes that may arise, and particularly with respect to the divergence between UK and EU law in this area.

Conclusion

As we have seen, permitted TDM activities are far narrower in scope in the UK than in the EU. When collecting content to train AI models, everyone in the AI supply chain must understand the copyright rules in each jurisdiction. They should also know how exceptions might apply to their activities. Additionally, they need to be aware of the specific nuances and differences. These can vary within and outside their own jurisdictions.

Authors: Sarah Reynolds , Emma Cartwright , Kelly Matthyssens , Dr. Peter Katko

This publication contains information in summary form and is therefore intended for general guidance only. It is not intended to be a substitute for detailed research or the exercise of professional judgment. Member firms of the global EY organization cannot accept responsibility for loss to any person relying on this article.

[1] https://digital-strategy.ec.europa.eu/en/news/commission-publishes-first-draft-general-purpose-artificial-intelligence-code-practice.

[2] Articles 3 and 4 of Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Markets and amending Directives 96/9/EC and 2001/29/EC.

[3] See s29A of the Copyrights Designs and Patents Act 1988 (CDPA), which permits the making of copies of text and data analysis for non-commercial research, and s29A1, which allows for computational analysis of anything recorded in a copyright for the purpose of non-commercial research.

[4] In March 2023 the UK government’s Department for Science, Innovation and Technology (DSIT) published an AI white paper, “A pro-innovation approach to AI Regulation” (AI white paper), which accords with Sir Patrick Vallance’s recommendations in the policy paper “Pro-innovation Regulation of Technologies Review: Digital Technologies report.” (Vallance Report)

[5] See the IPO’s proposed Code of Practice here: The government’s code of practice on copyright and AI - GOV.UK (www.gov.uk).

[6] The stated aims of the AI Code of Practice were to: help overcome barriers AI firms and users face; ensure protections for rights holders; promote and reward investment in creativity; and aim for the UK to be a world leader in research and AI innovation.?

[7] See the government’s “Publication of AI Regulation White Paper Consultation Response” here: Written statements - Written questions, answers and statements - UK Parliament.

EU AI Act in Practice

2,077 位关注者

Guido Reinke, Ph.D., LL.M., CIPP/E, CISA CRISC, CFE

2 天前

I would say yes. If this is test data only and won't get used in the life environment. The copyright owner should give consent or should get financially rewarded.

Ya?l Cohen-Hadria ??

Avocate et DPO - Partner France IT-IP-Data chez EY Avocats | Expertise IP-IT et Data

3 天前

Thanks Dr. Peter Katko !

3 次回应

查看更多评论

要查看或添加评论，请登录

Dr. Peter Katko的更多文章

Artificial Intelligence (AI) regulation in the automotive sector

2025年2月7日

Artificial Intelligence (AI) regulation in the automotive sector

The automotive industry is in the midst of transformation that is being driven to a large extent by digital technology,…

1 条评论
AI in financial services: compliance with the EU AI Act

2025年1月8日

AI in financial services: compliance with the EU AI Act

Artificial intelligence (AI), especially generative AI (GenAI), is transforming the financial services industry by…
New Privacy Law in Australia - will also impact AI

2024年12月13日

New Privacy Law in Australia - will also impact AI

On 12 September 2024, the Australian government introduced the first tranche of reforms to the Privacy Act 1988 (Cth)…
Responsibilities in the AI value chain: part 2 — the Provider

2024年12月6日

Responsibilities in the AI value chain: part 2 — the Provider

The EU AI Act includes a comprehensive regulatory framework for safe development and deployment of artificial…

2 条评论
Responsibilities in the AI value chain: Part 1 – the Deployer

2024年10月30日

Responsibilities in the AI value chain: Part 1 – the Deployer

The European Union’s AI Act, effective as of 1 August 2024, introduces a comprehensive regulatory framework aimed at…

1 条评论
AI literacy – knowledge is power

2024年9月25日

AI literacy – knowledge is power

One of the first principles and obligations of the EU AI Act (AI Act) to become applicable on 2 February 2025 is the…

2 条评论
The EU AI Act is here: ensuring AI system compliance with conformity assessment procedures

2024年9月12日

The EU AI Act is here: ensuring AI system compliance with conformity assessment procedures

On 1 August 2024, the EU Artificial Intelligence (AI) Act (AI Act) entered into force. In previous issues of the AI…
The EU AI Act is here: Does it apply to you?

2024年8月21日

The EU AI Act is here: Does it apply to you?

In the first issue of the AI Regulatory Update, we discussed the regulatory objective of the EU AI Act (AI Act) and how…
August 1, 2024: The EU AI Act is here: the impact of risk-based classification

2024年8月1日

August 1, 2024: The EU AI Act is here: the impact of risk-based classification

Welcome to the first edition of the EU AI Act in Practice, a regular newsletter curated by the EY AI Law team to share…

2 条评论
“Happy Data Protection Day!”: data, law and beyond

2023年2月3日

“Happy Data Protection Day!”: data, law and beyond

Great thanks to Jenny Le as co-author of this article! The European Council has been celebrating Data Protection Day…

2 条评论

See all articles

EU AI Act in Practice

2,077 位关注者

Dr. Peter Katko的更多文章

Artificial Intelligence (AI) regulation in the automotive sector

AI in financial services: compliance with the EU AI Act

New Privacy Law in Australia - will also impact AI

Responsibilities in the AI value chain: part 2 — the Provider

Responsibilities in the AI value chain: Part 1 – the Deployer

AI literacy – knowledge is power

The EU AI Act is here: ensuring AI system compliance with conformity assessment procedures

The EU AI Act is here: Does it apply to you?

August 1, 2024: The EU AI Act is here: the impact of risk-based classification

“Happy Data Protection Day!”: data, law and beyond