AI and copyright: the Text and Data Mining exception with respect to training AI models
In the rapidly evolving world of AI, the training of AI models hinges on vast amounts of content (text, pictures, video and other data), which may be protected by copyright. The intersection of AI development and copyright law creates an inherent tension between AI developers who need to use such content to train their AI systems, and the creators of that content, who may wish to oppose the use of their copyright-protected works for such purposes.
The EU AI Act brings this issue to the fore, highlighting the need to respect the rights of copyright rightsholders while addressing the practicalities of AI training.? Among other obligations, Article 53 (1) lit c of the EU AI Act requires providers of general-purpose AI models to put in place policies to comply with copyright and to deal with rightsholders’ reservations of rights under the text and data mining (TDM) exception in Article 4(3) of Directive (EU) 2019/790. The Commission has already published the first draft of the respective General-Purpose Artificial Intelligence Code of Practice mainly dealing with the copyright topic.[1]
In this blog post, we explore the TDM exception and its relevance to AI. We start by examining the nature of the exception under EU law. Next, we compare it to the position under UK law. Join us as we explore how these legal frameworks impact the development of AI and the delicate balance between innovation and intellectual property rights.
The EU TDM exception
EU copyright law permits reproductions and extractions of lawfully accessible works and other subject matter for the purpose of text and data mining which thus also applies to the training of AI models.[2] This exception is crucial for AI development; without it, it would be extremely difficult for AI developers to gather content to train AI systems as individual rightsholders’ permission would always be required.
In the EU, the TDM exception applies broadly for the purpose of scientific research and more narrowly for other purposes, including commercial purposes. In the latter case, the rightsholder has a right to “opt out” from the TDM exception. This right of opt-out was included to address rightsholder concerns relating to undue use of their copyright works.
The effect of the opt-out is that a rightsholder can reserve their rights, provided that this is done in an appropriate manner (such as through machine-readable means in the case of content made publicly available online). If a rightsholder chooses to reserve their rights, an AI developer would need to seek permission to copy the rightsholder’s copyright work.
There is no uniform way that a rightsholder may opt out. This may lead to uncertainty for rightsholders who may be concerned as to whether their opt-out has been stated sufficiently clearly or explicitly enough to prevent their content from being used to train AI systems. Examples of ways to opt out are via the use of metadata, via the terms and conditions of a website or service, or by other means such as under a contractual agreement or unilateral declaration.
The TDM exception equally creates uncertainty for AI developers. AI developers must confirm that appropriate checks are undertaken at various stages of training an AI model. AI developers must confirm that they always observe a rightsholder’s reserved rights. There is an additional layer of complexity to this in that it may not be clear the point at which a rightsholder has exercised its right of opt-out, e.g., whether that is only in relation to the initial collection of copyright content, whether a regular opt-out assessment is required throughout the AI training lifecycle, and if throughout, whether specific content then needs to be deleted at different stages. Reproductions and extractions of copyright works may only be retained for as long as they are necessary for the TDM purpose, i.e., once an AI model has been trained, any unnecessary TDM content will need to be deleted. Other issues include the fact that it may not always be clear who carries the burden of proof in the case of any dispute over the use of a copyright work, including the applicability of the TDM exception in each instance: the rightsholder, the AI developer or the AI provider, as the case may be.
Finally, it is worth mentioning that the TDM exception is transposed into the various EU Member States’ laws, so that each Member State’s interpretation of the TDM exception will also need to be considered on a per-country basis.
The UK TDM exception
In contrast, the TDM exception in the UK is narrow compared with that of the EU. A copyright work may only be copied for computational analysis of anything recorded in that work for the sole purpose of research for a non-commercial purpose.[3] Under UK law, any contractual term purporting to prevent or restrict TDM for non-commercial research purposes would be unenforceable. However, the UK TDM exception has limited scope when it comes to training AI models for commercial purposes.???
To date, the UK has not adopted a legislative approach to AI,[4] and there is no legislation in the UK akin to the EU AI Act. The UK favours a more flexible and agile approach to reform, and there are no plans currently to introduce AI-specific regulation.?
The UK government was planning to introduce a new copyright and database exception for TDM, which would have expanded the TDM exception to allow it for any purpose (both commercial and non-commercial). The UK Intellectual Property Office also announced in May 2023 that it would introduce a Code of Practice on Copyright and AI (AI Code of Practice),[5] which would have been drafted in consultation with relevant UK industry experts. However, these plans were shelved.
If the UK government had decided to go ahead with its proposed reforms, it would have significantly expanded the scope of the TDM exception in the UK. Not only would the exception have covered copyright works used in AI training models with commercial application, rightsholders would also not have been allowed to opt out of the TDM exception. The reforms and the AI Code of Practice[6] never came into being. This was mainly due to strong objections from rightsholders. Those in the creative industries were especially against the proposals. They believed the changes unfairly benefited the licence holders and AI developers. The UK government issued a statement[7] in February 2024 confirming that the working group would not be progressing with the AI Code of Practice.?
At the time of writing, the UK’s position in terms of any future reform in relation to the TDM exception for commercial uses of AI is not clear. Those involved in the AI supply chain should therefore familiarise themselves with the current UK regime, having regard to any future changes that may arise, and particularly with respect to the divergence between UK and EU law in this area.
Conclusion
As we have seen, permitted TDM activities are far narrower in scope in the UK than in the EU. When collecting content to train AI models, everyone in the AI supply chain must understand the copyright rules in each jurisdiction. They should also know how exceptions might apply to their activities. Additionally, they need to be aware of the specific nuances and differences. These can vary within and outside their own jurisdictions.
Authors: Sarah Reynolds , Emma Cartwright , Kelly Matthyssens , Dr. Peter Katko
This publication contains information in summary form and is therefore intended for general guidance only. It is not intended to be a substitute for detailed research or the exercise of professional judgment. Member firms of the global EY organization cannot accept responsibility for loss to any person relying on this article.
[2] Articles 3 and 4 of Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Markets and amending Directives 96/9/EC and 2001/29/EC.
[3] See s29A of the Copyrights Designs and Patents Act 1988 (CDPA), which permits the making of copies of text and data analysis for non-commercial research, and s29A1, which allows for computational analysis of anything recorded in a copyright for the purpose of non-commercial research.
[4] In March 2023 the UK government’s Department for Science, Innovation and Technology (DSIT) published an AI white paper, “A pro-innovation approach to AI Regulation” (AI white paper), which accords with Sir Patrick Vallance’s recommendations in the policy paper “Pro-innovation Regulation of Technologies Review: Digital Technologies report.” (Vallance Report)
[5] See the IPO’s proposed Code of Practice here: The government’s code of practice on copyright and AI - GOV.UK (www.gov.uk).
[6] The stated aims of the AI Code of Practice were to: help overcome barriers AI firms and users face; ensure protections for rights holders; promote and reward investment in creativity; and aim for the UK to be a world leader in research and AI innovation.?
[7] See the government’s “Publication of AI Regulation White Paper Consultation Response” here: Written statements - Written questions, answers and statements - UK Parliament.
I would say yes. If this is test data only and won't get used in the life environment. The copyright owner should give consent or should get financially rewarded.
Avocate et DPO - Partner France IT-IP-Data chez EY Avocats | Expertise IP-IT et Data
3 天前Thanks Dr. Peter Katko !