登录查看更多内容

Do Microsoft and OpenAI Have a Fair Use Defence against the NYT Copyright Complaint?

Aleksandr Tiulkanov

Upskilling people in the EU AI Act - link in profile | LL.M., CIPP/E, AI Governance Advisor, implementing ISO 42001, promoting AI Literacy

发布日期: 2023年12月29日

To follow up on my yesterday's post about The New York Times vs Microsoft and OpenAI story: people are asking me whether the fair use argument from the Authors Guild v Google (Google Books) decision will apply to the situation where the chatbots are allegedly providing users with almost wholesale-copied NYT articles?

What is Fair Use?

To answer this question, we need some background. The US copyright law has a doctrine of fair use, codified into section 107 of the US Copyright Act 1976, which allows for an otherwise copyright-infringing use, provided the court decides that in a particular case, all things considered, that use is "fair".

This is established based on the following four criteria, assessed on balance (that is, none of the four being decisive alone):

1. The purpose and character of the otherwise infringing use, including whether it's commercial or not.

2. The nature of the copyrighted work — whether the work leans towards purely factual description of the world or towards highly subjective and original creative expression.

3. The amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

4. The effect of the use upon the potential market for or value of the copyrighted work.

Can the Alleged Use of the NYT Materials in Chatbots Be Deemed Fair?

1. Purpose and character of use: If we're to trust the complaint, this criterion is very likely not met, as OpenAI and Microsoft's use of the NYT articles content for training LLMs is commercial. They are not open-sourcing their models and weights as Mistral currently does, for example.

2. Nature of copyrighted works: This second criterion, in my estimation, slightly favours OpenAI and Microsoft, as regular newspaper material normally leans towards factuality and descriptiveness. The NYT articles, one would suppose, are not generally fiction, and creativity is only a secondary objective.

3. Amount and substantiality of the used portions: The degree to which this third fair use criterion is met is still to be determined. The NYT has produced a voluminous exhibit full of almost wholesale-copied NYT articles, but many commentators have questioned the NYT's prompting tactics.

That is, allegedly, an average chatbot user is very unlikely to obtain the same results full of wholesale-copied NYT materials. Further evidence and independent assessment might be necessary to establish whether NYT has a case based on this third criterion, and OpenAI has all the time and means to alter the LLM outputs so that there is no further wholesale copying, if it ever was the case.

Christopher Penn 10 个月前

Who owns copyright for AI-generated content?

Scott Wallace, PhD (Clinical Psychology) 1 年前

Modsen 2 个月前

4. Effect on market and value of copyrighted works: Whether this fourth fair use criterion is met is also still to be determined. The NYT argues that the defendants are indeed falling afoul of this criterion.

It is on this fourth (effect on market) and also third (amount of used portions) criterion that this case differs from the one of Google Books, where Google prevailed on the fair use grounds.

In the Author's Guild v Google, the United States Court of Appeals (Second Circuit) said: "Google's unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals."

Google Books limits the viewable portion of each book to such an extent that using that service cannot substitute for actually buying the book. In the case of newspaper articles, you cannot deliver pages worth of content without infringing on the copyright — if that is in fact what happens.

The defendants' chatbot outputs are also arguably very different from traditional web search results. In the case at hand, the NYT alleges that, unlike traditional search engine-delivered snippets, ChatGPT, Bing Chat etc. outputs extensively reproduce the NYT articles' and do not provide prominent hyperlinks to the articles.

This way, the defendants arguably disincentivise users from visiting the NYT resources, as chatbot outputs' may in fact serve as an adequate substitute for reading the article itself. OpenAI and Microsoft therefore may be in fact competing in the same market in which NYT itself operates.

And by obviating the need for the users to visit the NYT resources, defendants are arguably preventing the NYT from obtaining the profits they would otherwise get from letting the users through the paywall and from ad and referral revenue.

This is, at least, if we consider the evidence in the exhibit to the NYT complaint to be a true representation of how the defendants' chatbots actually operate — which is, as I have noted, is already disputed by some commentators.

What's Next?

The fate of this case remains to be seen. In fact, it may never proceed to the stage at which the court will reach the final decision — some commentators are alleging that the complaint is merely the NYT's negotiating tactic to force the defendants to settle on a royalty scheme acceptable to the claimant.

If the parties settle, it will not be an unexpected outcome. However, from the societal and business perspective, much more clarity would be provided if the case proceeds to trial and a precedent-setting decision is made (and affirmed after appeals) on whether the use of copyrighted material for training AI systems similar to those operated by OpenAI and Microsoft constitutes fair use or not.

To that end, it would be nice to have some robust research at hand on to what extent operating generative AI systems at scale can indeed have an effect on market and value of original copyrighted works on which they are being trained and to what extent consumers consider these systems' outputs to be adequate substitutes for these original works. Chances are, some researchers might be helping us soon on this front.

AI, Data & Digital Policy

4,171 位关注者

Zachary Kosma

Indie Game Dev | Software Educator | XR & Esports

10 个月

I don't think New York Times has lost any money or business due to ChatGPT

2 次回应

Con Zymaris

10 个月

Aleksandr Tiulkanov Copyright law is triggered when actual copying and *redistribution* of copyrighted material occurs. (Fair use generally provides for making substantive 'in house' copies, but not their redistribution.) Can you tell me where you think that substantive copying and *redistribution* of copyrighted material occurs in the context of LLM training? Note, that statistical computation across a training corpus is not illegal. Thanks.

1 次回应

Kandy Z.

Cyber Strategist, Cyber OSINT

10 个月

#CRIMECHAT #DirtyLLMs

Robert Bateman

Data protection, privacy, and some AI-related stuff. Advising on it. Training people in it. Writing about it. Creating useful resources for it. Recording little videos about it.

11 个月

This is a fantastic article Aleksandr Tiulkanov. Exactly what I have been looking for. I knew that the case was not clear-cut (despite the many people who seem to have become overnight experts on the "fair use" exception) but I can't even begin to predict what the outcome will be. All I know is that every US privacy case against OpenAI I've read has been totally hopeless. So this really seems like a better angle for US litigants. There is a case to answer under the GDPR, however, and the Garante's enforcement action last year only scratched the surface. I have my eye on Lukasz Olenjnik's complaint to the UODO in Poland, among other cases. Regarding "fair use", I think I put more weight than you on the "creativity" argument, in favour of NYT. A lot of NYT articles are long-reads, opinion pieces, and even short stories. I don't see this content as any less creative than fiction books. But I appreciate I don't know the law, and that NYT might not be alleging that these types of works have been plagiarised by OpenAI.

3 次回应

Mark Montgomery

Founder & CEO of KYield. Pioneer in Artificial Intelligence, Data Physics and Knowledge Engineering.

11 个月

Full content reproduction is included as evidence -- from 90% to verbatim (several examples). That's one of two, and looks fatal to me. The second is competition, and there is no question LLM chatbots are competing with publishers and everyone else in the knowledge economy, by training on competitor's data for free. I read the full complaint and it looks like a very strong case for the NYTs to me.

3 次回应

查看更多评论

要查看或添加评论，请登录

Aleksandr Tiulkanov的更多文章

EU AI Act: Getting the Basics Straight

2024年9月6日

EU AI Act: Getting the Basics Straight

Last month, I ran a webinar that debunked some key misconceptions that are very prevalent with people posting about the…

1 条评论
What Transparency and Other Related Terms Mean in the Context of AI Systems

2024年6月24日

What Transparency and Other Related Terms Mean in the Context of AI Systems

Transparency, explainability and interpretability are the terms frequently used in the context of AI governance. They…

2 条评论
Understanding the Potential Harms of AI Systems: Part II

2024年1月31日

Understanding the Potential Harms of AI Systems: Part II

Earlier this month, I started covering the potential harms of AI systems. The previous newsletter explored the harm…
Understanding the Potential Harms of AI Systems to Individuals

2024年1月25日

Understanding the Potential Harms of AI Systems to Individuals

There are numerous publications on the benefits of AI you have undoubtedly read so far. These benefits, however, can…

4 条评论
Council of Europe is Progressing on the First International AI Treaty

2023年12月18日

Council of Europe is Progressing on the First International AI Treaty

While the European Union's AI Act is still in the works and its latest version is not yet available to the public and…

9 条评论
New Bipartisan U.S. AI Bill Can Be a Game Changer

2023年11月16日

New Bipartisan U.S. AI Bill Can Be a Game Changer

This week, a new U.S.

6 条评论
French MP Disrupts the Schrems Monopoly on Challenging Transatlantic Data Transfers

2023年9月9日

French MP Disrupts the Schrems Monopoly on Challenging Transatlantic Data Transfers

Philippe Latombe, a deputy of the French Parliament, has challenged the newly minted mechanism for transferring data…

1 条评论
Evolutionary Biology Substrate for Digital Policy: What Regulates the Regulators?

2023年9月2日

Evolutionary Biology Substrate for Digital Policy: What Regulates the Regulators?

As a lawyer by training, I tend to be interested in what drives and limits human behaviour. Lawyers are in the business…

1 条评论
Prompt injection: an evil trickery that may be disrupting automated résumé screenings

2023年5月30日

Prompt injection: an evil trickery that may be disrupting automated résumé screenings

This is a short but urgent public announcement and an open letter to all those who develop or use applicant tracking…

11 条评论
The latest Meta case is largely not about Meta

2023年5月23日

The latest Meta case is largely not about Meta

Following the recent €1.2bn fine imposed on Meta for the alleged GDPR non-compliance related to personal data transfers…

3 条评论

See all articles

Do Microsoft and OpenAI Have a Fair Use Defence against the NYT Copyright Complaint?

Aleksandr Tiulkanov

Upskilling people in the EU AI Act - link in profile | LL.M., CIPP/E, AI Governance Advisor, implementing ISO 42001, promoting AI Literacy

What is Fair Use?

Can the Alleged Use of the NYT Materials in Chatbots Be Deemed Fair?

领英推荐

What's Next?

AI, Data & Digital Policy

4,171 位关注者

Aleksandr Tiulkanov的更多文章

社区洞察

其他会员也浏览了

Breaking Ground: Hamburg Court's Surprising Ruling on AI and Copyright Law

Copyright's Compass in Tunisia : Finding Direction in Exceptions and Limitations

To Copyright is Human.

Navigating the Intersection of Copyright and Artificial Intelligence: Key Insights from the U.S. Copyright Office Report

How AI Prompts Are Revolutionizing Copyright Law as the New Digital Paintbrush

Unsettled Copyright Questions

Training data

When Prompts Collide: Avoiding Copyright Issues with AI Generation??????????

The Future of Creativity: The Intersection of AI and Copyright

Nigeria's Copyright Act 2022 and Generative AI: Addressing Fair Dealing Challenges

What is Fair Use?

Can the Alleged Use of the NYT Materials in Chatbots Be Deemed Fair?

领英推荐

What's Next?

AI, Data & Digital Policy

4,171 位关注者

Aleksandr Tiulkanov的更多文章

EU AI Act: Getting the Basics Straight

What Transparency and Other Related Terms Mean in the Context of AI Systems

Understanding the Potential Harms of AI Systems: Part II

Understanding the Potential Harms of AI Systems to Individuals

Council of Europe is Progressing on the First International AI Treaty

New Bipartisan U.S. AI Bill Can Be a Game Changer

French MP Disrupts the Schrems Monopoly on Challenging Transatlantic Data Transfers

Evolutionary Biology Substrate for Digital Policy: What Regulates the Regulators?

Prompt injection: an evil trickery that may be disrupting automated résumé screenings

The latest Meta case is largely not about Meta

社区洞察

其他会员也浏览了

Breaking Ground: Hamburg Court's Surprising Ruling on AI and Copyright Law

Copyright's Compass in Tunisia : Finding Direction in Exceptions and Limitations

To Copyright is Human.

Navigating the Intersection of Copyright and Artificial Intelligence: Key Insights from the U.S. Copyright Office Report

How AI Prompts Are Revolutionizing Copyright Law as the New Digital Paintbrush

Unsettled Copyright Questions

Training data

When Prompts Collide: Avoiding Copyright Issues with AI Generation??????????

The Future of Creativity: The Intersection of AI and Copyright

Nigeria's Copyright Act 2022 and Generative AI: Addressing Fair Dealing Challenges