登录查看更多内容

Generative AI - Prolific Copyright Infringer?

Jason Flaks

Head of AI and Search @ Simpplr | Executive Leadership | Conversational AI Pioneer

发布日期: 2024年1月22日

So, you might be wondering, "What makes this guy fit to pen an article about generative AI and copyright infringement?" I mean, I'm no copyright lawyer, nor do I moonlight as one on television. But I do bring a unique viewpoint to the table.? After all, I've dabbled in the music industry and have been up to my elbows in machine-learning projects for a good chunk of my career. But perhaps my best qualification is my long-standing fascination with copyright law, which started when I was just a kid. That image up top isn't some AI-generated piece from Midjourney; they're my earnest attempts at copyrighting my original music over three decades ago using the Poor Man’s copyright approach.

So, What’s Copyright, and How Do I Get One?

Before we dive into the meaty debate of whether Generative AI infringes on anyone’s copyright, let's clarify what copyright means. According to the US Copyright Office, copyright is a "type of intellectual property that protects?original works of authorship?as soon as an author?fixes?the work in a?tangible form of expression." In simpler terms, copyright asserts your ownership of any intangible creations of human intellect (e.g., music, art, writing, etc). ?The moment you fix your creation to a physical form (e.g., MP3 file, canvas, video recording, piece of paper, etc.), you have a copyright.

Amazingly, you don't need to register for an official copyright to have one. So, why register a copyright? The Supreme Court has decided that to sue for copyright infringement, you must have registered your copyright with the US copyright office. However, they've also clarified that registering to sue is separate from the date of creation. This means I could register for a copyright for this blog post years from now but still sue for any infringement prior, as long as I can prove the date of creation.

How Can Generative AI Infringe on a Copyright?

There's been a lot of talk about generative AI infringing on copyright protections, but many of these discussions oversimplify the issue. There are actually three different ways Generative AI can infringe on your copyright, some favoring artists/creators and some more favorable to the generative AI companies.

Theft (copying) of copyrighted material
Distribution of copyrighted material
Use of copyrighted material in derivative or transformative works

On Theft (Copying) of Copyrighted Material

Let's get real; while there are numerous court cases establishing the legitimate right to duplicate copyrighted material under the fair use doctrine, the default assumption is and should be that it is illegal to do so.? Therefore, if we can determine that generative AI companies are using copies of content they did not pay for or get permission to use and using that content in a way that falls outside fair use, then we can assume they are stealing copyrighted material.

There is little or no debate that generative AI companies are using copyrighted material.? After all, OpenAI basically admits to lifting its training data from content “that is publicly available on the internet” on its website. And as I discussed earlier just about anything newly written on the internet has an inherent copyright.? But beyond possible scraping of my blog post, there is sufficient evidence that generative AI companies have ingested copyrighted books, images, and more.?

And if you need any more proof, look at the image below where I attempted to elicit lyrics from Bob Dylan’s Blowin' in the Wind from ChatGPT.? ChatGPT both knew the lyrics I provided were from the song, and ChatGPT was able to quote a portion of the lyrics I did not provide.? It can only do that because it has seen the lyrics before in its training data set.

If there is no question that copyrighted material was used in the training process, then we only need to assess whether the copying should be considered fair use. There are multiple justifications for fair use in copyright law.? Some are easy to interpret, and others more difficult.? Items like research or scholarly use are reasonably easy to assess, and I can find no fair argument that generative AI companies are using copyrighted material in either capacity.

So, that leaves the last question in fair use: does the copying materially impact the monetization of the content??? And I think the answer here again is quite simple: YES!? The easiest example I can give is the artwork I regularly use in my blog posts.? I’ve traditionally paid for the art I use via services like Dreamstime.? If Midjourney or Stable Diffusion trained on this type of art and I subsequently generate my blog post art via their services, I may never pay for art via Dreeamstime or other similar services again.? And in doing so, those artists have lost a way to monetize their art, and they are not equally compensated by the generative AI companies.

On Distribution of Copyrighted Material

If you’re old like me, you may remember those FBI copyright warnings that regularly made an appearance on DVDs and VHS tapes.

领英推荐

What is Music Copyright

Musicinfo 2 年前

Avoiding Copyright Claims: Do You Know Where Your…

DIGITAL SCIENTISTS? 1 个月前

AI-Assisted Artwork and Copyright: What’s Changing?

Boostmysites 1 个月前

The unauthorized reproduction or distribution of this copyrighted work is illegal …

The issue of whether these systems distribute the content in its original form with little transformation is a big one. This distribution can occur in two ways: to end customers and to data annotators.

To end customers

Generative AI models are basically next-word (pixel, etc.) predictors. They aim to provide the most statistically likely next word based on a previous sequence of words. As a result, these models will, without any special adaptations, spit back exact copies of text, images, etc., especially in low-density areas. ?As you can see from the image in the previous section, while OpenAI has been proactively trying to adapt the system not to distribute copyrighted material, I was still able to get it to do so with very little effort on my part.

So while these generative AI systems will continue to try and put mitigations in place to prevent the distribution of copyrighted content, there is little or no debate that they have been doing so all along.? And they are likely to continue doing so as it is impossible to close every hole in the system.

To data annotators

OpenAI and others use reinforcement learning from human feedback (RLHF) to improve their models. RLHF requires that outputs from an original model are shown to human annotators to help build a reward model that leads to better outputs from the generative model. If these human annotators were shown copyrighted material, in an effort to reward the model for not doing so in the future, OpenAI and other generative AI companies would clearly be distributing copyrighted content.

You might ask, “Shouldn’t copyright holders be happy that OpenAI is trying to train their models not to distribute copyrighted content?” ?Well, maybe, but if I started traveling the country tomorrow, giving a for-profit seminar on how to detect illegal copies of the Super Bowl, and in these seminars, I played previous Super Bowl recordings to the attendees without the NFL's permission … I think the NFL would have a problem with that.

On Use of Copyrighted Material in Derivative or Transformative Works

The question of whether the output generated by Generative AI models, when not a direct reproduction, counts as copyright infringement is a murky one. There are many examples where courts have determined that "style" is not copyrightable. There are further questions on whether any output created by generative AI based on copyrighted material is derivative or transformative.?? Truth be told, it can likely be either, depending on how the model is prompted.? So it’s actually quite difficult to say for sure if the resulting output from generative AI models is fair use or copyright infringement. We're left then with questions about who is really violating copyright in any of these cases. Is it the model or the company that owns it? Or Is it the user who prompted the model to generate the content? And does any of it really matter unless that generated content is published?

The Road Ahead

It seems to me the issue of generative AI and copyright has been complicated more than necessary. Generative AI companies must find a way to pay for the content they use to train their models. If they distribute the content, they may need to find a way to pay royalties.? Otherwise, these generative AI companies are profiting off the works of creators without properly compensating them.? And that just isn’t fair.

For artists, don't let the thought of generative AI copying your style without compensation scare you. These models can't generate new content and are limited to what they've seen in their training set. So, keep making new art, keep pushing boundaries, and if we solve the first problem of content theft and distribution, you'll continue to be paid for the amazing work you create.

This article first appeared on the Speech Wrecko blog at www.speechwrecko.com.

Dzmitry Shyshko

Lead Software Testing Engineer

8 个月

The scariest part is what will happen when it becomes more popular in daily use. Easy example is stackoverflow. I'm pretty sure that majority of their visits are happening thanks to old questions. But with AI you don't need to visit it like 80% of the time. You can just get direct answer copy pasted by AI from stackoverflow with 0 browsing. Can such portals survive this? If they can't and we lose original source of information - can AI survive this? How AI will generate answers for some new programming language questions if there are basically no portals discussing this new language anymore. It's as self destructive as it can be

Iroro O.

1 年

Jason, have you heard of Nightshade? IIUC it applies adversarial pertubations to images, shifting around the feature space (invisible to the eye) but confounding any attempts to train on that image. https://news.artnet.com/art-world/nightshade-ai-downloaded-250000-times-2426956

Stephen Handley

1 年

I'm pretty sure I still have an unopened envelope with my CD of original music postmarked late 1997 ... ??

1 次回应

查看更多评论

要查看或添加评论，请登录

Jason Flaks的更多文章

Secrets of Startup Equity: How to Make Sure You Get Your Fair Share

2025年1月15日

Secrets of Startup Equity: How to Make Sure You Get Your Fair Share

So you’ve received a job offer from an early-stage startup, and they’re offering you a 1% equity stake. As you read…

2 条评论
The Secrets of RAG: An Informative Introduction for Executives

2024年8月2日

The Secrets of RAG: An Informative Introduction for Executives

Have you been walking around the office, puzzled by all the chatter about ratty dish towels? Don’t worry; I’m here to…

3 条评论
When Buttons Triumph Over Words: Find Out Why

2024年7月19日

When Buttons Triumph Over Words: Find Out Why

Picture it, Redmond, 2008. A young, naive program manager at Microsoft builds a prototype of one of the world's first…

1 条评论
How to Increase Team Productivity with "Swing"

2024年6月5日

How to Increase Team Productivity with "Swing"

I wasn’t looking for anything special when I first picked up a copy of The Boys in the Boat. I was in an airport…

2 条评论
How to Use Nondeterministic LLMs for Building Robust Deterministic Applications

2024年5月7日

How to Use Nondeterministic LLMs for Building Robust Deterministic Applications

If you've followed my previous posts, you might be under the impression that I'm against large language models (LLMs)…

5 条评论
The Astonishing Reasons Why Your LLM is a Bad Notetaker

2024年3月29日

The Astonishing Reasons Why Your LLM is a Bad Notetaker

We've all been there. You gather your team for a meeting, you make a bunch of decisions that lead to a series of…

2 条评论
Introducing Task-Oriented Multiparty Conversational AI: Inviting AI to the Party

2024年2月26日

Introducing Task-Oriented Multiparty Conversational AI: Inviting AI to the Party

The term “conversational AI” has been around for some time. There are dozens of definitions all over the internet.

5 条评论
Your Large Language Model - it's as Dumb as a Rock

2023年3月2日

Your Large Language Model - it's as Dumb as a Rock

Unless you’ve been living under a rock lately you likely think we’re entering some sort of AI-pocalypse. The sky is…

11 条评论
The Annotators Dilemma: When Humans Teach Machines to Fail

2022年2月22日

The Annotators Dilemma: When Humans Teach Machines to Fail

What does a machine learning model trained via supervised learning and a lion raised in captivity have in common? ……

3 条评论
Teachers Keep on Teaching – ‘til I Reach my Highest Ground

2021年5月6日

Teachers Keep on Teaching – ‘til I Reach my Highest Ground

Forgive me Stevie Wonder for slightly reordering your lyrics, but I think you’d agree that it’s hard to reach your…

3 条评论

See all articles

Generative AI - Prolific Copyright Infringer?

Jason Flaks

Head of AI and Search @ Simpplr | Executive Leadership | Conversational AI Pioneer

So, What’s Copyright, and How Do I Get One?

How Can Generative AI Infringe on a Copyright?

On Theft (Copying) of Copyrighted Material

On Distribution of Copyrighted Material

领英推荐

To end customers

To data annotators

On Use of Copyrighted Material in Derivative or Transformative Works

The Road Ahead

Jason Flaks的更多文章

社区洞察

其他会员也浏览了

AI-Assisted Artwork and Copyright: What’s Changing?

Denying Copyright for AI-Assisted Art Threatens Innovation

What You Need to Know About Copyright

IDEA-EXPRESSION DICHOTOMY UNDER COPYRIGHT LAW”

New Zealand’s Copyright Act gets a sense of humour

AI Cannot Copyright its Work: US Copyright Office

The Chou V Metstech Pty Limited case is a stark reminder that there is no copyright protection for ideas

AI and the Copyright Minefield

Artist Vs AI... Who Owns The Copyright For AI-Generated Art?

What is the Copyright Status of AI Generated Works?

So, What’s Copyright, and How Do I Get One?

How Can Generative AI Infringe on a Copyright?

On Theft (Copying) of Copyrighted Material

On Distribution of Copyrighted Material

领英推荐

To end customers

To data annotators

On Use of Copyrighted Material in Derivative or Transformative Works

The Road Ahead

Jason Flaks的更多文章

Secrets of Startup Equity: How to Make Sure You Get Your Fair Share

The Secrets of RAG: An Informative Introduction for Executives

When Buttons Triumph Over Words: Find Out Why

How to Increase Team Productivity with "Swing"

How to Use Nondeterministic LLMs for Building Robust Deterministic Applications

The Astonishing Reasons Why Your LLM is a Bad Notetaker

Introducing Task-Oriented Multiparty Conversational AI: Inviting AI to the Party

Your Large Language Model - it's as Dumb as a Rock

The Annotators Dilemma: When Humans Teach Machines to Fail

Teachers Keep on Teaching – ‘til I Reach my Highest Ground

社区洞察

其他会员也浏览了

AI-Assisted Artwork and Copyright: What’s Changing?

Denying Copyright for AI-Assisted Art Threatens Innovation

What You Need to Know About Copyright

IDEA-EXPRESSION DICHOTOMY UNDER COPYRIGHT LAW”

New Zealand’s Copyright Act gets a sense of humour

AI Cannot Copyright its Work: US Copyright Office

The Chou V Metstech Pty Limited case is a stark reminder that there is no copyright protection for ideas

AI and the Copyright Minefield

Artist Vs AI... Who Owns The Copyright For AI-Generated Art?

What is the Copyright Status of AI Generated Works?