登录查看更多内容

Why ChatGPT can't Plagiarize

Mike McAulay

AI Integrator & Enthusiast | Senior Subject Matter Expert - .NET, Azure, Cloud | Coding Consultant

发布日期: 2023年6月5日

It's an easy misunderstanding to have. But it is a misunderstanding, nonetheless. The scope of this article isn't to address the larger questions regarding these technologies and their impact on society. It's merely to show through an example and some technical aspects of Large Language Models (LLMs) that they aren't "plagiarism machines," nor are they capable of being so.

To see why that is, we need to consider what is primarily accomplished during training: generalization. The ChatGPT model doesn't retain any memory of the training material. It draws generalizations from the material by identifying patterns at the word and subword level. It then uses these patterns to provide unique responses.

No alt text provided for this image — Example of response to request for quote from known media.

Let's jump into a scenario to help us define what plagiarism is.

A young person is attending college. Their assignment is to write an essay on the nature of reality. This young person looks online and even found a few great sources. Reviewing the final draft, they reread one phrase they really liked and even felt a little proud they came up with it. Unfortunately, they hadn't remembered they had picked up that phrase verbatim in one of the articles they'd read.

A few days after turning in the essay the student is eager to see their score and maybe a positive note from the professor. Sadly, when the paper is returned it's marked with a 0 along with a note above the phrase they were so thrilled with, "this was plagiarized. Come see me during office hours." Our young student is distressed and confused. They knew they hadn't hand copied anything from the material they'd read.

Is this plagiarism? I know some would debate that because it wasn't intentional it doesn't really "count." While it wasn't intentional, this would run afoul most rules regarding plagiarism. The professor may show leniency, but as to the question itself. It is a copy of someone else's work.

When it comes to ChatGPT and other LLMs, there are some who believe that it is intentionally plagiarizing, others, having heard experts declare that it doesn't plagiarize, might conclude the experts don't grasp the more subtle forms of plagiarism such as the scenario I previously presented. That it doesn't "mean to," but it's still plagiarism

领英推荐

How to Increase Your Value

Bruce Turkel 1 年前

8 Ways to Use ChatGPT to Enhance Your Teaching…

LatinHire 1 年前

Humanizer.org vs Undetectable AI: Which Is the Best AI…

Anna Y. 3 个月前

It took the material and generalized from it about the language. Remember this is fundamentally about language, not about the facts and figures it might contain. This is also part of the reason that LLMs are said to "hallucinate," which simply means it sometimes makes things up that sound right.

Let's reexamine the scenario, this time with ChatGPT standing in for our student.

The teacher grading the essay noted that it seems to contain a number of popular ideas about the nature of existence, and sometimes even "sounds like," some eminent voices on the subject. But it never really exhibits traceable connections with the source materials. In fact, if asked to rewrite it, the essay would have the same general characteristics but written with unique text each time.

Is this plagiarism? Certainly not by any definition that's commonly used today. If it were, virtually everything written by anyone would fall into that category.

There is one last aspect of LLMs that can help us put this question to rest. To do so we have to peek under the hood of ChatGPT a bit. These LLMs operate in a way that could be summed up as: "Based on the context of what's been written so far, select the next 'best' word to write." It's important to note it's not comprehending what it's writing. The algorithm uses probabilities based on the discovered patterns on a per word (and subword) basis. As you might imagine, it has a lot of words to work with when finding the right one. It uses a kind of mathematical representation of the words and patterns it produced during training to calculate the probabilities for the next word to write.

Interestingly, researchers found that if they always chose the word with the highest probability, it tended to sound "flat," and less human like. They then tried occasionally using words with lower probability scores and suddenly it sounded more "creative." How often the model uses these less likely words is now a parameter called "temperature" that is used when producing the AI's output. I've written a more detailed description of temperature here .

The method used to build up these strings of words simply doesn't support taking phrases or really even ideas wholesale and copying them.

Said another way, even if one had all the generalizations that were produced by its training, it would be impossible to rebuild its training material from those generalizations. You can't squeeze an actual instance of plagiarism from it any more than you could provide a stock's price at a given time on a given day based solely on patterns you identified in the entire market.

I understand the implications of these technologies are far-reaching and raise real concerns over the fairness of the current distribution of proceeds related to our work. I want to emphasize that the issues being raised are far more fundamental than whether this technology is plagiarizing or not. It comes down to the bigger question of how we try to resolve the disconnect between the full impact of the value we create and the compensation models we've worked under for centuries.

Mike McAulay

AI Integrator & Enthusiast | Senior Subject Matter Expert - .NET, Azure, Cloud | Coding Consultant

1 年

Janet, I'm very curious to hear your take on whether you find my evidence and explanation persuasive. As I mentioned in the article, I do believe there are issues regarding the value we bring versus the proceeds we receive but I want to ensure people understand specifically what LLMs do so that they don't rely on arguments that feel right but ultimately miss the mark due to assumptions about the technology.

1 次回应

Alex Murrey

Staff Product Manager at Twilio

1 年

Interesting Mike McAulay! After reading through this, I’d tend to agree that the challenge here isn’t actually plagiarism. While I’m certainly not the expert here, the concern I see most often isn’t plagiarism, but rather a concern for the data that the model was trained on itself. What are your thoughts on how we can provide proper reference or attribution when it is due? In my small amount of time using Bard and google’s new SGE search, they seem to do a decent job of providing references to the sites that helped to generate the answer. I wonder if we will start to see the same for other types of “non-search” queries across these platforms.

1 次回应

查看更多评论

要查看或添加评论，请登录

Mike McAulay的更多文章

"Creativity" and AI-Generated Imagery

2024年7月8日

"Creativity" and AI-Generated Imagery

Asking if AI-generated imagery is creative misses the point. Disparaging it as "useless," "cheating," or "just…
Claude: the Emotionally Intelligent AI chatbot

2023年8月28日

Claude: the Emotionally Intelligent AI chatbot

While I still use ChatGPT for all kinds of tasks, I've started using another service called Claude. It has a different…

6 条评论
A Short Story about LINQ (with Claude in a supporting role)

2023年7月25日

A Short Story about LINQ (with Claude in a supporting role)

This is a bit of a hybrid article, one part .NET and one part Claude 2.

1 条评论
Engineering Prompt Engineering: Ensuring You and LLMs Are on the Same Page

2023年5月17日

Engineering Prompt Engineering: Ensuring You and LLMs Are on the Same Page

Almost everyone seems to be on the hunt for the ultimate prompt (or two) that will help them make money, save the…

1 条评论
In-house AI Evangelist: Go Tell it on the Mountain

2023年5月12日

In-house AI Evangelist: Go Tell it on the Mountain

While there are definite considerations to be made as to the right time to onboard ChatGPT and tools like it, this…

2 条评论
Going beyond ChatGPT as your Ghostwriter

2023年4月25日

Going beyond ChatGPT as your Ghostwriter

While there seems to be endless articles, videos and tips sharing the “ultimate” prompts to use with ChatGPT, there is…

1 条评论

See all articles

Why ChatGPT can't Plagiarize

Mike McAulay

AI Integrator & Enthusiast | Senior Subject Matter Expert - .NET, Azure, Cloud | Coding Consultant

领英推荐

Mike McAulay的更多文章

社区洞察

其他会员也浏览了

I was dazzled and confused when I asked #ChatGPT to write like a woman

Will ChatGPT kill the art of essay writing?

BEEN HEARING ABOUT CHATGPT? HERE’S WHAT YOU NEED TO KNOW

Bootstrapped, Building In Public, Plagiarism, and Staying Ahead of the Game in Precision Skin Health

The Impact of ChatGPT on the Future of Written World

OMG! ChatGPT and Its Mind-Boggling Impact on Plagiarism

The Ethics of ChatGPT Essay Writing — Is It Plagiarism?

Spotting Plagiarism Risks in Generative AI

Ethics of Plagiarism Allegations

Informed Pedagogy is the Key to Solving #AI #ChatGPT Plagiarism

领英推荐

Mike McAulay的更多文章

"Creativity" and AI-Generated Imagery

Claude: the Emotionally Intelligent AI chatbot

A Short Story about LINQ (with Claude in a supporting role)

Engineering Prompt Engineering: Ensuring You and LLMs Are on the Same Page

In-house AI Evangelist: Go Tell it on the Mountain

Going beyond ChatGPT as your Ghostwriter

社区洞察

其他会员也浏览了

I was dazzled and confused when I asked #ChatGPT to write like a woman

Will ChatGPT kill the art of essay writing?

BEEN HEARING ABOUT CHATGPT? HERE’S WHAT YOU NEED TO KNOW

Bootstrapped, Building In Public, Plagiarism, and Staying Ahead of the Game in Precision Skin Health

The Impact of ChatGPT on the Future of Written World

OMG! ChatGPT and Its Mind-Boggling Impact on Plagiarism

The Ethics of ChatGPT Essay Writing — Is It Plagiarism?

Spotting Plagiarism Risks in Generative AI

Ethics of Plagiarism Allegations

Informed Pedagogy is the Key to Solving #AI #ChatGPT Plagiarism