登录查看更多内容

Phantom Data: A New Tool for Copyright Holders to Detect AI Training Usage

Jenish Pithadiya

CHRO & Co-Founder |AI Development & Consulting | Working with ISRO | Machine Learning Expert | Deep Learning Expert | Computer Vision | NLP | Web Development Services | Mobile App Development | Aero Space

发布日期: 2024年7月30日

In a groundbreaking study, researchers from Imperial College London have introduced a novel technique that could enable copyright holders to determine if their work has been used in training large language models (LLMs). This innovative method was presented at the International Conference on Machine Learning in Vienna and detailed in a preprint on the arXiv server.

Generative AI, including advanced LLMs, relies on vast amounts of internet-sourced text, images, and other content to develop its impressive capabilities. However, this often occurs on legally uncertain grounds regarding the use of training data. Addressing this issue, the new paper from Imperial College experts proposes a mechanism to detect the use of copyrighted data for AI training.

Lead researcher Dr. Yves-Alexandre de Montjoye, from Imperial's Department of Computing, explains, “Inspired by early 20th-century mapmakers who used phantom towns to detect illicit copies, we explore how injecting 'copyright traps'—unique fictitious sentences—into original text enables content detectability in trained LLMs.”

Content owners can embed these copyright traps across their documents. If an LLM developer scrapes this data and uses it for training, the data owner can identify irregularities in the model's outputs, proving their content was used.

This technique is especially suited for online publishers, who can discreetly insert these traps in news articles. Dr. de Montjoye notes that while LLM developers might develop techniques to remove these traps, doing so consistently would require significant resources.

领英推荐

This AI newsletter is all you need #44

Towards AI 1 年前

Future of Creative Work with Generative AI

Data Science Dojo 1 年前

This AI Newsletter is all you need #14

Towards AI 2 年前

To validate their approach, the researchers partnered with a team in France to train a truly bilingual English-French 1.3B-parameter LLM, embedding various copyright traps in the training data. Their successful experiments suggest this method could enhance transparency in LLM training.

Co-author Igor Shilov highlights the increasing reluctance of AI companies to share training data information. “While the training data composition for older models like GPT-3 and LLaMA is known, it’s not the case for newer models like GPT-4 and LLaMA-2. This lack of transparency makes it crucial to have tools that inspect the training process,” Shilov said.

Co-author Matthieu Meeus adds, “The issue of AI training transparency and fair compensation for content creators is vital for the future of responsible AI development. We hope this work on copyright traps contributes towards a sustainable solution.”

#AI #GenerativeAI #MachineLearning #LLMs #CopyrightProtection #AITrainingData #TechInnovation #DigitalRights #TransparencyInAI #AIResearch #ImperialCollegeLondon #ContentCreators #FairCompensation #ResponsibleAI #AIethics

AI Unleashed: Daily Insights

1,023 位关注者

要查看或添加评论，请登录

Jenish Pithadiya的更多文章

When tech giants Google and Nvidia announce plans to enhance focus and investments in India.

2024年9月25日

When tech giants Google and Nvidia announce plans to enhance focus and investments in India.

Ttech giants Google and Nvidia announced plans to enhance their focus and investments in India, it marked a significant…
OpenAI launches new o1 language model

2024年9月16日

OpenAI launches new o1 language model

Attention: OpenAI launches new o1 language model with ‘reasoning’ abilities, Sam Altman says it is ‘still flawed’…
Protecting Creators: YouTube’s New AI Tools to Combat Deepfakes and Voice Clones

2024年9月11日

Protecting Creators: YouTube’s New AI Tools to Combat Deepfakes and Voice Clones

YouTube is developing new tools to help creators combat AI-generated content, including deepfakes and voice clones. One…
#Apple Intelligence delays could impede iPhone 16 ‘supercycle’

2024年9月9日

#Apple Intelligence delays could impede iPhone 16 ‘supercycle’

#When Apple unveiled its plans at in June, analysts suggested the feature could put the iPhone 16 on track for another…
Elite Athletes Face Appalling Online Abuse. The Paris Olympics Is Trying to Shield Them with AI

2024年8月6日

Elite Athletes Face Appalling Online Abuse. The Paris Olympics Is Trying to Shield Them with AI

An AI algorithm is set to wade through the vast oceans of social media content generated during the Paris 2024…

1 条评论
Omega’s AI Revolutionizes Olympic Athlete Performance Tracking

2024年7月23日

Omega’s AI Revolutionizes Olympic Athlete Performance Tracking

On August 27, 1960, the Rome Olympics saw a controversial gold medal decision in the 100-meter freestyle men's swimming…

1 条评论
How AI is Revolutionizing Game Design and Player Experiences

2024年7月15日

How AI is Revolutionizing Game Design and Player Experiences

Artificial intelligence is transforming numerous industries, and the gaming industry is no exception. From developing…
Microsoft and Apple Withdraw from OpenAI Board Amid Rising Regulatory Scrutiny

2024年7月11日

Microsoft and Apple Withdraw from OpenAI Board Amid Rising Regulatory Scrutiny

Microsoft and Apple have opted against taking up board seats at OpenAI, a decision influenced by increasing regulatory…
Would having an AI boss be better than your current human one?

2024年7月4日

Would having an AI boss be better than your current human one?

Managing 83 employees had taken a toll on Hannu Rauma, a senior manager at Student Marketing Agency in Vancouver…

2 条评论
Tech War Escalates: OpenAI Shuts Door on China

2024年6月29日

Tech War Escalates: OpenAI Shuts Door on China

This week, OpenAI decisively blocked access to its site from mainland China and Hong Kong, cutting off developers and…

See all articles

Phantom Data: A New Tool for Copyright Holders to Detect AI Training Usage

Jenish Pithadiya

CHRO & Co-Founder |AI Development & Consulting | Working with ISRO | Machine Learning Expert | Deep Learning Expert | Computer Vision | NLP | Web Development Services | Mobile App Development | Aero Space

领英推荐

AI Unleashed: Daily Insights

1,023 位关注者

Jenish Pithadiya的更多文章

社区洞察

其他会员也浏览了

Almost Timely News: When Should You Use Generative AI? (2023-06-25)

AI Weekly Digest - September 4 2023

What Legal Challenges in Harnessing Generative Artificial Intelligence?

How we talk to AI Bots, EU Directive NIS 2: A Race against Time, AI Creativity vs. Copyright Law, Smart Cities, Data Quality as Challenge

How to Write a Book Using Artificial Intelligence (AI)

Intellectual Property in the Age of AI

Google Lied, Magnific AI + MidJourney, Getty take legal action & GPT store delayed. This week...

Generative AI: is it a threat to creative economy?

?? ?? August in AI: ?? The Great Harvest, ????AI&Series, ?? EU's DSA, ?? IBM's Chip, ???? Brain-Decoding Music, ??♀??? ...

Chain-of-thought reasoning comes with a price

领英推荐

AI Unleashed: Daily Insights

1,023 位关注者

Jenish Pithadiya的更多文章

When tech giants Google and Nvidia announce plans to enhance focus and investments in India.

OpenAI launches new o1 language model

Protecting Creators: YouTube’s New AI Tools to Combat Deepfakes and Voice Clones

#Apple Intelligence delays could impede iPhone 16 ‘supercycle’

Elite Athletes Face Appalling Online Abuse. The Paris Olympics Is Trying to Shield Them with AI

Omega’s AI Revolutionizes Olympic Athlete Performance Tracking

How AI is Revolutionizing Game Design and Player Experiences

Microsoft and Apple Withdraw from OpenAI Board Amid Rising Regulatory Scrutiny

Would having an AI boss be better than your current human one?

Tech War Escalates: OpenAI Shuts Door on China

社区洞察

其他会员也浏览了

Almost Timely News: When Should You Use Generative AI? (2023-06-25)

AI Weekly Digest - September 4 2023

What Legal Challenges in Harnessing Generative Artificial Intelligence?

How we talk to AI Bots, EU Directive NIS 2: A Race against Time, AI Creativity vs. Copyright Law, Smart Cities, Data Quality as Challenge

How to Write a Book Using Artificial Intelligence (AI)

Intellectual Property in the Age of AI

Google Lied, Magnific AI + MidJourney, Getty take legal action & GPT store delayed. This week...

Generative AI: is it a threat to creative economy?

?? ?? August in AI: ?? The Great Harvest, ????AI&Series, ?? EU's DSA, ?? IBM's Chip, ???? Brain-Decoding Music, ??♀??? ...

Chain-of-thought reasoning comes with a price