登录查看更多内容

A Modest Proposal, or the Sound of Inevitability

Simson Garfinkel

Chief Scientist @BasisTech, Lecturer @Harvard

发布日期: 2025年1月13日

For preventing the end of journalism, ending the dependence of AI systems on data “scraped” from the Internet, and providing AI systems with a sustainable source of new training data.

By Dr. Simson L. Garfinkel (with homage to Dr. Jonathan Swift and his Modest Proposal)

It is a melancholy object to those who once paid to read articles in newspapers or magazines, or haunt websites claiming to offer articles and commentary about important events of the day. All of this intellectual output requires a veritable army of reporters and editors, each supported by three, four, or even six individuals devoted to production, advertising sales, marketing, and promotion—all struggling, figuratively in rags, and importuning every reader for alms. These journalists, instead of working for their honest livelihood, are forced to devote all their time to proposing and researching new articles, writing their thoughts, and then laboriously passing the resulting text through grammar checkers and content editors (both human and electronic) until the final product is ready to be scrolled past by a reader on a cell phone screen.

I think it is agreed by all parties that this prodigious number of writers and editors, in the present deplorable state of the Internet, constitutes a very great additional grievance. Therefore, whoever could discover a fair, cheap, and easy method of producing high-quality journalism would be ethically obliged to make it known and to bring it about.

As for my part, in recent years, I have made extensive use of AI systems like ChatGPT to assist with programming, copyediting, and even organizing my day-to-day thoughts. These systems are an absolute marvel, but they have a well-known Achilles’ heel: they are voracious consumers of original, human-generated text.

“Training data” is the creative spark that powers all Large Language Models—a spark that companies like OpenAI “scraped” from the Internet (perhaps with legal authorization, perhaps not) and packaged for their own use, much like stuffing lightning into a bottle. But ever since OpenAI released ChatGPT on November 30, 2022, an increasing amount of text on the Internet has been contaminated with the output of LLMs, much as steel world-wide was contaminated after the explosion of the first atomic bombs.

But my intention is very far from being confined to provide only for the livelihoods of professed beggars journalists. It encompasses a much broader scope, embracing all human creators and influencers, whether on YouTube, TikTok, or Substack—indeed, anyone seeking to monetize their own intellectual output through publishing.

领英推荐

Content in the age of infinite production

Bright Money 2 年前

Taking a Proactive Stance Against Deepfakes

Truepic 1 年前

Writers halt AI takeover...for now

Contagious 1 年前

Rather than having journalists and other creators perform the laborious work of researching, writing, publishing, and promoting content only to become the web-scraping targets of those AI companies, AI systems should simply become the journalists. New human-generated content would emerge when these systems ask humans questions, record the answers, and train on the data. (This is what should have been the plot of The Matrix, as the idea of using humans for power makes no thermodynamic sense.) And indeed, now that ChatGPT can send email and dial the phone, what potential source would even know the difference?

The New York Times claims to have 1,700 journalists reporting from more than 160 countries. The International Federation of Journalists claims to represent 600,000 media professionals from 187 trade unions and associations in more than 140 countries. What a waste, and waste simply cannot be tolerated in our modern AI-based economy.

Think of all the jobs saved through digitization and AI transformation.

“Do you hear that Mr. Anderson? That is the sound of inevitability.”

The economics are indeed inevitable: all of this work will be done by AI. Not today’s AI, of course, but some future AI that combines news gathering, reporting, and presentation. Instead of merely assisting the creative class with better tools, AI will help to eliminate that class entirely. Future systems will use humanity for training data. Like loving parents, we will feed and nourish our brain children. For the technical details, please see this Wikipedia article on Matriphagy.

Some readers may argue with my fundamental thesis, claiming that there are many sources of training data that are inaccessible to LLMs trapped in data centers. After all, my most recent articles in Technology Review Magazine have been based on painstaking review of MIT archives. These archives consist of handwritten notebooks and typed memos on yellowing pages, frequently more than a fifty or a hundred years old. There is no way that an LLM can access these data! Surely archival research is a possible role for at least a handful of journalists in the not-to-distant future.

To these readers, I have just one word in reply: robots.

Database Nation

1,785 位关注者

Jason Pramas

Executive Director, Boston Institute for Nonprofit Journalism; Editor-in-Chief, HorizonMass; Vice President of Logistics, Talking Joints Memo ...(marketers, do not attempt to friend me)

1 个月

I shall share this with my fellow mendicant colleagues lol ...

David Churbuck

Writer

1 个月

My vocabulary is grateful for "matriphagy." Thank you.

1 次回应

Jeffrey Goldberg

Geek who doesn't know how to present himself professionally on social media

1 个月

I was hoping that your proposal would include eating people!

3 次回应

Mei Lin Fung

Chair,Sustainability IEEE SSIT|Co-Founder Vint Cerf PeopleCentered.net | Co-Chair UN Commission on the Status of Women: Digital Innovation 2023 Africa Asia Europe Middle East |

1 个月

Very clever repurposing of the noted Wit Jonathan Swift’s Modest Proposal- the Matrix quote a crowning touch! Is resistance futile?

1 次回应

查看更多评论

要查看或添加评论，请登录

Simson Garfinkel的更多文章

Noisy Outtakes

2025年1月2日

Noisy Outtakes

My book Differential Privacy will be published March 25 by MIT Press. The book is part of the “Essential Knowledge…

3 条评论
Spooky Data at a Distance

2024年10月30日

Spooky Data at a Distance

As Halloween fast approaches, I thought it would be fun to recount a dinner talk that I gave several years ago on a…

6 条评论
Trust and Safety

2024年6月26日

Trust and Safety

If your website or service allows users to post comments or exchange messages with other users, then you will…

5 条评论
Review: Claire Bowen's "Government Data of the People"

2024年5月20日

Review: Claire Bowen's "Government Data of the People"

As governments and corporations make increasingly more use of our personal data, a growing number of computer…

3 条评论
Metasearch: Search and RAG multiple datasets without data governance chaos

2024年3月11日

Metasearch: Search and RAG multiple datasets without data governance chaos

Metasearch systems take your query, send it to multiple search engines, and then show you the combined results. Most…

3 条评论
Vector Databases and RAG

2024年2月27日

Vector Databases and RAG

“You Do Not Need a Vector Database” is the provocative title of a recent blog post (with code) by Dr. Yucheng Low…

12 条评论
Testing the family china for lead

2024年1月21日

Testing the family china for lead

In this issue I take a break from data and talk about something physical. This is Jerry Urban from Inspector 3755, his…

6 条评论
Sensitive Locations

2024年1月19日

Sensitive Locations

Do you work in a sensitive location? On January 9th, the US Federal Trade Commission settled a case with data broker…

4 条评论
WHOOP's AI (LLM) Coach

2023年11月1日

WHOOP's AI (LLM) Coach

In September, I joined the WHOOP Coach beta program, a new feature that WHOOP recently added to its popular fitness…

2 条评论
ORINink, brightening the MTA

2023年9月15日

ORINink, brightening the MTA

Today on the #6 Subway in NYC I saw a man doing rapid drawings of other people in the car, then leaving the…

4 条评论

See all articles

A Modest Proposal, or the Sound of Inevitability

Simson Garfinkel

Chief Scientist @BasisTech, Lecturer @Harvard

领英推荐

Database Nation

1,785 位关注者

Simson Garfinkel的更多文章

社区洞察

其他会员也浏览了

?? How the journalists you pitch are using AI

Can You Legally Use ChatGPT to Write a Book?

10 ways journalists use AI, valuing collaboration over competition, blocking AI bots: Twipe’s Weekly News Digest

Understanding Synthetic Media: The Future of Digital Content

OpenAI secures a license to use content from the Financial Times for ChatGPT.

How AI and the Lure of Efficiency Threaten Human Writing

Is Wikipedia Reliable?

Transparency is the key to building trust with AI-generated content

2023 Is The Year Governments Around The World Awaken To The Threat Posed By Artificial Intelligence

Writing with AI: Embracing the "Human in the Loop" Technique

领英推荐

Database Nation

1,785 位关注者

Simson Garfinkel的更多文章

Noisy Outtakes

Spooky Data at a Distance

Trust and Safety

Review: Claire Bowen's "Government Data of the People"

Metasearch: Search and RAG multiple datasets without data governance chaos

Vector Databases and RAG

Testing the family china for lead

Sensitive Locations

WHOOP's AI (LLM) Coach

ORINink, brightening the MTA

社区洞察

其他会员也浏览了

?? How the journalists you pitch are using AI

Can You Legally Use ChatGPT to Write a Book?

10 ways journalists use AI, valuing collaboration over competition, blocking AI bots: Twipe’s Weekly News Digest

Understanding Synthetic Media: The Future of Digital Content

OpenAI secures a license to use content from the Financial Times for ChatGPT.

How AI and the Lure of Efficiency Threaten Human Writing

Is Wikipedia Reliable?

Transparency is the key to building trust with AI-generated content

2023 Is The Year Governments Around The World Awaken To The Threat Posed By Artificial Intelligence

Writing with AI: Embracing the "Human in the Loop" Technique