A Modest Proposal, or the Sound of Inevitability

A Modest Proposal, or the Sound of Inevitability

For preventing the end of journalism, ending the dependence of AI systems on data “scraped” from the Internet, and providing AI systems with a sustainable source of new training data.

By Dr. Simson L. Garfinkel (with homage to Dr. Jonathan Swift and his Modest Proposal)

It is a melancholy object to those who once paid to read articles in newspapers or magazines, or haunt websites claiming to offer articles and commentary about important events of the day. All of this intellectual output requires a veritable army of reporters and editors, each supported by three, four, or even six individuals devoted to production, advertising sales, marketing, and promotion—all struggling, figuratively in rags, and importuning every reader for alms. These journalists, instead of working for their honest livelihood, are forced to devote all their time to proposing and researching new articles, writing their thoughts, and then laboriously passing the resulting text through grammar checkers and content editors (both human and electronic) until the final product is ready to be scrolled past by a reader on a cell phone screen.

I think it is agreed by all parties that this prodigious number of writers and editors, in the present deplorable state of the Internet, constitutes a very great additional grievance. Therefore, whoever could discover a fair, cheap, and easy method of producing high-quality journalism would be ethically obliged to make it known and to bring it about.

As for my part, in recent years, I have made extensive use of AI systems like ChatGPT to assist with programming, copyediting, and even organizing my day-to-day thoughts. These systems are an absolute marvel, but they have a well-known Achilles’ heel: they are voracious consumers of original, human-generated text.

“Training data” is the creative spark that powers all Large Language Models—a spark that companies like OpenAI “scraped” from the Internet (perhaps with legal authorization, perhaps not) and packaged for their own use, much like stuffing lightning into a bottle. But ever since OpenAI released ChatGPT on November 30, 2022, an increasing amount of text on the Internet has been contaminated with the output of LLMs, much as steel world-wide was contaminated after the explosion of the first atomic bombs.

But my intention is very far from being confined to provide only for the livelihoods of professed beggars journalists. It encompasses a much broader scope, embracing all human creators and influencers, whether on YouTube, TikTok, or Substack—indeed, anyone seeking to monetize their own intellectual output through publishing.

Rather than having journalists and other creators perform the laborious work of researching, writing, publishing, and promoting content only to become the web-scraping targets of those AI companies, AI systems should simply become the journalists. New human-generated content would emerge when these systems ask humans questions, record the answers, and train on the data. (This is what should have been the plot of The Matrix, as the idea of using humans for power makes no thermodynamic sense.) And indeed, now that ChatGPT can send email and dial the phone, what potential source would even know the difference?

The New York Times claims to have 1,700 journalists reporting from more than 160 countries. The International Federation of Journalists claims to represent 600,000 media professionals from 187 trade unions and associations in more than 140 countries. What a waste, and waste simply cannot be tolerated in our modern AI-based economy.

Think of all the jobs saved through digitization and AI transformation.

“Do you hear that Mr. Anderson? That is the sound of inevitability.”

The economics are indeed inevitable: all of this work will be done by AI. Not today’s AI, of course, but some future AI that combines news gathering, reporting, and presentation. Instead of merely assisting the creative class with better tools, AI will help to eliminate that class entirely. Future systems will use humanity for training data. Like loving parents, we will feed and nourish our brain children. For the technical details, please see this Wikipedia article on Matriphagy.

Some readers may argue with my fundamental thesis, claiming that there are many sources of training data that are inaccessible to LLMs trapped in data centers. After all, my most recent articles in Technology Review Magazine have been based on painstaking review of MIT archives. These archives consist of handwritten notebooks and typed memos on yellowing pages, frequently more than a fifty or a hundred years old. There is no way that an LLM can access these data! Surely archival research is a possible role for at least a handful of journalists in the not-to-distant future.

To these readers, I have just one word in reply: robots.

Jason Pramas

Executive Director, Boston Institute for Nonprofit Journalism; Editor-in-Chief, HorizonMass; Vice President of Logistics, Talking Joints Memo ...(marketers, do not attempt to friend me)

1 个月

I shall share this with my fellow mendicant colleagues lol ...

回复

My vocabulary is grateful for "matriphagy." Thank you.

Jeffrey Goldberg

Geek who doesn't know how to present himself professionally on social media

1 个月

I was hoping that your proposal would include eating people!

Mei Lin Fung

Chair,Sustainability IEEE SSIT|Co-Founder Vint Cerf PeopleCentered.net | Co-Chair UN Commission on the Status of Women: Digital Innovation 2023 Africa Asia Europe Middle East |

1 个月

Very clever repurposing of the noted Wit Jonathan Swift’s Modest Proposal- the Matrix quote a crowning touch! Is resistance futile?

要查看或添加评论,请登录

Simson Garfinkel的更多文章

  • Noisy Outtakes

    Noisy Outtakes

    My book Differential Privacy will be published March 25 by MIT Press. The book is part of the “Essential Knowledge…

    3 条评论
  • Spooky Data at a Distance

    Spooky Data at a Distance

    As Halloween fast approaches, I thought it would be fun to recount a dinner talk that I gave several years ago on a…

    6 条评论
  • Trust and Safety

    Trust and Safety

    If your website or service allows users to post comments or exchange messages with other users, then you will…

    5 条评论
  • Review: Claire Bowen's "Government Data of the People"

    Review: Claire Bowen's "Government Data of the People"

    As governments and corporations make increasingly more use of our personal data, a growing number of computer…

    3 条评论
  • Metasearch: Search and RAG multiple datasets without data governance chaos

    Metasearch: Search and RAG multiple datasets without data governance chaos

    Metasearch systems take your query, send it to multiple search engines, and then show you the combined results. Most…

    3 条评论
  • Vector Databases and RAG

    Vector Databases and RAG

    “You Do Not Need a Vector Database” is the provocative title of a recent blog post (with code) by Dr. Yucheng Low…

    12 条评论
  • Testing the family china for lead

    Testing the family china for lead

    In this issue I take a break from data and talk about something physical. This is Jerry Urban from Inspector 3755, his…

    6 条评论
  • Sensitive Locations

    Sensitive Locations

    Do you work in a sensitive location? On January 9th, the US Federal Trade Commission settled a case with data broker…

    4 条评论
  • WHOOP's AI (LLM) Coach

    WHOOP's AI (LLM) Coach

    In September, I joined the WHOOP Coach beta program, a new feature that WHOOP recently added to its popular fitness…

    2 条评论
  • ORINink, brightening the MTA

    ORINink, brightening the MTA

    Today on the #6 Subway in NYC I saw a man doing rapid drawings of other people in the car, then leaving the…

    4 条评论

社区洞察

其他会员也浏览了