A Modest Proposal, or the Sound of Inevitability
For preventing the end of journalism, ending the dependence of AI systems on data “scraped” from the Internet, and providing AI systems with a sustainable source of new training data.
By Dr. Simson L. Garfinkel (with homage to Dr. Jonathan Swift and his Modest Proposal)
It is a melancholy object to those who once paid to read articles in newspapers or magazines, or haunt websites claiming to offer articles and commentary about important events of the day. All of this intellectual output requires a veritable army of reporters and editors, each supported by three, four, or even six individuals devoted to production, advertising sales, marketing, and promotion—all struggling, figuratively in rags, and importuning every reader for alms. These journalists, instead of working for their honest livelihood, are forced to devote all their time to proposing and researching new articles, writing their thoughts, and then laboriously passing the resulting text through grammar checkers and content editors (both human and electronic) until the final product is ready to be scrolled past by a reader on a cell phone screen.
I think it is agreed by all parties that this prodigious number of writers and editors, in the present deplorable state of the Internet, constitutes a very great additional grievance. Therefore, whoever could discover a fair, cheap, and easy method of producing high-quality journalism would be ethically obliged to make it known and to bring it about.
As for my part, in recent years, I have made extensive use of AI systems like ChatGPT to assist with programming, copyediting, and even organizing my day-to-day thoughts. These systems are an absolute marvel, but they have a well-known Achilles’ heel: they are voracious consumers of original, human-generated text.
“Training data” is the creative spark that powers all Large Language Models—a spark that companies like OpenAI “scraped” from the Internet (perhaps with legal authorization, perhaps not) and packaged for their own use, much like stuffing lightning into a bottle. But ever since OpenAI released ChatGPT on November 30, 2022, an increasing amount of text on the Internet has been contaminated with the output of LLMs, much as steel world-wide was contaminated after the explosion of the first atomic bombs.
But my intention is very far from being confined to provide only for the livelihoods of professed beggars journalists. It encompasses a much broader scope, embracing all human creators and influencers, whether on YouTube, TikTok, or Substack—indeed, anyone seeking to monetize their own intellectual output through publishing.
领英推荐
Rather than having journalists and other creators perform the laborious work of researching, writing, publishing, and promoting content only to become the web-scraping targets of those AI companies, AI systems should simply become the journalists. New human-generated content would emerge when these systems ask humans questions, record the answers, and train on the data. (This is what should have been the plot of The Matrix, as the idea of using humans for power makes no thermodynamic sense.) And indeed, now that ChatGPT can send email and dial the phone, what potential source would even know the difference?
The New York Times claims to have 1,700 journalists reporting from more than 160 countries. The International Federation of Journalists claims to represent 600,000 media professionals from 187 trade unions and associations in more than 140 countries. What a waste, and waste simply cannot be tolerated in our modern AI-based economy.
Think of all the jobs saved through digitization and AI transformation.
The economics are indeed inevitable: all of this work will be done by AI. Not today’s AI, of course, but some future AI that combines news gathering, reporting, and presentation. Instead of merely assisting the creative class with better tools, AI will help to eliminate that class entirely. Future systems will use humanity for training data. Like loving parents, we will feed and nourish our brain children. For the technical details, please see this Wikipedia article on Matriphagy.
Some readers may argue with my fundamental thesis, claiming that there are many sources of training data that are inaccessible to LLMs trapped in data centers. After all, my most recent articles in Technology Review Magazine have been based on painstaking review of MIT archives. These archives consist of handwritten notebooks and typed memos on yellowing pages, frequently more than a fifty or a hundred years old. There is no way that an LLM can access these data! Surely archival research is a possible role for at least a handful of journalists in the not-to-distant future.
To these readers, I have just one word in reply: robots.
Executive Director, Boston Institute for Nonprofit Journalism; Editor-in-Chief, HorizonMass; Vice President of Logistics, Talking Joints Memo ...(marketers, do not attempt to friend me)
1 个月I shall share this with my fellow mendicant colleagues lol ...
Writer
1 个月My vocabulary is grateful for "matriphagy." Thank you.
Geek who doesn't know how to present himself professionally on social media
1 个月I was hoping that your proposal would include eating people!
Chair,Sustainability IEEE SSIT|Co-Founder Vint Cerf PeopleCentered.net | Co-Chair UN Commission on the Status of Women: Digital Innovation 2023 Africa Asia Europe Middle East |
1 个月Very clever repurposing of the noted Wit Jonathan Swift’s Modest Proposal- the Matrix quote a crowning touch! Is resistance futile?