In the past week, the number of signatories of *Statement on AI training* (https://lnkd.in/dREw7M_N) has surged to over 30,000, highlighting the urgent need for fair attribution to the original content owners. ?? At Openlicense, we’re developing a scalable solution to address this issue—stay tuned! #AI #ArtificialIntelligence #EthicalAI #ContentCreators #innovation #startups #technology
Professor of Advanced Media in Residence at S.I. Newhouse School of Public Communications at Syracuse University
More than 13,000 creatives (including some famous authors, musicians, and actors) signed a statement that expresses their growing concerns over the unauthorized use of copyrighted works to train generative AI models. The one-sentence statement published by Fairly Trained, an advocacy group founded by former Stability AI executive Ed Newton-Rex, reads: “The unlicensed use of creative works for training generative AI is a major, unjust threat to the livelihoods of the people behind those works, and must not be permitted.” Newton-Rex told The Guardian, “There are three key resources that generative AI companies need to build AI models: people, compute, and data. They spend vast sums on the first two – sometimes a million dollars per engineer, and up to a billion dollars per model. But they expect to take the third – training data – for free.” Like I said yesterday, this is the fight of the year (maybe the next five years). Fairly Trained’s idea is to have a universally enforceable method to prevent creative content from being used for AI training without permission and compensation. This might be possible, but anything that was ingested prior to the most recent cutoff date is already part of the collective body of knowledge of every foundational model (no matter the AI company). You can push back and say that there are ways to force this issue, that there are methods where you could extract specific artist’s works or fine tune a model to ignore requests to create new work product “inspired” by a particular artist’s work. You might even be able to figure out a way to prove that a particular model was trained on something specific and, if it can’t be extracted, force some kind of royalty payment. In practice, where there’s a will, there’s a way. First, we need to agree that the content (data) required for training should not be free. Then, as we fight the fight, we have to hope that new technologies and synthetic data don’t obviate the need for organic, real data. It’s hard to look toward the future through the lens of the present. It’s even harder to look through the lens of the past. Here’s my question (challenge) to each of you: For AI platforms, what is the right way to compensate content creators for the use of their works? How should AI platforms compensate content creators? There are two areas to consider: 1) training and 2) output. Please share your thoughts below. -s