Reflecting on the latest New York Times v Microsoft and OpenAI court case, Barry Scannell suggests that "As AI continues to evolve, so too must our legal understanding and regulations". In my opinion, however, law is and should remain largely technology-agnostic, adapting only to the extent we need to deal with the societal impact of technology and more importantly the impact of those behind its development and deployment (mostly addressing negative impact). This is why in particular it was a good move to remove any mentioning of concrete AI technologies from the definition of AI system in the EU AI Act. Because the main policy objective behind that regulation was to address risks from automated decision-making in high-stakes scenarios, not to engage in a futile attempt of capturing the ever-evolving taxonomy of AI development methods. We have argued with Barry earlier on the matter of permissibility of training GenAI models on copyrighted content without specific permission under one of my posts this year. I remember that he and many of you expressed your disagreement with me when I suggested that transient copying taking place during AI training was inherent but not economically (and hence legally) important. But I don't mean that copyright does not matter or that it needs to "evolve" just because a bunch of smart people have come up with a business idea whose implementation requires an ever-increasing use of copyrighted material at scale. I merely point out one of the most important — for me at least — prongs of the fair use test — which is whether the alleged copyright infringer is making it harder for the author and copyright holder to earn their living. In other words, whether there is an unfair competition element and a negative effect of GenAI products on the market for original works. In my post earlier this year, I proceeded on the _assumption_ that GenAI developers' products do NOT compete with those of the creators whose works were ingested into the model training datasets. And even despite that, I have received an understandably powerful pushback from the part of the community that still felt that uncompensated use of copyrighted works for ML model training is at least morally, if not legally, wrong. And I can totally relate. But now even my initial assumption — that GenAI products do not compete with the work of original content creators — is put into question. If the opposite is proven — that is, they do compete, at least with some works — this for me would be the most important argument against letting GenAI model developers train their models on copyrighted works without permission and without compensation. In the larger scheme of things, technological details of the GenAI model development process DO NOT matter, only the societal impact of deployment of the resulting GenAI products does.
As always, a well-reasoned post based on law and not subjective claims of harm and morality and the self-serving objectives. I am not the first to say that media companies copy original work from others all the time. As an independent publisher that conducts original research and has domain expertise around technology that is not resident in traditional media organizations that exist to parrot back to the market what they are told by "sources" or resides in company marketing materials, giant media organizations republish our original work regularly. They occasionally offer us credit for breaking a story or as the source of original research, but not always. So, you might say, and I think you may have mentioned this earlier, that a win in court could subject the media organizations to the same liability they hope to impose on deep-pocketed tech giants. However, I think they know they are insulated. The publishers that media organizations copy are smaller and unlikely to be able to afford a lawsuit. The big media companies have the most reach, so shaming them doesn't generate much traction if they ignore it and most people do not want to run afoul of them. ??♂?