登录查看更多内容

AI and the Unsettled Law Doctrine

David Atkinson

In-House Legal Counsel for AI Institute | A.I. Ethics and Law | University Lecturer | Veteran

发布日期: 2024年5月23日

This newsletter only shares some of my favorite posts from the Substack Jacob Morrison and I write, called Intersecting AI. You can find it here if you'd like to see all posts: https://intersectingai.substack.com/

As a lawyer, my role, by and large, is to make and examine the legal arguments for my employer’s actions. Lawyers are not taught ethics in law school beyond professional responsibility. Professional responsibility, in turn, focuses on matters like due diligence and not commingling client funds with personal funds. It does not address whether one’s actions are “right.” We spend little time thinking of our conscience aside from how it might lead to sanctions or disbarment.

Indeed, lawyers are infamous for being non-ethical. Non-ethical is not the same as unethical. Non-ethical is being ethics agnostic. This is how you can find lawyers who will dutifully defend repeat child molesters, election deniers, climate change deniers, vaping companies who want to market to children, and more. This is for two primary reasons: (1) We can be hired guns. If you have the money to pay a large law firm to take on your case, there is a good chance they will take it on. And (2) the US has a deliberate adversarial system. We believe the accused should get a fair hearing in a court of law, the accused are innocent until proven guilty, and they should have adequate and powerful advocates to make the arguments. We believe in this so much that we will provide free representation to all defendants in criminal cases regardless of the alleged wrongdoing.

There is a difference in my mind, however, between providing counsel to ensure the fundamentals of a fair judicial system, on the one hand, and arguing for outcomes that clearly benefit one party at the expense of others. When a company pours forever chemicals into a nearby stream and is sued, the lawyers representing it aren’t doing so because the company can’t afford adequate representation. The lawyers are doing it because they will be paid handsomely to argue that pouring the chemicals is the only alternative, or it was an accident, or the chemicals aren’t that bad, or, maybe, the chemicals are actually good for the environment. The willingness to subjugate one’s personal values to those of a paying client is, in part, why law firms can afford to pay brand new lawyers with zero experience $180,000 salaries.

As strong as my opinion may appear to be, I tend to believe the current system is the best of all realistic alternatives. Yet, there is still value in pointing out the shortcomings of our legal system and how it may color ethics. And this is where I’d like to discuss the role of law and ethics in artificial intelligence.

Many ethical issues in AI at first appear to be predominantly legal issues. A hot topic of today, for instance, is how fair use may apply to copyright materials used for AI training. For instance, OpenAI, the maker of GPT-4 and ChatGPT, is on the receiving end of multiple lawsuits accusing it of violating copyright law.?

Copyright law grants the owners of works (artwork, books, essays, screenplays, music, choreography, etc.) some exclusive rights, including the exclusive right to reproduce the work, to distribute the work, to display the work, and so on. Some copyright owners (particularly visual artists, software engineers, and authors of books) claim that large AI companies violated copyright law on an astounding scale. Book authors, for instance, claim that OpenAI used nearly 200,000 illegally downloaded books to train its large language models (LLMs).[1]

That is, one website illegally took the books from their original site, uploaded them to a different server, and then AI companies downloaded them from that server and trained their LLMs on the books. At no time were any authors compensated, and they did not give consent.

These facts are basically undisputed. While the largest AI companies have not admitted to using the books, they also have not denied it because then they’d probably be lying, which would be bad news for them in court. Instead, their lawyers have tried to make a legal argument that none of the AI’s outputs have infringed anyone’s rights, so there is no need to consider what materials were used to train the models.?

In other words, they want to sidestep the question of the allegedly ill-gotten books completely. This makes sense, because the only argument would be that the use of the books is “fair use” under copyright law. But fair use is an affirmative defense, which means the companies must argue that though they violated copyright law, they did so in a permissive fashion. If they can avoid admitting they used hundreds of thousands of illegally downloaded books, they will.

Big tech companies have hinted that a fair use argument is on the table even though they haven’t made the argument in court yet. OpenAI made the case in a comment to the US Copyright Office and Meta briefly referred to fair use in a court filing before switching the focus back to generated outputs and away from whatever training materials were used.?

Meta’s legal filing notes, incredibly, that “Plaintiffs’ claim for direct copyright infringement is based on two theories. One is that “Meta created unauthorized copies of Plaintiffs’ books in the process of training Llama…” and that such a claim was “without merit,” but then Meta immediately shifted its focus to outputs.[2]?

It’s unclear if Meta is saying it, in fact, did have authorization to use the 200,000 books despite what the authors say, or if Meta is saying it didn’t make any copies, despite the copies being in its alleged training dataset. Or perhaps they are saying they’ve developed a new technique to turn content on the web immediately into vectors (numerical representations of the words) for machine processing without needing to first make a copy of the original work, which would be an astounding breakthrough, freeing up huge amounts of data storage and probably computational resources, and improving the efficiency of model training by tremendous margins.

However, I don’t want to go too far down the fair use rabbit hole. Suffice it to say that using such copyrighted works is unsettled territory and nobody knows how the lawsuits will turn out. This is the case with several other critical issues. While what’s illegal is also generally accepted as unethical, the inverse is not true. What’s not illegal isn’t necessarily ethical.?

Importantly–and improperly, in my view–in-house legal teams have become the default deciders of what’s ethically right by making legal determinations for unsettled law, including:

Whether non-profits and/or research organizations using legal exemptions from laws for copyright or privacy are committing data laundering by then licensing that data to for-profit entities. Stability AI is a famous example of this.
Whether gathering data from across the web to train for-profit AI models is ethical at all. Legal teams often give the green light because it passes legal muster, but in saying it’s not illegal without noting caveats for ethical concerns, they implicitly say what’s permissible is also what’s ethical.
Whether generating profits from LLMs trained on copyrighted materials or private data without compensating the source is ethical and is not a form of unjust enrichment.
Whether allowing “in the style of” prompts or allowing AI replicas of real people (again, without consent) is ethical in addition to not being illegal.

What I’d like to focus on are the ethical undertones which, to my ears, are more like overtones. To me, when the legal issue is unsettled and there are massive ethical questions, such as whether using pirated material to train AI is permissible, the discussion should be driven by ethical arguments, not legal arguments.?

William W Collins 3 个月前

Look Under the Hood: A Day in the Life of Your…

Jeff Cunningham 1 年前

Global Legal Profession: Quality, Value and Metrics…

Stephen McGarry 2 年前

Moreover, legal arguments should directly address the concerns of the aggrieved rather than attempt to deflect or avoid the concerns. If a company can’t make a strong ethical argument to take a controversial action in an area of unsettled law, then their counsel should not make a legal argument in support of the controversial action.??

If a company’s weapon of choice when confronted with unsettled law is a technical argument that relies primarily on obscure, nuanced, and highly complex legal doctrines in the absence of a supplemental strong policy or equitable argument, it suggests to me that they are grasping to justify an action that is highly beneficial to them (i.e., makes them wealthier or to boost their cachet in the AI community) at the expense of at least certain people/entities and probably society at large.

Herein lies the rub: legal teams (especially in-house legal teams) are the de facto ethical arbiters for many (Most? All?) companies. When a legal team settles on a response to an issue, they are not only saying, “We believe we have a strong legal position,” but are also saying, “and we believe this position is the one we should take.”?

For example, the legal teams of big AI companies could just as easily hold the position that fair use does not support using pirated material. It’s not an impossible argument. And then the companies could just not use such material. But that is not how in-house counsel typically works. Instead, their role is usually to support whatever position is in the best interests of their employer, not society.

While it seems logical to decide that whatever the judicial system determines is fair or legal should be society’s standard as well, it’s worth noting that the outcome of many decisions is less a factor of the strength of the arguments and more about the access to resources.

This is particularly problematic when the tech company is extremely well-funded. For example, any of the MAMAA companies (Meta, Apple, Microsoft, Amazon, Alphabet) can easily throw more money into building a legal position (via research, lawyers, investigators, forensic analysts, etc.) than virtually any individual plaintiff or even the entire Federal Trade Commission (FTC) because each member of MAMAA generates more revenue in a few days than the entire annual budget of the FTC. This also means MAMAA companies can hire more experienced lawyers who can use more advanced tools to seek out every minute legal advantage.

For companies racing to push out large language models as quickly as possible, like OpenAI, Meta, Google, and Microsoft, this could mean throwing up roadblock after roadblock in the judicial and administrative systems to slow the ability of the FTC and others to take any meaningful action on a timescale that would have a significant impact. Instead, by the time the meat of the issues is adjudicated, the LLMs will be deeply ensconced in society, difficult or impossible to uproot. The tech companies, in other words, will determine what is ethical, not the society, or administrative agencies, or the judicial system. In all meaningful ways, might usually makes right in big tech.

In short, resolving these debates in the courtroom is inadequate. Lawsuits largely ask what can you do, not what should you do. Suppose a well-funded tech company trounces its opponent, whether the FTC, artists, or authors, in the courtroom. What does that prove, really?

If any entity can step up and address ethical issues in a timeframe that will make a difference, it’s the voice of society: the legislature. Not just the federal one (though federal action would be preferable to a patchwork of 50 laws), but also the state legislatures.

Society and legislatures should know that lawyers, generally, are not trained ethicists. For in-house lawyers, their loyalty lies with the provider of their paychecks, not with what’s best for society. You will not hear a big AI company argue that training on any copyrighted material found on the public web (even if they know or should know the content, like books, were stolen from elsewhere and moved to the public web) is not fair use. They will not disgorge the data used to train their LLMs and they will not voluntarily delete their LLMs so that future models are only trained on data copyright holders opt-in to share.

Importantly, it’s still not clear they should. Perhaps it would be unethical to not train on the data. The data, after all, makes the LLMs more useful, more well-rounded, and, hallucinations aside, more accurate than the LLMs would be without the data. As many will no doubt note, the LLMs have proven helpful at a myriad of tasks and their potential upside seems limitless. Would it be unethical to deprive society of such advances for the sake of copyright law? Well…perhaps.

Again, this is not a place for lawyers to take the wheel. Companies should form ethics committees, which should be made up at least partly of impartial third-party experts with diverse backgrounds. And that body should review and publish their ethical positions so it’s transparent to society. And the companies should be able to defend those positions. If they are indefensible, the companies should rightly face the fury of society. A small number of people in a single location should not determine what’s best for everyone everywhere.

[1] Including Sarah Silverman, David Baldacci, Mary Bly, Michael Connelly, Sylvia Day, Jonathan Franzen, John Grisham, Elin Hilderbrand, George R.R. Martin, Jodi Picoult, and others.

[2] From the filing: “Plaintiffs’ claim for direct copyright infringement is based on two theories: (1) Meta created unauthorized copies of Plaintiffs’ books in the process of training Llama; and (2) “[b]ecause the Llama language models cannot function without the expressive information extracted from Plaintiffs’ Infringed Works and retained inside [Llama],” the models “are themselves infringing derivative works”. Both theories are without merit, but this Motion addresses only the latter theory, which rests on a fundamental misunderstanding of copyright law.” (emphasis added)

Intersecting AI

369 位关注者

Shravan Kumar Chitimilla

Information Technology Manager | I help Client's Solve Their Problems & Save $$$$ by Providing Solutions Through Technology & Automation.

6 个月

Absolutely, lawyers have a key role, but AI ethics need input from multiple perspectives. Society's involvement is crucial! ?? David Atkinson

1 次回应

Emeric Marc

I help companies resuscitate dead leads and sell using AI ?????????????? #copywriting #emailmarketing #coldemail #content #databasereactivation

6 个月

Fascinating discussion on the intersection of AI and ethics. Can't wait to read more insights.

1 次回应

Oren E.

People Ops Leader | Lawyer | Tech | Start-Ups | Data-Obsessed Economist

6 个月

Thanks for posting! I love your content. I think this makes sense. Lawyers are specialized in the legal arts and sciences, which often overlap with ethics. But ethics is a separate discipline that has it's own experts. Lawyers sometimes position themselves as ethicists, but unless someone has advanced education or training in ethics, they should, as the kids say, "stay in their lane."

1 次回应

Kira K.

6 个月

As someone who has a law degree and a philosophy degree, I think the right guardians are a balance of both ethical and legal. Knowledge of the laws and implications but also of the philosophical depths of ethics and humanity is definitely a successful combination to provide a balanced approach to AI Ethics and Responsible AI.

2 次回应

查看更多评论

要查看或添加评论，请登录

David Atkinson的更多文章

Some Concluding Thoughts on GenAI and the Workforce

2024年11月26日

Some Concluding Thoughts on GenAI and the Workforce

This is Part 4 of our bite-sized series on GenAI and the workforce. The Reality: For Now, Human Labor Is Still More…
UBI? My, Oh, My

2024年11月21日

UBI? My, Oh, My

Part 3 of our bit-sized series on GenAI’s potential impact on the workforce. Economic Impact If many people lose their…
AI's Many Potentials

2024年11月13日

AI's Many Potentials

Part 2 of our bite-sized overview of GenAI and how it may impact the workforce. MIT professor Daron Acemoglu notes in…
AI, the Workforce, and the Economy

2024年11月7日

AI, the Workforce, and the Economy

Part 1 of our bite-sized series on GenAI’s potential impact on the workforce. We are not going to cite estimations of…
Law, Ethics, and AI: Parts 1 & 2

2024年11月5日

Law, Ethics, and AI: Parts 1 & 2

The purpose of this blog isn’t to generate revenue (it’s free for everyone). And it’s not to become famous (if it were,…

1 条评论
The Environmental Impact of GenAI

2024年10月31日

The Environmental Impact of GenAI

This is the final installment of our mini-series about GenAI and the environment. Uncertainty in Measuring…
The Environment and GenAI

2024年10月29日

The Environment and GenAI

This is Part 3 of our mini-series on GenAI’s effect on the environment and energy consumption. Perspective To address…
GenAI and the Environment

2024年10月24日

GenAI and the Environment

This is Part 2 of our mini-series on GenAI and its impact on energy and the environment. Location, Location, Location…
Energy and Environmental Impact

2024年10月22日

Energy and Environmental Impact

This will be the first in a mini-series on how GenAI is impacting carbon emissions, which affects the environment…

3 条评论
GenAI and the First Amendment

2024年10月17日

GenAI and the First Amendment

A few months ago, we, along with Jena Hwang, another AI researcher, authored a paper that emphatically and…

See all articles

AI and the Unsettled Law Doctrine

David Atkinson

In-House Legal Counsel for AI Institute | A.I. Ethics and Law | University Lecturer | Veteran

领英推荐

Intersecting AI

369 位关注者

David Atkinson的更多文章

社区洞察

其他会员也浏览了

Global Legal Profession: Quality, Value and Metrics --- A Billion-dollar ($) Consolidation

Legal Shadows in the Fog: Malpractice in a Steampunk Victorian Era and Modern UK Law Firms

The ethical challenges facing in-house lawyers, and how to address them – the beginnings of a new conversation with our in-house solicitor members

"Concluded Retainer" Argument Holds No Weight: Analysing Solicitor Conflicts Under the SRA Code

Guide Yourself Accordingly: Opposing Counsel, Threats of Discipline and Sideshow Clowns

ABA to Lawyers: Don't Feed The Trolls!

Making Millions in Law: Ethical Practice or Ruthless Business Savvy?

AI Meets Law: The Future of Legal Practice Unveiled

AI AND THE LAW: THE FUTURE OF LEGAL PRACTICE

Harris County Democratic Lawyers' Association - July 9th 2013 CLE Event

领英推荐

Intersecting AI

369 位关注者

David Atkinson的更多文章

Some Concluding Thoughts on GenAI and the Workforce

UBI? My, Oh, My

AI's Many Potentials

AI, the Workforce, and the Economy

Law, Ethics, and AI: Parts 1 & 2

The Environmental Impact of GenAI

The Environment and GenAI

GenAI and the Environment

Energy and Environmental Impact

GenAI and the First Amendment

社区洞察

其他会员也浏览了

Global Legal Profession: Quality, Value and Metrics --- A Billion-dollar ($) Consolidation

Legal Shadows in the Fog: Malpractice in a Steampunk Victorian Era and Modern UK Law Firms

The ethical challenges facing in-house lawyers, and how to address them – the beginnings of a new conversation with our in-house solicitor members

"Concluded Retainer" Argument Holds No Weight: Analysing Solicitor Conflicts Under the SRA Code

Guide Yourself Accordingly: Opposing Counsel, Threats of Discipline and Sideshow Clowns

ABA to Lawyers: Don't Feed The Trolls!

Making Millions in Law: Ethical Practice or Ruthless Business Savvy?

AI Meets Law: The Future of Legal Practice Unveiled

AI AND THE LAW: THE FUTURE OF LEGAL PRACTICE

Harris County Democratic Lawyers' Association - July 9th 2013 CLE Event