My Take on ChatGPT and LLMs

A legal technology colleague asked for my opinion of ChatGPT today. Below is a slightly edited version of my email to them.

ChatGPT itself is an interactive wrapper around a large language model (LLM).?There's some cool things around training for dialogues that were done to build that wrapper, and those are perhaps more important in the long run.?But the current excitement is mostly around a wide public being exposed to LLMs for the first time.??

What AI people have known for the past decade, and everyone is seeing now, is that?you can build a very good statistical model of what coherent text looks like at the surface level.?That lets you generate superficially plausible text on pretty much any topic.?I think of it as super-autocomplete.?

LLMs don't understand anything, so the only way they create true / meaningful / whatever text is, roughly speaking, if enough of the text they were trained on said true / meaningful / whatever things related to the topic you're "autocompleting" text on.?One analogy is the half-drunk guy at the party who is an expert on every topic, but is only sort of accidentally correct in what they say.?But even that person knows some things and intends some effect. LLMs know nothing and intend nothing.?

The Implications of Infinite Fluent Nonsense

But having machines generate fluent nonsense on any topic has exposed a zero-day bug in human society: there are big areas of life where we make the implicit assumption that fluent language can only be generated by human beings. There's at least two big consequences of that assumption:

(1) We've assumed it's not possible for bad actors to generate large amounts of language, because human beings are limited. Now bad actors will be able to generate an effectively infinite amount of bad content in any system they have access to.?We need to figure out how to run a society where there's a million evil language robots for every human being.??

This has mostly been discussed in the context of social media, but that's just the beginning.?It will soon be possible, for instance, to create software that would do nothing but attempt to imitate every person in the world one at a time, call up their financial institutions, and try to talk that institution into withdrawing your money.?Or email every person in your recently hacked address book, do a plausible imitation of your personal style, and destroy your relationships with every one of them.?Etc, etc.?If you thought passwords, identity verification, etc. were obtrusive now, you haven't seen anything.???

(2) We've used the ability to generate language as a surrogate for understanding and competence, in educational testing and professional certification.?That's very quickly going to be untenable unless you've got the person in a Faraday cage.?We're going to have to come up with new ways of evaluating people, and it's going to be very expensive.?

Some Mundane Implications for Legal Tech

That's the downsides.?On the plus side, what is super-autocomplete good for???Well, there's a lot of formulaic documents that need to be generated in the world, and of course in the law.?LLMs will be hugely useful in checking over and eventually generating those documents.??

However, there's a lot of task-specific engineering work to do there.?And a lot of careful looks at costs and benefits.?It's a common misconception, going back to early days of machine translation, that cleaning up bad or incorrect machine-generated language is much cheaper than creating text from scratch.?It can actually be more expensive, or only marginally cheaper.?Further, many legal documents in particular are not just documents, they are signals that the attorney has developed a deep understanding of the client's situation, and its that understanding that is being paid for.??

Dave

Thanks for fair and measured take on this topic, Dave!

回复

Dave, Thanks for this. I've already been sharing :) I suspect we are backing ourselves into a "dataverse" that will require the introduction of some kind of public provenance blockchain to root every graph back to a real individual human, asserting something at a particular moment in real space. My assumption is that until we anchor the fundamental structure of the Internet in real human terms, on purpose, we will continue to experience "medium muddying the message" problems.

回复

I enjoy being creative with the ChatGPT, and wonder if it wants to write a counter-argument to this article :) Well done, Dave, another excellent insight.

Srini Pagidyala

Mission: To bring AGI Benefits to Humanity | Scaling Aigo.ai to AGI to boost Human Flourishing | Going Beyond LLMs using Cognitive AI | Speaking with asymmetric ‘Aligned’ Lead Investors - Series A

2 年

Must read. Thanks for sharing Dave Lewis. #ChatGPT is Super - Autocomplete #LLMs don’t ‘Understand’ anything #ChatGPT creates infinite ‘Fluent Nonsense’ These three lines capture it all.

回复
William "Bill" Hamilton

Senior Legal Skills Professor and Director, UF Law International Center for Automated Information Retrieval

2 年

Brilliant article.

要查看或添加评论,请登录

Dave Lewis的更多文章

  • My Conspiracy Theory about DARPA's Hidden Lair Post: Conspiracy Theories

    My Conspiracy Theory about DARPA's Hidden Lair Post: Conspiracy Theories

    As widely reported, the Defense Advanced Research Projects Agency (DARPA) posted the following tweet on August 28th:…

  • Privacy, Search, and Email @ Archives 2018

    Privacy, Search, and Email @ Archives 2018

    I'm pleased to be speaking tomorrow (Thursday, 16Aug18) on a panel on privacy-preserving search of email archives at…

  • 3 Hats @ 2018 Archives

    3 Hats @ 2018 Archives

    I'm pleased to be speaking Thursday in DC on a panel on privacy-preserving search in email archives, at the ARCHIVES *…

  • PROFS #1 & SIGIR 0.731

    PROFS #1 & SIGIR 0.731

    I'm delighted to be giving a keynote talk in Ann Arbor this Thursday at the First International Workshop on…

  • Grand Pwning Unit

    Grand Pwning Unit

    Microarchitecture timing attacks are pretty scary, but can be kind of slow. Good thing attackers can't attach a…

  • The DOJ's Proposal for Machine Learning in the Michael Cohen Case

    The DOJ's Proposal for Machine Learning in the Michael Cohen Case

    Yesterday, the United States Attorney’s Office for the Southern District of New York proposed that a special master…

    13 条评论
  • Billions in Buggy Bitcoin Bindings

    Billions in Buggy Bitcoin Bindings

    Having written (imperfectly) both software and contracts, the enthusiasm for smart contracts has surprised me. Adrian…

  • Ola! An amazing legal tech conference in Brazil

    Ola! An amazing legal tech conference in Brazil

    I had the honor speaking on text analytics in the law last week at I Congresso Internacional de Direito e Tecnologia…

    4 条评论
  • AI & Law Panel tomorrow in Chicago

    AI & Law Panel tomorrow in Chicago

    I'm pleased to speaking on the panel Demystifying Artificial Intelligence: What Lawyers Need to Know About AI and…

  • Back from Tokyo

    Back from Tokyo

    SIGIR 2017, the 40th annual conference of the ACM Special Interest Group on Information Retrieval was a fantastic (and…

    5 条评论

社区洞察

其他会员也浏览了