Why large language models like ChatGPT are bullshit artists,
and how to use them effectively anyway
ChatGPT outstanding in the field? Photo by Markus Spiske on Unsplash

Why large language models like ChatGPT are bullshit artists, and how to use them effectively anyway

ChatGPT?has sparked the imagination, and fears, of many people, but I find that many misunderstand what ChatGPT actually is. Instead, they get caught up in what it seems to be. My goal in this article is to get you to the point where you can come up with potential applications of the technology without getting waylaid by misconceptions.

So, what is ChatGPT? ChatGPT is a large language model (LLM) that has been trained to carry on dialogue. What does that mean?

Language Model

Language models are machine learning models that are trained on how to continue a sentence. So, I were to say:

At the stroke

the model tells me probable ways in which that sentence continues. Perhaps, it is:

At the stroke of midnight [70%]

At the stroke of a pen [30%]

This is because the language model has been shown enough text — speeches, articles, books, articles, reviews, etc. — and it has learned what words are likely to follow others.

Large Language Model

But what if the sentence previous to "At the stroke" contained the name “Jordan”, “Nehru”, or “princess”? How does that change the likely continuation? If it contains the word “Jordan”, the continuation is more likely to be:

At the stroke of a pen one Sunday afternoon in Cairo in 1921, Churchill created the British mandate of Transjordan

If it contains the word “Nehru”, the continuation is more likely to be:

At the stroke of the midnight hour, when the world sleeps, India will awake to life and freedom.

while it contains the word “princess”, the continuation is more likely to be:

At the Stroke of Midnight is a hilarious, empowering story where princesses can save themselves while slaying in stilettos.

LLMs are language models (the first L is large) that have enough parameters that they are able to learn word continuations in many different contexts. They learn what words or phrases to pay attention to, and how those change the likely continuation of sentences.

Memorization and generalization

LLMs learn word continuations in different contexts. In fact, they are so large that it may seem that they have essentially memorized the entire corpus of text that they have been shown. However, this is not the case — there is a famous information theorem that says that an LLM that memorized all the text on the Internet would take more space than what it would take to store the entire Internet in compressed form. So, mathematically, it is impossible for a practical language model to actually memorize everything that has ever been published.

Even with billions of parameters, the LLM is not large enough to memorize all possible word continuations in context. So, the LLM interpolates between likely words and contexts, choosing similar continuations in similar contexts.

For example, if the context contains “Indian princess”, the model might switch back and forth between the Nehru speech and the Tara Sivec blurb, producing something like:

At the stroke of the midnight hour, when the world sleeps, princesses can save themselves while slaying in stilettos.

It appears that the model has generalized beyond the text it has been trained on, “understanding” that the world sleeps at midnight and explaining the context behind the title of the novel.

But has it?

Hallucination

Interpolation can happen in multiple ways. If the context contains “British empire”, the model might switch back and forth between the Jordan quote and the Nehru quote, producing something like:

At the stroke of a pen one Sunday afternoon, when the world sleeps,

which is still quite reasonable as an English-language fragment. But it is quite clear that the meaning of the sentence is non-sensical, like something out of Alice in Wonderland. We say that the model has “hallucinated”.

You can get at the non-sensical nature of the sentence from multiple directions — it’s odd if you know the creation story of both Jordan and of India or if you look deeply into whether Sunday afternoon siestas are actually common all over the world or if you notice the tense change between the first part of the sentence and the second.

But this is hard -- most readers skim, and the sentence looks correct. Therein lies the danger of LLMs: looks right, doesn't work.

No alt text provided for this image
Fooled ya! Looks right, doesn't work. https://twitter.com/lak_luster/status/1598375995694014464

The human standard

What Open AI did with ChatGPT was to add an extra step, to train the LLM (the LLM is called GPT-3) to produce less nonsense. They did this by training a dialogue system (the Chat in ChatGPT) to choose between possible word continuations based on which continuation was more convincing to a human.

So, basically, ChatGPT is a large language model that has been trained to produce text that meets human standards. The chat model was trained to deviate as little as possible from the trained LLM to limit the kinds of hallucinations possible. It was also fitted with filters to reject biased questions, although these filters are easy to circumvent. Finally, it incorporates an earlier model called InstructGPT that allows it to take directions from humans and do what’s asked of it.

Strengths and weaknesses

Once you understand how ChatGPT was built, you can quickly see where its strengths and weaknesses come from:

  1. The model does much better in areas well represented as digital text (computer code, politics, science) than in areas that are not. The more sentences there are, the better the interpolation. The less likely the model is to be out to lunch.
  2. ChatGPT will reproduce misinformation from any of its input sources — it is not an intelligent system that tries to balance or weight different perspectives.
  3. The human raters are not experts in the topic, and so they tend to choose text that looks convincing. They’d pick up on many symptoms of hallucination, but not all. Accuracy errors that creep in are difficult to catch.
  4. Because the choice of words captures tone and reveals biases, LLMs will tend to reproduce the tone and bias of the articles in its input corpus. Confident, scientific, or?racist, it will reproduce anything.
  5. The sources of individual fragments are lost. This is not information retrieval.
  6. ChatGPT has no reasoning capability. It can not do math. It can not solve logic puzzles. It can not invent new knowledge. It’s definitely not sentient.

What do you call a system that sounds confident about what it says regardless of whether what it’s saying is true? I’ll give you a hint.?Harry Frankfurt?wrote an entire book cautioning against a person who “does not reject the authority of the truth, as the liar does, and oppose himself to it. He pays no attention to it at all.” He calls such a person a bullshitter.

Best case illustration

Let me illustrate the above points by taking a subject on which a lot of information exists, and very little misinformation. This ought to be a pretty good case for ChatGPT.

No alt text provided for this image
A pretty good case: lots of information, very little misinformation, but the answer is still bullshit.

The answer is certainly relevant and touches on the key point of Kansas being proximate to the Rocky Mountains and Gulf and Mexico. But there is no real “why”. The answer is confident, the language is grammatical and erudite, but it is talking around the true answer. This is what Harry Frankfurt terms bullshit.

What would a better answer have looked like? We can get at this by giving ChatGPT better words for it to pay attention to:

No alt text provided for this image
Using the word “tornadogenesis” reduces the bullshit.

Notice how much more scientific this answer is. No more bullshit about unique geography, but the actual answer of warm, moist air colliding with cold, dry air. That’s because the scientific term “tornadogenesis” has drawn on a corpus of text that answers the “why” question much better. The collision of airmasses is likely to happen over Kansas, although that doesn’t show up in the scientific texts (only in the map illustrations!), which is why the answer to the question about Kansas was unable to reproduce this information.

Again, it should be emphasized that ChatGPT can not do reasoning. It can not understand the intent behind questions. There is a persistent myth that certain towns (Moore, OK) are more prone to tornados than even towns that are nearby. Ask GPT this and it will not correct the mistaken intent behind the question:

No alt text provided for this image
ChatGPT does not understand the subtext behind a question, and returns factual answers to a similar question.

Instead, it pulls on texts that are similar but not the same — this answer is about the difference in tornado likelihood?across?the Midwest (between Kansas and Minnesota, for example). A person who asked the question about nearby towns will, instead, get an answer about towns far apart. The answer is right, but it is not the answer to the question that was intended. In fact, the person who asked the question will walk away believing that Moore, OK is near the intersection of air masses whereas Norman, OK isn’t.

What should you not use it for?

Given these limitations, do not use GPT to:

  1. Generate full articles. First, the article will be inaccurate and full of bullshit. Secondly, your website will get penalized in SEO because Google said they?would?and showed?how they can. Generating tightly constrained, 1:1 content might be fine (see next section).
  2. Anything that involves reasoning (including math). If you are a teacher, and you want to ensure that your students don’t simply use a bot to do their homework, make sure to avoid questions that are a single Google search away. Instead, ask questions that require putting together multiple pieces of information, or involves missing context even if it is as simple as “Kansas is in the geographic area mentioned”.
  3. Anything that involves recent context. ChatGPT was trained before the 2022 World Cup in Football started. It has not seen any articles about Morocco’s excellent run in the tournament.

What is it good for?

These limitations do not mean LLMs are not useful. It’s early days, and I’m very confident in how innovative people are. Already, we see various good uses of ChatGPT:

  1. To create?poetry. When you want the model to be creative, hallucination is good! The joke is that we used to think AI would do the manual work while humans would do the creative work. Yet, here we are, still cooking and cleaning and driving cars while the robots draw pictures and write poetry.
  2. Outlining and form letters. The structure and wording of outlines are very similar and LLMs have seen lots and lots of outlines. So, ask an LLM to create a structure of an article, or a book, or a thesis, or a proposal, or a letter of intent, and it will do quite well. This also works for the complete content of things like form letters.
  3. Code snippets. Ask an LLM to write code that corresponds to the type of code that exists in code documentation and it will do quite well at reproducing it. The actual code may not do what you want it to do (indeed, Stackoverflow?banned?generated code for this reason), so you should test/edit the generated code in small pieces.
  4. Domain-specific assistants. Training on high-quality medical sources yields a LLM that creates text that does quite well on medical exams. Again, because it doesn’t reason and can not understand the intent behind a question, you don’t want to use the bot to replace a doctor, but they will do very well as doctors’ assistants.

Can you think of (or have you seen) any other neat applications of this technology?

Summary

ChatGPT is a large language model (LLM) that has been trained to carry on dialogue. It does this by predicting word continuations based on context. Because of fundamental limitations, LLMs interpolate between the text sequences that it has seen. They do not do actual reasoning or balancing of viewpoints. Unlike with humans, you can not use the tone of voice or general credibility to guard against bullshit — an LLM is not going to show up with slicked-back hair and extra-shiny teeth.

Even with these limitations, ChatGPT is a promising tech. There will, no doubt, be very interesting applications that take advantage of ChatGPT and models like Google’s LaMDA that are based on similar principles. The models will keep improving and the killer application is not far off!

Postscript: science moves?on

The tornadogenesis explanation that came out of ChatGPT was good enough for me, but Robin Tanamachi, a Purdue meteorology professor and one of the world’s experts on tornados, pointed out in a comment on my LinkedIn post that the science has since moved on. As proof, she linked to an article by a veritable who’s who of tornado research in the highest impact meteorological journal that called the “clash of air masses” characterization “oversimplified, outdated, and incorrect.” The authors instead suggest a more nuanced explanation that has to do with horizontal and vertical gradients in temperature and wind shear (and indeed, those are the features that machine learning models for tornado prediction have long used).

Searching Google for “tornadogenesis” reflects explanations that involve phenomena at the right scale:

No alt text provided for this image
Google's results consist of recent scientific papers that reflect the latest understanding.

However, the Google links are all to scientific papers. The “Markowski Richardson 2009” link is to an earlier paper by two of the authors that Robin linked to.?

However, I’d asked for a high-school explanation, and so, ChatGPT seems to have picked up wording from this presentation that was at the desired high-school level. ChatGPT is not going to weight evidence, credibility, etc., which are things that Google uses when deciding what information to surface.

UmaKaviya Tamilarasu

Head of Product, Amazon Customer Delivery Experience

1 年

Thanks for sharing this ! Helps clearly distinguish what ChatGPT can or cant do in its current state.

Kenneth Igiri

Enterprise Architect | Enabling Long-Term Business-Tech Alignment with Architecture & Strategy Tools

1 年

Thoroughly insightful. Thank you. Know what to ask, know what to do with the answers and stay current.

Ramji Bala

Google Certified (1) Digital Cloud Leader & (2) Cloud Professional Architect | Global Product Development | Cloud Migration

1 年

Thanks Valliappa Lakshmanan. Good read. Will wait for Lt Commander Data to show up soon :)

Such a great article, I thoroughly enjoyed! I am waiting for someone to put 2 models to feed each other and call it "don't bullshit a bullshitter" ?? . Wish you a Happy 2023!

Joydeep Ghatak

Generative AI | AI Safety | Responsible AI |AI Product Management |AI Technical Program Management |<Quantum |Computing> | <QML> | Data Science | | TCS

1 年

Very insightful explanation!! ?? ?? ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了