I Know What You Mean
In my first article, I mentioned natural language pragmatics as the field studying how speakers resolve context dependency, and figuring out how we understand each other despite how messy language is. In this post, we will look at some specific examples and how accurately ChatGPT can actually handle them. This will serve a double purpose: give you some idea of what challenges speakers face when interpreting natural language, and find out how well LLMs tech can deal with them.
Before we start, I’d like to make a quick note that in our current reality, these examples might seem almost trivial, but I assure you that this achievement is built on top of decades of failed attempts, due to the extremely high level of parameters and level of complexity of the task. The goal here is to try to appreciate some examples of exactly what modern LLMs achieve that wasn’t possible until recently.?
Let’s kick it off with one of the most commonly used examples in the pragmatics literature, the versatility of the phrase do the/a X. Depending on what noun phrase X is, the phrase can have any number of meanings, even ones you haven’t considered before (called ad-hoc or nonce sense), but speakers don’t seem to even take a pause when trying to interpret it. Let’s see what ChatGPT has to say about it. (Forgive the imperfect formatting, this is the best I can do with LinkedIn's interface)
The important thing to notice is that in order to interpret these phrases one needs to have some knowledge about the world, and also about how language behaves, and tie the two together to form a coherent meaning. No one is surprised that C (ChatGPT for short) is able to do it, but I do encourage you to take a note of how specific and accurate his explanation of the Napoleon pose is, including its social and cultural connotations.
But this is too easy, right? To crank the challenge level higher, I went to one of my all-time favorite papers, “Making Sense of Nonce Sense” (1973), by Herbert Clark, where he cites many examples of ad-hoc uses of language which break the standards, and asked C to explain them to me. Here is what I got:
Again, notice how specific the explanations are. It requires simultaneous familiarity with both the abstract properties of the concepts being used (gossip, plunging, speed, etc’), and the real-world context of the situation. The fact that generating buzz (or hype, in more contemporary language) may take busywork or create noise. that gossip might involve guesswork, that gold is strongly characterized by its price and that prices can go up and down, and that crab is often consumed as food. It’s much like a combinatorial mix-and-match: you have to project two concepts in exactly the right magnitude and direction so that they are able to combine with each other in a way that matches what we know about the world. So for example, in the case of quick crab, “quick” has to be projected in the dimension of “preparation speed” and “crab” in the dimension of “a dish”.?
If this sounds confusing, I’ll give you a small primer: in LLMs, the vector that represents a piece of text encodes information about its value along many dimensions measured separately. The unique meaning of the text results from the combination of these values, but we can isolate aspects of the meaning by looking at a combination of a subset of these dimensions. This is an over simplification, but you can imagine that some 3 combination of dimensions, one responsible for encoding concepts that have to do with altitude, another that encodes chemical concepts, and yet another that encodes financial ones. And when the model realizes that the string “gold plunges to new lows” is unlikely to describe altitude, it is projected into the financial domain, it is as if the model isolates the dimensions responsible for encoding altitude and projects them into the space defined by the vectors encoding financial concepts, and this is how we get the idea that the price went down.?
I realize that this explanation is far from satisfactory, but going into any more detail would stray too far from the goal of this article. If you want to learn more, there’s content coming up about the representation of metaphors in vector space models, in which I go deeper into the idea of cross-domain projection.
But even with a cursory understanding of the issue, I’m pretty sure you realize how complex this task is when considered in terms of the number of dimensions to choose from, the magnitude by which each concept has to be projected along its dimension, and the amounts of real-world information the resulting new meaning has to be compared against to evaluate it. This is a combinatorial nightmare, which is why it is really quite remarkable that the machine that achieves it was created the way it was, without any facts or knowledge being explicitly introduced beyond the text itself.
As discussed in my first article, there are far too many such nuances to language and the world to be able to hard code or enumerate them. Making these connections requires a model with strong generalization abilities trained on substantial amounts of data, which is what we have here.
Interestingly, note that it didn’t get the wordplay behind “buzzier” right in its first attempt (“busy” + “buzzy”), which shows how non-trivial this task is. It did get it right when I asked more specifically.
In my attempts to challenge C, I scoured “Defining Pragmatics” (2010) by Mira Ariel, a classic pragmatics textbook that cites many conversation snippets which exemplify interesting inferences people make when using language. Here is what I found (page 62).
Prompt:?I will give you a dialogue snippet. explain to me what is the implicit meaning of "we don't have laborers", what is inferred from it:
M: You don’t build by yourselves
S: No, we don’t have laborers, we have 10, 15 laborers (Lotan: 7).
Response: The implicit meaning of "No, we don’t have laborers, we have 10, 15 laborers" in this context seems to suggest a contradiction. Initially, the statement "we don't have laborers" implies a lack of workers. However, the follow-up clarification of having "10, 15 laborers" contradicts this, indicating that they do have some labor force, albeit perhaps not as many as might be expected or required for the work being discussed.
领英推荐
Compare to Ariel’s paraphrase: “we don’t have enough laborers to build by ourselves”. Notice also C’s reasoning process, which is very reminiscent of how linguists describe the process of inference, as bending the standard meaning of a phrase in the smallest amount that accounts for everything that was said. That is, we have to bend the meaning of “we don’t have laborers” in the smallest magnitude possible which still accounts for the fact that they have 10–15 laborers. This is the same idea as projection along the right dimensions I discussed above.?
There are, though, examples of C missing the intended meaning. These are few and far between, but I did recently find a decent example of one:
Q: what is a list sales
A: A "list sale" generally refers to the sale or rental of a mailing list.
I was thinking of A-list sales, but it didn’t occur to me to capitalize the “A”, and apparently it wasn’t obvious enough what I meant. But this is easily a misunderstanding that could happen between humans as well, so the fact that this is my best example of failed pragmatics with C means a lot. I know there are lots of examples online of awkward AI moments, though, but from my experience these are negligible exceptions in my overall flow of communication with LLMs. Not to mention that many of them are the result of attempts to intentionally break C, which is not what I am concerned with here. But I haven’t seen a quantitative analysis of these, so take it with a grain of salt.
To wrap things up, I also challenged C with a passage from the science fiction novel "The Towers of Titan" (Ben Bova, 1961), which represents the more nuanced and metaphorical sort of implicatures. I asked C to break down their pragmatics (= implicitly communicated information), and here are the results for you to draw your own conclusions from. Consider writing down the implicatures from this passage before reading C’s responses and then compare.
Bennett's smile broke into an open grin. "Self-pity is a terrible thing. Do you know what those kids think of you? You're Dr. Sidney Lee, the foremost xeno-anthropologist of the human race.
Response:
In conclusion, we’ve looked at some examples of actual language use that require a combination of linguistic and world knowledge to be understood, and cannot be broken down into some elegant set of rules. You have to really be familiar with all of it so that you wouldn’t be surprised by whatever novel combination of concepts is thrown at you, like “guess” and “gossip”. Moreover,? you have to be able to project concepts across domains, so that you find their interpretation that makes the most sense, like altitude and precious metals. To us it might seem trivial, but that’s because we have grown up solving these problems on a daily basis until they’ve become part of our nature. But from a computational point of view, it wasn’t clear until recently whether such problems can even be solved with the algorithms we have and with access to text data alone, due to the complexity of the abstractions it requires.?
In future articles, we’ll take a closer look at the math behind vector space models and how it might give rise to this kind of understanding. But in the meantime, enjoy this excerpt from “The Scarlett Letter” (Nathanieal Hawthorne, 1850) followed by C’s understanding of it:
Literature, its exertions and objects, were now of little moment in my regard. I cared not, at this period, for books; they were apart from me. Nature,—except it were human nature,—the nature that is developed in earth and sky, was, in one sense, hidden from me; and all the imaginative delight, wherewith it had been spiritualized, passed away out of my mind. A gift, a faculty if it had not departed, was suspended and inanimate within me. There would have been something sad, unutterably dreary, in all this, had I not been conscious that it lay at my own option to recall whatever was valuable in the past. It might be true, indeed, that this was a life which could not with impunity be lived too long; else, it might have made me permanently other than I had been without transforming me into any shape which it would be worth my while to take. But I never considered it as other than a transitory life. There was always a prophetic instinct, a low whisper in my ear, that, within no long period, and whenever a new change of custom should be essential to my good, a change would come.
Response:
Heresiarch
8 个月Unexpected but good, groovy even.
Franchise Growth Strategist | Co-Producer of Franchise Chat & Franchise Connect | Empowering Brands on LinkedIn
9 个月"The important thing to notice is that in order to interpret these phrases one needs to have some knowledge about the world, and also about how language behaves, and tie the two together to form a coherent meaning" No LLM can do this because the basically created longer and longer well formed syntax. There is no knowledge model of the world. And they certainly aren't capable of semantics, other than through syntax. But, they are are pretty neat.