Convincing v. Accurate:  A rubric for understanding Generative AI outputs*
Rendered from Stable Diffusion with the prompt “Convincing vs. Accurate Generative AI”

Convincing v. Accurate: A rubric for understanding Generative AI outputs*

* The author works in the field of machine learning/artificial intelligence. 
The views expressed herein are his own and do not reflect any positions or perspectives of his employer.        

Among the various Generative AIs that have been released for public use, ChatGPT is commanding a lion’s share of the buzz.? One of the many striking things about ChatGPT is the broad range of user prompts it can process to return relevant, often highly cogent outputs. This range includes essay writing, essay editing, poetry writing, script writing, how-to guides, function-specific source code generation, and general ideation on a given topic of interest.? As good as ChatGPT and other leading Generative AIs might be, both their developers and users are keenly aware of deficiencies.? Among the concerns expressed about Generative AIs, several seem to pull in opposite directions. On the one hand, users and critics point out errors and subpar responses from Generative AIs to certain types of user prompts.? On the other there is a concern that outputs from Generative AIs might be too good, creating the possibilities of successful cheating , fraud and other forms of harmful deception .?

In my prior piece, I characterized Generative AIs as creating a socially distributed form of Turing test , discussing why that is both part of their appeal and part of what leaves us also feeling uneasy.? In this piece I provide a basic, structured approach for thinking about the failure and success modes for the outputs of Generative AI – and the implications for how we navigate Generative AIs going forward.? I propose that both the deficiencies that annoy users and many of the broader social concerns around Generative AIs can be understood in terms of two factors: the extent to which outputs from a particular AI are convincing, and the extent to which they are accurate.? Before we delve in, it is useful to explore a bit about how Generative AIs are designed.? Both the technology behind how Generative AIs work and the objectives for having them interact on a broad scale with members of the public help explain their success and failure modes, the latter in particular.

Generative AIs seek to provide highly sophisticated, very useful responses to what a user might be interested in learning about or getting done.? To that end, ChatGPT in particular represents a significant technical achievement.? The current version of the machine learning (ML) model underlying ChatGPT (GPT3.5) was trained on 45 terabytes of text data and contains 175 Billion parameters .? Parameters are numerical variables that allow ML models to make new predictions from training data.? They are the elements of neural network-based ML models that allow for pattern recognition and the synthesis of relevant new outputs given a user prompt. To put that number of parameters in perspective, they are more than double the number of neurons in the human brain, estimated to be in the neighborhood of 86 Billion .? OpenAI reportedly spent over $12 Million to train GPT3 in 2020 .? Specifically, this money was spent on an extensive, energy-intensive process using Microsoft's super computing cloud infrastructure. This kind of process is required for all large scale ML engines to learn from extensive sets of training data and fine tune the parameters that enable them to make predictions and offer compelling outputs. This helps explain why ChatGPT is good enough to stoke a rapid and unprecedented level of user adoption.? Why, then, does ChatGPT get so much right, but still so much wrong?

Convincing vs. Accurate

Accuracy is an end goal for many prompts to Generative AIs.? If I want to know X from a single source, chances are that I’d like a ground truth version of X (e.g., What is today’s date?? What is Martin Luther King Jr.’s birthday?? How many miles away from the Sun is Earth?).? That being so, there are many other types of prompts for which Generative AIs, including ChatGPT, produce outputs reflecting bias or falsehoods .? However, for other prompts, accuracy is precisely not the goal, but emulation is.? If from an image-rendering Generative AI I want a painting in the style of Jean Michel Basquiat, I expect an emulation and might hope it is convincingly evocative of the late painter’s work.? But I would not expect it to be “accurate” or an exact replication of one of his works.??

The first image below is of Jean Michel Basquiat’s “Warrior.”? The image beneath it was generated by the Generative AI, Stable Diffusion, in response to the prompt “Painting in the style of Warrior by Jean Michel Basquiat”.? The generated image has a human figure in it, and uses style elements from the broader body of Basquiat’s work, but doesn’t copy the specific elements or style of his Warrior painting. The generated image is meant to be convincing as Basquiat-like , but it is not meant to accurately reproduce a specific Basquiat work (at least that was the intent of my prompt).? In fact, if the generated image was both convincing and an accurate reproduction, we might consider it a verbatim copy or forgery.? Were it convincing and overly similar to Warrior, we might consider it a derivative work, also on shaky footing under Copyright law.? (Worth noting is that, for purposes of a socially distributed Turing test , the generated image below might be convincing to a person one step removed from me (and this piece) as having been created by a human rather than a Generative AI model.)

No alt text provided for this image
"Warrior" by Jean Michel Basquiat
No alt text provided for this image
Stable Diffusion prompt: ‘Painting in the style of Warrior by Jean Michel Basquiat’

Generative AIs like ChatGPT, which output language in response to user prompts, are designed to synthesize information from several different sources and put conversational framing around whatever they synthesize.? Both the synthesized data and the framing are meant to convey information in the way that a human might.? To the extent that the framing is both coherent with the synthesized data and relatable to the user entering the prompt, the output is more likely to land well.? The more convincing and authoritative the framing, the more reliable the output will seem.? As I discuss below, in many fields being accurate but not convincing is problematic, while being convincing without accuracy tends to be worse.? Before we get to a structure for thinking about accuracy and convincingness, let’s first explore semantics, a key concept behind how humans and computing systems are able to have meaningful, relevant interactions.

Semantics

Semantics, classically, is concerned with the cognitive structure of meaning, logic and the meaning of words in relationship to each other.? Semantics, in computing, is concerned with how computer programs are able to mathematically determine meaning from data. Generative AIs don’t really “understand” things (at least not yet).? They can be trained, however, to make semantic associations between words and phrases and the elements of both the data they are trained on and the outputs that they generate in response to a user prompt. The associations are stored (and updated) in numerical form. This is called semantic segmentation .? It is how neural network-based machine learning (ML) systems, of which Generative AIs are a certain type, can differentiate between a cat, a bear and horse.? It is how facial recognition software can tell that a new image fed to it is you and not me.? Suppose training images of you were fed to an ML model.? Through pixel- and then tile-level parsing, the model can learn the unique size and shape of your eyes, and the specific placement around your face and head of your eyes, nose, mouth, ears, hair and other features.? With enough images, the model can learn to recognize you in profile or directly facing the camera.? The model learns the signatures or features of your face at many different levels of detail or abstraction. The model never learns to “know” who you are in the way that another human being can learn who you are (at least not yet).? However, the model can create a rich enough mapping of your features such that any new image of you can easily be identified as you, or you can be easily identified as you in a picture with multiple people. With the speed of analysis that modern processors and cloud technologies afford, ML models can become highly adept at pattern recognition in a manner that conveys a semblance of “understanding.”??

Semantic segmentation is useful in applications such as handwriting recognition, and the systems used by self-driving cars to recognize and distinguish between objects on the road or at a crosswalk.? It is based on the power of prediction rather than understanding.? Given enough data about a subject, and enough computing power, the semantic segmentation performed by an ML model allows it to make highly accurate predictions.? The more highly accurate the prediction, the more that prediction, functionally speaking, converges with “understanding,” in the way that humans think about that concept. The previous examples were of semantic segmentation used for classification tasks.? Generative AIs, of course, are designed to create new content, not simply pattern match and classify existing content fed to it. Specifically, Generative AIs are designed to take natural language human prompts, interpret them semantically, map that interpretation to data they have trained on and segmented, and then create new content that matches the intent of the prompt.?

The task of creation implicitly is concerned with generating something new and useful, not just, as with a search engine, retrieving existing content that is best matched to the intent of a query. With a search engine, the user must sift through results the engine predicted to be relevant to their query. The user is left with the task of evaluating the context presented by each search result as part of making a determination of usefulness.? In contrast, as a synthesized response to a user prompt, the output of a Generative AI must fill in the context for the user, demonstrating that the substantive content of the output and how it is being framed is well matched to what the user might expect.? For instance, in response to a prompt asking when humans first landed on the moon, instead of simply answering “1969,” ChatGPT would instead output the specific date.? Asked again who landed on the moon, and it might output the name of the mission, explain that it was a US effort, and identify the names of the astronauts, among other contextual details.???

Accuracy – the Veracity Dimension

So how, then, does semantics relate to failures by Generative AIs to be convincing or accurate?? Let's first take accuracy.? There are many different ways that ML models can be error prone .? Of the various kinds of errors that can creep into an ML model, many of them arise when models train themselves on data sets, a process called unsupervised learning .? In an unsupervised learning process, ML models perform their own semantic segmentation, generate their own parameters, and teach themselves how to make predictions and what kinds of predictions to make.? Unsupervised learning can work remarkably well, but generative models are prone to hallucinate.? By hallucination in this context , we mean that models having insufficient or noisy training on a certain data set or data type, or are unable to semantically segment a user prompt, will provide fanciful or distorted outputs. Generative AIs have a design imperative to create new outputs – regardless of errors in a model’s predictions arising from issues in its training data, prompt interpretation, or internal processing.? This can, and frequently does, lead generative models to reach, stretch and take unexpected turns in rendering outputs. This can be a boon for generating art.? But not so much for outputs one hopes will reflect some semblance of accuracy or credibility.??

In fact, the tendency to hallucination is one of the deep and intrinsic challenges to having a Generative AI tackle scientific prompts.? Just last November, Meta launched Galactic, a Generative AI custom-designed for science chats and trained on a large data set of 48 million papers, textbooks and lecture notes, scientific websites, and encyclopedias.? Despite its specific focus on science and massive amount of training data, Galactic was decommissioned after just 3 days once showing excessive tendencies towards bias and inaccuracy .? In fields where new ideas must be carefully supported and built on rigor, logic and expertise, outputs reflecting creative but ungrounded hallucinations won’t ever go over well.? This challenge isn’t just limited to advanced scientific subject matter.? Despite recent efforts by Open AI to improve its output accuracy for math-oriented prompts, ChatGPT still has problems answering basic math problems correctly . Much of that doesn’t have to do with hallucinations, as such, but with semantic challenges that ChatGPT has mapping numerical mathematical calculations to math-oriented prompts phrased in words .??

The key takeaway for accuracy is that any number of issues can undermine the veracity of an output, and, depending on the field of inquiry, this may pose lesser or greater concern.

Convincingness - the Authenticity Dimension

The case of math and science inaccuracy provides a good segway to the matter of being convincing.? In these domains, the failure by a Generative AI’s output to be accurate very quickly converges with the unconvincingness of that output.? Convincingness of an output, however, typically has distinct elements from accuracy.? For instance, if an output is meant to convey information in an authoritative tone, a more casual response might let a user down, even if the information presented is accurate (e.g., prompt = “What is the sum of 3, 2 and 1?” output = “I believe the correct answer is 6, but I can check again if you’d like” vs. output = “The correct answer is 6”). The conversational framing around the core information in a Generative AI’s output is a large part of what makes it convincing to a user. For image-oriented Generative AIs, convincing typically means that the requirements included in a user prompt are reflected in the style and visual elements of an output image (for example, the Basquiat-like image presented above in response to my prompt asking for such an image).? The output is convincing to the extent that its visual rendering seems authentically human, or authentically like the expression of a particular person.

Most of the truly problematic cases of Generative AIs are those where an output is convincing and not accurate. That is, the framing and presentation of the output causes a user (or someone one step removed from the user) to believe inaccurate or misleading propositions contained in the output.? The first thing OpenAI itself readily acknowledges in the “Limitations” segment of its ChatGPT blog is that “ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers.”? This serves as a warning:? do not believe (or rely on) everything that a Generative AI outputs, even if it sounds awfully good and authoritative.

The key takeaway for convincingness is that its presence in an output can either mask inaccuracy or its absence can undermine the impact of otherwise accurate output content.??

The Rubric

With that, we turn to the rubric, which lays out key concepts from the discussion to this point as a 2x2 diagram, with an authenticity dimension and a veracity dimension:

No alt text provided for this image
A rubric for understanding how Generative AI outputs map to failure and success modes

The rubric above provides a structured way of understanding how any particular output of a Generative AI succeeds or fails, either for the user providing the prompt or for persons one step removed.? A structured approach is useful for distinguishing among the many, often subtle issues posed by Generative AIs made available for public use.

THE SWEET SPOT - Convincing and Accurate

When Generative AI outputs are both accurate and convincing, this is the ideal state.? Depending on the prompt, a great responsive output is one that saves a user time and effort on a complex intellectual or creative task, or one that allows its users to forget they are dealing with a robot in the first instance. (Depending on the Generative AI's focus, this can get weird .)

Hitting the sweet spot can still sometimes be problematic.? In the case of Generative AIs that prepare creative works, though very rare, it is possible to obtain outputs that faithfully replicate a single training image . This runs the risk of copyright violation, even if the user is unaware that the image they obtained closely replicates a single underlying work.? And, away from the context of its Generative AI creation, an output that is convincing and accurate may easily be perceived as having been created by a human, an outcome that can range from harmless to deeply problematic, depending on the scenario.

THE PENALTY BOX - Unconvincing though Accurate

When outputs are accurate but poorly framed, something is lost.? That something may simply be the illusion that the user, rather than dealing with a superior intelligent machine, is dealing with an imperfect robot.? Such a feeling causes the user to feel less delighted by the output. ? Or, that something may be more dire in fields requiring rigor, logical consistency and confident presentation, such as all STEM fields.? Getting just a few unconvincing outputs is likely to cause distrust in the Generative AI for any meaningful use.

THE DANGER ZONE - Convincing but Inaccurate

When wrong, biased or incomplete, an output is inaccurate.? But, depending on the expertise or attention level of a user, the errant output may nonetheless be convincing.? Someone who is an expert in their field may instantly spot inaccuracies in an output, even if the framing for that output reads as authoritative.? But someone who is not yet sophisticated and is trying to learn about a topic may take the same output to be true or accurate because the framing induces that belief.? For higher stakes use cases – where one might seek to rely on the output to get something important done – convincing but inaccurate can be disastrous.? You probably don't want to ask a Generative AI to give you specific investment, medical, or other professional advice, even if it provides such advice with assurance.??

One exception may hold in what is otherwise the Danger Zone.? Assuming you found the Basquiat-like image above created by Stable Diffusion to be convincing (that it was in the style of Basquiat), the fact that it was not a replica of a specific Basquiat work may save it from being seen as a derivative work that violates copyright law. Of course, while that work is unlikely to be judged a derivative work just because it emulates the style of a particular artist, it potentially still could be passed off as a previously unknown work of that artist.? The possibility of deception is still there, even in the absence of an intellectual property violation.

THE ABYSS OF FAILURE - Unconvincing and Inaccurate?

When an output is wrong, and no one will be fooled by it, that is the worst place to be as a Generative AI.? Such an output will be unsatisfying to a user and might deter them from remaining interested in using that particular Generative AI.? When everyone is clear that an output is a dud, little harm is likely to flow. The benefits of efficiently getting human-like responses from a machine quickly vanish.? This is the province of previously well-intentioned Chatbots released to the public that failed spectacularly, such as Microsoft’s Tay and other earlier efforts, including the earliest incarnations of GPT3 .? Current Generative AIs beware.

* * *

The rubric shows that there are more ways for Generative AI outputs to fail than for them to succeed.? OpenAI is keenly aware of this issue, evidenced by the number of initiatives they have undertaken to address errors in ChatGPT outputs or awkwardness in the framing of ChatGPT outputs that undermine their convincingness.? These include multiple infusions of supervised learning and reinforcement learning from human feedback at all stages of the prototyping , training , and improvement of ChatGPT and the large language model (GPT3.5) underlying it .? Ongoing efforts at better product design are important for steering Generative AIs toward the sweet spot and away from the other quadrants of the rubric.? But OpenAI also acknowledges that its own efforts to keep its products safe may not be enough, and has invited industry regulation . As noted, even outputs falling within The Sweet Spot – the primary goal of Generative AIs – can nonetheless be misused for things like cheating and forgery when the source of their creation is not known.??

In future pieces I will make use of the rubric to analyze the copyright lawsuits that have been filed against certain Generative AI operators. I will also make use of it to explore the prospect of regulating Generative AIs and what the focal areas for such regulation should be.? The very next piece will introduce some levity on the topic of Generative AIs and intellectual property.?


Copyright ? 2023 Duane R. Valz.? Published here under a Creative Commons Attribution-NonCommercial 4.0 International License

*The author works in the field of ML/AI. The views expressed herein are his own and do not reflect any positions or perspectives of his current or former employers.

Spilling from the danger zone -- when convincing but inaccurate has collective effects and market impact https://techxplore.com/news/2023-08-chatgpt-showdown-stack.html

回复
Ahmad Mansur

Speaker + Thought leader | Founder @ FuturePoint Global | Educator | Writer | Podcaster | Pathfinder

1 年

Duane, Your piece was fantastic. Felt like a master class, which I think many need to get their hands around what’s emerging exponentially in front of us. Haha! Disclaimer written like a true IP lawyer. Look forward more of this

回复
Jeremiah Chan

Director and Associate General Counsel at Meta ? Equity & Inclusion Advocate ? Anti-slavery Activist

1 年

fun series Duane Valz! Thanks for putting your thoughts to paper!

Jonathan Eisenberg

Deputy General Counsel — Litigation at AIDS Healthcare Foundation

1 年

The AI Basquiat painting is scary good!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了