登录查看更多内容

ChatGPT Plugins and Factual Knowledge Alignment

Patrick Delaney

Software Engineer, DevOps, MLOps, LLMOps, AIOps

发布日期: 2023年5月24日

Is the output of ChatGPT just a bunch of bull?

Note, this article was originally published at my substack, which I publish more material at, and publish a few days earlier than Linkedin.

Earlier this year I had posted about some of the trends surrounding Open Source Language Models, namely that any sort of, “moat building,” that Google or OpenAI may be attempting to do with Bard and ChatGPT may be rendered, “Moot,” through Open Source Language Models.

Enter ChatGPT Plugins.

Plugins are a way to allow ChatGPT to interact with web, or with particular parts of the web that use APIs, such as a Real Estate website with data on listed houses.

So now it’s time to take a look at some of the plugins they have released so far and try to get an idea of how much more capable and centralized ChatGPT may become, upon mass release of plugins, compared to open source language models and developers building their own integrations.

Seeking the Killer App

ChatGPT’s plugins feature right now to a small extent appears to be OpenAI’s attempt at seeking a series of, “killer apps,” almost as if to say, “a language model is not good enough as a stand-alone product, we need to create an interface with the internet itself.”

The problem with Large Language Models (LLMs) of course, which I had covered extensively in a YouTube video back in December 2022 is that they are highly probabilistic in nature, which means they do not excel at dealing with discrete values that convey quantity, quality, e.g. statements of fact. What I found in these plugins is that ChatGPT is largely accessing APIs, which are essentially gateways to databases, which contain, well…data, which is discrete values that convey quantity, quality, e.g. statements of fact.

Now, I know I’m really just a crank with a newsletter and a Youtube channel asking you, dear reader to believe me on this. In my defense, there was a recent survey of studies which was published in ACM Computing Surveys, which is a journal that has a high impact factor of around 14+, which seems to be pretty high among computer-themed journals. What this basically means is, “the highest prestige groups in this field rank this particular journal of particularly high prestige,” and that journal published about LLM factual inaccuracies.

So first off there are many different types of what LLM researchers term, “hallucinations,” which is the effect of the process underlying the LLMs producing incorrect information. The type of hallucinations we’re dealing with here in the survey are:

“Intrinsic Abstractive Summarization,” where LLMs take information from a paragraph of text and regurgitates it in a smaller statement,
“Intrinsic Data2Text,” where LLMs take data from a table and transform it into text,
“Intrinsic Generative Q&A,” where an LLMs take data from a structured or unstructured source and regurgitate it back out in some other form.

Per the paper:

Innate divergence. Some NLG tasks by nature do not always have factual knowledge alignment between the source input text and the target reference […]. For instance, it is acceptable for open-domain dialogue systems to respond in chit-chat style, subjective style, […]– this improves the engagingness and diversity of the dialogue generation. However, researchers have discovered that such dataset characteristic leads to inevitable extrinsic hallucinations.

So basically what this is saying is, as I had discussed in a previous article, LLMs are often optimized for human engagement, which reduces factuality in favor of chit-chatting.

So that being said, let’s jump off and look at some of OpenAI’s plugins and see how they do on various tasks that require actual factual knowledge alignment.

Real Estate: Redfin

Redfin is a real estate website. I suppose all of the information that ChatGPT can give you from the Redfin plugin could be gleaned from browsing the Redfin website. The advantage here seems to be the interface through which we’re receiving the data, which is more of a command line tool than a webpage, which is kind of nice because it reduces the noise.

But can we create any additional knowledge or information from this browsing capability?

The above is essentially a sample for all of Minneapolis, which might not be representative of what a particular home buyer may be looking for at a particular time because typically buyers are looking in a specific area. So, I used a zip code radius map to grab the zip codes of a particular area in Minneapolis and fed those in, as shown below.

Searching specifically by explicit zip codes should ostensibly provide much more exacting results. However, when we look at the source map for a particular zip code showing $350,000 as an average list price, we see quite different results, where $350,000 is in fact the lower bound, not the average of this particular zip code:

Let’s take a look at what ChatGPT is actually doing under the hood. What it’s doing is writing an API call to Redfin, asking for the zip code 55419, with a maximum number of beds and bathrooms.

Then the API response is sent back to ChatGPT, which contains a response with the data requested, including the number of beds, baths as well as some other information. ChatGPT then is ostensibly doing a calculation of, “average” across the prices for all of the return responses, but for some reason, it’s getting the calculation wrong - likely because under the hood, ChatGPT is a Large Language Model creating probabilistic predictions of calculations for a response, rather than an actual calculation. It’s possible that we could engineer a prompt that helps ChatGPT to focus in on just that price.

What about for finding and scheduling open houses? Using the prompt:

Find houses which are 3 bed, 1 bath with open houses schedule 
within the following zip codes: 
55419,55410,55409,55424,55408,55423,55407,55435

I got the response:

When I went in and checked the links to find the open house times, literally only one of the links provided actually had an open house, with the rest listing, “No upcoming open houses.”

That’s all well and good, but what obviously would be better would be to have the open house times scheduled within a table, rather than just listing them. However, when asked for that, we get:

Simple Coding: CreatiCode

To a certain extent from I can tell, simple coding appears to be an application that LLMs do not seem to hallucinate to a massive extent, as long as the code being written is confined to the algorithms themselves, and to a language framework that was actively reflected in the training process. That is to say, as soon as you get into odd languages and requiring esoteric things like versioning, it doesn’t work out well anymore.

Sure enough, the CreatiCode plugin seems to do an OK job with creating simple scratch language programs.

领英推荐

5 ways OpenAI’s ChatGPT plugins could change the AI…

VentureBeat 1 年前

Web Browsing and Plugins in ChatGPT+ / Power of…

Zeo 1 年前

How To Stop AI From Scraping Data From Your Website

Integral 6 个月前

Hypothetically we could bring this plugin to its logical extent and see if it can generate code to steer a Raspberry-Pi based robot, per an existing example block of code.

After directly copying and pasting a super long string of unorganized code from a tutorial, the ChatGPT CreatiCode scratch plugin gets activated, but it largely seems to error out, because it’s calling upon a library that is specific to Raspberry Pi. My guess would be that while ChatGPT and CreatiCode could likely indeed write and compile scratch code, there may not be support for certain edge cases. Below shows the error that I got.

Visual Diagramming - DiagramIt

This is an application that I’m already familiar with from just having worked with ChatGPT and asked it to create visual diagrams with a Python language library called GraphViz, which ChatGPT has already done a fairly good job with.

Let’s see what happens if we ask DiagramIt to follow a prompt that ChatGPT provides as an example.

Uhh…weird ontology there…but I guess that’s a loose way to describe how a car works. We may be just experiencing some odd hallucinations which aren’t strictly incorrect, but just not necessarily precise enough to be of any use in increasing understanding of how cars work. We could perhaps design our own diagram, which I have seen work fairly well with a Python graphing language library called GraphViz.

So how is DiagramIt working under the hood? When I follow the link provided to edit the diagram, it flows through to a website called kroki.io, which does indeed show that GraphViz is the underlying code being used to create these diagrams, so from that perspective it might just be better to use GraphViz and own the code that it produces, depending upon what you’re trying to do.

My experience with GraphViz in the past has shown that once you go through more than three or four levels of complexity, the graph you’re trying to build starts to fall apart, but it can still be useful to put together a nice quick graph.

FiscalNote

FiscalNote seems to be a policy wonk news database which scrapes the web and stores news stories specific to policies and legislation, and allows users to query them.

What I feel like I’m doing here is basically just querying yet another web searching platform, essentially a wrapper around a thing that’s just searching news stories. This appears to be analogous to perhaps using a Google advanced search, but strictly defining it to one type of website, and hypothetically that website has already, “reviewed” the stories it is showing somehow. What’s going on under the hood is ChatGPT is accessing FiscalNote’s API, which then returns a load of articles. I’m not clear on how this might be valuable, but I don’t work in this industry. At the very least, it may be a faster way to look for regulatory questions on a particular topic to put into a blog post.

AI Ticker Chat

This actually was the most interesting plugin for me personally, perhaps for a niche reason that worked immediately. I put together a request asking AITickerChat to summarize some risk factors for a random company, AT&T.

The task it undertook under the hood was to physically extract from a highly structured pre-existing data source from the SEC which contained forward-looking statements.

Since LLMs can fundamentally do a good job at summarizing text compared to humans, it’s reasonable to see how they would do a good job here. Here’s an example of some of the underlying data that ChatGPT is grabbing from to create its above summary.

{
          "id": "b50ee629486f_81",
          "text": "CAUTIONARY LANGUAGE CONCERNING FORWARD-LOOKING STATEMENTS  Information set forth in this report contains forward-looking statements that are subject to risks and uncertainties, and actual results could differ materially. Many of these factors are discussed in more detail in the “Risk Factors” section. We claim the protection of the safe harbor for forward-looking statements provided by the Private Securities Litigation Reform Act of 1995.  The following factors could cause our future results to differ materially from those expressed in the forward-looking statements:  The severity, magnitude and duration of the COVID-19 pandemic and containment, mitigation and other measures taken in response, including the potential impacts of these matters on our business and operations.  Our inability to predict the extent to which the COVID-19 pandemic and related impacts will continue to impact our business operations, financial performance and results of operations.",
          "metadata": {
            "source": "SEC",

So the natural extension of this Plugin in my mind would be to create industry summaries or investor pitches from K10 forms which take into account summarizations from a wide variety of similar sources to try to build a summarization of different cross sections of industry trends. This sounds like a good usage to me.

What Didn’t Work Due to Errors or Server Problems

Data Science: Notable

Notable is evidently a code notebook platform which allows users to run data science experiments. For those who are familiar, it’s basically a fancy Jupyter Notebook or Google Colab notebook. I thought that this could be a fairly powerful tool, but unfortunately it doesn’t seem to be connected up to ChatGPT properly, because ChatGPT asks the user to set a default project, but there is no way to set a default project in Notable.

Attempting to Combine Webpilot and Speechki

Attempting a prompt that combines the two tools into something that would create a podcast automatically from my previously written blogpost.

The web browser plugin, Webpilot did appear to pick up the text from my previous blog post fairly well, as can be seen in the image below.

However, anything I attempted to get the Speechki plugin to convert the text of my blog post into a recording did not seem to work due to an API error. I physically attempted to log into Speechki and could not do so, so that might have been a problem with Speechki more than anything else.

PDF’s - ChatWithPDF

I was particularly excited about being able to check this one out because I have worked on PDF document text extraction, but alas there was a system error when I tried it.

Final Thoughts

So after having used plugins, here are my initial thoughts, in no order of importance.

Another way of looking at the ChatGPT plugin store might be analogous to the Chrome Webstore or Facebook Apps back when they were originally introduced, that is to say, it’s another way to create an ecosystem and experiment around with adding value through outsourced developers.
ChatGPT reminds me of the Facebook App Store in the late 2000’s and early 2010’s when there were so many fly-by-night applications, and it was difficult to know what particular app would make any difference in the long run, but it was clear there was a lot of excitement.
One wonders how much utility these plugins purely have at this point, vs. the various plugins wanting to glom on to ChatGPT’s popularity because they know that OpenAI has a huge user base and a lot of hype. E.g. the individuals and groups who built apps for ChatGPT at this point may have been motivated more from a promotional marketing perspective than from a, “killer app,” perspective. ChatGPT’s strategy may be just a shotgun approach at this point, allowing as many developers to create apps as possible at this point, further promoting those that work the best down the line after the usage has been proven out.
From the standpoint of someone who may be interested in developing their own plugin, a good adage for this which I have heard is, “Don’t Compete with Microsoft, Google, Amazon if you’re building a startup, find a niche application which is not going to be well covered by a plugin.” If you end up developing a successful plugin, there’s an extreme risk that OpenAI themselves may just adapt your technology into their own application, similar to selling a useful product on Amazon.com only to have Amazon release a basics version of your product.
While many of these plugins may have API and system errors for now, those will be fixed. What may or may not be fixed in a short order are the hallucinations, which may require different levels of fact checking and processing to deliver a proper result to an end user in the case of the use cases of Intrinsic Abstractive Summarization, Intrinsic Data2Text and Intrinsic Generative Q&A.

I'll Keep This Short

492 位关注者

查看更多评论

要查看或添加评论，请登录

Patrick Delaney的更多文章

The Fine-Turning an Open Source Language Model Journey Part One: Impetus

2023年10月10日

The Fine-Turning an Open Source Language Model Journey Part One: Impetus

Welcome to my bi-weekly newsletter, “I’ll Keep This Short,” where I navigate the less-traveled paths of Artificial…

2 条评论
Craft Beer and Spongiform Brain Bacterium

2023年9月27日

Craft Beer and Spongiform Brain Bacterium

Welcome to my bi-weekly newsletter, “I’ll Keep This Short,” where I navigate the less-traveled paths of Artificial…
Constitutional A.I. and the Math Achievement Gap

2023年9月13日

Constitutional A.I. and the Math Achievement Gap

Welcome to my bi-weekly newsletter, “I’ll Keep This Short,” where I navigate the less-traveled paths of AI, building…
AI Panic: Are Robots Going to Kill Us All?

2023年8月29日

AI Panic: Are Robots Going to Kill Us All?

Welcome to my bi-weekly newsletter, “I’ll Keep This Short,” where I navigate the less-traveled paths of AI, building…
Why Do A.I. Image Generators Have Problems Creating Hands?

2023年8月16日

Why Do A.I. Image Generators Have Problems Creating Hands?

Welcome to my bi-weekly newsletter, “I’ll Keep This Short,” where I navigate the less-traveled paths of AI, building…

1 条评论
Threading an Argument for the Fediverse

2023年7月31日

Threading an Argument for the Fediverse

Welcome to my bi-weekly newsletter, “I’ll Keep This Short,” where I navigate the less-traveled paths of software…
How Far Are We From Being Able to Generate Whatever 3D Objects On the Fly?

2023年7月17日

How Far Are We From Being Able to Generate Whatever 3D Objects On the Fly?

Welcome to my bi-weekly newsletter, “I’ll Keep This Short,” where I navigate the less-traveled paths of AI, building…
Why Didn't Ancient Rome Have a Space Program?

2023年7月3日

Why Didn't Ancient Rome Have a Space Program?

Within this newsletter, I have mostly covered extremely modern technological topics such as Automated Large Language…
How Far Away Are We From Non-Crappy AI Generated Video?

2023年6月21日

How Far Away Are We From Non-Crappy AI Generated Video?

Dall-E 2 is the quintessential image generation variant of the GPT-3 model developed by OpenAI, which instead of…
Defeating the Wizard: Large Language Model Prompt Attacks

2023年6月12日

Defeating the Wizard: Large Language Model Prompt Attacks

With the advent of Large Language Models, an entirely new class of cybersecurity attacks has emerged from the darkness…

8 条评论

See all articles

ChatGPT Plugins and Factual Knowledge Alignment

Patrick Delaney

Software Engineer, DevOps, MLOps, LLMOps, AIOps

Seeking the Killer App

Real Estate: Redfin

Simple Coding: CreatiCode

领英推荐

Visual Diagramming - DiagramIt

FiscalNote

AI Ticker Chat

What Didn’t Work Due to Errors or Server Problems

Data Science: Notable

Attempting to Combine Webpilot and Speechki

PDF’s - ChatWithPDF

Final Thoughts

I'll Keep This Short

492 位关注者

Patrick Delaney的更多文章

社区洞察

其他会员也浏览了

OpenAI launches a Web Search Engine Within ChatGPT

NEW OpenAI Agent: I Tried "Operator," and Here’s What I Learned

ChatGPT Search: AI Search Engine Challenging Google Monopoly

What to Know Before Using AI Generator Tools at Work

OpenAI Rolls Out Web Browsing and Plugins for ChatGPT Plus Users

Google is testing an AI that writes news articles for The New York Times

ChatGPT vs. Perplexity: A New Age of AI-based Voice Assistants Begins!

OpenAI Wants to be Valued at $29 Billion

OpenAI Introduces Plugin Support For ChatGPT

Brave Search Launches AI-Powered SERP Summarizer

Seeking the Killer App

Real Estate: Redfin

Simple Coding: CreatiCode

领英推荐

Visual Diagramming - DiagramIt

FiscalNote

AI Ticker Chat

What Didn’t Work Due to Errors or Server Problems

Data Science: Notable

Attempting to Combine Webpilot and Speechki

PDF’s - ChatWithPDF

Final Thoughts

I'll Keep This Short

492 位关注者

Patrick Delaney的更多文章

The Fine-Turning an Open Source Language Model Journey Part One: Impetus

Craft Beer and Spongiform Brain Bacterium

Constitutional A.I. and the Math Achievement Gap

AI Panic: Are Robots Going to Kill Us All?

Why Do A.I. Image Generators Have Problems Creating Hands?

Threading an Argument for the Fediverse

How Far Are We From Being Able to Generate Whatever 3D Objects On the Fly?

Why Didn't Ancient Rome Have a Space Program?

How Far Away Are We From Non-Crappy AI Generated Video?

Defeating the Wizard: Large Language Model Prompt Attacks

社区洞察

其他会员也浏览了

OpenAI launches a Web Search Engine Within ChatGPT

NEW OpenAI Agent: I Tried "Operator," and Here’s What I Learned

ChatGPT Search: AI Search Engine Challenging Google Monopoly

What to Know Before Using AI Generator Tools at Work

OpenAI Rolls Out Web Browsing and Plugins for ChatGPT Plus Users

Google is testing an AI that writes news articles for The New York Times

ChatGPT vs. Perplexity: A New Age of AI-based Voice Assistants Begins!

OpenAI Wants to be Valued at $29 Billion

OpenAI Introduces Plugin Support For ChatGPT

Brave Search Launches AI-Powered SERP Summarizer