ChatGPT Trust: A Gap, a hypothesis, and an experiment

ChatGPT Trust: A Gap, a hypothesis, and an experiment

Why ChatGPT should learn from Search

People asked that, whether ChatGPT presented challenges to Google.

The most plausible explanation is, ChatGPT would bring Google into an arms race of expensing lucrative search traffic into ChatGPT-style of conversations, which is a barren land for harvesting ads and more computationally expensive. In that sense, Google would fight an uphill battle.

Such arguments disproportionally smeared out the "Trust" value in the whole picture: Search connects people to trusted knowledge sources and the search algorithms promote web documents with authoritativeness and qualities. People using Search would presume trust like air and water; on the contrary, ChatGPT results' trust is a scarce resource.

Fixing the trust for ChatGPT would require learning how existing Search applies trust to search results. Confucius once said, “if three walk together, one can be my teacher."

No alt text provided for this image
"Confucius" by JayPLee is licensed under CC BY-NC-SA 2.0

What's missing in ChatGPT Trust

If you look at Search technologies, you would understand how various factors contribute to promote trusted search results.

  • One can check how web pages link to one another, a strong hint for authoritativeness.
  • One can check the content
  • One can check the context (domain, author, time)

On the root level, there is one fundamental factor that all others depend on, which is "Uniquely Identifiable Content."

For example, each content in Google is assigned a doc id. All computations about a doc id can be grouped together by that id. On the online serving, the doc ids are being ranked and served.

However, ChatGPT is different: you progressively increase meaning specifiers in your prompt and you keep generating new results. There is no such "document" set in ChatGPT you can collect in advance to analyze because ChatGPT answers are generated on-the-fly.

No one can deny that ChatGPT answers are meaningful, otherwise no hype. Meaningful answers without a pre-existence, how strange. Plato came and complain: "where is my eternal immutable set of patterns that are embodied by the ChatGPT answers?"

No alt text provided for this image
Plato by Sergey Sosnovskiy is licensed under CC BY-SA 2.0.

That's what's missing in ChatGPT on the root level. Where are the immutable-s in ChatGPT answers? And what they look like?

If we can extract those immutables from ChatGPT answers, the Trust issue would be must easier to solve. Imagine that the "immutables" are Search documents, and Search can help on establishing trust on those "immutables."

Hypothesis: ChatGPT have immutable answers - on Semantics

Semantics are theory of meanings. For example, human develop concepts such as nouns; each concept has a set of semantic meanings, with intensional meanings that determines what it is, and extensional meanings that applies that concept to individual entities on the outside world. And meanings from individual words forms sentence level meanings via the principle of compositionality.

There are plenty of evaluations in the research community that shows ChatGPT and LLM demonstrates the capability of learning concept hierarchy such as the paper "Dissociating Language and Thought in Large Language Models: A Cognitive Perspective," and such blog on emergent data properties in LLM.

So we can form a hypothesis like "ChatGPT have immutable patterns on the answers, and those immutable patterns represent Semantics." and it would align with other researches conclusions.

An Experiment: Extract Semantics from ChatGPT Answers

We (Pingping Consultation) will do an experiment on this. We will build a semantic classifier that extracts ChatGPT answers' immutable patterns on semantic level.

No alt text provided for this image
By Pingping Consultation

The deliverable high level should be some search metrics improvements on the ChatGPT results, with the treatment being using our extra semantic features acquired from that semantic classifier to drive search accuracies.

On "Generate Semantic Representations" block, we will use the language toolbox we are developing in this medium post about language engineering toolbox. The goal is to generate such a Semantic Representation that are

  • Trust: no anti-patterns from the semantic representations, such as using wrong attributes to wrong gender categories.
  • Economy: the generated semantic expressions have coverage in the LLM; less mathematics symbols, more concise symbolic expressions, and easy for LLM to learn, and easy for human to read.
  • Diversity: multiple different ways of grouping vocabularies to sentence levels.


...

Highly Recommended Read:

Other categories:


* Pingping Consultation, founded by Pingping XIU in December, 2022, is a non-profit for developing responsible AI technologies through a multi-disciplinary approach.

Alexandru Armasu

Founder & CEO, Group 8 Security Solutions Inc. DBA Machine Learning Intelligence

8 个月

Gratitude for your contribution!

Pingping Xiu

Data Engineer Leader @ Caltrans | Data Engineering / AI

1 年

ChatGPT will not do that for you, but you can do it yourselves! (to identify knowledge boundary)

  • 该图片无替代文字
回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了