登录查看更多内容

How To Get ChatGPT To Answer Questions Using Your Trusted Documents

Shanif Dhanani

I'm building an AI customer success manager that automates the support, data analysis, & product actions that your human CSMs used to handle for your users.

发布日期: 2023年5月28日

ChatGPT?is one of the most brilliant software tools ever created. It's going to make you better, faster, and stronger. But if you're not careful, it can also make you look foolish.

ChatGPT is a large language model that was trained on the world's data... before 2021. It's brilliant and incredibly efficient at coming up with creative ideas, new content, and often, factual answers to your questions. But it's too creative for its own good, and is infamous for just making things up when it doesn't have enough information to answer you properly.

I've seen it cite websites that don't exist, provide fake news, and falsify facts. So while I?think it's going to change the world, I think it's also important to make sure you stop it from hallucinating like this as much as possible.

That's why it's important to build up a library of trusted sources that contain reliable information, and make sure it cites those sources whenever possible.

What's a hallucination

You may have never heard the term "hallucination"?in the context of an LLM before. It's a relatively new term, but I?bet you it's going to get a lot more popular over the next 12 months. In the context of large language models, a "hallucination"?is simply a false statement that's presented as true. If large language models were alive, we might say they were lying, but even that wouldn't necessarily be accurate. It would be more accurate to say that it would be doing its best to complete a thought by extrapolating what it thinks makes sense to say next.

In any case, a hallucination is a factually incorrect statement, and they're particularly troublesome because they're proclaimed with the same confident tone as a non-hallucination. They frequently even have false or made up sources that could make you believe that they're legitimate statements of fact. Hallucinations may be innocent enough if you're just using ChatGPT?to talk like a pirate, but if you're using it for business productivity, they can be particularly troublesome.

That's why it's important to ensure the answers you receive are well-sourced and reliable.

Trusted sources

Today, ChatGPT doesn't provide its sources of truth. There's a lot of work being done here to fix this, but for now, the most common way to ensure you're getting reliable answers is to provide ChatGPT with contextual data from trusted sources that you've identified and ask it to use only those sources whenever possible. Taking it one step further, you can even ask ChatGPT to cite which trusted source it used when providing your answer.

Contextual data

You can think of ChatGPT as having a giant, imperfect memory. It was trained on a ton of digital documents, websites, online forums, and other digital sources. Because it's such a massive model, it can more-or-less remember what it has been trained on, and when you don't give it any contextual information to work off of, it tries to optimize its answer to you by recreating the most relevant sources it was trained on in a manner that most reasonably answers your prompt.

But it doesn't need to be that way.

Ed Evans FBCS 7 个月前

ChatGPT: Accessibility and Convenience in Exchange for…

Cameron G. Gould 1 年前

The best of the bot

Mukesh Kumar 1 年前

You can prompt it to use specific sources of information when answering your question. A common way to do this is to first identify any sources that you think might be helpful in discussing some subject matter, copying and pasting the content in those sources into ChatGPT, and then prompting ChatGPT to provide an answer using only that context.

This method works quite well at delivering relevant answers, and can even be used to identify which document was used to answer your question. But as always, there's a catch - the "context window."

Large language models can only consider a finite range of tokens when responding to a prompt. Said differently, you can only provide ChatGPT with so many words when you ask it a question. This means that if you have a huge document library of thousands of docs, you can't just take them all, copy/paste, and ask ChatGPT?to answer your question. You need to selectively provide only those documents, and only those paragraphs within those documents, that have the highest likelihood of being able to answer your questions.

By selecting a small subset of the context within your trusted data sources that are most likely to have the best content, you optimize the use of the context window and the likelihood that ChatGPT will be able to find an answer within your documents.

Vector databases

In order to make this all work, you'll have to have a way of taking a user's query, finding the documents that are most likely to contain answers to their query, and sending the content from those documents to ChatGPT to answer their question. The hard part lies in finding documents that might match a user's query. That's where vector databases come in.

A vector database is a database that can store a vector - a string of numbers - that represent some piece of text. The vector itself has a mathematical structure that represents the semantic meaning of a word or group of words. That means that the vectors for two words that are semantically related will be mathematically closer than the vectors for two words that have entirely different meanings. By representing your documents as vectors and storing them in a vector database, you can now easily identify sentences, paragraphs, and even entire documents whose "meaning"?might be similar to the meaning implied by a user's question.

Once you have those documents, you can then feed in their associated text, with the user's question, into ChatGPT to get a final answer. Additionally, you can prompt ChatGPT to only answer using the context you've provided, and if the answer is not in the documents you've given it, you can try again with new paragraphs and new documents, or have it fall back to using its built in knowledge if you need to.

Time, cost, and effort

The unfortunate part about this whole process is that it does take time and effort to put this system together, and it does require you to know how to code. Vector databases are just another type of data lookup tool, just like a relational database or a data warehouse, and you'll need to interact with it using a software development kit that is provided by the makers of the tool. Right now, one of the most popular tools out there is Pinecone, and they provide SDKs for node, Java, and Python, among others. They also have a free tier version that should be enough to get you started. You'll need to use your OpenAPI?key to turn each trusted document you have into a chunk of associated vectors, but that's straightforward to do using OpenAI's library.

It's relatively easy to get up and running, and it doesn't take a lot of time, but like with any software project, the complexity, time, and cost grow as you scale. As you get more users, you'll need to make sure you provide proper namespaces so that Pinecone can minimize the number of documents it needs to search through, and you'll need to scale your vector databases up as you get more and more indexed documents. You'll likely still need to maintain a relational database to keep track of which documents have been indexed, along with the chunk IDs for each chunk that's associated with each document (since you'll generally want to break apart your document into chunks of 500-1,000 tokens).

Larger context sources +?wrapping up

It's likely that future versions of LLMs may solve this problem intrinsically. We've already seen that ChatGPT 4 can provide a context window of up to 32K tokens. One day we might see a world where you can ask ChatGPT a question and it can provide its source for you without you having to input any source documents (for publicly available documents, at least). But until then, it's on us as LLM users to ensure we are doing our best to validate all of the answers we request from an LLM that is known to make things up.

要查看或添加评论，请登录

Shanif Dhanani的更多文章

Why We're Focusing On AI-Powered Customer Success For SaaS At?Locusive

2023年11月29日

Why We're Focusing On AI-Powered Customer Success For SaaS At?Locusive

Over the past year, I’ve been heavily focused on not making the same fatal startup mistake I’ve already made at least…
Our Playbook For Creating High-Quality SEO Content With?AI

2023年11月20日

Our Playbook For Creating High-Quality SEO Content With?AI

There’s no magic bullet when it comes to marketing, but when you can generate inbound leads into perpetuity, identify…
Why Function Calls Won't Be Enough To Operate Autonomous Agents For Business

2023年7月19日

Why Function Calls Won't Be Enough To Operate Autonomous Agents For Business

Large language models are going to form the foundation of a burgeoning industry of autonomous software agents that can…
9 Best Practices For Designing An AutoGPT?Agent

2023年7月12日

9 Best Practices For Designing An AutoGPT?Agent

Note: This article was originally published on Locusive’s website — Shortly after ChatGPT came out, developers around…
How To Build An Internal Search Engine With ChatGPT: A Complete Guide

2023年7月5日

How To Build An Internal Search Engine With ChatGPT: A Complete Guide

Employees today have a huge number of data sources that they use to get things done, but one of the problems with…
Everything You Need To Know About ChatGPT And Data Security

2023年6月29日

Everything You Need To Know About ChatGPT And Data Security

Note: This post was originally published on Locusive's website — Many businesses are interested in using ChatGPT (or…
How Does AutoGPT Work?

2023年6月17日

How Does AutoGPT Work?

There's been a lot of buzz recently around the idea of "autonomous agents", or applications that can act without human…
What Are Vector Databases And Why Do We Need Them?

2023年6月11日

What Are Vector Databases And Why Do We Need Them?

The more time that goes by, the more I hear about people using ChatGPT for their work. Most folks who have used it have…
Why You Can't Just Train ChatGPT (And What To Do Instead)

2023年6月7日

Why You Can't Just Train ChatGPT (And What To Do Instead)

Everyone from sales teams to entrepreneurs to professors wants to use ChatGPT to make their jobs easier. Its power…
Announcing Parlance Form: An Open Source Form Builder

2023年1月22日

Announcing Parlance Form: An Open Source Form Builder

Today, I'm announcing the launch ParlanceForm, a dynamic form builder for React, and also my first open-source project.…

4 条评论

See all articles

How To Get ChatGPT To Answer Questions Using Your Trusted Documents

Shanif Dhanani

I'm building an AI customer success manager that automates the support, data analysis, & product actions that your human CSMs used to handle for your users.

What's a hallucination

Trusted sources

Contextual data

领英推荐

Vector databases

Time, cost, and effort

Larger context sources +?wrapping up

Shanif Dhanani的更多文章

社区洞察

其他会员也浏览了

ChatGPT: be careful what you say?

By Using ChatGPT, Are You Putting Your Company at Risk?

Don't do this while talking to ChatGPT or any LLM List

3 flaws of ChatGPT-4

My thoughts on ChatGPT

ChatGPT is a sentence generator, not more

Unveiling the Magic: How ChatGPT Transcends Expectations

ChatGPT - The best tech I have ever seen but will destroy trust unless fixed

My early take on ChatGPT

Best ChatGPT Alternative

What's a hallucination

Trusted sources

Contextual data

领英推荐

Vector databases

Time, cost, and effort

Larger context sources +?wrapping up

Shanif Dhanani的更多文章

Why We're Focusing On AI-Powered Customer Success For SaaS At?Locusive

Our Playbook For Creating High-Quality SEO Content With?AI

Why Function Calls Won't Be Enough To Operate Autonomous Agents For Business

9 Best Practices For Designing An AutoGPT?Agent

How To Build An Internal Search Engine With ChatGPT: A Complete Guide

Everything You Need To Know About ChatGPT And Data Security

How Does AutoGPT Work?

What Are Vector Databases And Why Do We Need Them?

Why You Can't Just Train ChatGPT (And What To Do Instead)

Announcing Parlance Form: An Open Source Form Builder

社区洞察

其他会员也浏览了

ChatGPT: be careful what you say?

By Using ChatGPT, Are You Putting Your Company at Risk?

Don't do this while talking to ChatGPT or any LLM List

3 flaws of ChatGPT-4

My thoughts on ChatGPT

ChatGPT is a sentence generator, not more

Unveiling the Magic: How ChatGPT Transcends Expectations

ChatGPT - The best tech I have ever seen but will destroy trust unless fixed

My early take on ChatGPT

Best ChatGPT Alternative