Case Study: Explorations with my “private” ChatGPT client

Mark Wintersmith

?? | AI/ML strategist & practitioner with an LLM focus

发布日期: 2023年6月2日

Many organizations are taking a “wait-and-see” approach to cloud-based generative AI services such as OpenAI’s ChatGPT and other tools.?And some are just banning these tools outright.?Just last week Apple proclaimed they would be completely banning the use of ChatGPT and other similar cloud-based generative AI services.?Ostensibly this was a preemptive step to protect “data confidentiality” — but who really knows?

And yes — there are currently no assurances that sensitive or proprietary query data is not currently or will not in the future be leaked into the logs or the training sets of future model iterations.

RECAP: So on the one hand you have a toolset which is simply too compelling to ignore.?And on the other hand you have companies reflexively imposing bans on these toolsets.

Where is the optimal path forward here?

Q: What if organizations could run ChatGPT “locally” — just the same as any other proprietary in-house tool??Could it work??If so, HOW would it work?

That question is the basis for what follows.

PROBLEM STATEMENT:

Can “ChatGPT” (or a close analog) be configured to run locally???And if so, would it be robust and performant enough to provide the same rich user experience as its cloud-based variant?

USE CASE:

Evaluate how a local instance of ChatGPT performs as a “Q&A” knowledge-bot.

BACKGROUND:?

An issue some large (and some small) enterprises struggle with is the fragmentation of knowledge across the enterprise.?So-called “tribal knowledge” can reside in isolated pockets or in the heads of a few select specialists.

Delivery cadence can be impacted by how effeciently (or inefficiently) individuals can get access to that knowledge in time-critical moments.

As a result the business may have some process which is generally not well understood — but nonetheless critical to delivery.?And so “sherpas” (aka specialists) arise to provide guidance and to facilitate task completion.

What if a “smart Q&A” agent could be written which effectively performs this function?

METHODOLOGY:

I created a “private” LLM, running locally that does not exchange data of any sort in any way with any online resources.

The LLM I chose to employ is called “GPT4All”.?Other tools used were LangChain, LlamaCpp, Chroma and SentenceTransformers.

OBJECTIVE & GOAL:

My goal was to see if I could create a specialized knowledge-base LLM able to provide the following functionality:

VISTRADA 8 个月前

Why ChatGPT-4o Might Just Make Your Custom AI Obsolete

Glenn Hopper 4 个月前

Mind the gap: AI leaders pulling ahead as LLMs take off

smartR AI 7 个月前

Users are able to get authoritatitive answers on procedural questions.
Users are able to interact with the system in a natural Q&A conversational format.
Users are able to make inquiries, follow-up inquires and drill down ever-deeper on topics.
Users are able to get clarification on any facet of any topic which is vague or unclear.

TRAINING DATA:

I trained my LLM on the following publicly available ServiceNow documents:

Knowledge Base Article - Service Portal.html
Understanding CI's in ServiceNow - ServiceHub.html
servicenow-rome-it-asset-management-enus.pdf
servicenow-rome-it-business-management-enus.pdf
servicenow-rome-it-operations-management-enus.pdf
servicenow-sandiego-it-service-management-enus.pdf

All told I fed it close to 10,000 pages of documentation.?These documents were vectorized into a set of embeddings which were made query-able via a basic text UI.

RESULTS:

First-pass results were definitely underwhelming.

While the model had an impressive ability to make sense of natural language queries and to respond with answers which bore some relevance to the original query it was only partially able to make sense of the documents it was trained on.

TEST CRITERIA:

Can it respond to natural language queries?
Is it accurate
Is it able to answer obvious questions from the text?
Is it able to make inferences despite typos and grammatical mistakes?
Is it consistent??If you repeat a question, how much variability is there between responses?
Does it perform as well as ChatGPT or Bard?
Does it preseve context from one request to the next?
Is it able to abstract to higher levels of conceptual “understanding”?
Is it able to “drill-down” on topcis?
Is it able to generate code?
Is it able to create tables?

You can view the test prompts and the corresponding output on this Google Sheet.

NEXT STEPS:

By performing some targeted optimizations I believe performance can be radically improved.?Finding and applying those optimzations is the subject of a future evaluation.?But generally speaking I will likely explore these areas:

Create a collection of “canonical” definitions — and give these the highest relevance scores.
Pay more attention to signals such as headings and sub-links.
Use formatting as a signal.
Create a glossary of TLA’s with similarly higher-weighted scores.
Experiment with higher weightings for supplied training docs as opposed to “innate” LLM’s.
Experiment with tweaked model paramters such as context window size (# of tokens), vectors size of embeddings and simultaneous attention heads.

It’s all so reminiscent of search in a way.?The difference being that the end result is not a URL but instead a collection of relevant, distilled text.

Performance is also a factor, as every query takes a minimum of 27 seconds to run locally on a multi-core, M1 MAX.

CONCLUSION:

I see tremendous promise for tools like GPT4All.?However, it is going to take considerable work to customize and tune these models to boost predictability and accuracy so that they might become de facto “Subject Matter Experts”.?And that will be the subject of future posts.

Harris Gilliam

1 年

This use case seems like a great way to use this tool. I have been thinking about ways to solve this problem for quite some time. I've applied various approaches with varying success.. none rise to the level of a solution though. I look forward to your next exploration ??

要查看或添加评论，请登录

查看全部

Case Study: Explorations with my “private” ChatGPT client

Mark Wintersmith

?? | AI/ML strategist & practitioner with an LLM focus

PROBLEM STATEMENT:

USE CASE:

BACKGROUND:?

METHODOLOGY:

OBJECTIVE & GOAL:

领英推荐

TRAINING DATA:

RESULTS:

TEST CRITERIA:

NEXT STEPS:

CONCLUSION:

更多精彩文章

社区洞察

其他会员也浏览了

Exploring the Real-World Applications of ChatGPT and GPT Models in Everyday Life

2023-Four Trends of ChatGPT

ChatGPT maker OpenAI unveils customized AI bots, cheaper powerful models.

Why custom AI solutions beat ChatGPT for business use

ChatGPT Gets A Memory – Here’s All You Need To Know About This Groundbreaking Innovation

ChatGPT is NOT Google: Here's How to Actually Use AI for Search

Who Is Responsible When AI-Generated Answers Go Wrong?

Beyond ChatGPT - What Agents and Assistants Can Do For You

ChatGPT and AI Use

This is a huge week in AI! Elon released ChatGPT's rival "Grok"! OpenAI will be unveiling game-changing features...

PROBLEM STATEMENT:

USE CASE:

BACKGROUND:?

METHODOLOGY:

OBJECTIVE & GOAL:

领英推荐

TRAINING DATA:

RESULTS:

TEST CRITERIA:

NEXT STEPS:

CONCLUSION:

I SPENT $1 TRILLION ON AI ... AND ALL I GOT WAS THIS LOUSY T-SHIRT!

2024年8月27日

Software is eating the world. And now A.I. is eating Software.

2024年7月1日

For AI ... Reading is Fundamental.

2024年5月10日

NO MORE TPS Reports !!! (thanks to AI)

2024年3月24日

AI is no Trivial Pursuit ...

2024年2月4日

A DOCKER FOR YOUR LLAMAS?

2023年12月21日

EVERYDAY AI: The Declutter Project - pt 1

2023年11月20日

Everyday AI: No More Boring Meetings (a use case)

2023年10月19日

Why LLaMA-2 is such a Big Deal

2023年9月29日

Zoltan the Fortune Teller

2023年9月10日

社区洞察

其他会员也浏览了

Exploring the Real-World Applications of ChatGPT and GPT Models in Everyday Life

2023-Four Trends of ChatGPT

ChatGPT maker OpenAI unveils customized AI bots, cheaper powerful models.

Why custom AI solutions beat ChatGPT for business use

ChatGPT Gets A Memory – Here’s All You Need To Know About This Groundbreaking Innovation

ChatGPT is NOT Google: Here's How to Actually Use AI for Search

Who Is Responsible When AI-Generated Answers Go Wrong?

Beyond ChatGPT - What Agents and Assistants Can Do For You

ChatGPT and AI Use

This is a huge week in AI! Elon released ChatGPT's rival "Grok"! OpenAI will be unveiling game-changing features...