登录查看更多内容

How I made ChatGPT think like me

Oscar Lopez

Data + Energy

发布日期: 2023年5月9日

+ 关注

Before I get started...

This article is for education purposes, my education to be precise
It does not represent the view of my employer or any other person associated with me
?I have not used AI to write it, and as proof you may find some grammatical errors here and there
I have only used my own personal data
ChatGPT helped me code the solution but I can proudly say that I had to step in to solve some issues.
You can find the code for this example here.

My goal

Inspired by this article I wanted to implement an example where I could 'connect' a model like ChatGPT with my own data

No alt text provided for this image — Credit: Ben Dickson https://bdtechtalks.com/2023/05/01/customize-chatgpt-llm-embeddings/amp/

So here are some of my choices:

For language I chose python
For embedding API and language model I selected OpenAI
For vector database I selected Pinecone
And for "Relevant documents" I selected all the files stored in my personal OneDrive (specifically .docx documents only) which are mostly uni assignments, CV versions, notes, etc.

My choice of "Relevant documents" means that the use case for this solution is to enable ChatGPT to answer questions using any context that can be found in my personal files. In some cases that means using facts found in my files and in others it means it will answer questions as if it's thinking like me.

Here is how it works

First, I indexed and stored my files' vectors

A python script scanned all the docx files in my OneDrive, extracted the text and sent it to the OpenAI embeddings API
The embeddings API returned a vector (e.g. [0.383883,3939349340, 03943,......] ) for each file. This vector will be used later to find a document that is related to my questions
The vectors are stored in a Pinecone Index, as if it was a database table

Then I can send context-enriched questions to ChatGPT

I first generate the vector for my question, e.g. 'Who is Oscar Lopez Veloz'
Using this vector , I look it up in Pinecone (the vector database) to find the files that are most related to this question
The query returns the candidate files, in this case the candidates were different versions of my CV in word format
I selected the most relevant file from the candidates and rephrase my question to ChatGPT like 'Who is Oscar Lopez Veloz? Answer this question considering the context below"

The results

The results where very satisfying, here are some of my favourite examples so far:

Original question: "Who is Oscar Lopez Veloz?"

This one looked at my CV files (FYI I'm not Guatemalan)

Response with NO context:

领英推荐

An in the weeds look at using Chatgpt api to take…

Ryan Deeds 2 个月前

ChatGPT’s user experience and implementation ‘should…

Saeed Al Hasan .AI 2 年前

Revisiting ChatGPT for O&G: A Quick Recap

Alan Mourgues 5 个月前

Oscar Lopez Veloz is a former Guatemalan professional football player who represented his country at the 1998 FIFA World Cup and was a member of the Guatemala national team from 1991 to 2003. He was a versatile defender who played for a number of clubs in his native country, as well as some in Mexico, Brazil, and the United States. He was a two-time winner of the Guatemalan Player of the Year Award.

-------------------

Response with context:

Answer: Oscar Lopez Veloz is a data analytics professional with an energy industry experience and engineering background. He has a permanent?

resident working right, and he has worked for various companies and organizations, including the Australian Energy Market Operator, The University of Melbourne, PEMEX, Chrysler, and various small businesses. His skills include cloud and data analytics software, programming languages, data engineering, and more. He holds a Master of Energy Systems from The University of Melbourne and a Bachelor of Engineering (Mech...

Original question: "What is an ethical way to distribute the burdens of climate change?"

This one looked at an essay I wrote about this for Uni:

Response with NO context:

1. Implementing a fair and equitable carbon tax or cap-and-trade system to reduce emissions and incentivize businesses to invest in low-carbon alternatives.

2. Establishing a global fund to support adaptation and mitigation efforts in vulnerable communities.

3. Making sure that the costs of adapting to and mitigating climate change are shared equitably.

4. Developing standards and regulations that ensure corporations and industries are held responsible for their emissions.

-------------------

Response with context:

An ethical way to distribute the burdens of climate change is to ensure that those who are more vulnerable to it, particularly those living in undeveloped communities, are provided with access to energy and compensated via the revenues from granting access to the global atmosphere. This approach is superior to the previously discussed principles in that it establishes that the receipts should not be emissions, rather, some other fair package of goods that will necessarily include access to energy. In addition, this approach will enable many people to improve...

What I learned

Embeddings are a rapid way to provide LLMs with context or to 'connect' them with our own data
Although this may require data movements like sending to the embedding service, the vector database and in the form of context to LLMs
I continue to be impressed by what can be achieved with AI and I look forward to learning more about the topic.

Syed Ahmed

Executive Director, Data and Analytics at Department of Transport and Planning

1 年

Well done Oscar Lopez Veloz.

Orlando Miguel Morales Andrade

Software Developer | Electronics Engineer

1 年

Very useful. Great!

Kashif Bashir

Data | Artificial Intelligence | Cloud | Financial Services

1 年

Well done, Oscar. Enjoyed reading!

查看更多评论

要查看或添加评论，请登录

Oscar Lopez的更多文章

Data lakehouse anecdotes: Camilo's promotion

2022年10月10日

Data lakehouse anecdotes: Camilo's promotion

A personal anecdote that illustrates how Databricks Data Lakehouse could help a company at early data maturity stages…

How I made ChatGPT think like me

Oscar Lopez

Data + Energy

Before I get started...

My goal

Here is how it works

First, I indexed and stored my files' vectors

Then I can send context-enriched questions to ChatGPT

The results

Original question: "Who is Oscar Lopez Veloz?"

领英推荐

Original question: "What is an ethical way to distribute the burdens of climate change?"

What I learned

Oscar Lopez的更多文章

社区洞察

其他会员也浏览了

Unlocking the Power of Ruby with ChatGPT: Streamlining Subcatchment Analysis in InfoWorks ICM Network

What ChatGPT is. What it Isn't. And why that Matters.

ChatGPT Top Tip: Introduce yourself

How to Improve Your LLM Application

How ChatGPT’s “Reason” Feature Helps?

Using ChatGPT's Code Interpreter to take a large group of emails from a To: field and generate LinkedIn profile links

More prompt ideas for ChatGPT

Using ChatGPT to create Salesforce triggers and humor

Where's the Magic in ChatGPT and DeepSeek?

Here is how Recruiters can use ChatGPT to generate Boolean Search Strings!

Before I get started...

My goal

Here is how it works

First, I indexed and stored my files' vectors

Then I can send context-enriched questions to ChatGPT

The results

Original question: "Who is Oscar Lopez Veloz?"

领英推荐

Original question: "What is an ethical way to distribute the burdens of climate change?"

What I learned

Oscar Lopez的更多文章

Data lakehouse anecdotes: Camilo's promotion

社区洞察

其他会员也浏览了

Unlocking the Power of Ruby with ChatGPT: Streamlining Subcatchment Analysis in InfoWorks ICM Network

What ChatGPT is. What it Isn't. And why that Matters.

ChatGPT Top Tip: Introduce yourself

How to Improve Your LLM Application

How ChatGPT’s “Reason” Feature Helps?

Using ChatGPT's Code Interpreter to take a large group of emails from a To: field and generate LinkedIn profile links

More prompt ideas for ChatGPT

Using ChatGPT to create Salesforce triggers and humor

Where's the Magic in ChatGPT and DeepSeek?

Here is how Recruiters can use ChatGPT to generate Boolean Search Strings!