How I made ChatGPT think like me

How I made ChatGPT think like me

Before I get started...

  • This article is for education purposes, my education to be precise
  • It does not represent the view of my employer or any other person associated with me
  • ?I have not used AI to write it, and as proof you may find some grammatical errors here and there
  • I have only used my own personal data
  • ChatGPT helped me code the solution but I can proudly say that I had to step in to solve some issues.
  • You can find the code for this example here.

My goal

Inspired by this article I wanted to implement an example where I could 'connect' a model like ChatGPT with my own data

No alt text provided for this image
Credit: Ben Dickson https://bdtechtalks.com/2023/05/01/customize-chatgpt-llm-embeddings/amp/

So here are some of my choices:

  • For language I chose python
  • For embedding API and language model I selected OpenAI
  • For vector database I selected Pinecone
  • And for "Relevant documents" I selected all the files stored in my personal OneDrive (specifically .docx documents only) which are mostly uni assignments, CV versions, notes, etc.

My choice of "Relevant documents" means that the use case for this solution is to enable ChatGPT to answer questions using any context that can be found in my personal files. In some cases that means using facts found in my files and in others it means it will answer questions as if it's thinking like me.

Here is how it works

First, I indexed and stored my files' vectors

  • A python script scanned all the docx files in my OneDrive, extracted the text and sent it to the OpenAI embeddings API
  • The embeddings API returned a vector (e.g. [0.383883,3939349340, 03943,......] ) for each file. This vector will be used later to find a document that is related to my questions
  • The vectors are stored in a Pinecone Index, as if it was a database table

Then I can send context-enriched questions to ChatGPT

  • I first generate the vector for my question, e.g. 'Who is Oscar Lopez Veloz'
  • Using this vector , I look it up in Pinecone (the vector database) to find the files that are most related to this question
  • The query returns the candidate files, in this case the candidates were different versions of my CV in word format
  • I selected the most relevant file from the candidates and rephrase my question to ChatGPT like 'Who is Oscar Lopez Veloz? Answer this question considering the context below"

The results

The results where very satisfying, here are some of my favourite examples so far:

Original question: "Who is Oscar Lopez Veloz?"

This one looked at my CV files (FYI I'm not Guatemalan)

Response with NO context:

Oscar Lopez Veloz is a former Guatemalan professional football player who represented his country at the 1998 FIFA World Cup and was a member of the Guatemala national team from 1991 to 2003. He was a versatile defender who played for a number of clubs in his native country, as well as some in Mexico, Brazil, and the United States. He was a two-time winner of the Guatemalan Player of the Year Award.

-------------------

Response with context:

Answer: Oscar Lopez Veloz is a data analytics professional with an energy industry experience and engineering background. He has a permanent?

resident working right, and he has worked for various companies and organizations, including the Australian Energy Market Operator, The University of Melbourne, PEMEX, Chrysler, and various small businesses. His skills include cloud and data analytics software, programming languages, data engineering, and more. He holds a Master of Energy Systems from The University of Melbourne and a Bachelor of Engineering (Mech...

Original question: "What is an ethical way to distribute the burdens of climate change?"

This one looked at an essay I wrote about this for Uni:

Response with NO context:

1. Implementing a fair and equitable carbon tax or cap-and-trade system to reduce emissions and incentivize businesses to invest in low-carbon alternatives.

2. Establishing a global fund to support adaptation and mitigation efforts in vulnerable communities.

3. Making sure that the costs of adapting to and mitigating climate change are shared equitably.

4. Developing standards and regulations that ensure corporations and industries are held responsible for their emissions.

-------------------

Response with context:

An ethical way to distribute the burdens of climate change is to ensure that those who are more vulnerable to it, particularly those living in undeveloped communities, are provided with access to energy and compensated via the revenues from granting access to the global atmosphere. This approach is superior to the previously discussed principles in that it establishes that the receipts should not be emissions, rather, some other fair package of goods that will necessarily include access to energy. In addition, this approach will enable many people to improve...

What I learned

  • Embeddings are a rapid way to provide LLMs with context or to 'connect' them with our own data
  • Although this may require data movements like sending to the embedding service, the vector database and in the form of context to LLMs
  • I continue to be impressed by what can be achieved with AI and I look forward to learning more about the topic.

Syed Ahmed

Executive Director, Data and Analytics at Department of Transport and Planning

1 年

Well done Oscar Lopez Veloz.

回复
Orlando Miguel Morales Andrade

Software Developer | Electronics Engineer

1 年

Very useful. Great!

回复
Kashif Bashir

Data | Artificial Intelligence | Cloud | Financial Services

1 年

Well done, Oscar. Enjoyed reading!

回复

要查看或添加评论,请登录

Oscar Lopez的更多文章

  • Data lakehouse anecdotes: Camilo's promotion

    Data lakehouse anecdotes: Camilo's promotion

    A personal anecdote that illustrates how Databricks Data Lakehouse could help a company at early data maturity stages…

社区洞察

其他会员也浏览了