An Oracle DBA's journey to Vector DB, LLM and Generative AI
DBA in a computer room, pondering

An Oracle DBA's journey to Vector DB, LLM and Generative AI

An Oracle DBA's journey to Vector DB, LLM and Generative AI

Discovering the Power of #LLMs in Business Applications and Data Analysis

After graduating from university, the first product specific certification I completed was Oracle 8 DBA (still remember the 600 page book weighted almost 0.75 KG). If you like me, have 20 years experiences in enterprise tech, the recent happenings triggered by #OpenAI #ChatGPT is a bit overwhelming.

Not only is the pace of change unprecedented, but the foundational building blocks are entirely different. Some discussions and marketing hype surrounding the topic seem almost surreal. Intrigued, I decided to investigate further.

Eager to unlearn and relearn the concepts of AI and data analysis, I embarked on a new journey by enrolling in an online course with the developers at LangChain. This 3-week online session, called "Chat with Data", focuses on building new business applications using LLMs. The flagship demo of Chat with Data involves using ChatGPT to analyze Tesla's annual report (a 400-page PDF). The future of LLMs lies in creating personalized applications using one's own data, providing answers that are relevant, timely, and actionable for the user's specific situation.

Not many people realize that LLMs introduce fundamental changes to the way text is stored and searched. ChatGPT is an information retrieval system (i.e., the software produces output based on user input) that utilizes pre-built models to match user queries with a trained vector database. However, it does not rely on traditional deterministic searches in RDBMS. Instead, a vector database replaces RDBMS, and near neighbor search supplants SQL or regex (if you are using NoSQL).

Understanding how data is stored, processed, and "generated" has been beneficial for me in comprehending the current state of AI, where AI models are headed, their limitations, and most importantly, how to protect our data (which my team and I are continually building).

This is a dairy of our journey exploring LLMs and generative AI, mostly focus on text data. To help users understand the transition from traditional computing to AI, I will try my best to contrast and compare with RDBMS. Each day around 150 words on one concept ( I use chatGPT4 to summarise my writing to 150 words only ) and links to explore.

Day 1: LLMs store data using Vector DB. Why and how ?

Day 2: What is a vector search ?

Day 3: What is text embedding ?

Day 4: What are transformers ?

Day 5: How to control margin of errors ?

Day 6: Does encryption work with LLMs/vector DB?

Day 7: Can we cache our input ?

要查看或添加评论,请登录

馬Antony 裕杰的更多文章

社区洞察

其他会员也浏览了