登录查看更多内容

An Oracle DBA's journey to Vector DB, LLM and Generative AI

馬Antony 裕杰

Cybersecurity SaaS developer. Web Isolation and Augmented Whitelisting. #Insurtech #decentralised_insurance

发布日期: 2023年4月13日

+ 关注

Discovering the Power of #LLMs in Business Applications and Data Analysis

After graduating from university, the first product specific certification I completed was Oracle 8 DBA (still remember the 600 page book weighted almost 0.75 KG). If you like me, have 20 years experiences in enterprise tech, the recent happenings triggered by #OpenAI #ChatGPT is a bit overwhelming.

Not only is the pace of change unprecedented, but the foundational building blocks are entirely different. Some discussions and marketing hype surrounding the topic seem almost surreal. Intrigued, I decided to investigate further.

Eager to unlearn and relearn the concepts of AI and data analysis, I embarked on a new journey by enrolling in an online course with the developers at LangChain. This 3-week online session, called "Chat with Data", focuses on building new business applications using LLMs. The flagship demo of Chat with Data involves using ChatGPT to analyze Tesla's annual report (a 400-page PDF). The future of LLMs lies in creating personalized applications using one's own data, providing answers that are relevant, timely, and actionable for the user's specific situation.

Not many people realize that LLMs introduce fundamental changes to the way text is stored and searched. ChatGPT is an information retrieval system (i.e., the software produces output based on user input) that utilizes pre-built models to match user queries with a trained vector database. However, it does not rely on traditional deterministic searches in RDBMS. Instead, a vector database replaces RDBMS, and near neighbor search supplants SQL or regex (if you are using NoSQL).

Understanding how data is stored, processed, and "generated" has been beneficial for me in comprehending the current state of AI, where AI models are headed, their limitations, and most importantly, how to protect our data (which my team and I are continually building).

This is a dairy of our journey exploring LLMs and generative AI, mostly focus on text data. To help users understand the transition from traditional computing to AI, I will try my best to contrast and compare with RDBMS. Each day around 150 words on one concept ( I use chatGPT4 to summarise my writing to 150 words only ) and links to explore.

领英推荐

Optimizing PLINQ Performance for Low-Level Data…

David Shergilashvili 1 个月前

7 Top Companies Hiring Data Analyst & Machine Learning…

Shailesh Shakya 8 个月前

AWS Glue and Athena based Data Query using S3 Buckets

NARAYANAN PALANI ?????? 4 个月前

Day 1: LLMs store data using Vector DB. Why and how ?

Day 2: What is a vector search ?

Day 3: What is text embedding ?

Day 4: What are transformers ?

Day 5: How to control margin of errors ?

Day 6: Does encryption work with LLMs/vector DB?

Day 7: Can we cache our input ?

要查看或添加评论，请登录

馬Antony 裕杰的更多文章

What happens , if Elon is right?

2025年2月25日

What happens , if Elon is right?

We recently had an internal discussion about the ongoing saga of weekly email work achievement reports in the U.S.
Can you play chess without knowing the rules? LLM can !

2023年8月11日

Can you play chess without knowing the rules? LLM can !

Most people, upon their first interaction with ChatGPT or other large language models (LLMs), are astonished by the…
Combating the Globalization of Phishing Attacks: Using AI to Fight Back Against AI Threats

2023年5月28日

Combating the Globalization of Phishing Attacks: Using AI to Fight Back Against AI Threats

Language Learning Models (LLMs) have become adept at performing tasks that require specific content generation, such as…
What is a transformer?

2023年5月8日

What is a transformer?

One reason ChatGPT is so powerful is that it utilizes a new neural network architecture called the Transformer, which…

2 条评论
What is text embedding ?

2023年4月30日

What is text embedding ?

Did you know Cambridge dictionary works like a language model ? and Each time you lookup a word, it's as if you are…
What is a vector search?

2023年4月23日

What is a vector search?

The purpose of this newsletter is on assisting enterprise tech people transition into AI-era. It is also my learning…
LLMs store data using Vector DB. Why and how ?

2023年4月16日

LLMs store data using Vector DB. Why and how ?

Traditionally, computing has been deterministic, which refers to the inherent consistency, repeatability, and…
Which is a more effective phishing tactic against youngsters?

2021年2月16日

Which is a more effective phishing tactic against youngsters?

Scarcity or Authority or Both? #university #students #research When John received an email inviting him to sign an…
4 Key facts about WPA2 attack, not explained in major media

2017年10月18日

4 Key facts about WPA2 attack, not explained in major media

The news about WPA KRACK—Key Reinstallation Attack and its threats are proliferating. As a global connectivity company…
First 90 days being a startup founder

2016年4月11日

First 90 days being a startup founder

In December 2015, I left my job at Singapore IDA, where I had spent two years, and a month later started my own firm…

10 条评论

See all articles

An Oracle DBA's journey to Vector DB, LLM and Generative AI

馬Antony 裕杰

Cybersecurity SaaS developer. Web Isolation and Augmented Whitelisting. #Insurtech #decentralised_insurance

领英推荐

馬Antony 裕杰的更多文章

社区洞察

其他会员也浏览了

Simplifying Data Processing with PySpark on Amazon EMR: Best Practices, Optimization, and Security

ChatGPT and SQL: Transforming Database Management and Querying

Oracle SQL and Generative AI

What are Graph Databases?

Database for recommendation systems, content generators, or any AI solution that relies on vector-based data

Unlocking Incremental Data in PySpark: Extracting from JDBC Sources without Debezium or AWS DMS with CDC

Understanding Catalyst Optimizer in Azure Synapse Analytics

Best Practices: Running Stateful Apps on Kubernetes

Real-Time ETLT: Meeting the Demands of Modern Data Processing

Developing Data-Driven AI Apps: Making Calls to AI Services Directly from the?Database

领英推荐

馬Antony 裕杰的更多文章

What happens , if Elon is right?

Can you play chess without knowing the rules? LLM can !

Combating the Globalization of Phishing Attacks: Using AI to Fight Back Against AI Threats

What is a transformer?

What is text embedding ?

What is a vector search?

LLMs store data using Vector DB. Why and how ?

Which is a more effective phishing tactic against youngsters?

4 Key facts about WPA2 attack, not explained in major media

First 90 days being a startup founder

社区洞察

其他会员也浏览了

Simplifying Data Processing with PySpark on Amazon EMR: Best Practices, Optimization, and Security

ChatGPT and SQL: Transforming Database Management and Querying

Oracle SQL and Generative AI

What are Graph Databases?

Database for recommendation systems, content generators, or any AI solution that relies on vector-based data

Unlocking Incremental Data in PySpark: Extracting from JDBC Sources without Debezium or AWS DMS with CDC

Understanding Catalyst Optimizer in Azure Synapse Analytics

Best Practices: Running Stateful Apps on Kubernetes

Real-Time ETLT: Meeting the Demands of Modern Data Processing

Developing Data-Driven AI Apps: Making Calls to AI Services Directly from the?Database