登录查看更多内容

LLMs store data using Vector DB. Why and how ?

馬Antony 裕杰

Cybersecurity SaaS developer. Web Isolation and Augmented Whitelisting. #Insurtech #decentralised_insurance

发布日期: 2023年4月16日

Traditionally, computing has been deterministic, which refers to the inherent consistency, repeatability, and provability of outcomes in data processing. This is because the output strictly adheres to the programming logic (code written by software developers).

#LLMs leverage similarity search to process information. During the training phase, LLMs identify similarities among text tokens and create an extensive neural network to capture these patterns. These patterns are then represented in a high-dimensional vector space, allowing for a more nuanced understanding of textual data.

When processing user input, input sentence is converted into a vector. The OpenAI software then searches for the nearest tokens in the multi-dimensional space, using the shortest distance as a measure of similarity. There is no developers doing the coding to write different logic. Each sentence is following same vector search process.

Below I show how a sentence is transform into a vector. The model (all-MiniLM-L6-v2) in this example is using 768 dimensions. OpenAI has its own API endpoint that do similar processing (text-embedding-ada-002) with 1536 dimensions.

>>> from sentence_transformers import SentenceTransforme
>>> sentences = ["We are at the dawn of a new era...", "Each sentence is converted"]
>>>?
>>> model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
>>> embeddings = model.encode(sentences)
>>> print(embeddings)
[[-9.31226462e-03 -5.73424622e-03? 2.18255073e-02 -8.62214249e-03
? -1.94084086e-02? 2.32371558e-02 -3.61166969e-02 -5.94260842e-02
?? 8.96405503e-02? 8.60120635e-03?.... .... ] [... .... ]]

Closing Thoughts:

领英推荐

Notes on Data Compression: Part 5 (JPEG model)

Simon Southwell 3 年前

An experiment with Model Context Protocol (MCP) for…

Giri Ramanathan 2 天前

KISS at OpenAI, #batchforlife, and data science…

??Hakim Elakhrass 1 年前

For me, the moment of awakening comes as we transition into non-deterministic computing. When AI tools can no longer provide the same response every time, are we prepared to accept the implications, and what kind of risk controls will be applicable?

Don't miss out on future insights and discussions – subscribe to Oracle DBA's AI, LLM journey to stay up-to-date on the latest trends.

Reference :

OpenAI embedding doc (https://openai.com/blog/new-and-improved-embedding-model )

Pervasive Technology Institute at Indiana University "Introduction of document similarity" video (https://www.youtube.com/watch?v=MvG4dPplrRo)

要查看或添加评论，请登录

馬Antony 裕杰的更多文章

What happens , if Elon is right?

2025年2月25日

What happens , if Elon is right?

We recently had an internal discussion about the ongoing saga of weekly email work achievement reports in the U.S.
Can you play chess without knowing the rules? LLM can !

2023年8月11日

Can you play chess without knowing the rules? LLM can !

Most people, upon their first interaction with ChatGPT or other large language models (LLMs), are astonished by the…
Combating the Globalization of Phishing Attacks: Using AI to Fight Back Against AI Threats

2023年5月28日

Combating the Globalization of Phishing Attacks: Using AI to Fight Back Against AI Threats

Language Learning Models (LLMs) have become adept at performing tasks that require specific content generation, such as…
What is a transformer?

2023年5月8日

What is a transformer?

One reason ChatGPT is so powerful is that it utilizes a new neural network architecture called the Transformer, which…

2 条评论
What is text embedding ?

2023年4月30日

What is text embedding ?

Did you know Cambridge dictionary works like a language model ? and Each time you lookup a word, it's as if you are…
What is a vector search?

2023年4月23日

What is a vector search?

The purpose of this newsletter is on assisting enterprise tech people transition into AI-era. It is also my learning…
An Oracle DBA's journey to Vector DB, LLM and Generative AI

2023年4月13日

An Oracle DBA's journey to Vector DB, LLM and Generative AI

An Oracle DBA's journey to Vector DB, LLM and Generative AI Discovering the Power of #LLMs in Business Applications and…
Which is a more effective phishing tactic against youngsters?

2021年2月16日

Which is a more effective phishing tactic against youngsters?

Scarcity or Authority or Both? #university #students #research When John received an email inviting him to sign an…
4 Key facts about WPA2 attack, not explained in major media

2017年10月18日

4 Key facts about WPA2 attack, not explained in major media

The news about WPA KRACK—Key Reinstallation Attack and its threats are proliferating. As a global connectivity company…
First 90 days being a startup founder

2016年4月11日

First 90 days being a startup founder

In December 2015, I left my job at Singapore IDA, where I had spent two years, and a month later started my own firm…

10 条评论

See all articles

LLMs store data using Vector DB. Why and how ?

馬Antony 裕杰

Cybersecurity SaaS developer. Web Isolation and Augmented Whitelisting. #Insurtech #decentralised_insurance

领英推荐

馬Antony 裕杰的更多文章

社区洞察

其他会员也浏览了

KISS at OpenAI, #batchforlife, and data science conspiracies

Navigating the Depths of Data Structures and Algorithms: A Comprehensive Exploration in Plain English

WHAT IS DATA STRUCTURES AND ALGORITHM ?

Tracing My Data Science Path

Data Structures & Algorithms

Sorting and Searching: The Twin Pillars of Efficient Data Manipulation

Sorting Algorithm Visualizer

Data Structures and Algorithms

Evolution of Data Science and integration with Core Machine Learning

2019 - It's time to deliver

领英推荐

馬Antony 裕杰的更多文章

What happens , if Elon is right?

Can you play chess without knowing the rules? LLM can !

Combating the Globalization of Phishing Attacks: Using AI to Fight Back Against AI Threats

What is a transformer?

What is text embedding ?

What is a vector search?

An Oracle DBA's journey to Vector DB, LLM and Generative AI

Which is a more effective phishing tactic against youngsters?

4 Key facts about WPA2 attack, not explained in major media

First 90 days being a startup founder

社区洞察

其他会员也浏览了

KISS at OpenAI, #batchforlife, and data science conspiracies

Navigating the Depths of Data Structures and Algorithms: A Comprehensive Exploration in Plain English

WHAT IS DATA STRUCTURES AND ALGORITHM ?

Tracing My Data Science Path

Data Structures & Algorithms

Sorting and Searching: The Twin Pillars of Efficient Data Manipulation

Sorting Algorithm Visualizer

Data Structures and Algorithms

Evolution of Data Science and integration with Core Machine Learning

2019 - It's time to deliver