登录查看更多内容

SingleStore Vector DB and Embeddings

Luca Lattarini

TOGAF CERTIFIED | 3X SFDC 3X Adobe Certified 4X Goolge Certified | Analytics | Professional Google Architect Certified | Google ML Engineer | Google Data Engineering - Permanent Resident SG

发布日期: 2023年11月17日

Recently, I embarked on an exploration of VectorDB for my Gen AI applications and stumbled upon SingleStore DB. SingleStore DB is a remarkable high-performance, distributed, in-memory database management system (DBMS) purpose-built for handling both operational and analytical workloads. What sets it apart is its ability to seamlessly combine the functionalities of a transactional database (OLTP) and an analytical database (OLAP) within a single, integrated platform.

Known for its remarkable speed, scalability, and adaptability, SingleStore DB caters to a diverse array of use cases. It's especially well-suited for organizations in need of a high-performance database capable of managing both transactional and analytical workloads with minimal latency. Thanks to its versatility and scalability, it has become a valuable choice for modern data-driven applications and use cases.

In this article, I'll provide a brief overview of how SingleStore works with embeddings. To begin, let's clarify what we mean by embeddings: they represent data in the form of multi-dimensional arrays of numbers, often referred to as vectors. These vectors are used to convert unstructured or structured data types, such as text, audio, images, and more, into a format suitable for storage in a vector database.

In the accompanying illustration, I've simplified the representation of vectors to 2D for clarity, but it's important to note that in practice, these dimensions can extend into the hundreds or more.

Once we store vectors, we can execute the similarity search by comparing the question as vector with the stored vectors. The smallest difference by calculating the cosine distance/similarity between vectors, it will be our relevant information. If you want to know more about vector database and how it works vector similarity search you can visit think link. Now let’s play a bit with embeddings and and SingleStore as vector database that I found very interesting.For this example, I created a simple table by using SQL with two simpel columns text for the sentence and vector as blob to store the arrays of numbers.

Once we've stored these vectors, we can leverage them to perform similarity searches. This involves comparing a query vector, typically representing a question or input, with the stored vectors. By calculating the cosine distance or similarity between vectors, we can identify the most relevant information.

If you're keen to delve deeper into the world of vector databases and how they facilitate vector similarity searches, you can explore more details by visiting this link.. Now, let's embark on a hands-on journey with embeddings and SingleStore as our chosen vector database, a platform I've found particularly intriguing.

In this example, I've taken a straightforward approach. I've created a simple table using SQL, featuring two columns: one for the sentence text and another for the vector data, stored as blobs to accommodate arrays of numbers.

In order to convert my sentence into vector, I used Open AI model by simply execute the APIs and inserted the results from Open AI into SingleStore as showed below

If we want to check what we inserted in the table, we have the following row diplayed in SingleStore

Now let’s insert other three embeddings row and visualize them as text and vector

Embedding of I love dragonboat
Embedding of I love outrigger
Embedding of “Italy, a European country with a long Mediterranean coastline, has left a powerful mark on Western culture and cuisine. Its capital, Rome, is home to the Vatican as well as landmark art and ancient ruins. Other major cities include Florence, with Renaissance masterpieces such as Michelangelo’s "David" and Brunelleschi's Duomo; Venice, the city of canals; and Milan, Italy’s fashion capital.”

After inserting four rows with their embedding, we want to perform the search word as “Italy” as showed below with its vector that we will use in our search.

领英推荐

Architect’s Guide to Open Table Formats and Object…

MinIO 1 个月前

Databases Deconstructed: The Value of Data Lakehouses…

Alex Merced 8 个月前

Data Partitioning and Sharding - From Scratch

Shrey Batra 3 年前

Now let’s execute the search with the following SQL code. In the search we use dotproduct index as the best way to perform searching . This has been used to compute a cosine similarity metric of the two input vectors, if the input vectors are normalized to length 1.

A vector can be of any length, but the input blob length must be divisible by the packed vector element size (1, 2, 4 or 8 bytes, depending on the vector element).

If the result of DOT_PRODUCT() is infinity, negative infinity, or not a number (NaN), NULL will be returned instead.

The picture below shows the score result with the closest results to 1 as similarity search.

Now if I try to search OC V6 that it is related to the outrigger canoe, we have the closest similarity below.

In this article, we took a look at how the searching use case works by using SingleStore and embedding that we can sum up below.

要查看或添加评论，请登录

Luca Lattarini的更多文章

Retrieval Augmented Generation (RAG)

2023年11月6日

Retrieval Augmented Generation (RAG)

In this article, we're diving deep into a framework called Retrieval Augmented Generation (RAG) that helps LLM to be…
Conversational Agent and Vector Database

2023年10月30日

Conversational Agent and Vector Database

This article delves into the concept of an "Agent" within the realm of Large Language Models (LLMs), exploring its…
Introduction to LangChain

2023年10月25日

Introduction to LangChain

Recently, I delved into understanding LangChain, a concept or maybe it is better to say a framework that has been…
Deliver Personalized CX with Unbundle CDP- Part 1

2023年1月26日

Deliver Personalized CX with Unbundle CDP- Part 1

Today customers interact with a brand across a wide array of devices and expect a personalized customer experience…

4 条评论

SingleStore Vector DB and Embeddings

Luca Lattarini

TOGAF CERTIFIED | 3X SFDC 3X Adobe Certified 4X Goolge Certified | Analytics | Professional Google Architect Certified | Google ML Engineer | Google Data Engineering - Permanent Resident SG

领英推荐

Luca Lattarini的更多文章

社区洞察

其他会员也浏览了

Graph Database - Trying out Neo4J

Polyglot Persistence: Choosing the Right Database for the Right Task

Advanced Filtering Techniques With DynamoDB

Your Database Should Work for You, Not Against You: How RavenDB Brings Order to Chaos

The Evolution from Relational Databases to AI-Driven Knowledge Integration

Record Level Indexing in Apache Hudi Delivers 70% Faster Point Lookups

Unlocking High-Performance Snowflake Integrations with OpenResty

Dealing with Complex Relationships? Try Graph Databases!

Graph Databases: The Future of Scalable and Flexible Data Management for Complex Relationship-driven Applications

The Challenges of Graph Database Adoption: An In-Depth Analysis

领英推荐

Luca Lattarini的更多文章

Retrieval Augmented Generation (RAG)

Conversational Agent and Vector Database

Introduction to LangChain

Deliver Personalized CX with Unbundle CDP- Part 1

社区洞察

其他会员也浏览了

Graph Database - Trying out Neo4J

Polyglot Persistence: Choosing the Right Database for the Right Task

Advanced Filtering Techniques With DynamoDB

Your Database Should Work for You, Not Against You: How RavenDB Brings Order to Chaos

The Evolution from Relational Databases to AI-Driven Knowledge Integration

Record Level Indexing in Apache Hudi Delivers 70% Faster Point Lookups

Unlocking High-Performance Snowflake Integrations with OpenResty

Dealing with Complex Relationships? Try Graph Databases!

Graph Databases: The Future of Scalable and Flexible Data Management for Complex Relationship-driven Applications

The Challenges of Graph Database Adoption: An In-Depth Analysis