SingleStore Vector DB and Embeddings

SingleStore Vector DB and Embeddings

Recently, I embarked on an exploration of VectorDB for my Gen AI applications and stumbled upon SingleStore DB. SingleStore DB is a remarkable high-performance, distributed, in-memory database management system (DBMS) purpose-built for handling both operational and analytical workloads. What sets it apart is its ability to seamlessly combine the functionalities of a transactional database (OLTP) and an analytical database (OLAP) within a single, integrated platform.

Known for its remarkable speed, scalability, and adaptability, SingleStore DB caters to a diverse array of use cases. It's especially well-suited for organizations in need of a high-performance database capable of managing both transactional and analytical workloads with minimal latency. Thanks to its versatility and scalability, it has become a valuable choice for modern data-driven applications and use cases.

In this article, I'll provide a brief overview of how SingleStore works with embeddings. To begin, let's clarify what we mean by embeddings: they represent data in the form of multi-dimensional arrays of numbers, often referred to as vectors. These vectors are used to convert unstructured or structured data types, such as text, audio, images, and more, into a format suitable for storage in a vector database.

In the accompanying illustration, I've simplified the representation of vectors to 2D for clarity, but it's important to note that in practice, these dimensions can extend into the hundreds or more.

Embeddings

Once we store vectors, we can execute the similarity search by comparing the question as vector with the stored vectors. The smallest difference by calculating the cosine distance/similarity between vectors, it will be our relevant information. If you want to know more about vector database and how it works vector similarity search you can visit think link. Now let’s play a bit with embeddings and and SingleStore as vector database that I found very interesting.For this example, I created a simple table by using SQL with two simpel columns text for the sentence and vector as blob to store the arrays of numbers.

Once we've stored these vectors, we can leverage them to perform similarity searches. This involves comparing a query vector, typically representing a question or input, with the stored vectors. By calculating the cosine distance or similarity between vectors, we can identify the most relevant information.

If you're keen to delve deeper into the world of vector databases and how they facilitate vector similarity searches, you can explore more details by visiting this link.. Now, let's embark on a hands-on journey with embeddings and SingleStore as our chosen vector database, a platform I've found particularly intriguing.

In this example, I've taken a straightforward approach. I've created a simple table using SQL, featuring two columns: one for the sentence text and another for the vector data, stored as blobs to accommodate arrays of numbers.

Table

In order to convert my sentence into vector, I used Open AI model by simply execute the APIs and inserted the results from Open AI into SingleStore as showed below

Embedding inser into myvectortable


If we want to check what we inserted in the table, we have the following row diplayed in SingleStore

Inserted Row


Now let’s insert other three embeddings row and visualize them as text and vector

  1. Embedding of I love dragonboat
  2. Embedding of I love outrigger
  3. Embedding of Italy, a European country with a long Mediterranean coastline, has left a powerful mark on Western culture and cuisine. Its capital, Rome, is home to the Vatican as well as landmark art and ancient ruins. Other major cities include Florence, with Renaissance masterpieces such as Michelangelo’s "David" and Brunelleschi's Duomo; Venice, the city of canals; and Milan, Italy’s fashion capital.


Embeddings stored rows

After inserting four rows with their embedding, we want to perform the search word as “Italy” as showed below with its vector that we will use in our search.

Embedding of Italy

Now let’s execute the search with the following SQL code. In the search we use dotproduct index as the best way to perform searching . This has been used to compute a cosine similarity metric of the two input vectors, if the input vectors are normalized to length 1.

A vector can be of any length, but the input blob length must be divisible by the packed vector element size (1, 2, 4 or 8 bytes, depending on the vector element).

If the result of DOT_PRODUCT() is infinity, negative infinity, or not a number (NaN), NULL will be returned instead.

The picture below shows the score result with the closest results to 1 as similarity search.

Now if I try to search OC V6 that it is related to the outrigger canoe, we have the closest similarity below.

In this article, we took a look at how the searching use case works by using SingleStore and embedding that we can sum up below.


SingleStore and Embedding









要查看或添加评论,请登录

Luca Lattarini的更多文章

  • Retrieval Augmented Generation (RAG)

    Retrieval Augmented Generation (RAG)

    In this article, we're diving deep into a framework called Retrieval Augmented Generation (RAG) that helps LLM to be…

  • Conversational Agent and Vector Database

    Conversational Agent and Vector Database

    This article delves into the concept of an "Agent" within the realm of Large Language Models (LLMs), exploring its…

  • Introduction to LangChain

    Introduction to LangChain

    Recently, I delved into understanding LangChain, a concept or maybe it is better to say a framework that has been…

  • Deliver Personalized CX with Unbundle CDP- Part 1

    Deliver Personalized CX with Unbundle CDP- Part 1

    Today customers interact with a brand across a wide array of devices and expect a personalized customer experience…

    4 条评论

社区洞察

其他会员也浏览了