Building a RAG System with Embeddings, Delphi and Firebird database
Artificial intelligence (AI) has been around in the world of computing for a long time, but it has only recently exploded, mainly through the large language models (LLM) that have produced the magic of being able to speak to computers in the same way that we do with people, and also that these devices that work with 0 and 1, answer us in a way that we understand.
Recent development is such, that it is difficult to keep up with the pace of new models and techniques to obtain better and more precise answers.
In this article I show my tests to better understand how they work and to be able to build programs that take advantage of their possibilities. I will describe the process of creating a (RAG) system using embeddings generated by OpenAI, utilizing Delphi as programming tool and Firebird to store and manage the data
1. Understanding the problem
Large models are trained with huge amounts of data, which allows them to give us an answer to almost anything we ask them. Because of the way they construct that answer, it may not always be as precise as we are looking for. To improve the quality of the answers, and since AI often does not know about data from our businesses, we can provide context to our prompt that helps return the correct answer. This technique is known as Retrieval-Augmented Generation (RAG), and I will describe its steps in this article.
2. Embeddings
AI is based on neural networks, made up of a multitude of layers, each layer with its nodes, where a series of parameters are adjusted that use mathematical operations that transform a series of inputs into the desired output. An embedding in AI is a numerical representation of data (like text, images, or audio) in a high-dimensional vector space where similar items are positioned close to each other. Think of it as converting information into a list of numbers that capture the meaning or features of that data.
The embeddings alow us convert unstructured data into structured numerical format, enable machines to understand semantic relationships and allow mathematical operations among them.
Embedding stores are called vector databases, are specialized databases optimized for storing/querying high-dimensional vectors, vector indexing and optimized for similarity search (nearest neighbor search).
In this case I use the Firebird database, which is not a vector database, but has the necessary functionalities to test and better understand these systems. Firebird is a relational database that I have been using for many years, it is an open source project derived from Interbase, it requires little maintenance, it takes up little disk space, and it is actively maintained and improved.
3. Get embeddings from our documents
To obtain the embeddings of our documents we will query the OpenAI text-embedding-ada-002 model, which returns vectors with a fixed length of 1536 double values.
To consult the Open AI API, we need to have an account and have an API-key that we must send in the requests we make to it.
I query OpenAI in the Delphi IDE, which uses Object Pascal as a language. It is an environment that has been in the sector for many years and is continually updated with improvements in the language and the available tools. It compiles programs for Windows, iOS, MacOS, Android and Linux systems. It has an excellent set of components ready to use in your applications and excellent support for databases.
I use the Delphi VCL project application and the Firedac components for database operations. The project use clause requires this units:
uses
Winapi.Windows, Winapi.Messages, System.SysUtils, System.Variants, System.Classes, System.Generics.Collections, System.Generics.Defaults, System.Math, Vcl.Graphics,
Vcl.Controls, Vcl.Forms, Vcl.Dialogs, FireDAC.Stan.Intf, FireDAC.Stan.Option,
FireDAC.Stan.Error, FireDAC.UI.Intf, FireDAC.Phys.Intf, FireDAC.Stan.Def,
FireDAC.Stan.Pool, FireDAC.Stan.Async, FireDAC.Phys, FireDAC.VCLUI.Wait,
FireDAC.Phys.FBDef, FireDAC.Phys.IBBase, FireDAC.Phys.FB, Data.DB,
FireDAC.Comp.Client, FireDAC.Stan.Param, System.JSON,
System.Net.URLClient, System.Net.HttpClient, System.Net.HttpClientComponent,
Vcl.StdCtrls, FireDAC.VCLUI.Async, FireDAC.Comp.UI, FireDac.DApt,
Vcl.Samples.Spin;
To obtain the array of values that form the embedding of our text, we make a request of this type to end point https://api.openai.com/v1/embeddings OpenAI API:
function GetEmbedding(const Text: string): TArray<Double>;
var
HttpClient: TNetHTTPClient;
RequestBody: TStringStream;
Response: IHTTPResponse;
JSONObj: TJSONObject;
EmbeddingData: TJSONArray;
I: Integer;
begin
HttpClient := TNetHTTPClient.Create(nil);
try
HttpClient.ContentType := 'application/json';
HttpClient.CustomHeaders['Authorization'] := 'Bearer ' + ApiKey;
RequestBody := TStringStream.Create(
TJSONObject.Create
.AddPair('model', 'text-embedding-ada-002')
.AddPair('input', Text)
.ToString,
TEncoding.UTF8
);
try
// Hacer la petición POST
Response := HttpClient.Post('https://api.openai.com/v1/embeddings', RequestBody);
if Response.StatusCode = 200 then
begin
JSONObj := TJSONObject.ParseJSONValue(Response.ContentAsString) as TJSONObject;
try
EmbeddingData := JSONObj.GetValue<TJSONArray>('data').Items[0].GetValue<TJSONArray>('embedding');
SetLength(Result, EmbeddingData.Count);
for I := 0 to EmbeddingData.Count - 1 do
Result[I] := EmbeddingData.Items[I].AsType<Double>;
finally
JSONObj.Free;
end;
end else
begin
raise Exception.CreateFmt('Error getting the embedding. Status code: %d - %s', [Response.StatusCode, Response.StatusText]);
end;
finally
RequestBody.Free;
end;
finally
HttpClient.Free;
end;
end;
4. Save the embeddings in the Firebird database
The table I use to store the embeddings in Firebird has the following structure. I store the vector in an array-type field (I had read about its existence, but had never used it), to which I assign a size of 1536, which is the size of the vector returned by the model that generates the embeddings.
CREATE TABLE EMBEDDINGS (
ID_EMBED INTEGER GENERATED BY DEFAULT AS IDENTITY,
TEXT BLOB SUB_TYPE TEXT SEGMENT SIZE 400 ,
VECTOREM DOUBLE PRECISION [1:1536];
);
/******************************************************************************/
/* Autoincrement generators */
/******************************************************************************/
ALTER TABLE EMBEDDINGS ALTER ID_EMBED RESTART WITH 3;
/******************************************************************************/
/* Primary keys */
/******************************************************************************/
ALTER TABLE EMBEDDINGS ADD PRIMARY KEY (ID_EMBED);
In order to save the array field in the embeddings table we use this code:
const
Vector_Size = 1536;
procedure StoreEmbeddingInFirebirdArray(Connection: TFDConnection; const Text: string; const Embedding: TArray<Double>);
var
Query: TFDQuery;
I: Integer;
begin
if Length(Embedding) <> Vector_Size then
raise Exception.Create('Vector size is not correct. Must be 1536.');
Query := TFDQuery.Create(nil);
try
Query.Connection := Connection;
Query.SQL.Text := 'INSERT INTO EMBEDDINGS (TEXT, VECTOREM) VALUES (:text, :embedding)';
Query.Params.ParamByName('text').AsString := Text;
// TFDParam.DataTypeName must be set to
// <Table name>.<Field name>
Query.Params[1].DataTypeName := 'EMBEDDINGS.VECTOREM';
// TFDParam.ArrayType must be set to atTable,
// not to atArray
Query.Params[1].ArrayType := atTable;
// TFDParam.ArraySize must be set to array size
Query.Params[1].ArraySize := Vector_Size;
// Fill the array with the values
for I := Low(Embedding) to High(Embedding) do
begin
Query.Params[1].AsFloats[I]:= Embedding[I];
end;
Query.ExecSQL;
finally
Query.Free;
end;
end;
5. Query the embedding closest to another
Objective: To find the most relevant existing context by comparing stored embeddings with the embedding of the user's query.
The first step is to find the most similar context available in our database by comparing the embeddings of stored texts with the embedding of the user query. We use the cosine similarity metric to determine which of the embeddings are most similar to the query.
To do this, the users query embedding is generated using OpenAI's API, and then a comparison is made between this embedding and the stored embeddings using a cosine similarity calculation. The result is a reference to the text with the closest matching context. The cosine method measures the angle between two vectors, it's a good choice to check text similarity and document comparison, there are other methods like euclidean distance, manhattan distance, etc.
//Mathematic operation to calculate similarity between vectors
function CosineSimilarity(const A, B: TArray<Double>): Double;
var
DotProduct, MagnitudeA, MagnitudeB: Double;
I: Integer;
begin
DotProduct := 0;
MagnitudeA := 0;
MagnitudeB := 0;
for I := Low(A) to High(A) do
begin
DotProduct := DotProduct + A[I] * B[I];
MagnitudeA := MagnitudeA + A[I] * A[I];
MagnitudeB := MagnitudeB + B[I] * B[I];
end;
if (MagnitudeA = 0) or (MagnitudeB = 0) then
Result := 0
else
Result := DotProduct / (Sqrt(MagnitudeA) * Sqrt(MagnitudeB));
end;
In Delphi, we achieve the querying to the Firebird database to retrieve all stored embeddings. The cosine similarity is then calculated programmatically in Delphi by iterating through the embeddings and finding the one with the highest similarity score.
First we retrive from OpenAPI the embedding from the user prompt, after it we find the most similar embedding form the table:
//Function to retrieve all the embeddings from database
function GetAllEmbeddingsFromDatabase(Connection: TFDConnection): TArray<TArray<Double>>;
var
Query: TFDQuery;
Embeddings: TArray<TArray<Double>>;
EmbeddingArray: TArray<Double>;
I, Count: Integer;
F: TArrayField;
begin
Query := TFDQuery.Create(nil);
try
Query.Connection := Connection;
Query.SQL.Text := 'SELECT ID_EMBED, VECTOREM FROM EMBEDDINGS';
Query.Open;
while not Query.Eof do
begin
SetLength(EmbeddingArray, Vector_Size);
F:= TArrayField(Query.FieldByName('VECTOREM'));
for I := 0 to Vector_Size - 1 do
begin
EmbeddingArray[I]:= F.FieldValues[I];
end;
Embeddings := Embeddings + [EmbeddingArray];
Query.Next;
end;
Result := Embeddings;
finally
Query.Free;
end;
end;
function FindMostSimilarEmbedding(Connection: TFDConnection; const QueryEmbedding: TArray<Double>): Integer;
var
AllEmbeddings: TArray<TArray<Double>>;
MaxSimilarity, Similarity: Double;
MostSimilarID, I: Integer;
Query: TFDQuery;
begin
// Get all embeddings from database
AllEmbeddings := GetAllEmbeddingsFromDatabase(Connection);
MaxSimilarity := -1;
MostSimilarID := -1;
Query := TFDQuery.Create(nil);
try
Query.Connection := Connection;
Query.SQL.Text := 'SELECT ID_EMBED FROM EMBEDDINGS';
Query.Open;
I := 0;
// Calcular la similitud coseno con cada embedding almacenado
while not Query.Eof do
begin
Similarity := CosineSimilarity(QueryEmbedding, AllEmbeddings[I]);
if Similarity > MaxSimilarity then
begin
MaxSimilarity := Similarity;
MostSimilarID := Query.FieldByName('ID_EMBED').AsInteger;
end;
Query.Next;
Inc(I);
end;
Result := MostSimilarID;
finally
Query.Free;
end;
end;
6. Send the request to the OpenAPI with prompt + context
Once the most relevant context has been identified, it is used to enhance the prompt that will be sent to the OpenAI API. The goal is to provide the model with additional background information to improve the accuracy and relevance of the generated response. The prompt is built by combining the retrieved context along with the user's query. This comprehensive prompt is then sent to OpenAI's API, specifically using the gpt-3.5-turbo model (or another suitable model), which will generate a response based on both the context and the query. This approach ensures that the generated answer is as precise and useful as possible.
In Delphi, we use the TNetHTTPClient component to make the HTTP request to OpenAI's API. Here is a sample Delphi function to send the request:
function ExecuteQueryWithContext(Connection: TFDConnection; const UserQuery: string): string;
var
QueryEmbedding: TArray<Double>;
MostRelevantContext, FinalPrompt: string;
begin
QueryEmbedding := GetEmbedding(UserQuery);
// Find the nearest context text
MostRelevantContext := FindMostSimilarContext(Connection, QueryEmbedding);
// Make the final prompt
FinalPrompt := 'relevant context: ' + MostRelevantContext + sLineBreak +
'user prompt: ' + UserQuery + sLineBreak +
'Respuesta:';
//Get answer from OpenAI
result := GetOpenAIResponse(FinalPrompt);
end;
function GetOpenAIResponse(const Prompt: string): string;
var
HttpClient: TNetHTTPClient;
RequestBody: TStringStream;
Response: IHTTPResponse;
JSONObj: TJSONObject;
ChoicesArray: TJSONArray;
CompletionText: string;
begin
HttpClient := TNetHTTPClient.Create(nil);
try
HttpClient.ContentType := 'application/json';
HttpClient.CustomHeaders['Authorization'] := 'Bearer ' + ApiKey;
RequestBody := TStringStream.Create(
TJSONObject.Create
.AddPair('model', 'gpt-3.5-turbo')
.AddPair('messages', TJSONArray.Create(
TJSONObject.Create
.AddPair('role', 'user')
.AddPair('content', Prompt)
))
.AddPair('max_tokens', 150)
.AddPair('temperature', 0.7)
.ToString,
TEncoding.UTF8
);
try
Response := HttpClient.Post('https://api.openai.com/v1/chat/completions', RequestBody);
if Response.StatusCode = 200 then
begin
// Parsear la respuesta JSON
JSONObj := TJSONObject.ParseJSONValue(Response.ContentAsString) as TJSONObject;
try
ChoicesArray := JSONObj.GetValue<TJSONArray>('choices');
if ChoicesArray.Count > 0 then
CompletionText := ChoicesArray.Items[0].GetValue<TJSONObject>('message').GetValue<string>('content')
else
CompletionText := '';
Result := CompletionText;
finally
JSONObj.Free;
end;
end
else
begin
raise Exception.CreateFmt('Error getting the answer. State code: %d - %s', [Response.StatusCode, Response.StatusText]);
end;
finally
RequestBody.Free;
end;
finally
HttpClient.Free;
end;
end;
Conclusion
The described process effectively demonstrates how to create a Retrieval-Augmented Generation (RAG) system by leveraging stored embeddings saved in a Firebird database. By retrieving the most relevant context and combining it with the user query, we can generate high-quality responses that are informed by previous knowledge.
This approach combines the power of retrieval-based methods with generative models, resulting in a system that not only provides relevant information but also creates contextually appropriate answers to user queries.
This article has been written as a first contact to understand how these systems work and how information is exchanged. There is a lot of room for improvement, such as indexing embedding tables with search words, segmenting the table by groups or clusters, using other embedding query methods, etc.
My first contact with AI from Delphi code was using the components Maker AI that I recomend to visit and use.
Software Developer at Orion Law Management Systems, Inc
4 个月Interesting article! I like your description of Delphi... "Delphi IDE, which uses Object Pascal as a language. It is an environment that has been in the sector for many years and is continually updated with improvements in the language and the available tools. It compiles programs for Windows, iOS, MacOS, Android and Linux systems. It has an excellent set of components ready to use in your applications and excellent support for databases."