登录查看更多内容

Demystifying RAG and Vector Databases: How Browser-Based Vector Search Can Shape the Future of Personalized and Responsible AI-Powered Applications

Vlad Larichev

Industrial AI Lead @Accenture IX | Software Engineer | Keynote Speaker | Research Enthusiast ??| Building with LLMs since 2020 | Helping Industries Scale AI with Impact - Beyond the Hype and Marketing Fluff.

发布日期: 2024年12月17日

2024 was the year of Retrieval Augmented Generation (RAG) dominated most expert discussions, changing the way how we interact with AI by combining large language models (LLMs) with external knowledge sources.

Many people still view RAG and vector search as highly technical and complex topic. In reality, vector search is a simple yet elegant solution and understanding its core principles reveals both the strengths and weaknesses of current RAG-based applications.

This article is a journey, where we will break down the key concepts behind RAG and vector databases and implement a complete vector search from scratch directly in YOUR browser using a new browser API IndexedDB, to better understand how it works.

Finally, I’d like to explore together a future where vector database becomes a standard browser feature. It could offer an alternative to the "OpenAI's Memory" concept, securely managing your personal memory fingerprint and connecting AI-powered applications from different vendors, while prioritizing privacy and security.

Ready? ??

Understanding RAG and Vector Databases: How Retrieval Augmented Generation Finds the Data

RAG pairs a language model with an external repository of information. Instead of the model relying solely on what it has “memorized” during training, RAG dynamically fetches the most relevant documents or data points from a stored knowledge base.

To efficiently handle this, we need a system that translates words into numbers, allowing machines to work with them effectively. This translation involves creating embeddings—mathematical representations of text as a vector in a multi-dimensional space, stored in the vector databases:

Visualizing Words | PCA and clustering in Python | Marcus Alder | medium

Running a query, the vector database compares the query’s embedding to the embeddings of your documents. The result is a “semantic search” that finds contextually similar content, even if the exact keywords don’t match.

To simplify, imagine creating vector representations for the words "fire" and "truck." When adding these two vectors mathematically, the resulting vector might approximate the vector representation of a "firetruck." (amazing, right?)

However, it could also represent a "truck on fire", showing that how words are translated into numbers (or embeddings) plays a crucial role in determining meaning. Other techniques can use these tokens to understand the context of the sentences, but for now, let's just focus on the meaning of the individual words.

The beauty of vector databases is their ability to group data by the “meaning” behind words. For instance, if you search “guidelines for financial compliance,” a vector database can find documents discussing “banking regulations” and “financial oversight” even if those exact words aren’t used.

This semantic power lies at the heart of modern RAG systems. ??

Pretty similar process happening from the other side, when you are asking something - embedding model will be used, to translate your query to a vector representation, and find for matching vectors in the database and feed it to the LLM as context:

?? Let's build our own vector search solution - directly in the browser in 5 steps.

Enough theory - let’s have implement it in the browser in a speedrun.

In the second article, we will implement it step by step together, but for now, let’s cover the main components.

To realize this process, we will need two pipelines:

Text Processing Pipeline: This cleans up text by removing unnecessary syntax (e.g., HTML or XML tags) and extracts meaningful content (feature extraction).
Embedding Pipeline: This converts user input and document content into embeddings and integrates them with a Large Language Model (LLM).

For the browser implementation, we can easily build both pipelines in just a few lines of code using { pipeline } from the Hugging Face Transformers.js library. Pipelines make it straightforward to handle complex AI tasks directly in the browser: Pipelines

1) Selecting Models and creating the Pipelines

We will use two models: one for feature extraction and another for embedding generation. These lightweight models ensure efficient operation in a browser environment without relying on server-side infrastructure:

2) Creating a Vector Database in the Browser

Now we need a database.

Instead of a traditional server-based vector database, we’ll use the browser’s local storage with the IndexedDM. IndexedDB is a low-level API for client-side storage of significant amounts of structured data, including files/blobs.

By leveraging the extractor pipeline, we create embeddings and store chunks of text along with their vector representations in our DB as a simple table format with two columns: the text and its vector representation - Yes, that's all we'll need!

Let’s call this our vectorStore:

3) Implementing Similarity Search ??

Now comes the fun part: implementing similarity search using simple school math!

Let’s define a cosineSimilarity function to measure how similar two vectors are by calculating the cosine of the angle between them:

Here’s how cosine similarity works step by step:

Vectors Represent Items: Each vector (e.g., vecA and vecB) represents features of items like text or images.
Dot Product Measures Alignment: The dot product combines corresponding features (multiplying them and summing the results), showing how aligned the two vectors are in space.
Norms Measure Vector Sizes: The function calculates the “length” (norm) of each vector using the square root of the sum of the squares of its elements.
Cosine Similarity Formula: The dot product is divided by the product of the two norms, focusing on vector direction rather than length.

What does this similarity mean?

1 ?? Our embeddings are identical (point in the same direction),

0 ?? the vectors are completely unrelated (90° apart),

-1 ?? The vectors are opposites (point in opposite directions).

In a vector database, cosine similarity is used to compare a query vector to all other vectors in the database or find vectors (items) that are most similar to the query based on the cosine similarity score.

?? Isn't it amazing that our AI agents use simple school math to understand our language?

4) Uploading and Testing with Synthetic Data ??

To test it, I’ve built a simple web project VectorVault - where you can upload synthetic data like machine reports or incident logs. (more about it in a following article)

The documents will be processed and stored in the local vectorStore database we just created.

When adding new data, it is split, and we can see in the browser console how all items in the local database are stored in the browser, one by one, with appropriate embeddings, in my case in a 1536-dimensional array as a vector space:

5) Testing!

Now we can test, if searching for some specific data will result in similar files, which later could be consumed by the LLM.

Let’s search for something with “heat transfer efficiency” and the most similar documents should appear in the results...

It works!

searching for heat transfer returns all related documents, implemented with few lines of math (and embeddin model)

This simple and efficient solution runs entirely in the browser, requiring just few lines of code for the main algorithm—all based on straightforward school math!??

A look into the future: The future of responsible personalized AI and the role of vector databases in the browser

Imagine a future where your browser is also your personal and secure AI hub, with function, similar to password manager.

An optimized version of a vector store could become a standard browser feature—always running locally, indexing on your demand what you need, and serving as a “personal memory” for all your AI-powered agents, giving YOU the control what the AI-Copilots knows about you.

Different applications and websites could tap into this shared vector database to provide deeply personalized experiences, understanding your interests, goals, and needs at a semantic level you allows.

Like a password manager, but for your personal digital fingerprint, with you owning your data and deciding who can access it.

Of course, this raises important questions about responsible AI practices: How do we ensure privacy and security when sharing a common vector database across applications? How can we guarantee fairness and prevent biases from creeping into this shared memory?

The concept of Personalized Memory Fingerprint (PMF)

While this idea might sound unusual at first, it’s arguably less strange than many things we already do today, like handing all your data to OpenAI or relying on password managers to store sensitive details.

What do you think about the future of personalized AI and where would you store this data? Would you try such a personalized memory fingerprint?

In the next article, we will go through all elements of building VectorVault, a client-side vector search application that demonstrates how RAG works implementing it directly in your browser. By exploring this project together, we will learn how to optimize RAG results you’ll, strategies of splitting documents into chunks and creating embeddings, storing and retrieving embeddings and using them to power AI-driven applications with privacy in mind.

Stay tuned as we dive deeper into the technical implementation and looking forward to your feedback!

Dominik Schlicht Igor Aranovsky Dr. Jochen Malinowski Dominik Krimpmann, PhD Kathrin Schwan Janina M. Kummerfeldt Irina Adamchic, PhD Vanessa Cann Michael Lebacher Giovanni Milani Jiri Kram Andrew Smith Nick Rosa Claudia Leischner Christian Erb Robert Weber

Industrial Generative AI

4,347 位关注者

Dr. Klaus Ries

2 个月

A second strain in this article is agent memory. I believe this is a big open question. On important aspect is probably captured by MemGPT and Letta: https://github.com/letta-ai/letta A second aspect that I reason about is how agents are allowed to use information across application boundaries, potentially scrutenizing information security or privacy. A third aspect is how the state of the agent is synchronied with the cloud and a local system in real time. Any ideas on these issues?

Dr. Klaus Ries

2 个月

One strain in this article is to run Vector Search in the browser. I find this interesting by itself and I intend to that myself. This would make it easy to run search on any site locally, without requiring a backend service. In many cases I believe that semantic search is enough. A site could deliver the index data and allow search to be excecuted in the browser. Great!! Any idea to go further on this?

Aleh Bichel

Tech Innovation Manager at Accenture

3 个月

Great article - thanks! I would add that the idea of personal browser search and persons search AI Agent is similar to Digital Wallet concept at Web3 where personal data and shared data is solved through distribution, public - private keys combination and consensus. I think to enable it distribution data and digital wallet / AI personal wallet concepts might help

1 次回应

Reyhan Merekar

AI/ML @ Accenture | Educator

3 个月

Excellent share here Vlad, this is what the community needs heading into the new year!

1 次回应

Amine Kharrat

Automation in Manufacturing | Robotics & Industrial Metaverse

3 个月

Really cool! Next step is to combine this with a small model running also on the browser and avoid open-ai api dependence

4 次回应

查看更多评论

要查看或添加评论，请登录

Vlad Larichev的更多文章

AI Meets the Real World: Exploring How AI Learns and Improves Through Our Engineering Tools

2025年2月10日

AI Meets the Real World: Exploring How AI Learns and Improves Through Our Engineering Tools

While everyone is talking about Generative AI Agents and their ability to use software tools, they aren’t the only ones…

4 条评论
From Backend-Code to Abstract Prompt: What will the adaptation of LLM-Orchestration-as-Code mean for software development?

2024年8月9日

From Backend-Code to Abstract Prompt: What will the adaptation of LLM-Orchestration-as-Code mean for software development?

It seems that we are getting closer to using LLMs as a backend for our applications - here's why it might be reasonable…

4 条评论
Reasoning AI - The real Game-Changer behind Large Language Models is not content Generation.

2024年4月25日

Reasoning AI - The real Game-Changer behind Large Language Models is not content Generation.

1. The Overlooked Capabilities of Large Language Models When I talk to customers, colleagues and experts about…

34 条评论
??Launching Weekly AI Catch Up: This Week in AI & GenAI

2024年4月23日

??Launching Weekly AI Catch Up: This Week in AI & GenAI

With AI evolving daily, staying updated can be challenging. That’s why I’m launching a concise newsletter on major AI…

9 条评论
5 Key Changes in Building and Buying Generative AI in 2024

2024年3月28日

5 Key Changes in Building and Buying Generative AI in 2024

In 2023, the explosion of consumer engagement with generative AI marked a tipping point with over a billion dollars…

16 条评论
11 Things to Consider when Deploying Public Services with Generative AI for your Business + How to Handle Prompt Hacking

2023年8月17日

11 Things to Consider when Deploying Public Services with Generative AI for your Business + How to Handle Prompt Hacking

Generative AI in Culinary Misadventures A New Zealand supermarket, PAK'nSAVE, ventured into the realm of AI by…

11 条评论

See all articles

Understanding RAG and Vector Databases: How Retrieval Augmented Generation Finds the Data

?? Let's build our own vector search solution - directly in the browser in 5 steps.

1) Selecting Models and creating the Pipelines

2) Creating a Vector Database in the Browser

3) Implementing Similarity Search ??

4) Uploading and Testing with Synthetic Data ??

5) Testing!

A look into the future: The future of responsible personalized AI and the role of vector databases in the browser

Industrial Generative AI

4,347 位关注者

Vlad Larichev的更多文章

AI Meets the Real World: Exploring How AI Learns and Improves Through Our Engineering Tools

From Backend-Code to Abstract Prompt: What will the adaptation of LLM-Orchestration-as-Code mean for software development?

Reasoning AI - The real Game-Changer behind Large Language Models is not content Generation.

??Launching Weekly AI Catch Up: This Week in AI & GenAI

5 Key Changes in Building and Buying Generative AI in 2024

11 Things to Consider when Deploying Public Services with Generative AI for your Business + How to Handle Prompt Hacking