登录查看更多内容

RAG Pipeline with Deepseek-R1

Sourabh Solkar

NodeJs | Java | ReactJs | NextJs | ElectronJs | AWS Cloud | CI/CD | Github Copilot | AWS S3 | Docker | Elastic Search | AWS EC2 | Solidity | Dapp | MongoDB |Mysql | Redis | Socket.io

发布日期: 2025年3月14日

+ 关注

Introduction

Before we dive into building the RAG (Retrieval-Augmented Generation) pipeline, let me set the context.

Imagine we have some non-public data, such as an internal research paper, HR policy document, or a confidential contract. Now, we want employees within the company to be able to ask questions related to that document, and the LLM should provide answers strictly based on the context of that document, not generic answers that traditional language models typically generate.

To solve this problem, we are going to build a RAG pipeline that will fetch the most relevant data from the document and pass it to the LLM for more accurate and context-aware responses.

Prerequisite

Python
Jupyter Notebook (optional)

Step 1: Collect Text Data

Gather any non-public text data, such as a dummy HR policy, research paper, or a sample contract. Alternatively, you can quickly create a personal text document (avoiding any sensitive information).

text = """
put_dummy_data_here
"""

Step 2: Split Data into Chunks

Utilize libraries to break the text into smaller chunks. Here, we are using Langchain for this purpose:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=2)  
chunks = splitter.split_text(text)

Step 3: Convert Chunks into Vectors (Embedding Process)

In this step, we transform text data into numeric data, known as embeddings. This allows the model to understand the semantic meaning of the text.

from langchain_huggingface import HuggingFaceEmbeddings

# Generate embeddings
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
embeddings = embedding_model.embed_documents(chunks)
print(embeddings)

Step 4: Embed the Query

Now, embed the query to convert it into a numeric vector, similar to the text chunks.

# Embed the query
query = "What is the timeline for review?"
query_embedding = embedding_model.embed_query(query)
print(query_embedding)

Step 5: Reshape the Query Embedding

To ensure compatibility for similarity search, reshape the query embedding into a 2D array.

import numpy as np

# Reshape query embedding to 2D array
query_embedding = np.array(query_embedding).reshape(1, -1)

Step 6: Compute Similarity Scores

from sklearn.metrics.pairwise import cosine_similarity

# Compute similarity between query and all document embeddings
similarity_scores = cosine_similarity(query_embedding, embeddings)

Step 7: Retrieve Top Matching Chunks

top_k_indices = similarity_scores[0].argsort()[::-1]

# Print the top matching chunks
top_k = 3
context = ""

for i in top_k_indices[:top_k]:
    print(f"Score: {similarity_scores[0][i]:.4f} -> {chunks[i]}")
    
    # Append the top chunks to build the context
    context += chunks[i] + " "

print(context)

Step 8: Build the Final Prompt for the LLM

Combine the retrieved context and the query to create the final prompt.

final_prompt = f"Context: {context} \n\nQuery: {query} \n\nAnswer:"
print(final_prompt)

This prompt will be fed into the language model (LLM) to generate a more accurate and context-aware response. in our case we choose deepseek-r1 because its free :-)

Step 9: Set Up Deepseek-R1 Model

To run Deepseek-R1 locally, we'll use Ollama, a platform that allows running LLMs on your local machine without relying on cloud-based APIs

1?? Install Ollama

Visit the official site: ollama.com
Download and install the Ollama CLI based on your OS (Windows, Mac, or Linux).

2?? Pull the Deepseek-R1 Model

Run the following command to download the Deepseek-R1 (1.5B) model:

ollama run deepseek-r1:1.5b

Deepseek-R1 1.5B is lightweight and ideal for local setups. Heavier models like 7B or 67B require more GPU power and RAM.

3?? Run Ollama Server

After pulling the model, start the Ollama server:

ollama serve

By default, Ollama will start running Deepseek at:https://localhost:11434

Step 10: Generate the Final Answer Using Deepseek-R1

Now that we have the query and relevant context, it's time to generate the final answer by sending the prompt to Deepseek-R1 via Ollama API.

import requests
import json

response = requests.post(
    "https://localhost:11434/api/chat",
    json={
        "model": "deepseek-r1:1.5b",
        "messages": [
            {"role": "user", "content": final_prompt}
        ]
    },
)

# Handle streaming response
for chunk in response.iter_lines():
    if chunk:
        data = json.loads(chunk.decode('utf-8'))
        message = data.get('message', {}).get('content', '')
        if message:
            print(message, end='', flush=True)

1?? We're sending a POST request to Deepseek's local endpoint (localhost:11434) via Ollama API.

2?? The final prompt (context + query) is passed in the messages body.

3?? We're handling the streaming response, which allows us to read the output in real-time.

4?? Finally, the response is printed as the LLM generates the text.

? Congrats! Your RAG Pipeline with Deepseek-R1 is Complete ??

Github : https://github.com/jhm164/RAG

yashima gupta

Event Executive @ AI CERTs? | Event Management, Sponsorship

1 周

Sourabh, your work on the RAG pipeline is impressive! If you're interested in AI and HR, I thought you might find value in an upcoming free webinar hosted by AI CERTs on "Transforming HR with AI: From Recruitment to Employee Engagement" on March 27, 2025. Anyone interested can register at: https://bit.ly/y-transforming-hr. Participants will also receive a certification of participation.

1 次回应

要查看或添加评论，请登录

Sourabh Solkar的更多文章

RAG (Retrieval-Augmented Generation): Enhancing AI Responses with Relevant Information

2025年3月9日

RAG (Retrieval-Augmented Generation): Enhancing AI Responses with Relevant Information

The Problem: LLMs and Private Data Large Language Models (LLMs) like ChatGPT, OpenAI, and Gemini are not trained on…
How to Set Up Your First CircleCI Pipeline in 10 Minutes

2025年2月23日

How to Set Up Your First CircleCI Pipeline in 10 Minutes

CircleCI is a Continuous Integration and Continuous Deployment (CI/CD) platform that automates the process of building,…
How to Build a Music Streaming Platform Like Spotify

2025年1月11日

How to Build a Music Streaming Platform Like Spotify

Introduction Since its inception, music streaming platforms have undergone numerous changes. One of the biggest…
My First ML Model Training to Detect Mobile Cover

2024年12月19日

My First ML Model Training to Detect Mobile Cover

I decided to dive into machine learning by training my first model to detect mobile covers—both back and front. As an…
A Beginner's Guide to Creating Your First C++ Node.js Addon

2024年4月11日

A Beginner's Guide to Creating Your First C++ Node.js Addon

Introduction: Node.js is a powerful runtime environment that allows developers to run JavaScript on the server-side.
Orchestration vs Choreography

2023年11月18日

Orchestration vs Choreography

Orchestration and choreography are two different approaches to managing and coordinating the interactions between…
Deploying Your First Serverless Node.js Application on AWS

2023年9月17日

Deploying Your First Serverless Node.js Application on AWS

Step 1: Create Your Project Directory This creates a dedicated folder for your serverless project, making it easy to…

See all articles

Introduction

Prerequisite

Step 1: Collect Text Data

Step 2: Split Data into Chunks

Step 3: Convert Chunks into Vectors (Embedding Process)

Step 4: Embed the Query

Step 5: Reshape the Query Embedding

Step 6: Compute Similarity Scores

Step 7: Retrieve Top Matching Chunks

Step 9: Set Up Deepseek-R1 Model

Step 10: Generate the Final Answer Using Deepseek-R1

? Congrats! Your RAG Pipeline with Deepseek-R1 is Complete ??

Sourabh Solkar的更多文章

RAG (Retrieval-Augmented Generation): Enhancing AI Responses with Relevant Information

How to Set Up Your First CircleCI Pipeline in 10 Minutes

How to Build a Music Streaming Platform Like Spotify

My First ML Model Training to Detect Mobile Cover

A Beginner's Guide to Creating Your First C++ Node.js Addon

Orchestration vs Choreography

Deploying Your First Serverless Node.js Application on AWS

社区洞察