Deploying DeepSeek-R1 Locally with a Custom RAG Knowledge Data Base
The primary goal of this article is to explore how to deploy the popular open-source large language model (LLM) DeepSeek-R1, and integrate it with a customized Retrieval-Augmented Generation (RAG) knowledge data base on your local machine (PC/server). This setup enables the model to utilize domain-specific knowledge for expert-level responses while maintaining data privacy and customization flexibility. By doing so, users can enhance the model’s expertise in specific technical domains, enabling applications such as AI-powered support chatbots, private code generation, and industry-specific assistants. Most importantly, this setup allows users to keep proprietary data private, ensuring sensitive documents, licensed software, or non-public information remain secure while still benefiting from AI-powered insights.
The implementation in this article covers four main sections :
# Version: v_0.0.1
# Created: 2025/02/06
# License: MIT License
Introduction
DeepSeek, a Chinese AI firm, is disrupting the industry with its low-cost, open source large language models, challenging U.S. tech giants. It has shown high performance in mathematic, coding, English and Chinese Conversation. The DeepSeek-R1 model is opensource ( MIT License ). This article will explore the detailed steps to deploy the DeepSeek-R1:7B LLM model on a Windows laptop with an NVIDIA RTX 3060 (12GB GPU) to create a customized AI-powered chatbot or a program code generator using knowledge database Retrieval-Augmented Generation (RAG) and do a simple comparison between the normal LLM answer and RAG answer.
To implement this project, we will use four key tools:
This approach significantly improves AI-assisted decision-making, technical support, and software development by ensuring that responses are grounded in reliable, domain-specific information.
Background Knowledge
DeepSeek-R1: A High-Performance Open-Source LLM
DeepSeek AI is pioneering a new era of reasoning-based large language models (LLMs) with its DeepSeek-R1 series, designed to push the boundaries of mathematical, coding, and logical reasoning capabilities. Unlike traditional LLMs that rely heavily on supervised fine-tuning (SFT), DeepSeek AI adopts a reinforcement learning (RL)-first approach, enabling models to naturally develop complex reasoning behaviors.
Evolution of DeepSeek-R1 Models
- DeepSeek-R1-Zero was the first-generation model trained purely through large-scale reinforcement learning (RL), allowing it to self-verify, reflect, and generate long chain-of-thoughts (CoT) without SFT. However, it faced challenges such as language mixing, readability issues, and repetitive outputs.
- DeepSeek-R1 improved upon this by incorporating cold-start data before RL training, resulting in a more refined and human-aligned model with performance comparable to OpenAI-o1 across various reasoning benchmarks.
Reference link : https://api-docs.deepseek.com/
Understanding Retrieval-Augmented Generation (RAG)
Retrieval-augmented generation is a technique for enhancing the accuracy and reliability of generative AI models with information from specific and relevant data sources. RAG enhances generative AI models by retrieving external data before generating responses, leading to more accurate, up-to-date, and context-aware answers.
The work flow of RAS is shown below:
Step 1 : Deploy DeepSeek-R1 Model on Your Local Machine
To set up the DeepSeek-R1 model locally, you first need to install Ollama, a lightweight, extensible framework for running large language models on your machine. Then, you will download the appropriate DeepSeek-R1 model based on your hardware specifications.
1.1 Install Ollama
Download Ollama from the official website: https://ollama.com/download, and select the installation package for your operating system:
Once the installation is complete, verify that Ollama is properly installed by running the following command in a terminal:
ollama --version
If the version number shows, which means the Ollama is ready for use:
Next, start the Ollama service by running:
ollama serve
1.2 Choose the Right DeepSeek-R1 Model
DeepSeek-R1 offers models ranging from a compact 1.5 billion-parameter version to a massive 671 billion-parameter model. The model size you choose should match your GPU memory (VRAM) and system resources. In Ollama web, select models then search Deepseek as shown below:
Below is a hardware requirement table to help you decide which model to deploy. If your hardware is below the recommended specs, you can still run a larger model by using hardware optimization tools like LMStudio (https://lmstudio.ai/), but this will increase processing time. DeepSeek-R1 Hardware Requirements :
For 671b model, approximately 480 GB of VRAM. Multi-GPU setups are mandatory, with configurations such as:
Reference:
1.3 Download and Run DeepSeek-R1
For my local configuration, I use a 3060GPU(12GB), so I can try the 7b. We can use the ollama pull to down load the model or just use the run command, if the module is not download, Ollama will auto download it:
ollama run deepseek-r1:7b
Now, DeepSeek-R1 is successfully deployed on your local machine, and you can start asking AI questions directly from the terminal.
Step 2 : Install nomic-embed-text
To build the RAG (Retrieval-Augmented Generation) knowledge base, we need nomic-embed-text, which converts data (such as PDF files or text strings) into vector representations. These vector embeddings allow the AI model to understand semantic relationships between different pieces of text, improving search and retrieval accuracy.
2.1 Download nomic-embed-text
Visit the official page: https://ollama.com/library/nomic-embed-text and download the latest version as shown below:
2.2 Install via Ollama
You can also install nomic-embed-text directly using the Ollama pull command:
Once downloaded, nomic-embed-text is ready to be integrated into your RAG pipeline.
领英推荐
Step 3 : Install AnythingLLM and Deploy RAG
To set up the RAG (Retrieval-Augmented Generation) system, we will use AnythingLLM, an open-source AI chatbot that allows seamless interaction with documents.
3.1 Download and Install AnythingLLM
Visit the official AnythingLLM download page: https://anythingllm.com/desktop and download the appropriate installer for your operating system.
3.2 Create a Workspace
After installing and running AnythingLLM, create a new workspace named "DeepSeek-R1-RAG". Then, click on the "Open Settings" icon for the workspace, as shown below:
3.3 Configure LLM Settings
3.4 Configure Vector Database
3.5 Configure Embedding Model
Step 4 : Load RAG Data and Start Testing
Now that we have completed the setup, we can load documents into the RAG system and test the DeepSeek-R1 chatbot.
4.1 Prepare the Knowledge Base
We will use four PDF documents to build the AI’s knowledge base:
Power Grid Simulation System Documents
Cluster User Action Emulation (CUE) System Documents
4.2 Load Power Grid System Data
In AnythingLLM, create a "Power Grid Chat Bot" thread and click the upload icon:
Upload the 2 PDF files PowerGrid_introduction.pdf and PowerGrid_UsageManual.pdf, then select both files and click "Move to Workspace":
Then select the "Save and Embed" as shown below, after the progress finished the LLM with RAG is ready for use.
4.3 Test the DeepSeek-R1 ChatBot with Power Grid data RAG
Now we can try to ask DeepSeek-R1 a question related to the power grid simulation system and compare the result between the answer with and without RAG.
Question:
Give a short summary about the design of PLC and Remote Control Circuit Breaker Design in Power_Grid_OT_Simulation_System project.
DeepSeek-R1 (Without RAG) Answer - For DeepSeek-R1 without RAG, it listed a very general answer as shown below and the response doesn't have relationship with the project Power_Grid_OT_Simulation_System :
DeepSeek-R1 (With RAG Enabled) - The AI provides an accurate response based on the uploaded documents, covering HMI breaker control and system details:
4.4 Load the Cluster User Action Emulator Project Data
This time we remove the power grid doc and load the Cluster User Action Emulator Project introduction document CUE_Introduction.pdf and the python code API document Action_API_Doc.pdf as shown below:
4.5 Test the DeepSeek-R1 ChatBot with CUE data
Now we can try to ask DeepSeek-R1 a question related to create a python script with the lib function in cluster user action simulation system.
Question:
Help create a python script/function uses the cluster user emulator(CUE) function API to ping an IP 192.168.10.100 and ssh login to the server with (username: admin, password: P@ssword) to run a command "ifconfig" .
DeepSeek-R1 (Without RAG) Answer - The AI does not recognize CUE and incorrectly generates a solution using the requests library which is incorrect as shown below:
DeepSeek-R1 (With RAG Enabled) - The AI correctly utilizes the CUE API to generate the script. However, while it correctly finds the ping API function, it incorrectly initializes the SSH action:
As we can see the DeepSeek-R1 use the correct lib module provide in the API document and build the script. For the ping action code, it find the correct API function from API_document in page2 and used correctly. For the SSH action, it find the correct API from API_document in page 4, but it didn't init the connector object correctly:
With RAG enabled, DeepSeek-R1 can generate responses based on domain-specific documents, making it far more accurate and useful than the standard model. However, reviewing AI-generated code is still necessary to ensure correctness.
Conclusion
Deploying DeepSeek-R1 locally with a custom Retrieval-Augmented Generation (RAG) knowledge base enables AI-powered applications with enhanced domain-specific expertise while maintaining data privacy. By leveraging tools like Ollama, nomic-embed-text, and AnythingLLM, users can build intelligent chatbots, code generators, and AI-assisted decision-making systems tailored to their unique needs. The comparison between standard LLM responses and RAG-enhanced answers highlights the significant improvements in accuracy and relevance when integrating external knowledge sources. This setup not only enhances AI reliability but also ensures proprietary data remains secure, making it a powerful solution for businesses, researchers, and developers seeking localized AI-driven insights.
Thanks for reading, if you have any question and suggestion, please feel free to message me. Many thanks if you can give some comments and share any of the improvement advice so we can make our work better ~
Virtual Assistant, Social Media Management, Amazon Wholesale Product Researcher
2 周This article gives a solid guide on deploying DeepSeek-R1 with a custom RAG knowledge base! The focus on keeping data privacy intact while boosting AI capabilities is super important for organizations today. If any professionals want to level up their AI knowledge or skills, Coachers.org has personalized lessons that can help you navigate all the complexities of AI technologies.
Head of Technology at National Cybersecurity R&D Laboratories SG
3 周Now the ktransformers can support ?Deepseek-R1 and V3 on single (24GB VRAM)/multi gpu and 382G DRAM, up to 3~28x speedup: https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/DeepseekR1_V3_tutorial.md