Run DeepSeek AI Assistant on Your Local Machine
Introduction
DeepSeek is an advanced AI model tailored for natural language processing, featuring robust functionalities like text generation, summarization, and reasoning. Its ability to run locally makes it a fantastic option for users seeking privacy, control, and offline access to AI.
In this article, I will provide a quick overview of how to set up a local chatbot that can perform web searches using the open-source DeepSeek R1 model.
Writing this was quite challenging due to its technical nature. I grappled with it, especially since the newsletter aims to present tech concepts in a way that's easily understandable for the audience. Nevertheless, I went ahead because it would benefit those of you looking to embark on your AI journey.
Why Running Locally? Can’t I Use the DeepSeek Web Version?
Of course, you can!
When DeepSeek AI Assistant gets more and more popular, the frequency that I see this has increased:
Besides, people might have privacy concerns about using these companies’ AI assistant services.
DeepSeek sends all the data it collects on Americans to servers in China, according to the company’s terms of service.
You would not have this concern if this is only running on your computer.
Lastly, this would also be a great start to becoming familiar with some development tools if you want to learn AI.
Getting to work
DeepSeek excels in its flexibility; while it operates on CPU-only systems, utilizing a dedicated GPU greatly enhances performance. On CPU, response times can lag, and larger models might need considerable RAM. DeepSeek significantly accelerates response generation through parallel processing by using a GPU, facilitating smoother real-time interactions.
This guide will walk you through the installation and setup of DeepSeek, ensuring you can get started with AI on your machine, whether you have a high-end GPU or not.
Before we begin, ensure your system meets the minimum requirements. While DeepSeek can run on a CPU-only machine, having a high-performance processor and sufficient RAM will improve execution speed. If a compatible GPU is installed, Ollama will automatically detect and utilize it for accelerated processing.
Step 1
Ollama is a framework for running and managing large language models (LLMs) on local computing resources. It enables the loading and deployment of selected LLMs and provides access to them through an API.
Ollama is required to run DeepSeek models. It provides an optimized local runtime for running machine learning models efficiently.
Download and run the official Ollama installation script:
$ curl -fsSL https://ollama.com/install.sh | sh
After installation, verify that Ollama is installed correctly by checking its version:
$ ollama --version
Additionally, ensure that the Ollama service is running with:
$ systemctl is-active ollama.service
Now, fetch the model you want to run. DeepSeek-R1 models vary in size, balancing speed, and accuracy based on your hardware capabilities. Larger models provide better reasoning and accuracy but require more RAM, VRAM, and disk space. To install the 7B model as an example, run:
$ ollama pull deepseek-r1:14b
pulling manifest
pulling 6e9f90f02bb3... 100% ▕██████████████████████████████████▏ 9.0 GB
verifying sha256 digest
writing manifest
success
Other smaller distilled models are available for download on DeepSeek R1. However, I had a bad experience using models smaller than 14 billion parameters.
After you run the above command, you would expect to see an interactive terminal for you to enter your prompt:
Choosing the Right Model:
领英推荐
Even with 512+ GB RAM and multiple GPUs with 100+ GB VRAM, the DeepSeek-R1:671B model remains slow due to its massive 671 billion parameters, requiring an immense number of calculations per response.
While multiple GPUs improve overall throughput, they don’t significantly reduce latency for a single request, as data movement, memory bandwidth, and computational limits create bottlenecks.
Even high-end AI infrastructure struggles with this scale, making smaller models (7B–14B) far more practical for real-time applications.
The 671B model is best suited for research and large-scale AI experiments, where precision outweighs speed.
Once the model is downloaded, you can start interacting with it directly. To run the DeepSeek-R1 model, use:
$ ollama run deepseek-r1:14b
>>> Hey DeepSeek, how are you doing today!
<think>
Hi! I'm just a virtual assistant, so I don't have feelings, but thanks for asking! How can I help you today?
</think>
If you need to interact with DeepSeek programmatically, enable the API.
$ ollama serve & curl https://localhost:11434/api/generate -d '{"model": "deepseek-r1:14b", "prompt": "Hello, how are you?"}'
Step 2
SearXNG is an open-source search engine that aggregates search results from various engines without storing or tracking user data.
Save the file as “docker-compose.yml”.
version: '3.8'
services:
searxng:
image: docker.io/searxng/searxng:latest
container_name: searxng
ports:
- "4000:8080"
volumes:
- ./searxng:/etc/searxng
restart: unless-stopped
We can spin up the search engine by typing the following in the terminal at the same directory where you saved the “docker-compose.yml” file.
docker-compose up -d
It will download the SearXNG image, and once you see the following, the search engine is running on your machine.
Open up your browser, and you will be able to see a search interface like this at https://localhost:4000/search :
Gradio is an open-source Python library that makes creating interactive web-based interfaces for your Python scripts and machine-learning models easy without needing any front-end development skills.
Let’s now install an essential component for our user interface:
pip install gradio
In the last step, let’s create a simple python script that combines all our resources and makes function calls:
import gradio as gr
import requests
import ollama
def search_web(query: str) -> list:
SEARXNG_URL = "https://localhost:4000/search"
params = {'q': query, 'format': 'json'}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
}
response = requests.get(SEARXNG_URL, params=params, headers=headers)
if response.status_code != 200:
print("Response status code:", response.status_code)
print("Response text:", response.text)
raise Exception(f"Search query failed with status code {response.status_code}")
return response.json().get("results", [])
def chat_with_search(query: str, use_web_search: bool):
# Optionally integrate web search based on user toggle
if use_web_search:
results = search_web(query)
context_str = format_search_results(results, max_results=5)
else:
context_str = "No additional context provided."
return generate_augmented_response(query, context_str)
def format_search_results(results: list, max_results: int = 5) -> str:
"""
Format the top search results into a context string.
"""
formatted = []
for result in results[:max_results]:
title = result.get("title", "No title")
url = result.get("url", "No URL")
snippet = result.get("content", "No snippet")
formatted.append(f"Title: {title}\nURL: {url}\nSnippet: {snippet}")
return "\n\n".join(formatted)
def generate_augmented_response(query: str, context: str) -> str:
"""
Combine the user's query with the search context and send it to DeepSeek R1 via Ollama.
"""
# Create a composite prompt
composite_prompt = f"""
{context}
Please use the following web search results to provide the detailed summary of the request above.
{query}
Answer:"""
response = ollama.chat(
model="deepseek-r1:14b",
messages=[
{"role": "user", "content": composite_prompt}
]
)
return response["message"]["content"]
iface = gr.Interface(
fn=chat_with_search,
inputs=[
gr.Textbox(label="Query"),
gr.Checkbox(label="Enable Web Search", value=True)
],
outputs="text",
title="Houshang - Deepseek-r1:14b AI Model",
description="Ask questions and get answers augmented with real-time web search results."
)
iface.launch(share=False, debug=False, server_name="0.0.0.0")
Let’s run it in the terminal:
python app.py
You are expected to see this:
* Running on local URL: https://127.0.0.1:7860
You may now see the app on https://127.0.0.1:7860 by your browser:
Also, you can use the app's API to interact with it:
Congrats! You now have a local chatbot and do not have to rely so much on DeepSeek applications.
You can try typing in some prompts and interacting with the chatbot to see how it performs. Gradio provided a “Flag” button, a simple UI functionality that allows you to save the response and chat history in the project working directory as a CSV file.
Conclusion
This was just a simple demo of setting up a functional local chatbot. Many optimizations could be made, such as improving the response formatting, using a better-distilled model (more parameters), improving the UI, providing feedback to the model, and preserving context history. You might also want to refactor and organize the code into a proper software project when you become more seasoned in programming.
Cybersecurity Ph.D. candidate | Solution architect | Business analyst | Project Manager | Software Designer and Developer at Securities and Exchange Organization of Iran (SEO)
1 个月???? ?????? . ???? ????? ???????
Data Manager
1 个月Ali Yadegari