登录查看更多内容

Run DeepSeek AI Assistant on Your Local Machine

Amir Azarmi var

PKI & Cryptography | IT Solution Architect | Blockchain Enthusiast

发布日期: 2025年2月12日

Introduction

DeepSeek is an advanced AI model tailored for natural language processing, featuring robust functionalities like text generation, summarization, and reasoning. Its ability to run locally makes it a fantastic option for users seeking privacy, control, and offline access to AI.

In this article, I will provide a quick overview of how to set up a local chatbot that can perform web searches using the open-source DeepSeek R1 model.

Writing this was quite challenging due to its technical nature. I grappled with it, especially since the newsletter aims to present tech concepts in a way that's easily understandable for the audience. Nevertheless, I went ahead because it would benefit those of you looking to embark on your AI journey.

Why Running Locally? Can’t I Use the DeepSeek Web Version?

Of course, you can!

When DeepSeek AI Assistant gets more and more popular, the frequency that I see this has increased:

I can imagine how busy it was. — I can imagine how busy it was

Besides, people might have privacy concerns about using these companies’ AI assistant services.

DeepSeek sends all the data it collects on Americans to servers in China, according to the company’s terms of service.

You would not have this concern if this is only running on your computer.

Lastly, this would also be a great start to becoming familiar with some development tools if you want to learn AI.

Getting to work

DeepSeek excels in its flexibility; while it operates on CPU-only systems, utilizing a dedicated GPU greatly enhances performance. On CPU, response times can lag, and larger models might need considerable RAM. DeepSeek significantly accelerates response generation through parallel processing by using a GPU, facilitating smoother real-time interactions.

This guide will walk you through the installation and setup of DeepSeek, ensuring you can get started with AI on your machine, whether you have a high-end GPU or not.

Before we begin, ensure your system meets the minimum requirements. While DeepSeek can run on a CPU-only machine, having a high-performance processor and sufficient RAM will improve execution speed. If a compatible GPU is installed, Ollama will automatically detect and utilize it for accelerated processing.

Step 1

Install Ollama

Ollama is a framework for running and managing large language models (LLMs) on local computing resources. It enables the loading and deployment of selected LLMs and provides access to them through an API.

Ollama is required to run DeepSeek models. It provides an optimized local runtime for running machine learning models efficiently.

Download and run the official Ollama installation script:

$ curl -fsSL https://ollama.com/install.sh | sh

After installation, verify that Ollama is installed correctly by checking its version:

$ ollama --version

Additionally, ensure that the Ollama service is running with:

$ systemctl is-active ollama.service

Download DeepSeek-R1

Now, fetch the model you want to run. DeepSeek-R1 models vary in size, balancing speed, and accuracy based on your hardware capabilities. Larger models provide better reasoning and accuracy but require more RAM, VRAM, and disk space. To install the 7B model as an example, run:

$ ollama pull deepseek-r1:14b
pulling manifest
pulling 6e9f90f02bb3... 100% ▕██████████████████████████████████▏ 9.0 GB
verifying sha256 digest
writing manifest
success

Other smaller distilled models are available for download on DeepSeek R1. However, I had a bad experience using models smaller than 14 billion parameters.

After you run the above command, you would expect to see an interactive terminal for you to enter your prompt:

Available DeepSeek-R1 Models, Hardware Requirements and Recommendations

Choosing the Right Model:

1.5B – 7B models: Best for everyday tasks, chat applications, and lightweight inference.
8B – 14B models: Balanced models offer improved reasoning while staying relatively efficient.
32B – 70B models: Highly advanced, suitable for research and deep analysis, but require substantial resources.
671B model: Requires data-center-level hardware. Used for cutting-edge AI research.

领英推荐

?? Top AI Papers of the Week

DAIR.AI 1 个月前

The week's top generative AI updates - August 7, 2024

SymphonyAI 7 个月前

Gemma 2B Beats GPT-3.5, Taco Bell’s AI Drive-Thrus…

The AI Journal 7 个月前

Even with 512+ GB RAM and multiple GPUs with 100+ GB VRAM, the DeepSeek-R1:671B model remains slow due to its massive 671 billion parameters, requiring an immense number of calculations per response. 
While multiple GPUs improve overall throughput, they don’t significantly reduce latency for a single request, as data movement, memory bandwidth, and computational limits create bottlenecks. 
Even high-end AI infrastructure struggles with this scale, making smaller models (7B–14B) far more practical for real-time applications. 
The 671B model is best suited for research and large-scale AI experiments, where precision outweighs speed.

Begin Using DeepSeek

Once the model is downloaded, you can start interacting with it directly. To run the DeepSeek-R1 model, use:

$ ollama run deepseek-r1:14b
>>> Hey DeepSeek, how are you doing today!

<think>
Hi! I'm just a virtual assistant, so I don't have feelings, but thanks for asking! How can I help you today?
</think>

Use a Local API for Integration

If you need to interact with DeepSeek programmatically, enable the API.

$ ollama serve & curl https://localhost:11434/api/generate -d '{"model": "deepseek-r1:14b", "prompt": "Hello, how are you?"}'

Step 2

Running Your Search Engine

SearXNG is an open-source search engine that aggregates search results from various engines without storing or tracking user data.

Save the file as “docker-compose.yml”.

version: '3.8'
services:
  searxng:
    image: docker.io/searxng/searxng:latest
    container_name: searxng
    ports:
      - "4000:8080"
    volumes:
      - ./searxng:/etc/searxng
    restart: unless-stopped

We can spin up the search engine by typing the following in the terminal at the same directory where you saved the “docker-compose.yml” file.

docker-compose up -d

It will download the SearXNG image, and once you see the following, the search engine is running on your machine.

Open up your browser, and you will be able to see a search interface like this at https://localhost:4000/search :

SearXNG user interface. — SearXNG user interface

Install Gradio

Gradio is an open-source Python library that makes creating interactive web-based interfaces for your Python scripts and machine-learning models easy without needing any front-end development skills.

Let’s now install an essential component for our user interface:

pip install gradio

Writing the app to make function calls

In the last step, let’s create a simple python script that combines all our resources and makes function calls:

import gradio as gr
import requests
import ollama

def search_web(query: str) -> list:
    SEARXNG_URL = "https://localhost:4000/search"
    params = {'q': query, 'format': 'json'}
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
    }

    response = requests.get(SEARXNG_URL, params=params, headers=headers)
    if response.status_code != 200:
        print("Response status code:", response.status_code)
        print("Response text:", response.text)
        raise Exception(f"Search query failed with status code {response.status_code}")
    return response.json().get("results", [])

def chat_with_search(query: str, use_web_search: bool):
    # Optionally integrate web search based on user toggle
    if use_web_search:
        results = search_web(query)
        context_str = format_search_results(results, max_results=5)
    else:
        context_str = "No additional context provided."
    return generate_augmented_response(query, context_str)

def format_search_results(results: list, max_results: int = 5) -> str:
    """
    Format the top search results into a context string.
    """
    formatted = []
    for result in results[:max_results]:
        title = result.get("title", "No title")
        url = result.get("url", "No URL")
        snippet = result.get("content", "No snippet")
        formatted.append(f"Title: {title}\nURL: {url}\nSnippet: {snippet}")
    return "\n\n".join(formatted)

def generate_augmented_response(query: str, context: str) -> str:
    """
    Combine the user's query with the search context and send it to DeepSeek R1 via Ollama.
    """
    # Create a composite prompt
    composite_prompt = f"""
{context}
Please use the following web search results to provide the detailed summary of the request above.
{query}
Answer:"""
    response = ollama.chat(
        model="deepseek-r1:14b",
        messages=[
            {"role": "user", "content": composite_prompt}
        ]
    )
    return response["message"]["content"]

iface = gr.Interface(
    fn=chat_with_search,
    inputs=[
        gr.Textbox(label="Query"),
        gr.Checkbox(label="Enable Web Search", value=True)
    ],
    outputs="text",
    title="Houshang - Deepseek-r1:14b AI Model",
    description="Ask questions and get answers augmented with real-time web search results."
)

iface.launch(share=False, debug=False, server_name="0.0.0.0")

Let’s run it in the terminal:

python app.py

You are expected to see this:

* Running on local URL:  https://127.0.0.1:7860

You may now see the app on https://127.0.0.1:7860 by your browser:

Houshang!
The DeepSeek-r1:14B AI Assistant — Houshang - The DeepSeek-r1:14B AI Assistant

Also, you can use the app's API to interact with it:

Congrats! You now have a local chatbot and do not have to rely so much on DeepSeek applications.

You can try typing in some prompts and interacting with the chatbot to see how it performs. Gradio provided a “Flag” button, a simple UI functionality that allows you to save the response and chat history in the project working directory as a CSV file.

Conclusion

This was just a simple demo of setting up a functional local chatbot. Many optimizations could be made, such as improving the response formatting, using a better-distilled model (more parameters), improving the UI, providing feedback to the model, and preserving context history. You might also want to refactor and organize the code into a proper software project when you become more seasoned in programming.

saber faramarzi

Cybersecurity Ph.D. candidate | Solution architect | Business analyst | Project Manager | Software Designer and Developer at Securities and Exchange Organization of Iran (SEO)

1 个月

???? ?????? . ???? ????? ???????

1 次回应

Mahdi Moghimi

Data Manager

1 个月

Ali Yadegari

1 次回应

查看更多评论

要查看或添加评论，请登录

Amir Azarmi var的更多文章

Understanding Access Control Mechanisms: A Deep Dive in Models

2025年1月28日

Understanding Access Control Mechanisms: A Deep Dive in Models

Access control mechanisms are essential for safeguarding sensitive information and ensuring that only authorized…

1 条评论
Technical Overview About Using Hardware Security Module (HSM) in Payment Systems

2024年10月8日

Technical Overview About Using Hardware Security Module (HSM) in Payment Systems

Payment systems today must adhere to stringent security protocols to ensure the safety and integrity of transactions…

5 条评论
Create Your Own Digital Signature Service (DSS)

2024年2月28日

Create Your Own Digital Signature Service (DSS)

Introduction Digital signatures are like electronic “fingerprints.” In the form of a coded message, the digital…

3 条评论
BlockChain and DApps concepts

2024年2月24日

BlockChain and DApps concepts

Blockchain technology has revolutionized various industries by providing a decentralized and secure framework for…

Run DeepSeek AI Assistant on Your Local Machine

Amir Azarmi var

PKI & Cryptography | IT Solution Architect | Blockchain Enthusiast

Introduction

Why Running Locally? Can’t I Use the DeepSeek Web Version?

Getting to work

Step 1

领英推荐

Step 2

Conclusion

Amir Azarmi var的更多文章

社区洞察

其他会员也浏览了

?? AI K-news #13

GPT-4o Mini: Bridging the Gap Between Cost and Capability in AI

Techniques for Boosting AI Model Performance

Patenting AI Algorithms: Understanding the Challenges and Opportunities

The two paradigms of Artificial Intelligence: OpenAI's Approach to Building Thinking Machines

Janus Pro 7B: Key Features, Benefits & Drawbacks

Deep Deconstruction: The Core Differences and Strategic Advantages between Google Gemini and SearchGPT

Weekly AI Agents report

Transforming AI with Retrieval-Augmented Generation (RAG)

Empowering Artificial Intelligence with RAG: The New Era of Retrieval and Content Generation with Databricks and Mosaic AI

Introduction

Why Running Locally? Can’t I Use the DeepSeek Web Version?

Getting to work

Step 1

领英推荐

Step 2

Conclusion

Amir Azarmi var的更多文章

Understanding Access Control Mechanisms: A Deep Dive in Models

Technical Overview About Using Hardware Security Module (HSM) in Payment Systems

Create Your Own Digital Signature Service (DSS)

BlockChain and DApps concepts

社区洞察

其他会员也浏览了

?? AI K-news #13

GPT-4o Mini: Bridging the Gap Between Cost and Capability in AI

Techniques for Boosting AI Model Performance

Patenting AI Algorithms: Understanding the Challenges and Opportunities

The two paradigms of Artificial Intelligence: OpenAI's Approach to Building Thinking Machines

Janus Pro 7B: Key Features, Benefits & Drawbacks

Deep Deconstruction: The Core Differences and Strategic Advantages between Google Gemini and SearchGPT

Weekly AI Agents report

Transforming AI with Retrieval-Augmented Generation (RAG)

Empowering Artificial Intelligence with RAG: The New Era of Retrieval and Content Generation with Databricks and Mosaic AI