登录查看更多内容

Use a Simple Web Wrapper to Share the Local Deep Seek-R1 Model Service to LAN Users

Yuancheng Liu

Head of Technology at National Cybersecurity R&D Laboratories SG

发布日期: 2025年3月2日

In the previous article Deploying DeepSeek-R1 Locally with a Custom RAG Knowledge Data Base, we introduced the detail steps about deploying the DeepSeek-R1:7b locally with a customized RAG knowledge database on a desktop with RTX3060. Once the LLM deepseek-r1:7b is running on the local GPU-equipped computer, a new challenge emerges: we can only use the LLM service on the GPU computer, what if we want to use it from other devices in my LAN, is there any way I can access it from a mobile device or share this service with other computer in the same network? By default, Ollama only opens its API to the localhost, meaning that external devices in your LAN cannot easily interact with the model. Changing the configuration to expose the API fully may solve connectivity issues—but it also removes the safeguards that limit potentially risky operations, such as creating complete conversational chains. The user needs a controlled interface that can:

Limits access to only the necessary functions (e.g., sending questions and receiving responses).
Provides an intuitive web-based interface for mobile devices.
Enables programmatic API calls with controlled access.
Acts as a central hub for connecting to multiple GPU servers and different DeepSeek LLM versions.

# Created:     2025/03/01
# version:     v_0.0.1
# Copyright:   Copyright (c) 2025 LiuYuancheng
# License:     MIT License

Introduction

This article provides an overview of the Flask wrapper, explores practical use case scenarios, and explains how to configure Ollama to expose the service for LLM API calls. We will explore how a simple Python-Flask-based web wrapper acts as a controlled “bridge” between the local LLM service (deepseek-r1) and LAN users and fulfill below five request:

Connect multiple local GPU servers running different DeepSeek LLM versions within a subnet.
Limit or filter the Ollama API access for the LAN user.
Enable remote testing and performance comparison of LLM responses.
Provide controlled access to specialized/fine-tuned models without exposing server credentials.
Facilitate prompt engineering by modifying user queries before model submission.

The use case flow diagram is shown below:

Figure-01: Use case work flow diagram, version v_0.0.1 (2025)

By implementing this web wrapper, users gain secure, controlled access to DeepSeek-R1 models through a user-friendly interface, suitable for both web-based and programmatic interaction.

Introduction of the DeepSeek Flask Web Wrapper

This application provides a user-friendly interface for remote access to multiple LLM models running on different GPUs ( using the Ollama host the model). The chatbot is designed for the following purposes:

Testing the functionality of GPU-hosted Ollama LLM instances.
Allowing shared access to specialized LLMs (fine-tuned or RAG embedded) without requiring direct SSH access.
Comparing the performance of different LLM models, such as DeepSeek R1-1.5B and DeepSeek R1-7B, in response to the same query.

The workflow is very simple:

User → Web Wrapper (Port 5000) → Ollama API (Port 11434, localhost-only/remote) → Response ?

The chat bot web UI is shown below :

Figure-02: Flask Deepseek service wrapper web chat bot UI , version v_0.0.1 (2025)

Users can interact with the chatbot via a web-based UI that includes a model selection dropdown in the navigation bar. The mobile device (phone) view is shown below:

Figure-03: Flask Deepseek service wrapper web chat bot UI on Iphone , version v_0.0.1 (2025)

The remote program API function calls (Http GET ) is show below:

resp = requests.get("https://127.0.0.1:5000/getResp", json={'model':'localhost-DS1.5b', 'message':"who are you"})
print(resp.content)

Program source repo: https://github.com/LiuYuancheng/Deepseek_Local_LATA/tree/main/Testing/1_Simple_Flask_Deepseek_ChatBot

Expose Ollama Service API in LAN

Using the web wrapper, you can safely expose your Ollama service to LAN users. Instead of directly modifying Ollama’s configuration—which would expose all API functions—the wrapper acts as an intermediary.

For example, a typical API request via the wrapper might look like this:

curl https://localhost:11434/api/generate -d '{ "model": "deepseek-r1:1.5b", "prompt": "Why is the sky blue?"}'

This controlled access ensures that while users can send questions and receive answers, but we do not have the ability to modify internal system states or access logs and debugging details and if you use a mobile device such as phone or Ipad which are not easy to create a command line, it will be inconvenient to use the Ollama server.

To configure the Ollama server for different OS,

Setting environment variables on Mac

If Ollama is run as a macOS application, environment variables should be set using launchctl:

For each environment variable, call launchctl setenv.

launchctl setenv OLLAMA_HOST "0.0.0.0:11434"

Restart Ollama application.

Setting environment variables on Linux

If Ollama is run as a systemd service, environment variables should be set using systemctl:

Edit the systemd service by calling systemctl edit ollama.service. This will open an editor.
For each environment variable, add a line Environment under section [Service]:
Save and exit.
Reload systemd and restart Ollama:

Setting environment variables on Windows

On Windows, Ollama inherits your user and system environment variables.

First Quit Ollama by clicking on it in the task bar.
Start the Settings (Windows 11) or Control Panel (Windows 10) application and search for environment variables.
Click on Edit environment variables for your account.
Edit or create a new variable for your user account for OLLAMA_HOST, set its value to 0.0.0.0
Click OK/Apply to save.
Start the Ollama application from the Windows Start menu.

Reference : https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server

Use Case Scenarios

Use Case Scenario 01: Secure Sharing on a Headless GPU Server

Problem: Imagine you have a GPU server running DeepSeek on an Ubuntu system without a desktop environment. You want to share the LLM service with others on the same subnet without exposing SSH credentials or the full Ollama API functionality.

You want to limited the access such as only allow response without showing deepseek's "thinking" log and you also want to add some customized filter for the user's request and LLM's response.

The wrapper solution and work flow diagram is shown below:

The Flask web wrapper allows you to:

Expose only the necessary API endpoints (e.g., sending questions and receiving answers) on port 5000.
Prevent direct access to sensitive parts of the Ollama API.
Provide a clean web interface accessible from any device on the network, including mobile devices.
Provide a limited http API for the other program running on computer in the same LAN.

Use Case Scenario 02: Customized Query Handling Based on User Expertise

Problem: Different users have different levels of expertise. For example, A beginner might need a simplified explanation of an algorithm like bubble sort, whereas an expert might require a detailed technical example.

The wrapper solution and work flow diagram is shown below:

The wrapper can intercept user queries and append context-specific prompts before sending the query to the LLM. For example:

Beginner Query: The system modifies “What is bubble sort?” to “I am new beginner to sorting algorithms. What is bubble sort?”
Advanced Query: It transforms the question into “I am an expert and need a Python example. What is bubble sort?”

This dynamic prompt engineering tailors responses to the user’s needs.

Use Case Scenario 03: Multi-GPU Server and Model Comparison

Problem: In environments with several GPU servers running various DeepSeek LLM models (such as DeepSeek R1-1.5B, DeepSeek R1-7B, and DeepSeek Coder V2), comparing the performance and responses of these models can be challenging when managed separately.

The wrapper solution and work flow diagram is shown below:

The web wrapper serves as a central hub that:

Connects to multiple local GPU servers.
Provides a dropdown menu to select different models.
Allows users to send the same query to multiple models for easy performance and response comparison.
Offers a controlled and consistent interface regardless of the underlying server.

Use Case Scenario 04: GPUs Load Balancing and Requests Monitoring

Problem :

In a multi-GPU cluster, efficiently managing requests from different users or node IP addresses can be challenging. Without proper request distribution, some GPUs may become overloaded while others remain underutilized. Additionally, logging request data for monitoring and optimization purposes is crucial.

The wrapper solution :

The web wrapper acts as a request management layer, implementing a queue system to log user queries and distribute them efficiently across available GPU servers. By balancing workloads dynamically, it prevents overloading a single GPU while ensuring optimal resource utilization. The request logs can also be stored for analysis, enabling administrators to track usage patterns and improve system performance.

Program Deploy and Usage

To install the Ollama and setup deep seek model in a local computer, please follow the "Step 1 : Deploy DeepSeek-R1 Model on Your Local Machine" in this manual: https://github.com/LiuYuancheng/Deepseek_Local_LATA/blob/main/Articles/1_LocalDeepSeekWithRAG/readme.md

To deploy the program, please follow the setup section in the wrapper read me file : https://github.com/LiuYuancheng/Deepseek_Local_LATA/blob/main/Testing/1_Simple_Flask_Deepseek_ChatBot/readme.md

Then modify the app.py to add the GPU server (Ollama service) details with a unique ID in the wrapper program:

OllamaHosts[<unique_ID>] = {'ip': <host IP address>, 'model': <llm model name>}

Execute the following command to start the chatbot:

python app.py

Access the web UI at https://127.0.0.1:5000/ or https://<wrapper_host_IP>:5000/ and select the desired model from the dropdown menu.

API Request : For program usage, use python request lib to send a http GET request to get response:

requests.get("https://<Ollama_service IP>:5000/getResp", json={'model':'<model ID>', 'message':"<Questions>"})

Alternatively, refer to requestTest.py for more API usage examples.

Conclusion

A simple Flask wrapper unlocks powerful use cases for local LLM deployments:

Security: Limits exposure of Ollama endpoints.
Accessibility: Provides mobile-friendly interfaces.
Flexibility: Enables prompt engineering, multi-model testing, and load balancing.

The simple Flask web wrapper is a powerful solution for sharing the DeepSeek-R1 model service with LAN users securely and efficiently. By bridging the gap between the local Ollama API and external devices, the wrapper ensures that the service remains accessible yet controlled. Whether you’re looking to offer a streamlined mobile interface, safeguard sensitive API endpoints, or compare multiple LLM models, this approach addresses common challenges and enhances the usability of local DeepSeek deployments.

Reference

Thanks for spending time to check the article detail, if you have any question and suggestion or find any program bug, please feel free to message me. Many thanks if you can give some comments and share any of the improvement advice so we can make our work better ~

RK Pillai

Founder, President and CEO,AIIPLTECH Pvt Ltd. Founder Director, Past Chairman, USIIC US INDIA IMPORTS COUNCIL,

6 天前

Very informative

1 次回应

查看更多评论

要查看或添加评论，请登录

Yuancheng Liu的更多文章

Deploying DeepSeek-R1 Locally with a Custom RAG Knowledge Data Base

2025年2月9日

Deploying DeepSeek-R1 Locally with a Custom RAG Knowledge Data Base

The primary goal of this article is to explore how to deploy the popular open-source large language model (LLM)…

6 条评论
3D Visualize Your GitHub Yearly Contributions Matrix

2025年1月26日

3D Visualize Your GitHub Yearly Contributions Matrix

This article will introduce a very simple way to transform your yearly contributions matrix into a stunning 3D skyline…

1 条评论
Power Grid Simulation System 02 : S7Comm FDI Power Outage Attack Case Study

2025年1月12日

Power Grid Simulation System 02 : S7Comm FDI Power Outage Attack Case Study

We are excited to share that the Power Grid Simulation System (version v_0.1.

9 条评论
SIEM Big Data Visualization [05] : P2PComm_GeoTopology_Map_Plugin_App

2024年12月22日

SIEM Big Data Visualization [05] : P2PComm_GeoTopology_Map_Plugin_App

This article will introduce the Peer to Peer Communication Geolocation Topology Map Plugin App developed for SIEM big…
Python PLC Honeypot 02: System Deployment and Attack Detection

2024年12月1日

Python PLC Honeypot 02: System Deployment and Attack Detection

After explored the design principles of the Python PLC Honeypot Project last week, this follow-up article provides the…

15 条评论
Python PLC Honeypot Project

2024年11月24日

Python PLC Honeypot Project

This article will give a general introduction about the Python PLC Honeypot System we developed for cyber security…

32 条评论
SIEM Big Data Visualization [04] : Data Transmission Latency SIEM Log Analysis Dashboard

2024年11月3日

SIEM Big Data Visualization [04] : Data Transmission Latency SIEM Log Analysis Dashboard

This article will introduce the Data Transmission Latency Log Analysis Dashboard developed for SIEM big data analytics.…
SIEM Big Data Visualization [03]: Graph-Based SIEM Log Analysis Dashboard

2024年10月27日

SIEM Big Data Visualization [03]: Graph-Based SIEM Log Analysis Dashboard

In this article, I will introduce the Graph-Based SIEM Log Analysis Dashboard (angular plugin) developed for SIEM big…

4 条评论
SIEM Big Data Visualization [02]: SG National Threats Summarization

2024年10月20日

SIEM Big Data Visualization [02]: SG National Threats Summarization

Dashboard for Summarizing SG National Cyber Threats in Critical Infrastructure In this article, I will introduce the…

1 条评论
Power Grid OT Simulation System

2024年10月13日

Power Grid OT Simulation System

Mini OT-Energy-System Cyber Security Digital Twin This article will give a general introduction about the Power Grid…

14 条评论

See all articles

Introduction

Introduction of the DeepSeek Flask Web Wrapper

Expose Ollama Service API in LAN

Use Case Scenarios

Use Case Scenario 01: Secure Sharing on a Headless GPU Server

Use Case Scenario 02: Customized Query Handling Based on User Expertise

Use Case Scenario 03: Multi-GPU Server and Model Comparison

Use Case Scenario 04: GPUs Load Balancing and Requests Monitoring

Program Deploy and Usage

Conclusion

Reference

Yuancheng Liu的更多文章

Deploying DeepSeek-R1 Locally with a Custom RAG Knowledge Data Base

3D Visualize Your GitHub Yearly Contributions Matrix

Power Grid Simulation System 02 : S7Comm FDI Power Outage Attack Case Study

SIEM Big Data Visualization [05] : P2PComm_GeoTopology_Map_Plugin_App

Python PLC Honeypot 02: System Deployment and Attack Detection

Python PLC Honeypot Project

SIEM Big Data Visualization [04] : Data Transmission Latency SIEM Log Analysis Dashboard

SIEM Big Data Visualization [03]: Graph-Based SIEM Log Analysis Dashboard

SIEM Big Data Visualization [02]: SG National Threats Summarization

Power Grid OT Simulation System