Use a Simple Web Wrapper to Share the Local Deep Seek-R1 Model Service to LAN Users
In the previous article Deploying DeepSeek-R1 Locally with a Custom RAG Knowledge Data Base, we introduced the detail steps about deploying the DeepSeek-R1:7b locally with a customized RAG knowledge database on a desktop with RTX3060. Once the LLM deepseek-r1:7b is running on the local GPU-equipped computer, a new challenge emerges: we can only use the LLM service on the GPU computer, what if we want to use it from other devices in my LAN, is there any way I can access it from a mobile device or share this service with other computer in the same network? By default, Ollama only opens its API to the localhost, meaning that external devices in your LAN cannot easily interact with the model. Changing the configuration to expose the API fully may solve connectivity issues—but it also removes the safeguards that limit potentially risky operations, such as creating complete conversational chains. The user needs a controlled interface that can:
# Created: 2025/03/01
# version: v_0.0.1
# Copyright: Copyright (c) 2025 LiuYuancheng
# License: MIT License
Introduction
This article provides an overview of the Flask wrapper, explores practical use case scenarios, and explains how to configure Ollama to expose the service for LLM API calls. We will explore how a simple Python-Flask-based web wrapper acts as a controlled “bridge” between the local LLM service (deepseek-r1) and LAN users and fulfill below five request:
The use case flow diagram is shown below:
By implementing this web wrapper, users gain secure, controlled access to DeepSeek-R1 models through a user-friendly interface, suitable for both web-based and programmatic interaction.
Introduction of the DeepSeek Flask Web Wrapper
This application provides a user-friendly interface for remote access to multiple LLM models running on different GPUs ( using the Ollama host the model). The chatbot is designed for the following purposes:
The workflow is very simple:
User → Web Wrapper (Port 5000) → Ollama API (Port 11434, localhost-only/remote) → Response ?
The chat bot web UI is shown below :
Users can interact with the chatbot via a web-based UI that includes a model selection dropdown in the navigation bar. The mobile device (phone) view is shown below:
The remote program API function calls (Http GET ) is show below:
resp = requests.get("https://127.0.0.1:5000/getResp", json={'model':'localhost-DS1.5b', 'message':"who are you"})
print(resp.content)
Program source repo: https://github.com/LiuYuancheng/Deepseek_Local_LATA/tree/main/Testing/1_Simple_Flask_Deepseek_ChatBot
Expose Ollama Service API in LAN
Using the web wrapper, you can safely expose your Ollama service to LAN users. Instead of directly modifying Ollama’s configuration—which would expose all API functions—the wrapper acts as an intermediary.
For example, a typical API request via the wrapper might look like this:
curl https://localhost:11434/api/generate -d '{ "model": "deepseek-r1:1.5b", "prompt": "Why is the sky blue?"}'
This controlled access ensures that while users can send questions and receive answers, but we do not have the ability to modify internal system states or access logs and debugging details and if you use a mobile device such as phone or Ipad which are not easy to create a command line, it will be inconvenient to use the Ollama server.
To configure the Ollama server for different OS,
Setting environment variables on Mac
If Ollama is run as a macOS application, environment variables should be set using launchctl:
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
Setting environment variables on Linux
If Ollama is run as a systemd service, environment variables should be set using systemctl:
Setting environment variables on Windows
On Windows, Ollama inherits your user and system environment variables.
Use Case Scenarios
Use Case Scenario 01: Secure Sharing on a Headless GPU Server
Problem: Imagine you have a GPU server running DeepSeek on an Ubuntu system without a desktop environment. You want to share the LLM service with others on the same subnet without exposing SSH credentials or the full Ollama API functionality.
You want to limited the access such as only allow response without showing deepseek's "thinking" log and you also want to add some customized filter for the user's request and LLM's response.
The wrapper solution and work flow diagram is shown below:
The Flask web wrapper allows you to:
Use Case Scenario 02: Customized Query Handling Based on User Expertise
Problem: Different users have different levels of expertise. For example, A beginner might need a simplified explanation of an algorithm like bubble sort, whereas an expert might require a detailed technical example.
The wrapper solution and work flow diagram is shown below:
The wrapper can intercept user queries and append context-specific prompts before sending the query to the LLM. For example:
This dynamic prompt engineering tailors responses to the user’s needs.
Use Case Scenario 03: Multi-GPU Server and Model Comparison
Problem: In environments with several GPU servers running various DeepSeek LLM models (such as DeepSeek R1-1.5B, DeepSeek R1-7B, and DeepSeek Coder V2), comparing the performance and responses of these models can be challenging when managed separately.
The wrapper solution and work flow diagram is shown below:
The web wrapper serves as a central hub that:
Use Case Scenario 04: GPUs Load Balancing and Requests Monitoring
Problem :
In a multi-GPU cluster, efficiently managing requests from different users or node IP addresses can be challenging. Without proper request distribution, some GPUs may become overloaded while others remain underutilized. Additionally, logging request data for monitoring and optimization purposes is crucial.
The wrapper solution :
The web wrapper acts as a request management layer, implementing a queue system to log user queries and distribute them efficiently across available GPU servers. By balancing workloads dynamically, it prevents overloading a single GPU while ensuring optimal resource utilization. The request logs can also be stored for analysis, enabling administrators to track usage patterns and improve system performance.
Program Deploy and Usage
To install the Ollama and setup deep seek model in a local computer, please follow the "Step 1 : Deploy DeepSeek-R1 Model on Your Local Machine" in this manual: https://github.com/LiuYuancheng/Deepseek_Local_LATA/blob/main/Articles/1_LocalDeepSeekWithRAG/readme.md
To deploy the program, please follow the setup section in the wrapper read me file : https://github.com/LiuYuancheng/Deepseek_Local_LATA/blob/main/Testing/1_Simple_Flask_Deepseek_ChatBot/readme.md
Then modify the app.py to add the GPU server (Ollama service) details with a unique ID in the wrapper program:
OllamaHosts[<unique_ID>] = {'ip': <host IP address>, 'model': <llm model name>}
Execute the following command to start the chatbot:
python app.py
Access the web UI at https://127.0.0.1:5000/ or https://<wrapper_host_IP>:5000/ and select the desired model from the dropdown menu.
API Request : For program usage, use python request lib to send a http GET request to get response:
requests.get("https://<Ollama_service IP>:5000/getResp", json={'model':'<model ID>', 'message':"<Questions>"})
Alternatively, refer to requestTest.py for more API usage examples.
Conclusion
A simple Flask wrapper unlocks powerful use cases for local LLM deployments:
The simple Flask web wrapper is a powerful solution for sharing the DeepSeek-R1 model service with LAN users securely and efficiently. By bridging the gap between the local Ollama API and external devices, the wrapper ensures that the service remains accessible yet controlled. Whether you’re looking to offer a streamlined mobile interface, safeguard sensitive API endpoints, or compare multiple LLM models, this approach addresses common challenges and enhances the usability of local DeepSeek deployments.
Reference
Thanks for spending time to check the article detail, if you have any question and suggestion or find any program bug, please feel free to message me. Many thanks if you can give some comments and share any of the improvement advice so we can make our work better ~
Founder, President and CEO,AIIPLTECH Pvt Ltd. Founder Director, Past Chairman, USIIC US INDIA IMPORTS COUNCIL,
6 天前Very informative