Creating an MCP Agent with Local/LAN DeepSeek Service for Browser Control
In this article, we explore how to build an AI-driven Model Context Protocol (MCP) Agent that can help user to operate a web browser to complete tasks or generate network traffic based on human language instructions. This is achieved using the powerful browser-use library in combination with a Local/LAN configured DeepSeek LLM module service.
We will Introduce step by step through setting up the Ollama DeepSeek-R1 Service in a local LAN environment, link it with the MCP Agent, and integrate with browser automation. Since the DeepSeek model runs locally, you won't have to worry about the "deepseek service busy" issues and the token fees. Additionally, this setup allows for testing various models or including customized fine-tuned DeepSeek versions to compare the performance of different models.
This article will cover the following sections:
# Created: 2025/03/15
# version: v_0.0.1
# Copyright: Copyright (c) 2025 LiuYuancheng
# License: MIT License
MCP Agent Task Scenarios
To demonstrate the capabilities of the MCP Agent, we evaluate its performance through two task scenarios. In each case, the agent receives a human language input string and autonomously interacts with a web browser to retrieve and summarize relevant information. The final output is a concise text summary file.
Scenario 1 : General Information Search & Summarization
In this scenario, the agent performs a web search, gathers relevant content, and generates a structured summary.
Human Language Input string:
Use google search 'deepseek', go to its web and summarize the web contents in 300 words.
Agent Operation: The agent initiates a search query, extracts key details from multiple sources, and compiles a summary. The demo video is shown below :
Scenario 2: Targeted Web Content Extraction & Summarization
Here, the agent is tasked with visiting a specific website or project repository, extracting critical details (e.g., a README file), and summarizing the content.
Human Language Input string :
Find the project “Deepseek_Local_LATA,” open the README file, and summarize the project in 100 words.
Agent Operation: The agent locates the repository, extracts the README content, and generates a concise summary. The demo video is shown below :
These scenarios showcase how the MCP Agent can autonomously navigate the web, retrieve relevant information, and provide structured summaries—all powered by a local/LAN DeepSeek AI processing for efficiency and control.
MCP Agent Operation Detailed Design
Before diving into the agent’s detailed design, let's first introduce some background knowledge about the Model Context Protocol (MCP). MCP is an open standard that enables secure and standardized connections between AI assistants and various data sources. It allows Large Language Models (LLMs) to access tools and datasets directly, improving their ability to retrieve and process information.
Background: Model Context Protocol (MCP)
In an MCP architecture, MCP Servers act as lightweight programs that expose specific functionalities through the standardized protocol. The MCP Service serves as an intermediary layer, bridging applications or tools with the LLM service. These services include various MCP Agents, each providing tools, resources, and prompt templates that enable dynamic interactions between AI systems and clients.
For this project, we develop a simple MCP Agent that interacts with a web browser. The workflow of this agent is shown below:
By managing resources with URI-based access patterns and supporting capability negotiation, MCP Servers play a crucial role in extending the functionalities of AI systems, allowing them to perform actions or retrieve information securely and efficiently.
Agent Workflow Overview
The agent workflow is very simple as shown below and it operates in three primary steps:
Step 1 : Add Scenario Prompt & Generate a To-Do List
The agent begins by modifying the user’s input request into a structured To-Do list. This step ensures that the agent understands how to systematically execute the requested task. Below is an example prompt we append before the user's request :
Prompt: I am an AI agent program can simulate human actions as a beginner user to use browser, please create the TO-DO steps need to be simulated for the task:
When the user input string Google search DeepSeek and summarize the product features in 300 words, the request agent send to AI will be modified to below contents :
# Input to deepseekR1:8b model
Prompt: I am an AI agent program can simulate human actions as a beginner user to use browser, please create the TO-DO steps need to be simulated for the task: Using Google search to find DeepSeek and summarize its product features in 300 words.
The output should exactly follow the JSON format below:
{
"initURL":"<First URL for browser to open>",
"tasksList":[
"1. <Step 1 - Perform an action>",
"2. <Step 2 - Process results based on previous step>",
"3. <Step 3 - Perform next action>",
"4. <Step 4 - Process results based on previous step>",
...
]
}
Then we send the request to the Deepseek to get below ToDo list:
# Output from deepseekR1:8b model
initURL = "https://www.google.com/"
tasksList = [
"1. Type in for 'deepseek' in the Google page's search bar",
"2. Click the first search result",
"3. Select the first deepseek link in the google search result page"
"4. Base on the web content in the link, summarize the contents in 300 words"
]
Step 2 : Agent Interact with the Hosts Browser
Once the To-Do list is generated, the agent executes the steps autonomously using the Playwright library and browser-use interaction module:
Each step is evaluated against real-time webpage analysis to ensure accurate execution. If a step cannot be completed, the agent attempts corrective measures or logs an error.
Reference:
Step3: Generate the Result Summary
Once all tasks in the To-Do list are completed, the extracted content is sent to DeepSeek LLM for summarization. The Final Verification Prompt Sent to LLM:
The result content is: <Extracted text>.
Can this content fulfill the user's goal to Google search DeepSeek and summarize its product features in 300 words?
The final output consists of:
This agent can be further expanded to support different cloud or local AI models, customized browsing automation, and multi-step reasoning tasks based on user-defined scenarios.
Environment Introduction and Setup
Environment Over View
In this test environment setup, we configured a Local Area Network (LAN) with at least two types of machines:
The AI agent will be responsible for controlling browsers on the laptops, and the network topology is illustrated below:
To establish this environment, we need to do the following setup and the configuration is shown below :
Setting Up the DeepSeek Service on the GPU Node
The GPU server needs to run DeepSeek-R using Ollama and expose the service connection to LAN nodes.
Step 1 : Install Ollama LLM Service
Download Ollama from the official site: https://ollama.com/download and select the appropriate package for your OS and install it.
Step 2 Download and Run DeepSeek-R
Since our GPU server has an RTX 3060 (12GB), we use the 8B model. Run the following command to download and launch DeepSeek-R1:8b model :
ollama run deepseek-r1:8b
To check the biggest deepseek model your GPU node can run, you can refer to this link: https://www.dhirubhai.net/pulse/deploying-deepseek-r1-locally-custom-rag-knowledge-data-yuancheng-liu-uzxwc
Step 3: Expose DeepSeek Service to LAN
Setting environment variables on Mac
If Ollama is run as a macOS application, environment variables should be set using launchctl:
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
Setting environment variables on Linux
If Ollama is run as a systemd service, environment variables should be set using systemctl:
Setting environment variables on Windows
On Windows, Ollama inherits your user and system environment variables.
Configure the Agent on Operation Node
Node the GPU node is ready, we need to configure each operation node laptop. The operation node requires: A modern web browser , Python 3.11 or higher, Required and Python libraries installed.
Step 1 : Install Required Python Libraries
Install Ollama and LangChain support:
pip install ollama
pip install langchain-ollama
Install the browser-use automation library:
pip install browser-use
Install Playwright for browser control:
playwright install
Step 2: Configure Agent Parameters
Each laptop runs a browser control agent that interacts with DeepSeek, download the file dsBrowserCtrlAgent.py , dsBrowserCtrlConfig.txt and ConfigLoader.py to the operation node.
Set up the deepseek service configuration dsBrowserCtrlConfig.txt based on the GPU node and the model running:
# This is the config file template for the module <dsBrowserCtrlAgent.py>
# Setup the parameters with below format (every line follow <key>:<val> format, the
# key can not be changed):
#-----------------------------------------------------------------------------
# GPU node IP address which provide the ollama service.
OLLAMA_HOST_IP:192.168.50.12
#-----------------------------------------------------------------------------
# The deepseek model name we want to use.
DP_MODEL_NAME:deepseek-r1:8b
#-----------------------------------------------------------------------------
# The deepseek CTX number name we want to use, # for deepseek-r1:7b and 8b use
# 6000, for higher model use 32000
NUM_CTX:6000
Then add the request to the USER_REQUEST parameter as shown below then start:
#-----------------------------------------------------------------------------
# the user request string such as Use google search 'deepseek', go to its web
# and summarize the web contents in 100 words
USER_REQUEST:Use google search 'deepseek', go to its web and summarize the web contents in 100 words
Step 3: Run the MCP Agent
Then execute the agent with below command and collect the result.
python dsBrowserCtrlAgent.py
The agent will Connect to the GPU server which running DeepSeek-R1, perform browser interactions (e.g., searching, clicking, summarizing) and return the processed output based on the request.
Test Result Summary and Conclusion
During the evaluation, we tested four different DeepSeek model sizes to analyze their impact on task completion and execution speed. The test involved executing a series of browser-based tasks, including searching for a specific project, selecting relevant links, reading content, and summarizing information.
Below is a tasks result of the scenario 2:
During the test, there are some observation:
Conclusion :
The test results highlight that model size significantly impacts task success rate and execution efficiency. The 8B model is the minimum viable option for basic browsing tasks, while the bigger than 14B model is recommended for handling more complex, multi-step processes with higher accuracy and reliability.
This design ensures a structured, automated, and accurate approach to executing browser-based tasks using an MCP-powered AI agent. By integrating local DeepSeek LLM processing, users benefit from:
For users seeking higher accuracy and robust execution, investing in larger DeepSeek models (14B and above) is recommended to maximize performance and reliability in AI-driven browser task automation.
Project GitHub Repo Link: https://github.com/LiuYuancheng/Deepseek_Local_LATA
If you are interest about other Deepseek-R1 local deployment and test, please refer to :
Deepseek test 01 : https://www.dhirubhai.net/pulse/deploying-deepseek-r1-locally-custom-rag-knowledge-data-yuancheng-liu-uzxwc
Deepseek test 02 : https://www.dhirubhai.net/pulse/use-simple-web-wrapper-share-local-deep-seek-r1-model-yuancheng-liu-n7p6c
Thanks for spending time to check the article detail, if you have any question and suggestion or find any program bug, please feel free to message me. Many thanks if you can give some comments and share any of the improvement advice so we can make our work better ~
Senior Engineer, Cybersecurity
20 小时前Thanks for sharing! Very informative article
Unemployed
1 天前Thanks for sharing