openEuler × DeepSeek 3: Containerized vLLM Deployment Guide

openEuler × DeepSeek 3: Containerized vLLM Deployment Guide

Welcome back to our openEuler × DeepSeek series! ??

In our previous blog ??, we explored deploying vLLM with DeepSeek on openEuler using GPUs/CPUs. While effective, the setup process was relatively complex. Today, we're introducing a much simpler method that enables you to quickly deploy DeepSeek. The entire process consists of three straightforward steps:

  1. Prepare the environment – Use a Kunpeng server or a server with NVIDIA GPUs.
  2. Pull the image & start the container – Download the image and start a container with a single command.
  3. Start DeepSeek – Access the container and initiate your AI inference journey.


?System Requirements

Before deployment, ensure your hardware meets the necessary specifications.

CPU Inference Requirements:

GPU Inference Requirements:


Deploying vLLM × DeepSeek on CPUs

To deploy and run inference using Kunpeng CPUs, follow these steps:

  • Pull the container image:

docker pull hub.oepkgs.net/neocopilot/deepseek_vllm:openeEuler2203-lts-sp4_cpu        

  • Create and start a container:

docker run --name deepseek_kunpeng_cpu -it hub.oepkgs.net/neocopilot/deepseek_vllm:openeEuler2203-lts-sp4_cpu bash        

  • Once deployed, you can interact with DeepSeek through the command line:

vllm serve /home/deepseek/model/DeepSeek-R1-Distill-Qwen-7B/ --max_model_len 32768 &        

Explanation of Key Parameters:

  • /home/deepseek/model/DeepSeek-R1-Distill-Qwen-7B/ specifies the path to the preloaded model.
  • --max_model_len 32768 sets the maximum context length. Inputs exceeding this length will be truncated.

When you see the following output, it means your deployment is completed.


Deploying vLLM × DeepSeek on GPUs

To run inference on an NVIDIA GPU, follow these steps:

  • Pull the container image:

docker pull hub.oepkgs.net/neocopilot/deepseek_vllm:openeEuler2203-lts-sp4_gpu        

  • Create and start a GPU-enabled container:

docker run --gpus all --name deepseek_kunpeng_gpu -it 7633dbb045f3 bash        

  • Launch the vLLM Model Service:

vllm serve /home/deepseek/model/DeepSeek-R1-Distill-Qwen-7B/ --tensor-parallel-size 8 --max_model_len 32768 &        

Explanation of Key Parameters:

  • --tensor-parallel-size 8 enables tensor parallelism across 8 GPUs. Adjust this based on your available hardware.


Testing Your Deployment

?? To make sure everything is running smoothly, try asking your model to tell you something about openEuler OS. Test your deployment with this simple curl command:

curl -X POST "https://localhost:8080/v1/chat/completions" \

-H "Content-Type: application/json" \

-d '{

"model": "deepseek-r1",

"messages": [{"role": "user", "content": "Tell me about openEuler OS."}]

}'

If everything is set up correctly, you'll receive a response with some cool insights about openEuler! :D        

?

What's Next?

With this streamlined approach, deploying vLLM × DeepSeek on openEuler with CPUs or GPUs has never been easier. By using containerized deployment, you can set up and run AI inference in just a few minutes, enabling efficient scaling across different hardware architectures.

?? Stay tuned for our next openEuler × DeepSeek blog as we continue to explore AI deployment optimizations on openEuler!

Got questions or feedback? Feel free to reach out to us via the openEuler Intelligent SIG ??. Let's continue to innovate and build the future of AI together!


Quick Links for More openEuler × DeepSeek Blogs

?? DeepSeek-R1 671B Distributed Training Achieved on openEuler 24.03

?? openEuler × DeepSeek 1: Quick Deployment of DeepSeek-R1 on openEuler 24.03 LTS

?? openEuler × DeepSeek 2: vLLM Deployment Guide (CPU + GPU)


要查看或添加评论,请登录

openEuler的更多文章