openEuler × DeepSeek 3: Containerized vLLM Deployment Guide
Welcome back to our openEuler × DeepSeek series! ??
In our previous blog ??, we explored deploying vLLM with DeepSeek on openEuler using GPUs/CPUs. While effective, the setup process was relatively complex. Today, we're introducing a much simpler method that enables you to quickly deploy DeepSeek. The entire process consists of three straightforward steps:
?System Requirements
Before deployment, ensure your hardware meets the necessary specifications.
CPU Inference Requirements:
GPU Inference Requirements:
Deploying vLLM × DeepSeek on CPUs
To deploy and run inference using Kunpeng CPUs, follow these steps:
docker pull hub.oepkgs.net/neocopilot/deepseek_vllm:openeEuler2203-lts-sp4_cpu
docker run --name deepseek_kunpeng_cpu -it hub.oepkgs.net/neocopilot/deepseek_vllm:openeEuler2203-lts-sp4_cpu bash
vllm serve /home/deepseek/model/DeepSeek-R1-Distill-Qwen-7B/ --max_model_len 32768 &
Explanation of Key Parameters:
When you see the following output, it means your deployment is completed.
Deploying vLLM × DeepSeek on GPUs
To run inference on an NVIDIA GPU, follow these steps:
docker pull hub.oepkgs.net/neocopilot/deepseek_vllm:openeEuler2203-lts-sp4_gpu
docker run --gpus all --name deepseek_kunpeng_gpu -it 7633dbb045f3 bash
vllm serve /home/deepseek/model/DeepSeek-R1-Distill-Qwen-7B/ --tensor-parallel-size 8 --max_model_len 32768 &
Explanation of Key Parameters:
Testing Your Deployment
?? To make sure everything is running smoothly, try asking your model to tell you something about openEuler OS. Test your deployment with this simple curl command:
curl -X POST "https://localhost:8080/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1",
"messages": [{"role": "user", "content": "Tell me about openEuler OS."}]
}'
If everything is set up correctly, you'll receive a response with some cool insights about openEuler! :D
?
What's Next?
With this streamlined approach, deploying vLLM × DeepSeek on openEuler with CPUs or GPUs has never been easier. By using containerized deployment, you can set up and run AI inference in just a few minutes, enabling efficient scaling across different hardware architectures.
?? Stay tuned for our next openEuler × DeepSeek blog as we continue to explore AI deployment optimizations on openEuler!
Got questions or feedback? Feel free to reach out to us via the openEuler Intelligent SIG ??. Let's continue to innovate and build the future of AI together!
Quick Links for More openEuler × DeepSeek Blogs