登录查看更多内容

Running compact AI model locally: How I Deployed the Phi3 on a Raspberry Pi 5 Using FastAPI

Harri J Salomaa

Executive Advisor | Entrepreneur | Applied AI & ML Specialist | Non-Executive Director

发布日期: 2024年9月9日

As a coding hobbyist and an executive with a deep interest in AI, I often experiment with technology that bridges the gap between professional development and hands-on fun. When Amazon delivered my new 8GB Raspberry Pi 5 this Sunday at 7am, I couldn't wait to see how far I could push this tiny powerhouse.

In just a few hours, I had the Phi3 model running locally and served via a FastAPI server. Quite fast, right? This post is a recount of how I unboxed the Raspberry Pi, tackled memory constraints, and turned it into a functional AI demo—running on hardware that most wouldn’t expect for such a task.

It wasn’t totally smooth sailing, but by the end, I had a fully functional AI model running on hardware most people wouldn’t expect. Here’s how I made it work.

Taking the Raspberry Pi 5 Out of the Box

Upon receiving my brand new CanaKit Raspberry Pi 5 Starter Kit PRO (Turbine Black, 128GB Edition with 8GB RAM), I quickly realized that a bit of assembly was required before getting the Pi up and running. The kit included not only the Raspberry Pi 5 board but also a heat sink, cooling fan, case, power supply, and a microSD card preloaded with the operating system. The setup was straightforward, but it did involve carefully assembling the heat sink and fan to ensure proper cooling—a crucial step considering the Pi would be running AI models that demand efficient thermal management.

Once the hardware was fully assembled and everything connected, including the HDMI and USB cables, I booted up the Raspberry Pi with the standard Raspbian OS. It was a pleasant surprise to discover that Visual Studio Code is now available for Raspberry Pi OS. As someone who regularly uses VS Code for development, having access to my go-to IDE on the Pi made the entire setup process even more seamless and famili

Installing Ollama to Run the Phi3 Model

Phi-3 is a family of lightweight 3B (Mini) and 14B (Medium) state-of-the-art open models by Microsoft.

Next, I needed a tool to actually run the Phi3 model. For this, I turned to Ollama, a great tool that simplifies local inference for AI models.

Once Ollama was installed, it was time to see if the Phi3 model could run smoothly on the Raspberry Pi 5. I knew that 8GB of RAM wasn’t much for an AI model, so I wasn’t expecting blazing-fast performance. But hey, this was about pushing boundaries, right?

It's good to note that when you run the command ollama run phi3 for the first time, it actually downloads the Phi3 model onto the device. This step took a bit of time.

Sure enough, the model started up! However, I quickly realized that I’d need a larger swap file to make up for the memory limitations.

Making the Model Accessible via FastAPI

Next, I wanted to make the model accessible via an API, so I turned to FastAPI, which is both easy to use and highly performant. Installing FastAPI on the Raspberry Pi was straightforward.

领英推荐

Build ANYTHING With Claude 3.7 Sonnet: One-Click AI…

Julian Goldie 1 个月前

Behind Fuji X Studio: Lessons in Web development, AI…

Evgeny Khoroshilov 4 个月前

CST Studio Suite: Domain Decomposition solver vs. Time…

Katerina G. 2 年前

Running the FastAPI App with Uvicorn

After setting up the FastAPI framework, I needed a way to run the app. This is where Uvicorn comes into play. Uvicorn is a lightweight, high-performance Asynchronous Server Gateway Interface. In simple terms, it's the engine that takes FastAPI app and turns it into a running web service, allowing your API to handle requests and return responses.

Once FastAPI and Uvicorn were installed, I wrote a simple Python server to handle requests and pass prompts to the Phi3 model.

The Moment of Truth: Testing the API

To make the API accessible, I started the FastAPI server using Uvicorn. This made the server live on my local network, and I could access it from any device connected to the network.

The real test came when I sent a prompt to the model via curl:

curl -X 'POST' \
  'https://<raspberry-pi-ip>:8000/chat/phi3' \
  -H 'Content-Type: application/json' \
  -d '{"prompt": "How fast is a cheetah?"}'

And it worked! While the response was slow, the Phi3 model eventually provided a concise and accurate answer:

“The cheetah can run at speeds up to 70-75 miles per hour.”

Success!

Is It Worth It?

Running an AI model like Phi3 on a Raspberry Pi 5 was a fun experiment, but it wasn’t without challenges. The swap file allowed me to overcome memory limitations, but the speed of inference reminded me that this hardware, while impressive, still has its limits when it comes to high-end AI models.

That said, it’s fascinating to see how much you can accomplish with a small, affordable device like the Raspberry Pi. This experiment showed me that you don’t need a massive cloud infrastructure to explore AI models—sometimes, all you need is a Pi and a bit of persistence.

I hope this post inspires others to experiment with AI models on resource-constrained devices. Feel free to reach out if you have any questions or connect with me on LinkedIn to discuss further!

Best regards

Harri, AI enthusiast.

Sharan Kantikar

Head of India Delivery Center -Telecom

6 个月

This is very motivating Harri

Carl Jonas Adebring

Head of Solutions and Business Development

6 个月

Once those devices can get off the CPU cores and onto the GPU/NPU (that is not hidden behind unsupported drivers) there will be more than enough processing power for most interference. Once a chip-vendor realize they will not dominate the market of of €5K Microsoft AI-laptops or €2000-€15 000 datacenter chips or realize that you can put 32GB or 64GB (or 72GB) of semi-fast HBM memory with additional slow memory around a 10 core device .....then the tooling and opensource ecosystem is ready for some really interesting deployments. Those Core Ultra 200V or Strix Halo chips with 50-200 FLOPS combined with some "faster than 2 channel"-memory should be plenty fast.... but I guess memory-availability and TCSM capacity is holding back the volumes for now....

查看更多评论

要查看或添加评论，请登录

Harri J Salomaa的更多文章

Is this the future of coding?

2025年3月23日

Is this the future of coding?

I've closely observed how AI is transforming the field. My everyday work includes developing iOS applications using…

3 条评论
From Metadata to Meaning: How AI Agents and RAG Bring Archives to Life

2025年3月4日

From Metadata to Meaning: How AI Agents and RAG Bring Archives to Life

Metadata is data that describes key characteristics of files, such as author, date, or subject matter topics. By…

7 条评论
AI in 2025: What’s Real and What’s Next

2025年1月30日

AI in 2025: What’s Real and What’s Next

I have collected this information based on my knowledge and understanding of the current state of AI as of January…

5 条评论
Redefining Digital Experiences with AI Agents

2025年1月29日

Redefining Digital Experiences with AI Agents

As some of you know, we have recently launched Aitoware Oy (LLC), a family-run consulting agency focused on helping…

1 条评论
Turning Complexity into Opportunity

2024年12月13日

Turning Complexity into Opportunity

Transforming an organization, whether by launching a new product, entering new markets, or shifting to a new operating…

9 条评论
Modern Leadership in the AI Era

2024年11月19日

Modern Leadership in the AI Era

This is a rewrite of my 2016 LinkedIn post, Data Science Leadership: Food for Thought. Creating a data-driven culture…
From Control to Trust: The Shift Leaders Must Make

2024年11月19日

From Control to Trust: The Shift Leaders Must Make

Hybrid work isn't a passing phase; it's reshaping how we do business. Yet many leaders aren't keeping up.

7 条评论
Machine Learning Inference at the Edge

2024年11月18日

Machine Learning Inference at the Edge

Deploying machine learning models for inference, at the edge of the network is essential for businesses aiming to…
Thriving in Tech: Exploring Career Paths in Consulting Firms and Product Companies

2024年11月13日

Thriving in Tech: Exploring Career Paths in Consulting Firms and Product Companies

With over 10 years in technology consulting and more than 25 years in product companies, I’ve seen firsthand how each…

5 条评论
Transformative Industrial Automation: Integrating IIoT, AI, Digital Twins, and Generative AI for Smarter Operations

2024年11月4日

Transformative Industrial Automation: Integrating IIoT, AI, Digital Twins, and Generative AI for Smarter Operations

In the evolving industrial landscape, the integration of IIoT, ML, digital twins, and GenAI is transforming traditional…

See all articles

Running compact AI model locally: How I Deployed the Phi3 on a Raspberry Pi 5 Using FastAPI

Harri J Salomaa

Executive Advisor | Entrepreneur | Applied AI & ML Specialist | Non-Executive Director

领英推荐

Harri J Salomaa的更多文章

社区洞察

其他会员也浏览了

Arcade - Create a Unique Asset ID

Voxy tool development so far ...

Microsoft Copilot Studio v3 deep dive

TGN: close study of digital environments, a model INTERACTIONS format standard

Key Concepts of Copilot Studio

Our Journey in Spatial Computing

Copilot Studio - Architecting the right solution for the "Hey, how can I help you?" conversation starter

Making OPCUA Fun with LEGO: Turning Complex Technology into a Playful Assembly Station

WebGPU is faster because of Mindset.

An AI-charged blast from the past

领英推荐

Harri J Salomaa的更多文章

Is this the future of coding?

From Metadata to Meaning: How AI Agents and RAG Bring Archives to Life

AI in 2025: What’s Real and What’s Next

Redefining Digital Experiences with AI Agents

Turning Complexity into Opportunity

Modern Leadership in the AI Era

From Control to Trust: The Shift Leaders Must Make

Machine Learning Inference at the Edge

Thriving in Tech: Exploring Career Paths in Consulting Firms and Product Companies

Transformative Industrial Automation: Integrating IIoT, AI, Digital Twins, and Generative AI for Smarter Operations

社区洞察

其他会员也浏览了

Arcade - Create a Unique Asset ID

Voxy tool development so far ...

Microsoft Copilot Studio v3 deep dive

TGN: close study of digital environments, a model INTERACTIONS format standard

Key Concepts of Copilot Studio

Our Journey in Spatial Computing

Copilot Studio - Architecting the right solution for the "Hey, how can I help you?" conversation starter

Making OPCUA Fun with LEGO: Turning Complex Technology into a Playful Assembly Station

WebGPU is faster because of Mindset.

An AI-charged blast from the past