登录查看更多内容

Setting up Ollama + OpenWebUI on Docker

Prayank Swaroop

Partner at Accel

发布日期: 2025年3月3日

I just got a 128GB M4 Macbook pro (woo hoo!) - well out of need since my older 16GB M1 pro couldn’t run any of the 70B models. (The challenge with Mac continues to be the lack of CUDA!) Anyways, so the easiest way to try out the models is to serve them using Ollama and OpenWebUI. More recently, using Docker to run the things feels more cleaner - everything has their own containers and I can observe the processes more cleanly. Otherwise, running everything from the command line leads to a lot of switching between windows, and improper logging.

But running these things on Docker take a bit of learning curve. So here are the steps for running these on MacOS.

1. Install Docker Desktop - this is a useful UI for managing docker images and containers. A Docker image is prepackaged collection of software that is built to run a particular software e.g. there is a docker image for Ollama and a docker image for OpenWebUI. So on my docker desktop these will look like this:

First is the Open-Webui image, second is Weaviate (a vector DB), then Ollama, then Postgres and finally one for Jupyter notebook.

2. To get ollama into the Docker desktop goto your command line and run the following command - this has to be run from command line - it will download the image of ollama if you don’t have it - as you can see from previous image its a 7GB image:

docker run -d -v /Users/prayank/ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

This command will take the /Users/prayank/ollama - local directory and map it to the /root/.ollama directory in the container, also it will map the local port of 11434 to the port of the container 11434. (the variables are in host:container mapping format) . You can see the contents of my ollama directory here:

The ollama directory contains all the blobs of the models I downloaded

3. To run the image from the Docker desktop, goto Images and click on the play icon next to the ollama image, then it will show an options screen, you can enter the following:

You can change the port and the volumes.

Once you set these settings and run, these settings will become the initialization parameters of the container from the image.

4. You should be able to access Ollama API server now. Try this from command line:

curl https://localhost:11434/api/tags

Or, you can access the URL in the browser. You will get an output like this :

The screenshot shows the various models installed in my ollama server

5. You can go into the Ollama container in Docker Desktop to run commands for listing, downloading and running models. See below:

Exec tab gives you a CLI to execute commands inside the container

6. You can even see logs of ollama:

7. And also the stats, this is where I see how much RAM / CPU my models are consuming:

8. For Open-WebUI (github), we need to make sure it can access our Ollama server so we need to use the following command to run the container:

docker run -d -p 3000:8080 --restart always --name open-webui --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data ghcr.io/open-webui/open-webui:main

This command tells docker to run the container as a daemon (-d), and ‘--restart always’ will make it startup every time we start docker and the ~/open-webui directory on host is mapped to the /app/backend/data container directory.

The most important part is ‘--add-host=host.docker.internal:host-gateway’, allows containers to resolve ip and connect with services running on the host machine. (documentation)

9. When you are running Open-webui from Docker Desktop, the options should look like this:

10. You should be able to now access open-webui on https://localhost:3000/

11. If things are all well you should be able to login to your Open-Webui interface, now goto User → Settings → Admin Settings

12. You should see your ollama settings:

13. If you goto models you should be able to see all your models in Ollama docker, if things are doing ok.

14. You can now start chatting with your model.

You are good to go!

(TIP: In my case I have forgotten my password couple of times for Open-Webui, you need to to the Docker Desktop → OpenWebui container → Exec. Then goto /app/backend/data and delete the webui.db file and restart the container. It will ask you to recreate the accounts. )

(TIP: One of the things that I need control over is context length in my chats in OpenWebUI, since by default Ollama / Open-webui context length is 2048, you can work around this by setting chat specific context length, click on the “Controls” in the chat in the top right and set the context length.)

I’m setting this up in powers of 2 - 2048, 4096, 8192, 16384 etc

Hitesh Joshi

Founder & CEO @ Ploton

1 周

You can always just get a Mac Studio 512GB RAM for $24k and save the money on APIs spending.?

Jayesh Chouhan

1 周

I’d love to connect and share how RWS Technology Services is helping businesses like yours stay ahead with digital transformation, Custom Web & App Development, AI integration, and intelligent automation. From cloud solutions and DevOps to data analytics and dynamic testing services, we enable organizations to scale efficiently, optimize operations, and drive innovation in a rapidly evolving landscape. If you’re exploring cutting-edge technology solutions, let’s schedule a quick call to discuss how RWS can support your digital journey. When would be a good time for you? ?? Explore our services here: https://shorturl.at/j7SEv Feel free to share your requirements at [email protected], or visit www.rws.com for more details. Looking forward to connecting!

Abhiram R

Senior ML Engineer II at o9 Solutions, Inc.

2 周

Here's my post on the same topic! :) https://everythingpython.substack.com/p/i-want-to-run-llm-models-locally

1 次回应

Farzand Khan

Technologist | Ex - Agriculturalist, Hunter Gatherer, Herbivores Primate

2 周

Apple silicon seems like a great balance of performance per watt & cost both. 128 gigs mac is way cheaper than pulling up 4x24GB 4090 or better, especially if your use case is personal & don't mind ~25-30 tokens/ sec Thanks for sharing!

2 次回应

Utkarsh Saxena

2 周

You should get a Mac mini setup and run a home server with the exo labs stack. Then you can run AI on confidential data with good tok/sec and without heating your Mac too, cos that’s the most scary part ime.

2 次回应

查看更多评论

要查看或添加评论，请登录

Prayank Swaroop的更多文章

Experiments in summarization using LLMs

2025年2月21日

Experiments in summarization using LLMs

For many months I have been trying to be on the top of the AI news, it is damn hard ! So much is happening at such fast…

29 条评论
India needs to find its Moore’s Law of growth by investing in AI innovation

2025年2月13日

India needs to find its Moore’s Law of growth by investing in AI innovation

On the first day of his presidential tenure, Donald J Trump Jr, the 47th President of the United States, unveiled a…

23 条评论
Muhammad Ali wrestled with alligators! - who has your startup fought with?

2016年10月6日

Muhammad Ali wrestled with alligators! - who has your startup fought with?

Would you call "Muhammad Ali is the greatest!" or "Nadal is the best!" - if they had no competition? To be the best you…

1 条评论
Building a startup?—?what to keep in mind?

2016年5月4日

Building a startup?—?what to keep in mind?

I’ve been a VC at Accel Partners, India?—?now for 4.5 years, and have seen some 70+ companies funded by my team.

67 条评论

You are good to go!

Prayank Swaroop的更多文章

Experiments in summarization using LLMs

India needs to find its Moore’s Law of growth by investing in AI innovation

Muhammad Ali wrestled with alligators! - who has your startup fought with?

Building a startup?—?what to keep in mind?

社区洞察