登录查看更多内容

How Much Does It Cost to Self-Host an LLM? A Comprehensive Cost Analysis

Maxim Yemleninov

DevOps & Cloud | AWS, GCP, Azure, Kubernetes | COO at ITsyndicate

发布日期: 2024年9月11日

?? TL;DR: Hosting the Llama-3 8B model on AWS EKS will cost around $17 per 1 million tokens under full utilization. Using GPT-4 Turbo costs $10 per 1 million prompt tokens and $30 per 1 million sampled tokens. If you choose to buy your own hardware, costs can drop to less than $0.01 per 1 million tokens, but it may take up to 5.5 years to break even at partial utilization.

1. Hosting on AWS EKS

I started by testing the Llama-3 8B model on AWS EKS. Initially, we tried a smaller Nvidia Tesla T4 instance (g4dn.2xlarge) but eventually switched to a more powerful g4dn.16xlarge with 4 Tesla T4 GPUs. This cut response times down to 5-7 seconds.

The cost of running this setup on AWS:

$3.91 per hour or $2,816.64 per month for the g4dn.16xlarge with 4 GPUs.

After running calculations, I found that this setup can process around 157,075,200 tokens per month, which brings the cost to $17.93 per 1 million tokens under full utilization.

Pros of AWS EKS:

Flexibility and scalability: You can dynamically scale resources up or down as needed.
No upfront hardware costs: You only pay for what you use.
Optimization: You have a chance to optimize costs by reserving instances. It's A good option if you load your model on a 24x7 basis.

Cons of AWS EKS:

Relatively high cost: While AWS provides convenience and scalability, the cost of instances is still too high.
Management: You have to manage the working time of your instances to save costs.
Changes: Costs can change from time to time. IIt'sAWS)

2. GPT-4: The Convenience of a Cloud Solution

Now, let's look at GPT-4, particularly with its updated pricing:

GPT-4 Turbo (128k context length) costs $10 per 1 million prompt tokens and $30 per 1 million sampled tokens.
On average, this costs around $20 per 1 million tokens (prompt + sampled combined).

Example Cost Breakdown:

Let's say you send a query with 500,000 prompt tokens, and the model generates 500,000 sampled tokens. In this case:

Five hundred thousand prompt tokens would cost $5.
Five hundred thousand sampled tokens would cost $15.

Total: $20 for 1 million tokens.

Pros of GPT-4:

No infrastructure management: Everything is set up and ready to use.
Ease of use and flexibility: You can start with no upfront investment in hardware.
Optimized infrastructure: You only pay for the tokens you actually process.

Cons of GPT-4:

Pay-as-you-go model: The monthly cost can increase for large-scale or long-term usage.
Less control over infrastructure: You can't fully customize the hardware or environment as you would with self-hosting.
Security compliance: For some companies, keeping models out of the house is unacceptable.

领英推荐

AI Everywhere – Scaling AI In The Cloud With Intel?…

Bernard Marr 9 个月前

AI Everywhere – Scaling AI In The Cloud With Intel?…

Bernard Marr 9 个月前

Kuano Biotech achieves significant cost reductions and…

OVHcloud 1 年前

3. Self-Hosting: Control and Long-Term Savings

For those who want full control over their infrastructure and are looking for long-term savings, self-hosting can be a good option. I explored a scenario where you buy 4 Nvidia Tesla T4 GPUs, which can be found for $700 each. The total upfront cost for the hardware would be:

$2,800 for the GPUs.
$1,000 for the rest of the setup (motherboard, power supply, cooling, etc.).

Total upfront cost: $3,800.

Monthly Operating Costs:

Electricity and maintenance costs are estimated at around $100 per month. If fully utilized, this setup could process 157,075,200 tokens per month, bringing the cost down to $0.00064 per 1 million tokens. That's a pretty good result and looks like the best option. But we have BUTs.

Pros of Self-Hosting:

Low cost for high-volume usage: Once the hardware is paid for, the cost per token becomes incredibly low.
Full control over the infrastructure: You can tailor the environment and model to your specific needs.

Cons of Self-Hosting:

High upfront costs: The initial investment in hardware is significant.
Long break-even period: If the hardware isn't used at full capacity, it could take up to 5.5 years to break even. Usually, companies load their models only on 20% of the working time.
Maintenance responsibilities: You are responsible for managing and maintaining the hardware.
Long recovery: If one of the parts of your server breaks, you will spend quite a long time recovering work on your project. As a result, you will pay more money and extend the project's payback period.
Additional risks: Power outages, natural disasters, etc., can lead to unexpected expenses.

4. When to Choose What?

GPT-4: Quick Start with No Hassles

If you need to get up and running quickly without worrying about managing infrastructure, GPT-4 is a solid choice. It's ideal for businesses with unpredictable workloads, startups, or anyone looking to minimize upfront costs. This option is for those who experiment and need flexibility.

AWS EKS: Flexibility for Short-Term Use

AWS EKS offers flexibility and scalability without requiring you to buy your own hardware. It's useful for companies that need to scale up or down quickly. This option is relevant for those who care about compliance and require flexibility. There are ways to optimize costs in the cloud, but you must prepare yourself to invest more in the project.

Self-Hosting: Long-Term Savings with Full Control

If your project involves consistent, heavy usage, and you're willing to invest in the initial hardware setup, self-hosting can significantly reduce costs in the long run. However, it requires a significant upfront investment, risks, and ongoing maintenance.

Conclusion

Each solution has its own set of advantages and trade-offs, and the best choice depends on the specific needs and goals of your project.

GPT-4 is great for simplicity and flexibility, with no upfront hardware investment.
AWS EKS offers flexibility for scaling up resources without buying hardware.
Self-hosting provides the lowest cost per token after the hardware is paid off, but requires a long-term commitment and significant upfront investment.

Finally, the choice between cloud-based solutions like GPT-4 or AWS EKS and self-hosting depends on your workload, budget, and operational needs. For many projects, starting with a GPT is the way to go, and then migrating to a cloud solution when your businesses need scaling. For others, considering self-hosting can be a balanced approach.

This article provides a neutral comparison of the available options, allowing you to make an informed decision based on your specific requirements. Whether you choose the ease of cloud-based solutions or the long-term savings of self-hosting, understanding the cost dynamics is key to optimizing your approach to LLM hosting.

Bitwage

5 个月

Great breakdown of the costs associated with self-hosting an LLM. It's interesting to see the significant cost savings that can be achieved with self-hosting, although the initial investment may take some time to break even. What are some key factors to consider when deciding between self-hosting and cloud-based options?

1 次回应

查看更多评论

要查看或添加评论，请登录

Maxim Yemleninov的更多文章

The Current Landscape of DevOps and Security in 2024

2024年12月17日

The Current Landscape of DevOps and Security in 2024

Introduction As we approach the end of 2024, the DevOps and cybersecurity landscape is witnessing transformative…
Future 2030: Reflections on the Future and Industry Transformation

2024年9月5日

Future 2030: Reflections on the Future and Industry Transformation

Recently, I came across an essay by Leopold Aschenbrenner, a former OpenAI employee, and I was simultaneously…

3 条评论
How to Get Free AWS Credits for Your Startup

2024年8月26日

How to Get Free AWS Credits for Your Startup

For startups, managing costs while scaling infrastructure is a big challenge. Cloud services, especially those provided…

1 条评论
?? Why do you need to use AWS ECR???

2024年8月12日

?? Why do you need to use AWS ECR???

Amazon ECR has a great stack of values for groups creating a modern microservices-based application, so it is a…
Launching ITSyndicate DevOps Courses: A Journey of Growth

2024年8月6日

Launching ITSyndicate DevOps Courses: A Journey of Growth

In the last two years, our company has initiated on a thrilling venture, introducing our DevOps courses and effectively…
#Why Your Bank Account Loves a Robust CI/CD Pipeline?

2023年6月1日

#Why Your Bank Account Loves a Robust CI/CD Pipeline?

Dear CEOs, CTOs, and Guardians of the Company’s Bank Balance, I want to take a moment to shed some light on a dark…
?? Splitting the Project into Environment: A Game-Changer for Your Development Process ??

2023年4月28日

?? Splitting the Project into Environment: A Game-Changer for Your Development Process ??

Hello, LinkedIn community! Today, I'm thrilled to discuss a game-changing approach to software development that…

See all articles

How Much Does It Cost to Self-Host an LLM? A Comprehensive Cost Analysis

Maxim Yemleninov

DevOps & Cloud | AWS, GCP, Azure, Kubernetes | COO at ITsyndicate

1. Hosting on AWS EKS

Pros of AWS EKS:

Cons of AWS EKS:

2. GPT-4: The Convenience of a Cloud Solution

Example Cost Breakdown:

Pros of GPT-4:

Cons of GPT-4:

领英推荐

3. Self-Hosting: Control and Long-Term Savings

Monthly Operating Costs:

Pros of Self-Hosting:

Cons of Self-Hosting:

4. When to Choose What?

GPT-4: Quick Start with No Hassles

AWS EKS: Flexibility for Short-Term Use

Self-Hosting: Long-Term Savings with Full Control

Conclusion

Maxim Yemleninov的更多文章

社区洞察

其他会员也浏览了

GP Bullhound's weekly review of the latest news in public markets.

When Worlds Collide: The VAST Data Platform Is Now Certified for Cloud Partners in the NVIDIA Partner Network

Top 10 GPU Cloud Providers For 2025

The Rise of Alternative Clouds: Revolutionizing GPU Accessibility for AI

Azure and .NET Digest #3: New Virtual Machines, Subscription Discounts, and Azure Spring Apps Completion

New Beta APIs. Ray On more versions and Cloud Run News.

Technical Notes from AWS re:Invent 2023

VMware: Let's Take This AI Thing Private!

Serverless AI infrastructure

Civo February 2025 Newsletter

1. Hosting on AWS EKS

Pros of AWS EKS:

Cons of AWS EKS:

2. GPT-4: The Convenience of a Cloud Solution

Example Cost Breakdown:

Pros of GPT-4:

Cons of GPT-4:

领英推荐

3. Self-Hosting: Control and Long-Term Savings

Monthly Operating Costs:

Pros of Self-Hosting:

Cons of Self-Hosting:

4. When to Choose What?

GPT-4: Quick Start with No Hassles

AWS EKS: Flexibility for Short-Term Use

Self-Hosting: Long-Term Savings with Full Control

Conclusion

Maxim Yemleninov的更多文章

The Current Landscape of DevOps and Security in 2024

Future 2030: Reflections on the Future and Industry Transformation

How to Get Free AWS Credits for Your Startup

?? Why do you need to use AWS ECR???

Launching ITSyndicate DevOps Courses: A Journey of Growth

#Why Your Bank Account Loves a Robust CI/CD Pipeline?

?? Splitting the Project into Environment: A Game-Changer for Your Development Process ??

社区洞察

其他会员也浏览了

GP Bullhound's weekly review of the latest news in public markets.

When Worlds Collide: The VAST Data Platform Is Now Certified for Cloud Partners in the NVIDIA Partner Network

Top 10 GPU Cloud Providers For 2025

The Rise of Alternative Clouds: Revolutionizing GPU Accessibility for AI

Azure and .NET Digest #3: New Virtual Machines, Subscription Discounts, and Azure Spring Apps Completion

New Beta APIs. Ray On more versions and Cloud Run News.

Technical Notes from AWS re:Invent 2023

VMware: Let's Take This AI Thing Private!

Serverless AI infrastructure

Civo February 2025 Newsletter