How to Run Stable Diffusion 3X Faster for 5X Less

How to Run Stable Diffusion 3X Faster for 5X Less

Pre-Accelerated Stable Diffusion 2.1 Available in OctoML Compute Service on AWS

For a team of AI fanatics like us, it’s been a thrill to see the AI market take off over the last 12 months. But we continue to hear that the long term compute costs of production deployment threatens the economic viability of any AI offering. And that’s if the developer or enterprise can even get access to the AI compute they want to create their app/service in the first place.?

At OctoML, we are on a mission to deliver affordable AI compute services for those who want control over the business they are building. That’s why we built a new compute service, available now in early access.

It delivers AI infrastructure and advanced machine learning optimization techniques that you can only find in large scale AI services like OpenAI, but gives you the power to control your own API, choose your own models and? work within your AI budget.?

Early access users can try our super fast Stable Diffusion 2.1 model (with no change to the accuracy/performance of the model) on the market, without needing to train or retrain the model.

Here is some early data that demonstrates the performance gains:

No alt text provided for this image
OctoML Stable Diffusion 2.1 runs super fast on abundant A10Gs.

Stable Diffusion Runs Blazing Fast on A10Gs. Why Are You Waiting on A100s?

We are hearing time and time again from AI developers that GPU availability is hampering their ability to create their new AI-powered app. When we double-click on these conversations, we are finding that organizations are taking it on faith that only newer NVIDIA hardware i.e. A100s deliver the price/performance they need to run their models at scale. That’s why we are excited to share that A10Gs can deliver the right user experience–1.35 seconds to generate an image–that any mainstream Stable Diffusion powered-app needs. And most importantly A10Gs are available everywhere and aren’t being rationed.?

Not only is OctoML’s optimized version of Stable Diffusion 2.1 blazing fast, it actually outperforms by 30% the best in class do-it-yourself configuration available to sophisticated users (those who have experience in machine learning engineering).

No alt text provided for this image
The OctoML Compute Service in early access mode

How does OctoML stack up against hosted services?

When running AI in production, hosted services like HuggingFace (Inference Endpoints) are popular options because they’re easy to use and reduce the headaches of manual deployment and infrastructure management. Now that we’ve grounded you in the fact that you don’t need the latest/greatest NVIDIA hardware to run your models, let’s compare to HuggingFace, which is the most popular distribution source for Stable Diffusion.?

Whereas the HuggingFace version running on their Inference Endpoints–that infrastructure has been designed for and optimized for the ML researcher community–has not been developed to deliver best-in-class compute services.?

Run Stable Diffusion 2.1 3x faster for 5x less

No alt text provided for this image
Generate the image above 2.6X faster in OctoML
No alt text provided for this image
Generate the image above 3.6X faster in OctoML

As a proof point of that we highlight that our Stable Diffusion model hosted in our compute service has a speedup range between 2X on a lower end image quality (512x512, 30 steps) to 3X better at the very high image quality (768x768, 150 steps).??

Check out the full post with more data and details on how OctoML delivers faster, more efficient AI compute.

Sign up to for early access to the OctoML Compute Service and start building with the fastest stable diffusion available.

要查看或添加评论,请登录

OctoAI (Acquired by NVIDIA)的更多文章

社区洞察

其他会员也浏览了