BentoML

软件开发

San Francisco，California 8,674 位关注者

Build and Scale Compound AI Systems

关注

查看全部 19 位员工

关于我们

Compute orchestration platform for rapid and reliable GenAI adoption, from model inference to advanced AI applications.

网站: https://www.bentoml.com
BentoML的外部链接
所属行业: 软件开发
规模: 11-50 人
总部: San Francisco，California
类型: 私人持股
创立: 2019
领域: Model Serving、Model Inference、Inference Platform、Compound AI Systems、Multimodality、AI Inference、LLM Inference、LLM Applications、MLOps和LLMOps

产品

BentoML

BentoML is a developer platform for building and scaling compound AI systems. It provides fast and scalable infrastructure for AI apps and model inference and streamlines the workflow from prototype to production.

地点

主要

650 California St

6 fl

US，California，San Francisco，94108

获取路线

BentoML员工

查看全部员工

动态

BentoML

8,674 位关注者
19 小时前
举报此动态
This is why Kubernetes isn't enough for scaling LLM inference workload ??
Chaoyu Yang

Founder & CEO of ?? BentoML, Compute orchestration platform for rapid and reliable GenAI adoption.
1 天前

How to autoscale LLM inference workload properly? Autoscaling is critical for online LLM inference workload, as it helps ???????????? ?????? ?????????????? ?????????????? ????????-???????????????????????? ???????? ???????? ?????? ??????'?? ????????. But implementing autoscaling for LLM inference is not as straightforward as you may think. Traditional container orchestration platform (like Kubernetes) that only have access to resource utilization and simple request metrics don't cut: ? ? ?????? ???????????? (DCGM_FI_DEV_FB_USED): Amount of GPU memory used. Doesn't apply for workload that preallocate GPU memory (e.g. vLLM KV cache). ? ? ?????? ?????????????????????? (DCGM_FI_DEV_GPU_UTIL): Amount of time GPU is active. Does not measure how much effective compute is being done (e.g. Batch size). ? ? ? ?????? (Query Per Second): A simple request based scaling metrics. Not applicable to LLM workloads due to processing time per request varies depending on input and output token length, or cache hit. ? ? ?????????? ????????: Number of requests pending in external queue before they're processed. This is easy to implement for workloads without batching. For LLM workloads with continuous batching, additional guardrails on concurrency control is required. What's proven to work effectively, is ? ??????????????????????-?????????? ??????????????????????, which represent the # of active requests being processed (See image below). It accurately reflects system load, scales precisely, and is easy to configure based on batch size. The only downside - it requires specialized infrastructure and serving stack, which can be complex and time consuming to build and optimize: ? Workload-specific metrics is required for gaining visibility into batch size, queue size, inference latency, and request concurrency. AI teams should ship AI-specific containers that includes those metrics, and pair it with infrastructure that leverage those metrics for scaling. ? Cold start acceleration is necessary for efficient scaling. Pulling large container image and loading large models can drastically slow down the scaling up process, leading to failed requests or slow responses. ? Scaling to zero: reduce cost by scaling down to zero replica for inactive models to free up compute resources. And spin up the model only when a request is received. At BentoML, we've optimized every layer in the inference and serving stack, to ensure efficient scaling of private LLM inference workload, while allowing developers to easily fine-tune the scaling behaviors tailored to their specific needs. Check out our team's learning in scaling AI inference at BentoML by Sean Sheng: https://lnkd.in/gZvNgE3i BentoML documentation on Concurrency and autoscaling: https://lnkd.in/gmVpgW93 #LLM #Autoscaling #Inference #OpenLLM
赞评论分享
BentoML

8,674 位关注者
22 小时前
举报此动态
?? Today’s the day! Happening this evening from 5:30 PM - 8:30 PM PDT in San Francisco! There’s still time to register! See you there!
BentoML

8,674 位关注者
1 周

Join us for SF #TechWeek: Data & AI Edition on October 8th from 5:30 PM - 8:30 PM PDT! We will co-host this event with Datastrato and Zilliz, featuring key insights on AI and data tech: ? Multimodal Search with Open-Source Tools, Stefan Webb, DevRel, Zilliz ? A Guide to Compound AI Systems, Chaoyu Yang, Founder & CEO, BentoML ? Timeplus - One Single Binary to Tackle Streaming and Historical Analytics, Ken Chen, Co-Founder & Chief Architect, Timeplus Don't miss out on the chance to learn, network, and dive into the latest advancements in AI and data technology! ???Register now to secure your spot! See comments for the link ?? #AI #DataScience #BentoML #Datastrato #Zilliz #Timeplus #OpenSource
赞评论分享
BentoML

8,674 位关注者
3 天前
举报此动态
? Just 3 days to go! On October 8th from 5:30 PM - 8:30 PM PDT in SF, you can expect an evening packed with cutting-edge AI and data insights! Don’t miss out and register now to secure your spot!
BentoML

8,674 位关注者
1 周

Join us for SF #TechWeek: Data & AI Edition on October 8th from 5:30 PM - 8:30 PM PDT! We will co-host this event with Datastrato and Zilliz, featuring key insights on AI and data tech: ? Multimodal Search with Open-Source Tools, Stefan Webb, DevRel, Zilliz ? A Guide to Compound AI Systems, Chaoyu Yang, Founder & CEO, BentoML ? Timeplus - One Single Binary to Tackle Streaming and Historical Analytics, Ken Chen, Co-Founder & Chief Architect, Timeplus Don't miss out on the chance to learn, network, and dive into the latest advancements in AI and data technology! ???Register now to secure your spot! See comments for the link ?? #AI #DataScience #BentoML #Datastrato #Zilliz #Timeplus #OpenSource
赞评论分享
BentoML

8,674 位关注者
6 天前
举报此动态
?? Happening today at 9AM, PT! Is LLM deployment draining your budget and patience? Ready to tackle the rising costs, performance issues, and data security headaches? Join us TODAY with BentoML CEO Chaoyu Yang! ??

BentoML

8,674 位关注者
2 周

?? Join the first LIVE! #AGIBuildersMeetup on 10/02, 9 AM, PT ? Self-hosting #LLM vs #Serverless API ?? Uncover hidden costs & Optimize performance ?? Live Q&A with?BentoML CEO Chaoyu Yang ?? Can't attend? Register to get the recording ?? Spread the love with likes, shares, and invites #AI #GenAI #ML

此处无法显示此内容

在领英 APP 中访问此内容等

赞评论分享
BentoML

8,674 位关注者
1 周
举报此动态
Join us for SF #TechWeek: Data & AI Edition on October 8th from 5:30 PM - 8:30 PM PDT! We will co-host this event with Datastrato and Zilliz, featuring key insights on AI and data tech: ? Multimodal Search with Open-Source Tools, Stefan Webb, DevRel, Zilliz ? A Guide to Compound AI Systems, Chaoyu Yang, Founder & CEO, BentoML ? Timeplus - One Single Binary to Tackle Streaming and Historical Analytics, Ken Chen, Co-Founder & Chief Architect, Timeplus Don't miss out on the chance to learn, network, and dive into the latest advancements in AI and data technology! ???Register now to secure your spot! See comments for the link ?? #AI #DataScience #BentoML #Datastrato #Zilliz #Timeplus #OpenSource
1 条评论

赞评论分享
BentoML

8,674 位关注者
1 周
举报此动态
Check out this step-by-step guide to run Llama 3.2 Vision, the first Llama model to support vision tasks! With a single command, you can self-host it with #OpenLLM, or deploy on #BentoCloud for a hassle-free experience! Both as OpenAI-compatible APIs! Link: https://lnkd.in/g9YTi7hu #AI #LLM #MachineLearning #BentoML #OpenSource #Llama32

Deploying Llama 3.2 Vision with OpenLLM: A Step-by-Step Guide

bentoml.com

赞评论分享
BentoML

8,674 位关注者
1 周
举报此动态
? Only 3 days left! Join our CEO Chaoyu Yang to explore cost-saving strategies and performance optimization for LLMs. Register now to reserve your spot!

BentoML

8,674 位关注者
2 周

?? Join the first LIVE! #AGIBuildersMeetup on 10/02, 9 AM, PT ? Self-hosting #LLM vs #Serverless API ?? Uncover hidden costs & Optimize performance ?? Live Q&A with?BentoML CEO Chaoyu Yang ?? Can't attend? Register to get the recording ?? Spread the love with likes, shares, and invites #AI #GenAI #ML

此处无法显示此内容

在领英 APP 中访问此内容等

赞评论分享
BentoML

8,674 位关注者
1 周
举报此动态
?? We’re thrilled to announce that #BentoCloud is now available on Amazon Web Services (AWS) Marketplace! This opens new opportunities for #AWS customers to leverage a complete platform to build and scale #CompoundAI systems! BentoCloud takes the infrastructure complexity out of production #AI workloads. It enables AI teams to run inference with unparalleled efficiency, rapidly iterate on system design, and effortlessly scale in production. With the built-in observability features, BentoCloud empowers you to optimize your AI operations and stay ahead in the fast-paced world of enterprise AI. ?? Explore BentoCloud on AWS Marketplace: https://lnkd.in/g8KaiXGN ?? Questions? Contact us: https://lnkd.in/gxcD-9k8 #BentoML #MachineLearning
赞评论分享
BentoML

8,674 位关注者
1 周
举报此动态
?? Exciting times in the open-source AI world again! AI at Meta launched Llama 3.2, introducing multimodal capabilities with Llama Vision and new small models for on-device applications! Over 10 ?? variants from 1B to 90B! Try running inference with OpenLLM and BentoML ?? #OpenLLM openllm serve llama3.2:1b openllm serve llama3.2:3b openllm serve llama3.2:11b-vision #BentoML 11B: https://lnkd.in/gaJXB6Md 90B: https://lnkd.in/ggY5B4Te Here is Llama 3.2 Vision on ?? #BentoCloud ?? #AI #MachineLearning #OpenSource #Llama32
赞评论分享
BentoML

8,674 位关注者
1 周
举报此动态
???BentoML Newsletter: Webinar - Self-Hosting or Serverless API? Master LLM Deployments by Optimizing Performance and Cutting Expenses

BentoML Newsletter | September 2024

BentoML，发布于领英

赞评论分享

相似主页

查看职位

融资

BentoML 共 3 轮

上一轮

种子轮 2023年7月26日

US$6,000,000.00

投资者

DCM Ventures +1 其他投资者

在 Crunchbase 上查看更多信息

登录看看您认识BentoML的哪些人

BentoML

软件开发

San Francisco，California 8,674 位关注者

Build and Scale Compound AI Systems

关于我们

产品

BentoML

地点

BentoML员工

Amir Rustamzadeh

Partner · Firestreak Ventures

Sean Sheng

Head of Engineering at BentoML | Build and deploy AI applications

Eric Liu

Industrializing Machine Learning

Gloria Zhang

Investor at DCM Ventures | Wharton | ex-AI product

动态

立即加入，查看您错过的职场动态

相似主页

ragas

LlamaIndex

Hugging Face

neptune.ai

MLflow

Weights & Biases

ClearML

Outerbounds

Perplexity

Featureform

查看职位

总监职位

搜索经理职位

工厂操作员职位

英语教师职位

业务发展代表职位

产品经理主管职位

设计研究员职位

高级产品设计师职位

用户体验研究员职位

生物统计学家职位

技术制作人职位

用户体验设计师职位

项目经理职位

分析师职位

Python 开发员职位

科学家职位

工程师职位

经理职位

客户经理职位

前端开发工程师职位

融资