BentoML Newsletter | September 2024
Welcome to the BentoML newsletter, with the latest updates and resources for AI engineers who are building and scaling AI systems in production!
Mastering LLM Deployments: Self-Hosting or Serverless API
Join us on October 2nd, 2024, from 9:00 AM to 10:00 AM PDT for the first LIVE AGI Builders Meetup! Tired of skyrocketing AI costs, unpredictable performance, or data security concerns? You're not alone. Our CEO Chaoyu Yang will explore these topics and share actionable strategies to enhance your LLM deployment processes.
Tuning TensorRT-LLM for Optimal Serving with BentoML
This blog post explores inference configuration tuning for LLM serving, using TensorRT-LLM as an example. We demonstrate how adjusting configurations like batch size and prefix chunking can improve serving performance, focusing on improvements in Time to First Token and Token Generation Rate.
Function Calling with Open-Source LLMs
Open-source LLMs are rapidly evolving as key components of compound AI systems. This blog post explores function calling, a key method for integrating LLMs with external APIs and custom functions into such systems. We also dive into its use cases like agents and data extraction, the trade-offs between open-source and proprietary models for function calling, and the challenges like generating structured outputs.
Featured Content and Events: In Case You Missed It