vLLM转发了
vLLM running hot on 5080! Thank you Ian Buck and NVIDIA so much for letting me test out getting it to work on Blackwell! Try it out yourself with instructions here to make your GPU go brrr! https://lnkd.in/g5UgmuDz
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs
vLLM的外部链接
vLLM转发了
vLLM running hot on 5080! Thank you Ian Buck and NVIDIA so much for letting me test out getting it to work on Blackwell! Try it out yourself with instructions here to make your GPU go brrr! https://lnkd.in/g5UgmuDz
vLLM转发了
Awesome turnout for Anyscale's Cody Yu presentation at the vLLM meetup—nearly 300 people joined to hear about the vLLM roadmap and our team's release of new LLM APIs in Ray Data and Ray Serve. The new batch inference APIs seamlessly integrate vLLM, improving both speed and scalability. See the APIs here: Ray Data + LLMs- https://lnkd.in/gJ_Ucc4W Ray Serve for LLMs- https://lnkd.in/gi2TVSAz
vLLM转发了
?? We built an open-source RAG with DeepSeek-R1. Here's what we learned: ?? Don’t use DeepSeek R1 for retrieval: Use specialized embeddings — Qwen's embedding model is amazing! ?? Do use R1 for response generation ?? Use vLLM & SkyPilot, to boost performance by 5x & scale up by 100x! Our complete code and learnings: https://lnkd.in/g6B6Y3SE
DeepSeek AI is dropping a lot of goodies this week! Join tomorrow's vLLM office hours to discover what they are, how they work seamlessly with vLLM, & bring your questions to learn more with Michael Goin Date: Thursday, Feb. 27 Time: 2:00PM ET / 11:00AM PT Register: https://lnkd.in/euF8m73q
RunLLM powers the Ask AI button on https://docs.vllm.ai and has successfully answered 3000+ questions every week! The answers incorporates docs with GH issues, code comments, and Slack. No hallucination, no handwaving, real AI Support Engineer with grounded answers! Congrats!
The RunLLM Public Beta is Live! ?? After almost two years of work, we’re launching RunLLM — the first AI Support Engineer. Built for advanced technical support, RunLLM: ? Saves support engineering time ? Accelerates customer adoption ? Generates customer and product insights We’ve also designed an awesome new onboarding experience that we can’t wait for you to try — go to runllm.com to sign up and try it for free. You can read about our vision for the AI Support Engineer here: https://lnkd.in/eMNF2Yaw
We are welcoming AIBrix to vLLM organization! It is a battery-included vLLM Kubernetes serving stack developed by ByteDance. Born in early 2024, AIBrix was built with scalability at its core. It has already powered multiple use cases in production with: ? High-Density LoRA Management: Streamlined support for lightweight, low-rank adaptations of models. ? LLM Gateway and Routing: Efficiently manage and direct traffic across multiple models and replicas. ? LLM App-Tailored Autoscaler: Dynamically scale inference resources based on real-time demand. ? Unified AI Runtime: A versatile sidecar enabling metric standardization, model downloading, and management. ? Distributed Inference: Scalable architecture to handle large workloads across multiple nodes. ? Distributed KV Cache: Enables high-capacity, cross-engine KV reuse. ? Cost-efficient Heterogeneous Serving: Enables mixed GPU inference to reduce costs with SLO guarantees. ? GPU Hardware Failure Detection: Proactive detection of GPU hardware issues. and more! vLLM Blog: https://lnkd.in/gkU4cG94 Code Repo: https://lnkd.in/g7yyVUs3 Detailed technical blog: https://lnkd.in/ghN8dyAc.
vLLM转发了
We’re excited to see the vLLM Project team at UC Berkeley’s Sky Computing Lab unbox their new #NVIDIADGX B200 system. ??
We're excited to receive our first #NVIDIADGX B200 system which we'll use for vLLM research and development! Thank you NVIDIA!
Friends from the East Coast! Join us on Tuesday, March 11 in Boston for the first ever East Coast vLLM Meetup. You will meet vLLM contributors from Neural Magic (Acquired by Red Hat), Red Hat, Google, and more. Come share how you are using vLLM and see what's on the roadmap!