vLLM

软件开发

An open source, high-throughput and memory-efficient inference and serving engine for LLMs.

关注

查看全部 7 位员工

关于我们

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs

网站: https://github.com/vllm-project/vllm
vLLM的外部链接
所属行业: 软件开发
规模: 51-200 人
类型: 非营利机构

vLLM员工

查看全部员工

动态

vLLM转发了
Roger Wang

Software Engineer @ Roblox
2 天前
举报此动态
vLLM running hot on 5080! Thank you Ian Buck and NVIDIA so much for letting me test out getting it to work on Blackwell! Try it out yourself with instructions here to make your GPU go brrr! https://lnkd.in/g5UgmuDz
4 条评论

赞评论分享
vLLM转发了
Anyscale

45,352 位关注者
1 周已编辑
举报此动态
Awesome turnout for Anyscale's Cody Yu presentation at the vLLM meetup—nearly 300 people joined to hear about the vLLM roadmap and our team's release of new LLM APIs in Ray Data and Ray Serve. The new batch inference APIs seamlessly integrate vLLM, improving both speed and scalability. See the APIs here: Ray Data + LLMs- https://lnkd.in/gJ_Ucc4W Ray Serve for LLMs- https://lnkd.in/gi2TVSAz
赞评论分享
vLLM转发了
SkyPilot

378 位关注者
1 周
举报此动态
?? We built an open-source RAG with DeepSeek-R1. Here's what we learned: ?? Don’t use DeepSeek R1 for retrieval: Use specialized embeddings — Qwen's embedding model is amazing! ?? Do use R1 for response generation ?? Use vLLM & SkyPilot, to boost performance by 5x & scale up by 100x! Our complete code and learnings: https://lnkd.in/g6B6Y3SE

Using DeepSeek R1 for RAG: Do's and Don'ts

blog.skypilot.co

3 条评论

赞评论分享
vLLM

616 位关注者
1 周
举报此动态
DeepSeek AI is dropping a lot of goodies this week! Join tomorrow's vLLM office hours to discover what they are, how they work seamlessly with vLLM, & bring your questions to learn more with Michael Goin Date: Thursday, Feb. 27 Time: 2:00PM ET / 11:00AM PT Register: https://lnkd.in/euF8m73q
赞评论分享
vLLM

616 位关注者
2 周
举报此动态
RunLLM powers the Ask AI button on https://docs.vllm.ai and has successfully answered 3000+ questions every week! The answers incorporates docs with GH issues, code comments, and Slack. No hallucination, no handwaving, real AI Support Engineer with grounded answers! Congrats!

RunLLM

1,263 位关注者
2 周

The RunLLM Public Beta is Live! ?? After almost two years of work, we’re launching RunLLM — the first AI Support Engineer. Built for advanced technical support, RunLLM: ? Saves support engineering time ? Accelerates customer adoption ? Generates customer and product insights We’ve also designed an awesome new onboarding experience that we can’t wait for you to try — go to runllm.com to sign up and try it for free. You can read about our vision for the AI Support Engineer here: https://lnkd.in/eMNF2Yaw

Introducing RunLLM: The AI Support Engineer

runllm.com

赞评论分享
vLLM

616 位关注者
2 周
举报此动态
We are welcoming AIBrix to vLLM organization! It is a battery-included vLLM Kubernetes serving stack developed by ByteDance. Born in early 2024, AIBrix was built with scalability at its core. It has already powered multiple use cases in production with: ? High-Density LoRA Management: Streamlined support for lightweight, low-rank adaptations of models. ? LLM Gateway and Routing: Efficiently manage and direct traffic across multiple models and replicas. ? LLM App-Tailored Autoscaler: Dynamically scale inference resources based on real-time demand. ? Unified AI Runtime: A versatile sidecar enabling metric standardization, model downloading, and management. ? Distributed Inference: Scalable architecture to handle large workloads across multiple nodes. ? Distributed KV Cache: Enables high-capacity, cross-engine KV reuse. ? Cost-efficient Heterogeneous Serving: Enables mixed GPU inference to reduce costs with SLO guarantees. ? GPU Hardware Failure Detection: Proactive detection of GPU hardware issues. and more! vLLM Blog: https://lnkd.in/gkU4cG94 Code Repo: https://lnkd.in/g7yyVUs3 Detailed technical blog: https://lnkd.in/ghN8dyAc.
6 条评论

赞评论分享
vLLM转发了
NVIDIA AI

1,222,932 位关注者
2 周
举报此动态
We’re excited to see the vLLM Project team at UC Berkeley’s Sky Computing Lab unbox their new #NVIDIADGX B200 system. ??
23 条评论

赞评论分享
vLLM

616 位关注者
2 周
举报此动态
We're excited to receive our first #NVIDIADGX B200 system which we'll use for vLLM research and development! Thank you NVIDIA!
1 条评论

赞评论分享
vLLM

616 位关注者
2 周已编辑
举报此动态
Friends from the East Coast! Join us on Tuesday, March 11 in Boston for the first ever East Coast vLLM Meetup. You will meet vLLM contributors from Neural Magic (Acquired by Red Hat), Red Hat, Google, and more. Come share how you are using vLLM and see what's on the roadmap!

East Coast vLLM Meetup · Luma

lu.ma

赞评论分享

相似主页

登录看看您认识vLLM的哪些人

vLLM

软件开发

An open source, high-throughput and memory-efficient inference and serving engine for LLMs.

关于我们

vLLM员工

Michael Goin

Inference Optimization @ Red Hat | vLLM Committer

Robert Shaw

Director of Engineering at Red Hat

Kyle Mistele

Founder @ Constellate.ai, software engineer, open-source contributor

Simon Mo

Lowering cost of inference, via open source

动态

Using DeepSeek R1 for RAG: Do's and Don'ts

blog.skypilot.co

Introducing RunLLM: The AI Support Engineer

runllm.com

East Coast vLLM Meetup · Luma

lu.ma

立即加入，查看您错过的职场动态

相似主页

Neural Magic (Acquired by Red Hat)

字节跳动

Kubernetes

红帽

The Rundown AI

Embedded LLM

Sakana AI

Character.AI

BrightInvoice

AIBrix