Join us for our next #PyTorch Expert Exchange Webinar on Wednesday, October 16th at 4 PM PT ?? DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference with Hao Zhang, Assistant Professor at Hal?c?o?lu Data Science Institute, UC San Diego and UC San Diego Computer Science and Engineering Department (CSE). Details: In this talk, I'll talk about our work DistServe (OSDI'24). DistServe disaggregates the prefill and decoding computation to eliminate interference between two phases, hence improves the performance of large language models (LLMs). DistServe has seen adoption in frameworks like vLLM and companies including Google.
PyTorch
研究服务
San Francisco,California 268,920 位关注者
An open source machine learning framework that accelerates the path from research prototyping to production deployment.
关于我们
An open source machine learning framework that accelerates the path from research prototyping to production deployment. PyTorch is an open source project at the Linux Foundation.
- 网站
-
https://www.pytorch.org
PyTorch的外部链接
- 所属行业
- 研究服务
- 规模
- 501-1,000 人
- 总部
- San Francisco,California
- 类型
- 上市公司
- 领域
- Artificial Intelligence、Deep Learning、Machine Learning和AI
地点
-
主要
548 Market St
US,California,San Francisco
PyTorch员工
动态
-
Starting soon ?? Live PyTorch Expert Exchange webinar on DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference with Hao Zhang
Join us for our next #PyTorch Expert Exchange Webinar on Wednesday, October 16th at 4 PM PT ?? DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference with Hao Zhang, Assistant Professor at Hal?c?o?lu Data Science Institute, UC San Diego and UC San Diego Computer Science and Engineering Department (CSE). Details: In this talk, I'll talk about our work DistServe (OSDI'24). DistServe disaggregates the prefill and decoding computation to eliminate interference between two phases, hence improves the performance of large language models (LLMs). DistServe has seen adoption in frameworks like vLLM and companies including Google.
DistServe: disaggregating prefill & decoding for LLM inference
www.dhirubhai.net
-
Learn about faster CPU performance for #PyTorch on Windows from Intel Corporation ?? in our latest blog: https://hubs.la/Q02TrR060
-
?? Live #PyTorch Expert Exchange Webinar starts soon Grab your ? and join Guangxuan Xiao for a talk and Q&A on Efficient Streaming Language Models with Attention Sinks Participate at: https://hubs.la/Q02T57N_0
Efficient Streaming Language Models with Attention Sinks
https://www.youtube.com/
-
?? Highlights from PyTorch Conference 2024 ?? ?? Watch the video: https://hubs.la/Q02T0zGn0 Amazing talks, powerful collaborations, and a welcoming community ?? What was your favorite part? Share below ?? #AI #DeepLearning #PyTorchConf #OpenSource #ML #PyTorch
PyTorch Conference 2024 Highlights
https://www.youtube.com/
-
We are pleased to announce the first-ever Chair and Vice Chair of the PyTorch Foundation’s Technical Advisory Council (TAC) ?? Congrats to Chair - Luca Antiga of Lightning AI and Vice Chair - Jiong Gong of Intel Corporation Both leaders bring extensive experience and deep commitment to the PyTorch community. Learn more in our blog: https://hubs.la/Q02SFVl80
-
Join us next Friday, October 11th at 10 AM PT for our next LIVE PyTorch Expert Exchange Webinar on Efficient Streaming Language Models with Attention Sinks with Guangxuan Xiao, MIT EECS ????? Tune in at: https://hubs.la/Q02SfXqm0 Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly, during the decoding stage, caching previous tokens' Key and Value states (KV) consumes extensive memory. Secondly, popular LLMs cannot generalize to longer texts than the training sequence length. Window attention, where only the most recent KVs are cached, is a natural approach, but we see that it fails when the text length surpasses the cache size. We observe an interesting phenomenon, namely attention sink, that keeping the KV of initial tokens will largely recover the performance of window attention. In this paper, we first demonstrate that the emergence of attention sink is due to the strong attention scores towards initial tokens as a "sink" even if they are not semantically important. Based on the above analysis, we introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence lengths without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more. In addition, we discover that adding a placeholder token as a dedicated attention sink during pre-training can further improve streaming deployment. In streaming settings, StreamingLLM outperforms the sliding window recomputation baseline by up to 22.2x speedup #PyTorch #LLMs #ExpertExchange #AttentionSinks #Live #Webinar
-
PyTorch Conference 2024 Recap: On Fire ?? Check out the key themes, highlights, & major takeaways from this year’s conference on our blog: https://hubs.la/Q02RVXbg0 ? Generative AI and LLMs ? Democratizing AI Through Open Source ? Distributed and Edge Computing #PyTorchConf #AI #ML #PyTorch #OpenSource
-
Congratulations to the 2024 PyTorch Contributor Awardees ?? The Annual PyTorch Contributor Awards honor individuals who have made significant contributions to the PyTorch ecosystem, presented at the PyTorch Conference. Their contributions have enriched our community and accelerated the advancement of AI and machine learning. This year's awardees: ?? PyTorch Newcomer - Jerome Ku ?? PyTorch Review Powerhouse - Aaron Gokasian ?? PyTorch Problem Solver - Natalia Gimelshein ?? PyTorch Innovator - Kaichao You ?? PyTorch Trail Blazer - Andrew Hoblitzell ?? PyTorch Rock Turner - Stefano Fabri ?? PyTorch Ecosystem Champion - Phillipe Tillet ?? PyTorch Pacesetter - Eddie Yan ?? PyTorchbearer - Jiong Gong ?? PyTorch Superhero - K. Frank And nominees: Eddie Yan, Salman Mohammadi, Leslie Fang, Jithun Nair, Davis Wertheimer, Sahdev Zala, Masahiro Hiramori, Xuehai Pan, Thien Tran, Jonathan Chuang, Sunita Nadampalli, and Pritam Damania ?? More info: https://hubs.la/Q02RLRBt0 #PyTorchConf #AI #ML #PyTorch #Contributors #Awards
-
Thank you to all who joined us for PyTorch Conference 2024! Recordings are now live ?? Watch the playlist: https://hubs.la/Q02RGm630 #PyTorchConf #AI #ML #LLM #PyTorch