Neural Magic (Acquired by Red Hat)

软件开发

Somerville，Massachusetts 18,290 位关注者

关注

查看全部 22 位员工

关于我们

Together with our community, we engineer sparse LLM, CV, and NLP models that are more efficient and performant in production. Why does this matter? Sparse models are more flexible and can achieve unrivaled latency and throughput performance on your private CPU and GPU infrastructure. Check us out on GitHub and join the Neural Magic Slack Community to get started with software-delivered AI.

网站: https://neuralmagic.com/
Neural Magic (Acquired by Red Hat)的外部链接
所属行业: 软件开发
规模: 51-200 人
总部: Somerville，Massachusetts
类型: 私人持股
领域: machine learning、deep learning和artificial intelligence

地点

主要

55 Davis Sq

Floor 3

US，Massachusetts，Somerville，02144

获取路线

Neural Magic (Acquired by Red Hat)员工

查看全部员工

动态

Neural Magic (Acquired by Red Hat)

18,290 位关注者
3 天前
举报此动态
[vLLM Office Hours #22] Introduction to vLLM V1 Join us to learn about new developments in vLLM v1, including updates to the scheduler, memory manager, model runner API server, and more.

[vLLM Office Hours #22] Introduction to vLLM V1

www.dhirubhai.net

赞评论分享
Neural Magic (Acquired by Red Hat)转发了
Yun Dai

Senior Software Engineer @ LinkedIn | LLM & RecSys
1 个月
举报此动态
Check out our latest work on LLM compression and efficient training & deployment!

Kayhan Behdin

Sr. ML Engineer at LinkedIn, PhD at MIT
1 个月

Excited to share our latest preprint detailing our team's recent work at LinkedIn, https://lnkd.in/dWHTuKJm! Our focus has been on training and deploying efficient Large Language Models (LLMs) across various predictive and generative applications. Through techniques like knowledge distillation, model compression via pruning and quantization, and CUDA kernel optimization, we've successfully developed and deployed small language models that mostly maintain the quality of larger foundation models while offering significantly higher inference throughput and lower latency. Notably, we've achieved over a 20x reduction in model size with minimal impact on model quality. In our paper, we discuss the specifics of our approach towards model compression and efficiency, sharing practical insights gained along the way. Our paper touches upon both methodology and practice of efficient LLM deployment. Particularly, we demonstrate the power of model pruning through combinatorial optimization, adding to the growing list of real-world applications of discrete optimization. Read more about our work: Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications: https://lnkd.in/dWHTuKJm Structured pruning with OSSCAR: https://lnkd.in/d8emmFQM Model quantization with QuantEase: https://lnkd.in/dZna796n 360Brew: A foundation model for personalized recommendation: https://lnkd.in/dUXydhaZ Kudos to our amazing team, and specially, Aman Gupta, Yun Dai, Qingquan Song and Ata Fatahi who made this work possible!

Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications

arxiv.org

3 条评论

赞评论分享
Neural Magic (Acquired by Red Hat)

18,290 位关注者
3 周
举报此动态
[vLLM Office Hours #21] vLLM Production Stack Deep Dive Join us for an overview of the components in the vLLM Production Stack (https://lnkd.in/gsSnNb9K) and practical guidance on deploying it effectively. We’ll dive into the technical details, including an in-depth look at the prefix-aware router and its role in optimizing request routing, as well as KV cache offloading and its impact on performance and scalability.

[vLLM Office Hours #21] vLLM Production Stack Deep Dive

www.dhirubhai.net

赞评论分享
Neural Magic (Acquired by Red Hat)转发了
vLLM

845 位关注者
1 个月已编辑
举报此动态
Friends from the East Coast! Join us on Tuesday, March 11 in Boston for the first ever East Coast vLLM Meetup. You will meet vLLM contributors from Neural Magic (Acquired by Red Hat), Red Hat, Google, and more. Come share how you are using vLLM and see what's on the roadmap!

East Coast vLLM Meetup · Luma

lu.ma

赞评论分享
Neural Magic (Acquired by Red Hat)转发了
LMCache Lab

583 位关注者
3 周
举报此动态
???Our very own Yihua Cheng will be presenting in the vLLM office hour this ???????????????? with Neural Magic (Acquired by Red Hat)!? ? It is time to learn more about ??????????-?????????? ?????? ?????????????? with ???????? ????????????????????-??????????! We will be discussing in depth about the design and functionality of the vLLM Production-Stack, an open-source framework to serve LLM models ???? ?????????? with ?????? ???????? and ???????? ????????????????????. We will also share updates on the????????????? ???????? ??????????????????????.? Mark you calendar for ?????????? ??, ??:???????? ???? / ????:???????? ???? and register at this link: https://lnkd.in/euF8m73q Comment below for anything else you want us to talk about in the office hour! ?? Code: https://lnkd.in/gsSnNb9K ?? Blog: https://lnkd.in/gdXdRhEj ?? Tutorials: https://lnkd.in/gWz7gW6T ?? 30-sec demo: youtu.be/RLk8zbQ-eqM #LLM #vLLM #opensource #K8s #AI #AIInfra #Systems

vLLM Production Stack for open-source Enterprise-Grade LLM serving

github.com

赞评论分享
Neural Magic (Acquired by Red Hat)转发了
Mark Kurtz

Enabling Efficient AI @ Red Hat | Former CTO Neural Magic (acquired) | ML Innovator and Researcher
3 周已编辑
举报此动态
?? ?????????????????? ????????????????-???? ???????? – ?????????????????? ?????? ?????????????????? ?????? ???????????????????? ?????? ????????????????????! While many companies are still reasoning through the implications of the open-source DeepSeek R1 launch, we've been heads-down refining and compressing these models for deployment to maximize efficiency with quantization while maintaining accuracy. To our knowledge, this is the first comprehensive exploration of quantization for reasoning LLMs, spanning ?????? ?????????????????? ????????????, ???????????????? ???? ?????????????????? ???? ?????????????????? ????????????????????, and ?????????????????? ???? ?????????????????????? ??????????????????. Our findings offer key insights into model behavior, trade-offs, and performance improvements across different quantization techniques. ?? Specifically, we found: ? ???????????? ???????????? ?????? ???????????? ???? ??????????????????: 7B+ models retained full accuracy at 8-bit (FP8, INT8) and 97% at 4-bit (INT4) ? ?????????????? ???????????? ???????? ???????? ????????????: After more thorough hyperparameter tuning, the 1.5B model recovered 97% at 8-bit and 94% at 4-bit ? ???????????????????? ?????????????????????? ????????????: Speedups averaged around 1.5X with some up to 4X, depending on the model size, hardware, and inference scenario ?? Some interesting insights: ? ?????????????????? ???????????????????? ???????? ???????? ????????????????: AIME had up to a 7-point standard deviation in pass@1, and to address this generally, we ran 20 different random seeds for each eval ? ?????????????????? ???????????? ?????????????? ??????????????: The DeepSeek tokenizer on Hugging Face was missing the <think> token in chat templates, leading to degraded accuracy even in baseline models If you're interested in diving in more, check out our latest Red Hat research blog (our first piece not pushed solely through Neural Magic (Acquired by Red Hat)!): https://lnkd.in/eENPT8xz Or get hands-on with the Hugging Face collection and start deploying now: https://lnkd.in/eQu_bhMR Let me know what you think and what you want to see next!
10 条评论

赞评论分享
Neural Magic (Acquired by Red Hat)

18,290 位关注者
1 个月已编辑
举报此动态
Accurately quantized DeepSeek-R1 is here!
Eldar Kurti?

Machine Learning
1 个月

?? Introducing state-of-the-art quantized DeepSeek-R1 reasoning models for blazingly fast inference! DeepSeek recently released a suite of distilled reasoning models across the Llama and Qwen families. These models demonstrated impressive performance on a wide range of reasoning benchmarks and applications. Given their potential, we set out to quantize them while preserving their reasoning capabilities—ensuring faster inference without compromising accuracy. ?? Key findings from quantization: - Larger models quantize exceptionally well, minimal tuning was required to retain near lossless accuracy. - Smaller models require careful tuning, so we had to employ techniques like MSE-optimal clipping and activation reordering to help stabilize quantization. ?? How do different quantization schemes compare? - FP W8A8: Practically lossless accuracy recovery. - INT W8A8: Competitive, recovering ~99% of the original accuracy. - INT W4A16: Modest drop on AIME & GPQA-Diamond, but strong on MATH-500. You can find the models fully open-sourced in our HuggingFace Hub at: https://lnkd.in/d3stvmvi For more details about evaluations and performance benchmarking, please check out our Neural Magic (Acquired by Red Hat) blog at: https://lnkd.in/dr-D5WTB
赞评论分享
Neural Magic (Acquired by Red Hat)

18,290 位关注者
1 个月
举报此动态
[vLLM Office Hours #20] DeepSeek and vLLM DeepSeek is dropping a lot of exciting goodies this week during their Open Source Week, and we’re thrilled to spotlight them at our bi-weekly vLLM Office Hours! We'll dive into "DeepSeek on vLLM: New Features, Optimizations, and More," making it a hands-on learning opportunity for anyone curious about DeepSeek’s innovations and how they work with vLLM. DeepSeek's advancements aren’t just cool tech—they’re reshaping how we build and deploy AI. DeepSeek’s focus on efficiency means you can tackle bigger problems with fewer resources. Plus, with vLLM’s seamless integration, you get these benefits without the headache. This week’s Office Hours is your chance to learn how DeepSeek and vLLM team up. We’ll unpack DeepSeek's features, demo their vLLM integration, and brainstorm what’s next—together.

[vLLM Office Hours #20] DeepSeek and vLLM

www.dhirubhai.net

赞评论分享
Neural Magic (Acquired by Red Hat)转发了
Robert Shaw

Director of Engineering at Red Hat
1 个月已编辑
举报此动态
Make sure to join the first vLLM East Coast meetup in Cambridge, MA! Great opportunity to learn more about production-grade inference serving with vLLM. We are excited to share project updates! The best feature requests are made in person https://lu.ma/7mu4k4xx vLLM Red Hat Neural Magic (Acquired by Red Hat)

East Coast vLLM Meetup · Luma

lu.ma

1 条评论

赞评论分享
Neural Magic (Acquired by Red Hat)转发了
Robert Shaw

Director of Engineering at Red Hat
1 个月已编辑
举报此动态
It has been an amazing week for open source AI infrastructure as the DeepSeek team releases key components of their infrastructure stack that supports the complex and innovative V3/R1 architecture. https://lnkd.in/eWV7wQgr So far, we have seen: - *FlashMLA*: an efficient MLA decoding kernel for Hopper GPUs, which helps to accelerate attention (the bottleneck for long context "reasoning-style" workloads) - *DeepEP*: the first open-source EP communication library for MoE model training and inference, which helps to enable more complex parallelism schemes for serving the 600B+ parameter model with 256 experts - *DeepGEMM*: an FP8 matmul library that supports both dense and MoE layers, helping to accelerate the first foundation model trained in Fp8 - *EPLB*: an expert-parallel load balancer for V3/R1, enabling more complex expert parallel deployments for at scale workloads At 2pm ET during the Neural Magic (Acquired by Red Hat) Office Hours, Lucas Wilkinson, Tyler Michael Smith, Michael Goin, Simon Mo, and I will dive into these items and cover our progress integrating them into vLLM! Stop by to ask questions! Link to signup: https://lnkd.in/ePqDYgpT

vLLM Office Hours

https://neuralmagic.com

赞评论分享

关联主页

红帽

软件开发

Raleigh，NC

相似主页

查看职位

融资

Neural Magic (Acquired by Red Hat) 共 3 轮

上一轮

A 轮 2021年11月5日

US$30,000,000.00

投资者

New Enterprise Associates +4 其他投资者

在 Crunchbase 上查看更多信息

登录看看您认识Neural Magic (Acquired by Red Hat)的哪些人

Neural Magic (Acquired by Red Hat)

软件开发

Somerville，Massachusetts 18,290 位关注者

关于我们

DeepSparse

深度学习软件

SparseML

深度学习软件

SparseZoo

深度学习软件

地点

Neural Magic (Acquired by Red Hat)员工

Dimitri Sirota

BigID - Know Your Data | Control Your Data

Jamie Goldstein

Steve Dake

Principal Engineer and Technical Director Distributed Inferencing (vllmd)

Shubhra Pandit

Perception & Machine Learning

动态

[vLLM Office Hours #22] Introduction to vLLM V1

www.dhirubhai.net

[vLLM Office Hours #21] vLLM Production Stack Deep Dive

www.dhirubhai.net

[vLLM Office Hours #20] DeepSeek and vLLM

www.dhirubhai.net

立即加入，查看您错过的职场动态

关联主页

红帽

相似主页

红帽

Deci AI (Acquired by NVIDIA)

Cerebras Systems

Nebius

Anthropic

Pillar VC

Roboflow

Anyscale

Weights & Biases

Hugging Face

查看职位

工程师职位

总监职位

服务顾问职位

法律助理职位

视觉助理职位

机器学习工程师职位

市场营销经理职位

专员职位

经理职位

创始人职位

科学家职位

运营顾问职位

产品管理总监职位

生物学家职位

客户专员职位

分析师职位

社交媒体总监职位

融资