?? Llama 3.1 405B now runs on Cerebras Inference at 969 tok/s, a new world record! Highlights: ?? 969 tokens/s – this is frontier AI at instant speed ? 12x faster than GPT-4o, 18x faster than Claude, 75x faster than AWS ? 128K context length with 16-bit weights ? Industry leading time-to-first-token: 240ms This year we pushed Llama 3.1 8B and 70B to over 2,000 tokens/s, but frontier models are still stuck at GPU speed. Not anymore. On Cerebras, Llama 3.1 405B now runs at 969 tokens/s—code, reason, and RAG workflows just got 12-18x faster than closed frontier models. Cerebras Inference for Llama 3.1 405B is in customer trials today with general availability coming in Q1 2025, priced at $6/million tokens (input) and $12/million tokens (output). Frontier AI now runs at instant speed on Cerebras. #Llama #Inference #AI Read more here: https://lnkd.in/g-RGjf9Q
Cerebras Systems
计算机硬件
Sunnyvale,California 41,258 位关注者
AI insights, faster! We're a computer systems company dedicated to accelerating deep learning.
关于我们
Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, functional business experts and engineers of all types. We have come together to build a new class of computer to accelerate artificial intelligence work by three orders of magnitude beyond the current state of the art. The CS-2 is the fastest AI computer in existence. It contains a collection of industry firsts, including the Cerebras Wafer Scale Engine (WSE-2). The WSE-2 is the largest chip ever built. It contains 2.6 trillion transistors and covers more than 46,225 square millimeters of silicon. The largest graphics processor on the market has 54 billion transistors and covers 815 square millimeters. In artificial intelligence work, large chips process information more quickly producing answers in less time. As a result, neural networks that in the past took months to train, can now train in minutes on the Cerebras CS-2 powered by the WSE-2. Join us: https://cerebras.net/careers/
- 网站
-
https://www.cerebras.ai
Cerebras Systems的外部链接
- 所属行业
- 计算机硬件
- 规模
- 201-500 人
- 总部
- Sunnyvale,California
- 类型
- 私人持股
- 创立
- 2016
- 领域
- artificial intelligence、deep learning、natural language processing和inference
产品
地点
Cerebras Systems员工
动态
-
It's that time of the year again! ????We're headed to Vancouver for NeurIPS 2024! Test drive Cerebras Inference, serving the biggest LLMs 70x faster than NVIDIA GPUs. Learn about the latest #ML research that's powering the next wave of #genAI. Meet us there: https://lnkd.in/gPZAs2VA
-
How did we achieve 70x faster inference than NVIDIA? Watch Daniel Kim's talk at Llamapalooza NYC to learn about the hardware and software optimizations Cerebras is achieving to accelerate next-gen AI. https://lnkd.in/gwCjYBMY
Behind the Scenes: Achieving 2100 tok/s with Llama-70B | Daniel Kim
https://www.youtube.com/
-
Thank you to Zetta Venture Partners for hosting and giving me the opportunity to give a keynote talk at the #AINative2024 conference! Cerebras Systems now powers the faster frontier model on the planet. Llama 405B at 969 tokens/s. This is GPU impossible performance! To learn more about how Cerebras Systems can enable your next generation AI application, check out https://lnkd.in/guwS5mbV #ai #ml
Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference - Cerebras
https://cerebras.ai
-
At the Supercomputer Show this week, Cerebras Systems announced the fastest inference for Llama3.1 405B in the industry, a whopping 969 tokens per second...the interest was overwhelming....thank you to the hundreds of people who visited our booth.
-
Nothing excites us more than seeing what people are able to build on Cerebras. We are grateful for our partners at #LLNL, Los Alamos National Laboratory and Sandia National Laboratories with whom we get to peer into the future and do the impossible.
A team of researchers from #LLNL, Los Alamos National Laboratory, Sandia National Laboratories and Cerebras Systems have unveiled a revolutionary approach to molecular dynamics simulations using the Cerebras Wafer-Scale Engine. Explore how the work is unleashing new frontiers in materials science: https://lnkd.in/gtHASdgB
-
From breaking a few world records to a packed happy hour, we couldn't have asked for a better time at #SC24. Thank you for helping us take Cerebras to the next level. We hope to see you all next year! If you missed us at the conference, no problem, you can reach out to us any time: https://lnkd.in/ge74WNtj
-
#SC24 is a wrap! We kicked off the conference announcing new records in Llama inference and molecular dynamics (our fourth Gordon Bell finalist in a row!) Over the course of the week, Cerebras and the broader community presented five accepted papers and posters powered by our hardware. We also showcased the power of wafer-scale at Argonne Leadership Computing Facility's tutorial on AI accelerator inference, and our recurring BoF on using AI accelerators for HPC applications. A huge thank you to the incredible work done every day by everyone at Cerebras, and all our partners, collaborators, and friends. A special shoutout to Michael James, Sivasankaran Rajamanickam, and their teams at Cerebras Systems and Sandia National Laboratories on their groundbreaking Gordon Bell finalist in molecular dynamics! See everybody next year! #IAmCerebras
-
"I am excited to join the Cerebras team and help guide this visionary company as it revolutionizes AI compute. Cerebras is uniquely positioned to transform the pace and innovation of AI workloads, and I look forward to contributing to its continued success as it defines the future of computing." Please join us in welcoming Thomas (Tom) Lantzsch to Cerebras Board of Directors to help advance our AI compute leadership. Read more here: https://lnkd.in/gwF_FwPy