180W vs. 1,000W: RNGD delivers power-efficient inference with LLMs

180W vs. 1,000W: RNGD delivers power-efficient inference with LLMs

2024 was a transformative year for FuriosaAI. Since we unveiled our second-gen inference chip, RNGD (“Renegade”), at Hot Chips in August, it has achieved compelling performance metrics in real-world deployments with enterprises. We’re now rapidly scaling production and expanding our leadership team with the appointment of Alex Liu as SVP of Product and Business to meet the strong market demand for RNGD’s unique combination of performance and efficiency.

With these technical, customer and business foundations in place, we’re positioned to demonstrate RNGD’s global impact in 2025. RNGD is the only product that delivers power efficiency and programmability as well as performance – the three most pressing challenges for inferencing with LLMs, multimodal, and future model architectures to prepare and scale your AI infrastructure for rapidly growing demand and adoption for agentic AI.

Sign up to be notified first about RNGD


Llama 3.1: RNGD Performance Breakthrough

Our latest performance data demonstrates that RNGD equipped with HBM3 48GB already meets the industry’s needs for throughput when deploying generative AI applications with Meta’s Llama 3.1-8B and Llama 3.1-70B. And with additional optimizations still in progress, we expect to deliver even better performance, greater power efficiency and new developer tools in the coming months.

As we shared in this press release, notable recent achievements include:

  • In addition to running Llama 3.1-8B, RNGD now runs Llama 3.1-70B and handles up to 100 concurrent queries using just two RNGD cards. Further optimization is in progress, targeting 8,000 TPS per server with just 8 RNGD cards.
  • New SDK optimization tools such as KV cache management with PagedAttention, vLLM API capability, PyTorch integration to empower developers to create efficient and scalable AI services.

We are continuously rolling out software features and seeing significant performance improvements. We’re excited for the continued evolution of RNGD as we head into 2025. ???


Alex Liu Joins Furiosa as SVP of Product and Business

Furiosa welcomes Alex Liu as SVP of Product and Business. An award-winning industry leader and co-founder of NETINT Technologies, Alex brings over 20 years of expertise in startup leadership, technology innovation, and strategic market development. His achievements at NETINT include launching the world’s first Video Processing Unit (VPU) SoC, a milestone that earned a Technology Emmy.

Alex will lead Furiosa’s global expansion efforts, overseeing product management, strategic partnerships, and regional operations. Partnering closely with the Furiosa's executive team, Alex aims to align Furiosa’s AI-native Tensor Contraction Processing technologies with client needs, driving innovation and scaling AI solutions worldwide.

Reach out to him to start conversations about partnerships in 2025!


Enterprise Customer Sampling and SDK Enhancements

Our RNGD cards deployed in a customer's data center

RNGD is now in the hands of early enterprise customers who are testing both cloud and on-premises environments. We’re working closely with our partners, including TSMC, to ramp up RNGD production.?

Early Access Program (EAP) customers can now use our latest SDK version, v2024.1.0, to implement advanced optimization techniques such as PagedAttention, Block KV Cache, and Continuous Batching, along with various token sampling methods.?

These resources are just the beginning, however. Our upcoming SDK release will include support for tensor parallelism, enabling seamless distribution of processing across multiple elements without requiring model modifications, and a torch.compile backend, providing the foundation for executing customized models. Integration with HuggingFace Optimum will further empower customers to leverage a broader variety of models.

Sign up to be notified about RNGD availability here


AI Chip Integration at KubeCon + CloudNativeCon 2024

Our engineer is the 3rd from the left in the first row!

Several Furiosa engineering leaders attended KubeCon + CloudNativeCon North America 2024 in Denver. New Kubernetes innovations like Dynamic Resource Allocation (DRA) and the Container Device Interface (CDI) will make it much easier to manage complex deployments on different kinds of AI acceleration hardware. Furiosa is currently developing CDI and DRA plugins for RNGD inference deployments in enterprise and cloud environments.

See Furiosa senior engineer Byonggun Chun’s analysis


Furiosa featured in DIGITIMES

A quote from the interview, read the full article!

Furiosa was recently featured in DIGITIMES, which highlighted our rapid growth and innovative approach in the AI chip market. We’re thrilled to see our vision gaining attention in the industry. And we're just getting started.?

Check out the full article here


AI Inference at Scale: The Key to Advancing Generative AI


Image credit: Generative Value

The AI inference landscape is evolving rapidly, as the industry races to optimize for performance, cost, and energy efficiency, while also serving new kinds of generative AI applications, such as agents.

The Generative Value newsletter recently provided a thought-provoking overview of what’s changing, noting that “AI inference is the bottleneck for deploying advanced AI at scale.”

Read Eric Flaningan’s analysis in Generative Value


RNGD Accelerates Academic Innovation at KAIST

Students at both ends are holding up RNGD

RNGD cards arrived this month at the Korea Advanced Institute of Science and Technology (KAIST), where they will be used in the Concurrency and Parallelism Laboratory to accelerate AI compiler research.

We’re excited to see how our cutting-edge hardware can help drive the field forward and we look forward to additional academic collaborations in 2025.


Until next time, keep your servers cool. ??

Bertrand GENNERET

Principal Product Engineer at Cadence Design Systems, willing to use its nanoelectronics technical skills for customers and students success, investor in French Tech startups, sport and astronomy lover

1 个月

Happy New Year! May 2025 be full of great accomplishments for humans and Earth

要查看或添加评论,请登录

FuriosaAI的更多文章