RNGD #furiousprogress since the unveiling. Llama 3.1-8B update!

RNGD #furiousprogress since the unveiling. Llama 3.1-8B update!

?? Chip optimization is a continuous journey, and we want to bring you along.

Just 3 months after RNGD's raw silicon arrived, we presented GPT-J results at Hot Chips in August. RNGD can now deliver 3,200–3,300 tokens per second on a single chip running the LLaMA 3.1-8B model, consuming just 159W at the chip level (181W at the board). ?

While we are still far from our finish line, this milestone is solid proof of our mission to make AI computing sustainable.

We’ve also been busy speaking at technical conferences, prepping RNGD-Max (!), and welcoming aboard Alex Liu, our new SVP of Product and Business.

Onward!

Sign up to be notified first about RNGD: https://furiosa.ai/signup.?

Furiosa CEO June Paik speaks at AI Open Source Summit

Furiosa co-founder and CEO June Paik delivered a talk at Open Source AI Summit cohosted by Technology Innovation Institute(TII) and Falcon Foundation. Taking the stage alongside AWS and Cerebras in the "Compute for AI Models" session, June spoke about open source being a critical catalyst in the age of inference.


Gartner listed FuriosaAI on 2024 Emerging Tech: Adoption Trends for Energy-Efficient Semiconductors. ??

Power consumption is a crucial issue for the AI industry. Simply put, the machines that are running GenAI models today use a truly astonishing amount of energy. And it’s not sustainable or workable for the businesses who have to pay those power bills.?

Read more: https://lnkd.in/dKzNYqDr? https://furiosa.ai/blog/gartner-report-mentions-furiosaai-as-sample-vendor-highlights-need-for-power-efficient-ai-chips?


RNGD Roadshow continued

It’s not quite the Eras Tour but this RNGD sample has been racking up the frequent flier miles since its first reveal at Hot Chips this year.?

Over the past couple months, it’s made stops in Buenos Aires, London, Silicon Valley, Vegas, Silicon Valley again, Singapore, Seoul, Dubai and Abu Dhabi.?

Most recently RNGD was at AI Tokyo Expo 2024, showcasing a live demo running Llama 8B and 70B. We also showed our Gen 1 Vision NPU running 25 channels of object detection and pose estimation on a single card.


Not only that, RNGD-Max sneak peak

"What's your direct comparison to the H100?" We often got this question from engineers and AI leaders interested in new inference solutions.

Our answer: RNGD-Max.?

It has two RNGD chips on a single card, unlocking 1 petaFLOPS of FP8 compute with unparalleled power efficiency.

If you didn't know, our RNGD card is only the first of the RNGD Series, and we're excited to introduce the full lineup in 2025.

Our CEO shared the first sneak peek of the RNGD-Max prototype that arrived at Seoul HQ last week. The team jumped right into testing!


Read our latest technical blogs

Furiosa’s chip architecture actually innovative? Or just a fancy systolic array?



We could just tell you that our #TCP (Tensor Contraction Processor) architecture is “groundbreaking.” OR we could show you exactly how it works and let you decide.

We chose the latter.

Here's TCP explained in painfully technical detail by our two senior HW architects. Let us know what you decide. ??


Implementing HBM3 in RNGD: Why it’s tricky, why it’s important, how we did it

In this blog post, our CTO and HW engineering leads candidly discuss how we overcame some tricky challenges to bring High Bandwidth Memory 3 (HBM3) to our second-gen chip.

TL;DR HBM3 is an important part of what makes RNGD a great solution for inference with multimodal models and LLMs.

But it was also a pain in the ASIC to implement. ??




Until next time, keep your servers cool. ??


要查看或添加评论,请登录

FuriosaAI的更多文章

社区洞察

其他会员也浏览了