Love the phrase you coined, #furiousprogress... keep it going Team FuriosaAI! ?? ?? ??
?? Chip optimization is a continuous journey, and we want to bring you along. Just 3 months after RNGD's raw silicon arrived, we presented GPT-J results at Hot Chips in August. RNGD can now deliver 3,200–3,300 tokens per second on a single chip running the LLaMA 3.1-8B model, consuming just 159W at the chip level (181W at the board). ? While we are still far from our finish line, this milestone is solid proof of our mission to make AI computing sustainable. Onward! Note: We utilized MLPerf frameworks for testing. #furiousprogress #llm