GenAI Weekly — Edition 24
Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs
Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.
Zuckerberg says Meta will need 10x more computing power to train Llama 4 than Llama 3
Meta, which develops one of the biggest foundational open source large language models, Llama, believes it will need significantly more computing power to train models in the future.
Mark Zuckerberg said on Meta’s second-quarter earnings call on Tuesday that to train Llama 4, the company will need 10x more compute than what was needed to train Llama 3. But he still wants Meta to build capacity to train models rather than fall behind its competitors.
“The amount of computing needed to train Llama 4 will likely be almost 10 times more than what we used to train Llama 3, and future models will continue to grow beyond that,” Zuckerberg said.
“It’s hard to predict how this will trend multiple generations out into the future. But at this point, I’d rather risk building capacity before it is needed rather than too late, given the long lead times for spinning up new inference projects.”
Meta released Llama 3 with 8 billion parameters in April . The company last week released an upgraded version of the model, called?Llama 3.1 405B, which had 405 billion parameters , making it Meta’s biggest open source model.
Meta’s CFO, Susan Li, also said the company is thinking about different data center projects and building capacity to train future AI models. She said Meta expects this investment to increase capital expenditures in 2025.
Training large language models can be a costly business. Meta’s capital expenditures rose nearly 33% to $8.5 billion in Q2 2024, from $6.4 billion a year earlier, driven by investments in servers, data centers and network infrastructure.
According to a report from The Information , OpenAI spends $3 billion on training models and an additional $4 billion on renting servers at a discount rate from Microsoft.
My take on this: As time goes by, the theory of open source as the future direction of LLMs looks more and more viable—unless GPT-5 turns out exponentially more capable than the most powerful open source models available today. As always, it’s important to note that by “open source”, most model developers just mean open weights.
Character.AI CEO Noam Shazeer returns to Google
In a big move, Character.AI co-founder and CEO Noam Shazeer is returning to Google after leaving the company in October 2021 to found the a16z-backed chatbot startup. In his previous stint, Shazeer spearheaded the team of researchers that built?LaMDA ?(Language Model for Dialogue Applications), a language model that?was used for conversational AI tools .
In a big move, Character.AI co-founder and CEO Noam Shazeer is returning to Google after leaving the company in October 2021 to found the a16z-backed chatbot startup. In his previous stint, Shazeer spearheaded the team of researchers that built?LaMDA ?(Language Model for Dialogue Applications), a language model that?was used for conversational AI tools .
Character.AI co-founder Daniel De Freitas is also joining Google with some other employees from the startup. Dominic Perella, Character.AI ’s general counsel, is becoming an interim CEO at the startup. The company noted that most of the staff is staying at Character.AI . Google is also signing a non-exclusive agreement with Character.AI to use its tech.
The reason is the kicker, giving readers an insight into how the business of foundational models works:
Character.AI has raised over $150 million in funding, largely from a16z.
“When Noam and Daniel started?Character.AI , our goal of personalized superintelligence required a full stack approach. We had to pre-train models, post-train them to power the experiences that make?Character.AI ?special, and build a product platform with the ability to reach users globally,” Character AI mentioned in its blog announcing the move.
“Over the past two years, however, the landscape has shifted; many more pre-trained models are now available. Given these changes, we see an advantage in making greater use of third-party LLMs alongside our own. This allows us to devote even more resources to post-training and creating new product experiences for our growing user base.”
An Open Course on LLMs, Led by Practitioners
领英推荐
Today, we are releasing Mastering LLMs , a set of workshops and talks from practitioners on topics like evals, retrieval-augmented-generation (RAG), fine-tuning and more.
This course is unique because it is:
We have organized and annotated the talks from our popular paid course.1 This is a survey course for technical ICs (including engineers and data scientists) who have some experience with LLMs and need guidance on how to improve AI products.
My take on this: May those who teach others be blessed by the universe.
SAM 2: The next generation of Meta Segment Anything Model for videos and images
What AI is best at: reducing manual work. This must be a blessing for video editing.
Black Forest Labs announces Flux text-to-image models
Flux, the largest SOTA open source text-to-image model to date, developed by Black Forest Labs —the original team behind Stable Diffusion is now available on fal. Flux pushes the boundaries of creativity and performance with an impressive 12B parameters, delivering aesthetics reminiscent of Midjourney.
To play around with the model now, check out the demo page here on fall.
My take on this: Looks like a Midjourney-quality model just became open source.
Stability AI announces Stable Fast 3D: Rapid 3D Asset Generation From Single Images
Sounds like fun!
If you've made it this far and follow my newsletter, please consider exploring the platform we're currently building: Unstract —a no-code LLM platform that automates unstructured data workflows.
For the extra curious
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
3 个月Zuckerberg's statement about Llama 4's computational demands echoes Moore's Law, where processing power doubles roughly every two years. This exponential growth in AI training necessitates a parallel evolution in hardware infrastructure. The Open Course on LLMs is timely, as democratizing access to knowledge is crucial for responsible development. Meta's SAM 2 advancements are intriguing, particularly its expanded capabilities beyond static images. Given the increasing complexity of generative models, how will we ensure robust interpretability and mitigate potential biases embedded within these vast datasets?