GenAI Weekly — Edition 24

GenAI Weekly — Edition 24

Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs

Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.

? Click subscribe to be notified of future editions



Zuckerberg says Meta will need 10x more computing power to train Llama 4 than Llama 3

Ivan Mehta writing for Tech Crunch :

Meta, which develops one of the biggest foundational open source large language models, Llama, believes it will need significantly more computing power to train models in the future.

Mark Zuckerberg said on Meta’s second-quarter earnings call on Tuesday that to train Llama 4, the company will need 10x more compute than what was needed to train Llama 3. But he still wants Meta to build capacity to train models rather than fall behind its competitors.

“The amount of computing needed to train Llama 4 will likely be almost 10 times more than what we used to train Llama 3, and future models will continue to grow beyond that,” Zuckerberg said.
“It’s hard to predict how this will trend multiple generations out into the future. But at this point, I’d rather risk building capacity before it is needed rather than too late, given the long lead times for spinning up new inference projects.”

Meta released Llama 3 with 8 billion parameters in April . The company last week released an upgraded version of the model, called?Llama 3.1 405B, which had 405 billion parameters , making it Meta’s biggest open source model.

Meta’s CFO, Susan Li, also said the company is thinking about different data center projects and building capacity to train future AI models. She said Meta expects this investment to increase capital expenditures in 2025.
Training large language models can be a costly business. Meta’s capital expenditures rose nearly 33% to $8.5 billion in Q2 2024, from $6.4 billion a year earlier, driven by investments in servers, data centers and network infrastructure.
According to a report from The Information , OpenAI spends $3 billion on training models and an additional $4 billion on renting servers at a discount rate from Microsoft.

My take on this: As time goes by, the theory of open source as the future direction of LLMs looks more and more viable—unless GPT-5 turns out exponentially more capable than the most powerful open source models available today. As always, it’s important to note that by “open source”, most model developers just mean open weights.


Character.AI CEO Noam Shazeer returns to Google

Iva Mehta writing for Tech Crunch :


In a big move, Character.AI co-founder and CEO Noam Shazeer is returning to Google after leaving the company in October 2021 to found the a16z-backed chatbot startup. In his previous stint, Shazeer spearheaded the team of researchers that built?LaMDA ?(Language Model for Dialogue Applications), a language model that?was used for conversational AI tools .
In a big move, Character.AI co-founder and CEO Noam Shazeer is returning to Google after leaving the company in October 2021 to found the a16z-backed chatbot startup. In his previous stint, Shazeer spearheaded the team of researchers that built?LaMDA ?(Language Model for Dialogue Applications), a language model that?was used for conversational AI tools .
Character.AI co-founder Daniel De Freitas is also joining Google with some other employees from the startup. Dominic Perella, Character.AI ’s general counsel, is becoming an interim CEO at the startup. The company noted that most of the staff is staying at Character.AI . Google is also signing a non-exclusive agreement with Character.AI to use its tech.

The reason is the kicker, giving readers an insight into how the business of foundational models works:

Character.AI has raised over $150 million in funding, largely from a16z.
“When Noam and Daniel started?Character.AI , our goal of personalized superintelligence required a full stack approach. We had to pre-train models, post-train them to power the experiences that make?Character.AI ?special, and build a product platform with the ability to reach users globally,” Character AI mentioned in its blog announcing the move.
“Over the past two years, however, the landscape has shifted; many more pre-trained models are now available. Given these changes, we see an advantage in making greater use of third-party LLMs alongside our own. This allows us to devote even more resources to post-training and creating new product experiences for our growing user base.”

An Open Course on LLMs, Led by Practitioners

From Hamel Husain’s blog :

Today, we are releasing Mastering LLMs , a set of workshops and talks from practitioners on topics like evals, retrieval-augmented-generation (RAG), fine-tuning and more.

This course is unique because it is:

  • Taught by 25+ industry veterans who are experts in information retrieval, machine learning, recommendation systems, MLOps and data science. We discuss how this prior art can be applied to LLMs to give you a meaningful advantage.
  • Focused on applied topics that are relevant to people building AI products.
  • Free and open to everyone .

We have organized and annotated the talks from our popular paid course.1 This is a survey course for technical ICs (including engineers and data scientists) who have some experience with LLMs and need guidance on how to improve AI products.

My take on this: May those who teach others be blessed by the universe.


SAM 2: The next generation of Meta Segment Anything Model for videos and images

From the Meta blog :

  • Following up on the success of the Meta Segment Anything Model (SAM) for images, we’re releasing SAM 2 , a unified model for real-time promptable object segmentation in images and videos that achieves state-of-the-art performance.
  • In keeping with our approach to open science , we’re sharing the code and model weights with a permissive Apache 2.0 license.
  • We’re also sharing the SA-V dataset , which includes approximately 51,000 real-world videos and more than 600,000 masklets (spatio-temporal masks).
  • SAM 2 can segment any object in any video or image—even for objects and visual domains it has not seen previously, enabling a diverse range of use cases without custom adaptation.
  • SAM 2 has many potential real-world applications. For example, the outputs of SAM 2 can be used with a generative video model to create new video effects and unlock new creative applications. SAM 2 could also aid in faster annotation tools for visual data to build better computer vision systems.

What AI is best at: reducing manual work. This must be a blessing for video editing.


Black Forest Labs announces Flux text-to-image models

From their blog :

Flux, the largest SOTA open source text-to-image model to date, developed by Black Forest Labs —the original team behind Stable Diffusion is now available on fal. Flux pushes the boundaries of creativity and performance with an impressive 12B parameters, delivering aesthetics reminiscent of Midjourney.

To play around with the model now, check out the demo page here on fall.


Prompt: Portrait of a bearded man with dark hair wearing red sunglasses and a light gray Patagonia fleece jacket. He has a serious expression and is looking directly at the camera. The background shows a blurred outdoor scene with rocky terrain and a vibrant pink and purple sunset sky. The lighting gives the image a warm, golden-hour glow. The overall mood is rugged yet stylish, with a touch of adventure.

My take on this: Looks like a Midjourney-quality model just became open source.


Stability AI announces Stable Fast 3D: Rapid 3D Asset Generation From Single Images

From StabilityAI :

  • Stable Fast 3D generates high-quality 3D assets from a single image in just 0.5 seconds.
  • Built on the foundation of TripoSR , Stable Fast 3D features significant architectural improvements and enhanced capabilities.
  • The model has applications for game and virtual reality developers, as well as professionals in retail, architecture, design and other graphic-intense professions.
  • The model is available on Hugging Face and is released under Stability AI Community License .
  • Access the model easily on Stability AI API and on Stable Assistant chatbot and share your 3D creations in a 3D viewer and play with them in Augmented Reality. Get started with a free trial.?

Sounds like fun!


If you've made it this far and follow my newsletter, please consider exploring the platform we're currently building: Unstract —a no-code LLM platform that automates unstructured data workflows.



For the extra curious


Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

3 个月

Zuckerberg's statement about Llama 4's computational demands echoes Moore's Law, where processing power doubles roughly every two years. This exponential growth in AI training necessitates a parallel evolution in hardware infrastructure. The Open Course on LLMs is timely, as democratizing access to knowledge is crucial for responsible development. Meta's SAM 2 advancements are intriguing, particularly its expanded capabilities beyond static images. Given the increasing complexity of generative models, how will we ensure robust interpretability and mitigate potential biases embedded within these vast datasets?

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了