登录查看更多内容

Breaking down Gemma, Google’s new open-source AI model

Fast Company

Inspiring the future of business.

发布日期: 2024年2月21日

Welcome to AI Decoded, Fast Company’s weekly LinkedIn newsletter that breaks down the most important news in the world of AI. I’m Mark Sullivan , a senior writer at Fast Company, covering emerging tech, AI, and tech policy.

This week, I’m focusing on why Google is releasing its new Gemma models as open-source. I also look at the implications and limitations of OpenAI’s new Sora video generator, as well as how Google Gemini’s huge context window might be used in practice.?

Sign up to receive this newsletter every week via email here or through LinkedIn here . And if you have comments on this issue and/or ideas for future ones, drop me a line at [email protected] , and follow me on X (formerly Twitter) @thesullivan .

Google revives its open-source game with Gemma models

Google announced today a set of new large language models, collectively called “Gemma,” and a return to the practice of releasing new research into the open-source ecosystem. The new models were developed by Google DeepMind and other teams within the company that already brought us the state-of-the-art Gemini models.?

The Gemma models come in two sizes: one that is comprised of a neural network with 2 billion adjustable variables (called parameters) and one with a neural network with 7 billion parameters. Both sizes are significantly smaller than the largest Gemini model, “Ultra,” which is said to be well beyond a trillion parameters, and more in line with the 1.8B- and 3.25B-parameter Gemini Nano models. While the Gemini Ultra is capable of handling large or nuanced requests, it requires data centers full of expensive servers.?

The Gemma models, meanwhile, are small enough to run on a laptop or desktop workstation. Or they can run in the Google cloud, for a price. (Google says its researchers optimized the Gemma models to run on Nvidia GPUs and Google Cloud TPUs.)

The Gemma models will be released to developers on Hugging Face, accompanied by the model weights that resulted from pretraining. Google will also include the inference code and the code for fine-tuning the models. It is not supplying the data or code used during pretraining. Both Gemma sizes are released in two variants—one that’s been pretrained and the other that’s already been fine-tuned with pairs of questions and corresponding answers.?

But why is Google releasing open models in a climate where state-of-the-art LLMs are hidden away as proprietary? In short, it means that Google is acknowledging that a great many developers, large and small, don’t just build their apps atop a third-party LLM (such as Google’s Gemini or OpenAI’s GPT-4), but that they access via a paid API, but also use free and open-source models at certain times and for certain tasks.

The company may rather see non-API developers build with a Google model than move their app to Meta’s Llama or some other open-source model. That developer would remain in Google’s ecosystem and might be more likely to host their models in Google Cloud, for example. For the same reasons, Google built Gemma to work on a variety of common development platforms.

There’s of course a risk that bad actors will use open-source generative AI models to do harm. Google DeepMind director Tris Warkentin said during a call with media on Tuesday that Google researchers tried to simulate all the nasty ways that bad actors might try to use Gemma, then used extensive fine-tuning and reinforcement-learning to keep the model from doing those things.

OpenAI’s Sora image generator still has a way to go

Remember that scene in The Fly when the scientist Seth (played by Jeff Goldblum) tries to teleport a piece of steak from one pod to another but fails? “It tastes synthetic,” says science journalist Ronnie (Geena Davis). “The computer is rethinking it rather than reproducing it, and something’s getting lost in the translation,” Seth concludes. I was reminded of that scene, and that problem, last week when I was getting over my initial open-mouthed reaction to videos created by OpenAI’s new Sora tool .?

Sora uses a hybrid architecture that leverages the accuracy of diffusion models with the scalability of transformer models (meaning that the more computing power you give the model, the better the results). The resultant videos seem more realistic and visually pleasing than those created by the text-to-video generator from Runway, which has been the leader in that space.?

领英推荐

This AI newsletter is all you need #91

Towards AI 8 个月前

Inflection point for Open-Source LLMs Reached in…

Michael Spencer 7 个月前

OpenAI, Q* and the anxiety of AI hype

VentureBeat 12 个月前

But as I looked a bit closer at some of the Sora videos, the cracks began to show. The shapes and movements of things are no longer ridiculously, nightmarishly, wrong, but they’re still not quite right—enough so to break the spell. Objects in videos often move in unnatural ways. The generation of human hands remains a challenge in some cases. For all its flash appeal, Sora still has one foot in the Uncanny Valley .?

The model still seems to lack a real understanding of the laws of physics that govern the play of light over objects and surfaces, the fineries of facial expressions, the textures of things. That’s why text-to-video AI still isn’t ready to start putting thousands of actors out of work. However, it’s hard to argue that Sora couldn’t be useful for producing “just in time” or “just good enough” videos, such as for short-run ads for social media.

OpenAI has been able to rapidly improve the capabilities of its large language models by increasing their size, the amount of data they train on, and the amount of compute power they use. A unique quality of the transformer architecture that underpins GPT-4 is that it scales up in predictable and (surprisingly) productive ways. Sora is built on the same transformer architecture. We may see the same rapid improvements in Sora that we’ve seen in the GPT language models in just a few years.

Developers are doing crazy things with Google’s Gemini 1.5 Pro

Google announced last week that a new version of its Gemini LLM called Gemini 1.5 Pro offers a one-million-token (words or word parts) context window. This is far larger than the previous industry leader, Anthropic’s Claude 2, which offered a 200,000-token window. You can tell Gemini 1.5 Pro to digest an hour of video, or 11 hours of audio, or 30,000 lines of computer code, or 700,000 words.?

In the past, the “context window size” metric has been somewhat overplayed because, regardless of the prompt’s capacity for data, there’s no guarantee the LLM will be able to make sense of it all. As one developer told me, LLMs can become overwhelmed by large amounts of prompt data and start spitting out gibberish. This doesn’t seem to be the case with Gemini 1.5 Pro, however. Here are some of the things developers have been doing with the model and its context window:

A developer uploaded an hour-long video and asked Gemini 1.5 Pro to answer detailed questions about the content of the video. They then asked the model to write a detailed outline of all slides shown in the video.?
A developer instructed the LLM to read through every department in a company’s year-end reports and analyze overlapping goals or identify ways for departments to work together.?

A developer input half a million lines of computer code and asked the model to answer specific questions about code that were discussed in only one place (i.e., the “needle in the haystack” problem).?

A developer fed the model the entire text of The Great Gatsby, inserted a mention of a laser-lawnmower and an “iPhone in a box,” then asked the model if it “saw anything weird.” Gemini found both additions and explained why they sounded out of place. It even seized on the (real) mention in the book of a business called “Swastika Holding Company,” calling it “historically inaccurate” and “jarring.”?

More AI coverage from Fast Company:

From around the web:?

NYT plans to debut new generative AI ad tool later this year (Axios)
Chinese startup Moonshot AI raises more than $1 billion (The Information)
Bioptimus raises $35 million to develop AI models for biology (TechCrunch)
OpenAI Completes Deal That Values the Company at $80 Billion (The New York Times)

AI Decoded

217,101 位关注者

Frederik Dhooge

?? COO at Northera: Your Design Powerhouse ??| ?? AI Consultant

9 个月

Interesting read, thanks for sharing

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

9 个月

Absolutely, Gemma's open-source approach brings a refreshing shift in releasing new research. It fosters collaboration and innovation within the community. Have you noticed any specific trends or impacts resulting from this return to open-source practices? I'm curious about your thoughts on the potential benefits or challenges it might pose in the broader landscape of AI development.

Rosemary Hood

Rosemary Hood DVM Emerita

9 个月

Inference code - define parameters, decisions, data source etc

查看更多评论

要查看或添加评论，请登录

Breaking down Gemma, Google’s new open-source AI model

Fast Company

Inspiring the future of business.

Google revives its open-source game with Gemma models

OpenAI’s Sora image generator still has a way to go

领英推荐

Developers are doing crazy things with Google’s Gemini 1.5 Pro

More AI coverage from Fast Company:

From around the web:?

AI Decoded

217,101 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

?? Nvidia Releases Open-Source AI, Competes with OpenAI

AI News Roundup: Dec'23 Highlights

LLM Pulse - Nov 1, 2024

Liberating AI: The End of Surveillance

NVLM: Unpacking Nvidia's Bold Move in the Open Source AI Race

AI/ML news summary: week 31

(How-to) Smaller, Faster, Cheaper. The Rise of Mixture of Experts & LLAMA2 on Microsoft Azure

Go Small to Go Big: Small Wins Prove AI Isn’t a Bubble

Microsoft Builds Supercomputer for AI

Who Are The True Heroes of the AI Revolution?

Google revives its open-source game with Gemma models

OpenAI’s Sora image generator still has a way to go

领英推荐

Developers are doing crazy things with Google’s Gemini 1.5 Pro

More AI coverage from Fast Company:

From around the web:?

AI Decoded

217,101 位关注者

More people are applying for business school—but not for the reasons you think

2024年11月25日

Why are we still talking about return to office?

2024年11月24日

How sleep tourism became a booming business for hotels

2024年11月23日

AI is the latest tool in the cybersecurity cat-and-mouse game

2024年11月22日

Elon Musk believes remote work is a 'COVID-era privilege'

2024年11月22日

Why we need to shift from a scarcity mindset to abundance now, more than ever

2024年11月21日

Don’t sign your severance agreement until you negotiate these 3 points

2024年11月20日

The 2024 Next Big Things in Tech

2024年11月19日

How to learn to speak more kindly to yourself

2024年11月18日

This is how to be an empathic leader during stressful times

2024年11月17日

社区洞察

其他会员也浏览了

?? Nvidia Releases Open-Source AI, Competes with OpenAI

AI News Roundup: Dec'23 Highlights

LLM Pulse - Nov 1, 2024

Liberating AI: The End of Surveillance

NVLM: Unpacking Nvidia's Bold Move in the Open Source AI Race

AI/ML news summary: week 31

(How-to) Smaller, Faster, Cheaper. The Rise of Mixture of Experts & LLAMA2 on Microsoft Azure

Go Small to Go Big: Small Wins Prove AI Isn’t a Bubble

Microsoft Builds Supercomputer for AI

Who Are The True Heroes of the AI Revolution?