The Gemini family tree

The Gemini family tree

Last month at Google I/O, we introduced Gemini 1.5 Flash, the latest model in the growing Gemini family. We asked Hamidou Dia , vice president for applied engineering at Google Cloud, to explain a bit about all the different models that now belong to the Gemini family tree, when and where to use them, and why Gemini stands out from other AIs. (This post originally appeared as part of Google Cloud’s monthly executive insights email newsletter — which you can sign-up for here .)


The Gemini family is a big one, and it just keeps growing. And like any family, each member has its own strengths and personalities. Gemini 1.5 Flash is the newest of the bunch, and one of our most capable offerings yet. What’s so special about Flash and all its relatives? What makes each of them — Gemini 1.5 Pro, Gemini 1.0 Nano, Gemini 1.0 Pro, and Gemini 1.0 Ultra, as well as their cousin Gemma, the open model — different??

Or, what you’re really wondering: Which of them is right for your business or specific applications?

Rarely are any two AI use cases the same, and those use cases keep growing in number and maturity each day. It takes a wide range of models to satisfy these different needs, and that might even include another family of models altogether, like Anthropic’s Claude or the open source Mistral models. This diversity of needs is why Google Cloud has taken a truly open approach since day one for our model offerings and capabilities, highlighted by Vertex AI’s Model Garden and its selection of more than 150 first-party, third-party, and open models.

One of the most important considerations across the latest Gemini models, and what sets them apart from the competition, is their long context window . When we announced Gemini 1.5 Pro in February, it was the first widely available model with not only a context window of 1 million tokens, but also near-perfect recall across large amounts of input data. At I/O, Sundar Pichai, Google’s CEO, revealed that that context would expand to 2 million tokens. He even remarked that this was part of the pathway to “infinite context.”

Want to try Gemini 1.5 Pro for yourself? Check it out now in the Google Cloud console.

A token is fundamentally the smallest segment that a piece of data can be broken down into for use in a particular model. This could be thought of as a letter or character, but depending on the configuration of both the model and the data, these tokens could be as large as a word or phrase. The larger the context window, the more a model can process and compare information without “forgetting” what has already been processed or prompted.?

If your context window only covers a few thousand tokens, maybe the model could understand a single whitepaper or a few emails. When it gets into the millions, that’s enough processing power to understand and analyze entire books or movies or, more practically for the enterprise, entire codebases, large financial datasets and research reports, or hours of footage from a manufacturing floor and a shelf’s worth of production manuals.

That’s where things really get interesting, when you start to combine some of these materials. The other important aspect of Gemini is that all the models are natively multimodal . Previous generations of models could maybe identify an image or video while also deciphering text or code, but that was basically shuttling the information between a set of sub-models. Gemini was developed from the start to handle a range of information types, just as a person normally would.

This means less latency and energy usage and better results for queries involving multiple sources and types of information. A manufacturing company, for example, could upload those manuals and potentially use them to spot dangers or inefficiencies in the factory footage by seamlessly cross-referencing the two. Or an investment firm could upload an investor call, regulatory filings, and references to social media and combine them for investment insights.

This is where the family of models becomes so important. For the most lightweight application on a mobile phone or edge device, there’s Gemini 1.0 Nano. Gemini 1.0 Pro is the mid-weight model with a context window and features optimized for common tasks and scale, while Gemini 1.0 Ultra tackles more complex and demanding tasks. Our Gemini 1.5 models step up with context windows of 1-million+ tokens and native multi-modal reasoning. Gemini 1.5 Flash — which offers our best combination of long context capabilities, advanced analysis, and low latency — will now serve most enterprise applications, though there are some of the most advanced needs that will require the full power of Gemini 1.5 Pro. And for those who need an open model for greater flexibility or access, Gemma, our family of open models, is at the ready.

It’s a big family, ready to get to work.


Speaking of the capabilities of our models, underlying infrastructure, and enterprise tooling in Vertex AI Platform, we’re excited to share that Google was named a Leader in The Forrester Wave?: AI Foundation Models for Language, Q2 2024. Google received the highest scores of all vendors evaluated in the Current Offering and Strategy categories, with Forrester noting:

“Gemini is uniquely differentiated in the market especially in multimodality and context length while also ensuring interconnectivity with the broader ecosystem of complementary cloud services.”?

You can read more in our blog or download a complimentary copy of the full report .


Yusri Kassim

Senior Software Development Engineer

3 个月

My work preference easy simply with ai.

回复
Adam Hiber

PMP | CSM | CISM | Consultant Sr. IT Business Analyst - Project Manager | Cybersecurity | Cloud Computing |

5 个月

Google Cloud's latest AI model, Gemini 1.5 Flash, is now part of the expanding Gemini family. Each model, from Gemini 1.0 Nano to Gemini 1.5 Pro, offers unique strengths for diverse AI applications. The standout feature of these models is their long context window, now reaching up to 2 million tokens, allowing them to process and recall vast amounts of data efficiently. Additionally, their native multimodal capabilities ensure seamless integration of various data types for superior performance.?

回复
Imran Arain

Vice President Customer Experience | Martech | Marketing Science | Data & Analytics | Consulting | Sales | Digital Transformation | Digital Strategy & Consulting | Entrepreneur | AI & ML | Fintech & Startup Advisory

5 个月

amazing contratulations

回复
KIMBERLY NEU

Strategic Enterprise Territory Executive at Google

5 个月

Great explanation of the Gemini Family!

We do not all have Business Addresses to fill the mandatory Field to request a r download a complimentary copy of the full report.

回复

要查看或添加评论,请登录

Google Cloud的更多文章

社区洞察