The Gemini family tree
Last month at Google I/O, we introduced Gemini 1.5 Flash, the latest model in the growing Gemini family. We asked
Hamidou Dia
, vice president for applied engineering at Google Cloud, to explain a bit about all the different models that now belong
The Gemini family is a big one, and it just keeps growing. And like any family, each member has its own strengths and personalities. Gemini 1.5 Flash is the newest of the bunch, and one of our most capable offerings yet. What’s so special about Flash and all its relatives? What makes each of them — Gemini 1.5 Pro, Gemini 1.0 Nano, Gemini 1.0 Pro, and Gemini 1.0 Ultra, as well as their cousin Gemma, the open model — different??
Or, what you’re really wondering: Which of them is right for your business or specific applications?
Rarely are any two AI use cases the same, and those use cases keep growing in number and maturity each day. It takes a wide range of models to satisfy these different needs
One of the most important considerations across
Want to try Gemini 1.5 Pro for yourself? Check it out now in the Google Cloud console.
A token is fundamentally the smallest segment that a piece of data can be broken down into for use in a particular model. This could be thought of as a letter or character, but depending on the configuration of both the model and the data, these tokens could be as large as a word or phrase. The larger the context window, the more a model can process and compare information without “forgetting” what has already been processed or prompted.?
If your context window only covers a few thousand tokens, maybe the model could understand a single whitepaper or a few emails. When it gets into the millions, that’s enough processing power to understand and analyze entire books or movies or, more practically for the enterprise, entire codebases, large financial datasets and research reports, or hours of footage from a manufacturing floor and a shelf’s worth of production manuals.
That’s where things really get interesting, when you start to combine some of these materials. The other important aspect of Gemini is that all the models are natively multimodal. Previous generations of models could maybe identify an image or video while also deciphering text or code, but that was basically shuttling the information between a set of sub-models. Gemini was developed from the start to handle a range of information types, just as a person normally would.
This means less latency and energy usage and better results for queries involving multiple sources and types of information. A manufacturing company, for example, could upload those manuals and potentially use them to spot dangers or inefficiencies in the factory footage by seamlessly cross-referencing the two. Or an investment firm could upload an investor call, regulatory filings, and references to social media and combine them for investment insights.
领英推荐
This is where the family of models becomes so important. For the most lightweight application
It’s a big family, ready to get to work.
Speaking of the capabilities of our models, underlying infrastructure, and enterprise tooling in Vertex AI Platform, we’re excited to share that Google was named a Leader in The Forrester Wave?: AI Foundation Models for Language, Q2 2024. Google received the highest scores of all vendors evaluated in the Current Offering and Strategy categories, with Forrester noting:
“Gemini is uniquely differentiated in the market especially in multimodality and context length while also ensuring interconnectivity with the broader ecosystem of complementary cloud services.”?
You can read more in our blog or download a complimentary copy of the full report.
Senior Software Development Engineer
6 个月My work preference easy simply with ai.
Senior IT Business Analyst | Project Manager - (PMP | CSM | CSPO | CISM)
9 个月Google Cloud's latest AI model, Gemini 1.5 Flash, is now part of the expanding Gemini family. Each model, from Gemini 1.0 Nano to Gemini 1.5 Pro, offers unique strengths for diverse AI applications. The standout feature of these models is their long context window, now reaching up to 2 million tokens, allowing them to process and recall vast amounts of data efficiently. Additionally, their native multimodal capabilities ensure seamless integration of various data types for superior performance.?
Vice President Customer Experience | Martech | Marketing Science | Data & Analytics | Consulting | Sales | Digital Transformation | Digital Strategy & Consulting | Entrepreneur | AI & ML | Fintech & Startup Advisory
9 个月amazing contratulations
Strategic Enterprise Territory Executive at Google
9 个月Great explanation of the Gemini Family!
We do not all have Business Addresses to fill the mandatory Field to request a r download a complimentary copy of the full report.