Google's new Gemini model - on moats and the future of AI
Gianluca Mauro
AI entrepreneur, public speaker, and troublemaker | Follow me for hot takes on the world of AI ??
Before we get started, AI Academy’s January edition of the Master in Prompt Engineering is almost sold out. If you’re considering joining you should do it now (click here to read more).
Google yesterday released Gemini, their new AI model. And it’s the most powerful AI technology we’ve ever seen, finally de-throning GPT4.
In this newsletter, I want to explain a bit what it is, what it can do, and share some reflections on the future of AI and competition.
What is Gemini
First of all, Gemini is not a single model, but a family of models:
The most important attribute of the Gemini models is that they’re developed for multimodality from the ground up. “Multimodality” means being able to process and output not just text, but also images, audio, and video.
Basically, Gemini is not an improvement to Google’s old models - Google went back to the drawing board and re-thought completely how to use all these heterogeneous datasets (this will be important in the analysis piece later).
What can Gemini do?
Google mostly demoed the Ultra model, so let’s talk about that.
First of all, it’s “smarter” than GPT-4. How do we define “intelligence” though? Today AI companies use a benchmark called MMLU, it’s basically a giant list of questions on topics ranging from politics to law, biology, math, anything you can think of. Researchers ask these questions to AI models and they measure their accuracy.
GPT4 has a MMLU accuracy of 86.4%. Gemini Ultra 90.0%.
But the most exciting thing is the new capabilities unlocked by this super-advanced multimodality. There are quite a lot of videos in the Google blogpost, so here I’ll report the ones that were the most surprising to me.
In the GIF above Gemini is shown two yarns of different colors and it’s asked (via voice) to generate some ideas on what to do with it. Notice how Gemini can generate images of realistic, fun ideas that you could actually knit using the yarn in the picture. This is a seamless experience going from images + voice → text + image, showing also pretty good understanding of context and creativity while staying grounded to the input data.
领英推荐
In this other GIF, you can see Gemini “reasoning” about which car would be the faster, starting from two car sketches on post-its. Gemini correctly answers that the one to the right would be more aerodynamic so that’d be the faster one. This is pretty mind-blowing because it seems to prove it has some real “understanding” of the two items in the image, and it can link these two to an internal model of physics.
The last example: the user hides a ball of paper under a cup and mixes it with the other two. Gemini correctly identifies where the ball is. This was the most shocking example to me because it shows an important element that was not present before in other AI models: time. To be able to complete this task, understanding the content of an image isn’t enough. Gemini had to “watch” each frame of the video and link what it knew about a frame to the next one and extrapolate some information based on this sequence. Wild.
On moats and the future of AI
A company's moat refers to its ability to maintain the competitive advantages that are expected to help it fend off competition and maintain profitability in the future. A moat can be anything hard to replicate for competitors: data, network effects, tech, etc.
Since GPT4 was introduced and it was the most powerful model out there for a while, people started wondering whether anyone could catch up with OpenAI and whether their technology was big enough as a moat to crown them as the winners in the AI race.
Then there was a piece of news that not many people cared about: OpenAI started a data partnership initiative to collect more high-quality data from partners.
Did that mean they ran out of data to build GPT-5?
Potentially, but now Gemini is showing that Google may never have that problem. Gemini is a testament to the value of the wildly heterogeneous amount of data Google has collected over the years and on its value as a moat. Think about video data from Youtube, audio data recorded through Android, and all the partnerships for Google Books, news, etc. etc. If “more data” is the solution to more powerful AI models, Google has just flexed its muscles and showed everyone who would win in that race. (By the way, I’m not sure that “more data” is the answer, but it’s paying off for now so let’s roll with that assumption).
What does that mean for us? I want to take my point of view as an entrepreneur and reflect on how that can impact you too.
As you know, I’m building a generative AI product that’s powered by GPT models for now. As soon as I get access to Gemini, I could run a test and check whether my product's performance improves (this would take between a few hours and a few days, depending on the complexity of my product).
Let’s assume now that Gemini does improve the performance of my system. What do I need to do to completely cut off ties with OpenAI and power my entire company with Google’s technology? I’d probably have to change one single line of code.
Switching from one model to another is incredibly easy. This means that:
So that’s it, we have a new king in the AI race. Just a regular Thursday in this crazy world of AI.
If you’ve read this far you must be really keen to take part in this crazy AI revolution. I think you’d love the Master in Prompt Engineering, I hope I’ll see you in class in the new year.
Combines architecture, security, and innovation in business and IT
11 个月Wow, this reaches the level that proves that everybody with a knowledge job in the end will be replaceable by AI.
AI/LLM Disruptive Leader | GenAI Tech Lab
11 个月Possibly the first webinar about Google Gemini, for those interested: https://mltblog.com/3R8cMMx
Prof. adj. of AI applied to Language (Univ. Roma Tre) - AI Trainer & Consultant - Podcaster - Of Counsel at Area Legale?srl - AI "Hype Free" expert - Helping people understand and develop a Responsible AI Business.
11 个月Yes, but the difference is that electricity and fire were far more understood before they were commercialized ;)
Managing Partner @ Chasers | Marketing Technology & Innovation
11 个月Gianluca, looking at Google's developers docs it seems the demo is misrepresenting Gemini's capability... ?? I hope we are wrong. But not feeling happy about this. What do you think about it? https://www.dhirubhai.net/feed/update/urn:li:activity:7138620863238524928
IT Practitioner with a fervent passion for social impact | ICT | ??Diana Award-winning Start-up Board Adviser | Graphic Designer | STEM Ambassador | EDI | EdTech | Drummer | Information Technology Specialist
11 个月Sei un grande Gianluca, apprezzo sempre i tuoi breakdowns.