? Google's Gemini tells us a lot about the AI race
The release of ChatGPT set off the starting gun no one expected. As the NYT reports, when the Big Tech firms saw the reception of ChatGPT, they immediately pivoted to developing their own AI products, with minimal care for risks. Meta open-sourced Llama-2, Microsoft added GPT to their products and Google rush-released Bard. It seems like a blind sprint in a race no one fully understood.
Last week, the next leg was revealed with the announcement of Google’s Gemini models. They are multi-modal from the ground up, meaning it can reason and understand across modalities, such as text, image, audio and video. For an impressive example of this, see here how Gemini reads, understands and filters 200,000 scientific papers over a lunch break. The largest of the models, Gemini Ultra, looks to be the best model on the market, finally beating GPT-4. It has achieved state-of-the-art (SOTA) results on 30 of the 32 most common research benchmarks, the first model ever to outperform humans on the well-known MMLU benchmark.
But this doesn’t tell the full story. If there is anything telling about how much of an AI race there is, it’s in the marketing tricks that Google used to make its model better than it seems.?
领英推荐
Firstly, their MMLU SOTA score lacks nuance. It beats GPT-4 using Chain-of-Thought@32 prompting, a method that has the AI generate 32 different reasoning paths for a single question, considering various angles and possibilities, before choosing the most consistent or convincing answer. While this can lead to more nuanced and considered responses, it’s a process that’s more complex and less commonly employed for quick, everyday queries where users prioritise immediate and concise answers. On the other hand, GPT-4 beats Gemini Ultra using the 5-shot method, which involves presenting the AI with five examples of a task — complete with questions and the correct answers — to help it understand what’s expected before posing a new, similar question. This approach is likely closer to how users might naturally give context to help guide the AI towards the kind of response we’re seeking. This highlights some of the limitations of benchmarks, as we have previously covered in Chartpack: (Mis)measuring AI. The performance of LLMs is often inadequately represented in benchmarks due to their wide-ranging use cases, leading to abstraction from the actual qualia of using these models. We will have to wait for Gemini Ultra’s release to properly judge it.
Their second marketing trick was in their announcement video. They presented a video demo, which made Gemini look like some miraculous real-time assistant. However, it turns out the model response time was sped up, it was not done in real-time, and there was some complex prompting done on the backend. I can’t help but think this is a sign that Google feels threatened, and needs to create a better appearance than reality.?
This contrasts with OpenAI’s “low key research preview” that was ChatGPT’s release. In fairness, the game has changed since then. One thing is for certain, with the competition hotting up no tech firm can afford to sit still, whether or not we see models much better than Gemini Ultra or GPT-4 in the next few months.
See also my commentary from earlier this year:
Supervisor de RR. HH. en F &P MFG DE MEXICO SA DE CV
11 个月@rer @eeeeerreeeeeeee9e99ee
COO at Incode Group // Business Advisor at MLPCo
11 个月Great read!
Realtor in Louisville Kentucky ? Finding the Land to Build Dream Homes ? Residential, Farms, Land, Acreage
11 个月Comparing Horse and first model car analogy is perfect in insufferable resistance to intelligent progress. Suggest going to Henry Ford museum near Detroit the best North American museum regarding innovations. Lots of fabulous cars including 100 year old electric cars, but also brought together a group of buildings like Edison and Wright brothers garages. Ride around in a real Model T and kids can build a model t in a day.
Vice President, Client Partnerships at Massive Insights, Founder & Host at AwokenWord Podcast
11 个月Your opening point Azeem Azhar is super important - “…. when the Big Tech firms saw the reception of ChatGPT, they immediately pivoted to developing their own AI products, with minimal care for risks.” We have to be mindful of the risks of this race, and not just cast them to the side to be dealt with later. That’s a central point Martin Ryan made throughout our conversation on AI. https://www.dhirubhai.net/posts/anuj-rastogi-22a0ab2_samaltman-chatgpt-podcast-activity-7133521290954473473-ObVv?utm_source=share&utm_medium=member_ios
Artificial Intelligence ? Futures Studies ? Sustainability & Impact ? Ecosystem Builder
11 个月Perfect! Let's all remember about the promise of world changing Google Duplex, in 2018, that is still not a reality.