? Google's Gemini tells us a lot about the AI race
Google DeepMind

? Google's Gemini tells us a lot about the AI race

Understand AI with my weekly updates


The release of ChatGPT set off the starting gun no one expected. As the NYT reports, when the Big Tech firms saw the reception of ChatGPT, they immediately pivoted to developing their own AI products, with minimal care for risks. Meta open-sourced Llama-2, Microsoft added GPT to their products and Google rush-released Bard. It seems like a blind sprint in a race no one fully understood.

Last week, the next leg was revealed with the announcement of Google’s Gemini models. They are multi-modal from the ground up, meaning it can reason and understand across modalities, such as text, image, audio and video. For an impressive example of this, see here how Gemini reads, understands and filters 200,000 scientific papers over a lunch break. The largest of the models, Gemini Ultra, looks to be the best model on the market, finally beating GPT-4. It has achieved state-of-the-art (SOTA) results on 30 of the 32 most common research benchmarks, the first model ever to outperform humans on the well-known MMLU benchmark.

But this doesn’t tell the full story. If there is anything telling about how much of an AI race there is, it’s in the marketing tricks that Google used to make its model better than it seems.?

Firstly, their MMLU SOTA score lacks nuance. It beats GPT-4 using Chain-of-Thought@32 prompting, a method that has the AI generate 32 different reasoning paths for a single question, considering various angles and possibilities, before choosing the most consistent or convincing answer. While this can lead to more nuanced and considered responses, it’s a process that’s more complex and less commonly employed for quick, everyday queries where users prioritise immediate and concise answers. On the other hand, GPT-4 beats Gemini Ultra using the 5-shot method, which involves presenting the AI with five examples of a task — complete with questions and the correct answers — to help it understand what’s expected before posing a new, similar question. This approach is likely closer to how users might naturally give context to help guide the AI towards the kind of response we’re seeking. This highlights some of the limitations of benchmarks, as we have previously covered in Chartpack: (Mis)measuring AI. The performance of LLMs is often inadequately represented in benchmarks due to their wide-ranging use cases, leading to abstraction from the actual qualia of using these models. We will have to wait for Gemini Ultra’s release to properly judge it.

Their second marketing trick was in their announcement video. They presented a video demo, which made Gemini look like some miraculous real-time assistant. However, it turns out the model response time was sped up, it was not done in real-time, and there was some complex prompting done on the backend. I can’t help but think this is a sign that Google feels threatened, and needs to create a better appearance than reality.?

This contrasts with OpenAI’s “low key research preview” that was ChatGPT’s release. In fairness, the game has changed since then. One thing is for certain, with the competition hotting up no tech firm can afford to sit still, whether or not we see models much better than Gemini Ultra or GPT-4 in the next few months.

See also my commentary from earlier this year:


Julieta Bustos Colín

Supervisor de RR. HH. en F &P MFG DE MEXICO SA DE CV

11 个月

@rer @eeeeerreeeeeeee9e99ee

回复
Mykyta Basanko

COO at Incode Group // Business Advisor at MLPCo

11 个月

Great read!

回复
Amy Zeitz Bailey

Realtor in Louisville Kentucky ? Finding the Land to Build Dream Homes ? Residential, Farms, Land, Acreage

11 个月

Comparing Horse and first model car analogy is perfect in insufferable resistance to intelligent progress. Suggest going to Henry Ford museum near Detroit the best North American museum regarding innovations. Lots of fabulous cars including 100 year old electric cars, but also brought together a group of buildings like Edison and Wright brothers garages. Ride around in a real Model T and kids can build a model t in a day.

Anuj Rastogi

Vice President, Client Partnerships at Massive Insights, Founder & Host at AwokenWord Podcast

11 个月

Your opening point Azeem Azhar is super important - “…. when the Big Tech firms saw the reception of ChatGPT, they immediately pivoted to developing their own AI products, with minimal care for risks.” We have to be mindful of the risks of this race, and not just cast them to the side to be dealt with later. That’s a central point Martin Ryan made throughout our conversation on AI. https://www.dhirubhai.net/posts/anuj-rastogi-22a0ab2_samaltman-chatgpt-podcast-activity-7133521290954473473-ObVv?utm_source=share&utm_medium=member_ios

Pedro Rocha

Artificial Intelligence ? Futures Studies ? Sustainability & Impact ? Ecosystem Builder

11 个月

Perfect! Let's all remember about the promise of world changing Google Duplex, in 2018, that is still not a reality.

要查看或添加评论,请登录

Azeem Azhar的更多文章

  • ?? What surprised me most after 500 editions of Exponential View

    ?? What surprised me most after 500 editions of Exponential View

    Artwork by Moebius After nine years of writing Exponential View and 500 Sunday editions at technology’s frontier, I’ve…

    4 条评论
  • ?? Ten charts to understand the Exponential Age

    ?? Ten charts to understand the Exponential Age

    This week marks the 500th edition of the Sunday newsletter. My aim all along has been to show that we live in…

    10 条评论
  • ?? The chip advantage

    ?? The chip advantage

    This is an excerpt from my weekly newsletter, Exponential View. All new paying subscribers to Exponential View get 1…

    3 条评论
  • ?? My first, magical Waymo ride

    ?? My first, magical Waymo ride

    After changing my view of self-driving cars by using my head and thinking through the data, I can confirm that my heart…

    3 条评论
  • ?? What would you do with an abundance of computing power?

    ?? What would you do with an abundance of computing power?

    What would you do with 1000x more computing power? How would your organisation use it? If you were to ask these…

    7 条评论
  • ?? Will genAI cause a compute crunch?

    ?? Will genAI cause a compute crunch?

    Last year, Google reached a milestone where its spending on compute exceeded its spending on people. This is a…

    4 条评论
  • ?? The foundations of future AI

    ?? The foundations of future AI

    ChatGPT, Claude and other language models have dominated mainstream discussions and use. It’s not surprising: they’re…

    4 条评论
  • ?? AI, energy & industry round-up for September

    ?? AI, energy & industry round-up for September

    Welcome to my September recap on AI, climate and energy transition, industry and economic trends. This summarises the…

    5 条评论
  • Fastest tech in history

    Fastest tech in history

    ?? THANK YOU for reading Exponential View. If you upgrade your membership today, you’ll get 1 year of FREE access to…

    8 条评论
  • ?? What is going on at OpenAI?

    ?? What is going on at OpenAI?

    This was originally published earlier today in my newsletter Exponential View. If you become a paying member of…

    7 条评论

社区洞察

其他会员也浏览了