Review: Google Gemini, Meh
Google Bard Start Screen

Review: Google Gemini, Meh

Yesterday, I started playing with Google Gemini the competitor to OpenAI's GPT-4 that powers ChatGPT. Developed through extensive collaborative efforts across various teams at Google, Gemini stands out as a "natively" multimodal model, capable of understanding and integrating diverse types of information, including text, audio, images, and video.

Google Gemini Versions and Ways to Access

The three versions of Gemini are:

  1. Gemini Ultra: The most advanced version, tailored for highly complex tasks.
  2. Gemini Pro: A model designed for scalability across various tasks.
  3. Gemini Nano: The most efficient version, optimized for on-device tasks. It will come to Google Pixel phones this month.

Google plans to integrate Gemini into its offerings in different ways:

  • Gemini Pro will power Google’s free chatbot, Bard, providing advanced reasoning and understanding capabilities.
  • Gemini Ultra, the most advanced version, will be featured in the Bard Advanced model and is expected to be released in 2024. This iteration will enable more intricate tasks involving images, audio, video, and coding.
  • Gemini Nano is set to be incorporated into Android phones, starting with Google's Pixel 8 Pro, to answer complex queries directly on the device without needing an internet connection.

Gemini Review with Bard

I have started playing with Google Bard, now powered by Gemini, and my initial results weren't bad. I first noticed that Bard is now super fast; complex answers were rendered in a second or less. However, overall it wasn't the Bard I hoped for. Read on.

Contextual Search, Fail

I was excited about the prospect of using Google Bard with Gemini for better search results but suprisingly the results were not good. I asked Google for a list of reviews with summaries. I got a list of reviews with logos but not hyperlinks to the citations like I do in Perplexity.ai.

Image Creation, Nope

OpenAI has GPT-4 and DALL-E 3 as integrated models, but they are separate models integrated. Google Gemini is a single multi-modal model that, as of today, can receive both text and visual inputs, but it has yet to create images.

Fact-Checking, Not Bad

Here's where Google's approach to fact-checking is different than OpenAI's included citations or Perplexity . Instead to get citations you can click the Google G logo at the end of the output. It highlights a section with highlighted text and pop-up windows, so you don't have to leave the page when verifying a response.

Fact-checking hyperlinks to sources contextually

Privacy, Pass

When you log into Google Bard it provides a splash screen with the following statement:

Your conversations are processed by human reviewers to improve the technologies powering Bard. Don’t enter anything you wouldn’t want reviewed or used.

There's a How it Works link that takes you to a page that tells you how to opt-out from training with your data, not unlike what you see from ChatGPT.

Integration with Google Docs

One thing I do like is that the share button allows you to export the output of chat to a Google Doc. My biggest use case for writing is to use ChatGPT and probably now Google Bard to create an outline for articles and presentation.

Summary, Keep on Trying Google

Overall I tried to create content as I have for many months with ChatGPT, I didn't find Bard to do that great of a job. Where it did shine was creativity though and I might use it for ideation of creation of content.

Strengths:

  • Performance Metrics: Gemini has been reported to outperform OpenAI's previous models, including GPT 3.5, in certain benchmarks, particularly the Massive Multitask Language Understanding (MMLU). In this test, Gemini scored 90%, slightly above human experts and GPT-4, which scored 86% on text-based questions. However, it's worth noting that experts have raised questions about the comprehensiveness and transparency of these benchmarks, suggesting that they may not fully represent Gemini's capabilities across diverse applications.

Source: Google

  • Multimodal Capabilities: One of the standout features of Gemini is its multimodal understanding, which allows it to process and integrate different types of information such as text, code, audio, images, and video. This has been designed from the ground up, unlike traditional multimodal models that stitch together separate components for different modalities. This native multimodality potentially enhances its utility in various fields, ranging from science to finance. The demos have been impressive but until the mobile app is available it's hard to verify the cool demo videos that accompanied the Google release.
  • Variants of Gemini: Gemini comes in three versions: Ultra, Pro, and Nano. Ultra is optimized for highly complex tasks, Pro for a wide range of tasks, and Nano for efficient on-device tasks. This segmentation suggests that Google is targeting a diverse range of applications and use cases with the Gemini model.
  • Coding Proficiency: Gemini demonstrated the ability to understand, explain, and generate high-quality code in multiple programming languages, positioning it as a leading model in this domain. This is more anecdotal as I have used ChatGPT to generate a lot more code, but Google's capabilities haven't seemed that bad. I created a simple Chrome extension and did the same in ChatGPT and got similiar results. (this was simple Javascript and HTML coding).

Weaknesses:

  • Content Creation Is Mediocre: Despite being integrated with all of Googles services including search, the content Gemini is creating doesn't seem to be great.
  • No Image Creation: Unlike ChatGPT Plus with integration with DALL-E 3 Google Bard can't create images, that's a big differentiator today.
  • Early stage of development: Still under development and requires further testing and refinement. Interface is not intuitive.
  • Limited public access: Not yet available for wide public use (e.g. Gemini Nano), making its capabilities difficult to assess firsthand.

Final Thought, Meh

Despite its technical achievements, that for the average user, the incremental improvement Gemini offers over the previous PaLM-2 model might not be significantly noticeable. Also it's just not as good as OpenAI's ChatGPT Plus.



Ramesh Reddi

Cyber Security Consultant

1 年

I believe Gemini replacing Palm2. Right?

回复
Christian Reilly

CTO // Technology Strategy // Enterprise & Vendor // Human // Mental Health Advocate

1 年

Why do they need both?

回复
Dean Peters

Product Management Trainer, Consultant, & Mentor | Innovation Coach & AI Tamer | Hakawati (??????)

1 年

I asked Bard to help me with a curl command to a RapidAPI movie database. When the command they suggested failed, I posted the error message. It said it was an LLMs and as such, wasn't equipped to answer such a question. So Meh for sure!

要查看或添加评论,请登录

Mark Hinkle的更多文章

  • MCP, The USB-C of AI

    MCP, The USB-C of AI

    How the Model Context Protocol is Creating a Universal Standard for Enterprise AI Integration Artificial intelligence…

    4 条评论
  • Using the ChatGPT Mobile App to Fix Anything

    Using the ChatGPT Mobile App to Fix Anything

    ChatGPT’s mobile app is a powerful tool for troubleshooting, problem-solving, and quick fixes Last summer, my family…

    4 条评论
  • AI is About People

    AI is About People

    With artificial intelligence, we need to focus on the people as much as we do the technology When I got into AI one of…

    91 条评论
  • Creating Killer Presentations with ChatGPT

    Creating Killer Presentations with ChatGPT

    Save time, improve clarity, and create impactful slides with AI Creating Presentations with ChatGPT Save time, improve…

    6 条评论
  • Who Will Win the LLM Wars

    Who Will Win the LLM Wars

    Hint: The Future of AI Won’t Belong to OpenAI, DeepSeek, or even Google The Age of LLM Routing: Right Model, Right Task…

    2 条评论
  • ChatGPT for Conference Survival

    ChatGPT for Conference Survival

    ChatGPT for capturing, organizing, and summarizing key insights from sessions, talks, and networking chats Has this…

    3 条评论
  • Is DeepSeek the New Open Source or the New Electricity

    Is DeepSeek the New Open Source or the New Electricity

    Why the reality behind DeepSeek’s open source model is more complicated than the hype Electricity transformed America…

    6 条评论
  • Optimizing Prompts for Reasoning LLMs

    Optimizing Prompts for Reasoning LLMs

    Techniques for getting great results from reasoning LLMs Reasoning models are advanced large language models designed…

    2 条评论
  • FOBO - Fear of Being Obsolete

    FOBO - Fear of Being Obsolete

    The K-Shaped Market: Who Thrives with AI and Who Falls Behind? FOBO - Fear of Being Obsolete The K-Shaped Market: Who…

    2 条评论
  • Next-Gen AI Automation

    Next-Gen AI Automation

    Beyond RPA: How AI-Powered Models Are Automating Workflows, Extracting Data, and Revolutionizing Digital Interactions…

    6 条评论

社区洞察

其他会员也浏览了