What Google Gemini AI means to your business!
Image created with www.canva.com

What Google Gemini AI means to your business!

Introduction

Large Language Models are evaluated on a standard set of tasks that are well recognized in the generative AI space. Google Gemini generative AI (a competitor of ChatGPT) has published their report on the evaluation of Gemini. This article summarizes what Google Gemini and its evaluation means to your business.

Background of Google Gemini compared to ChatGPT

ChatGPT version 3.5 is a Large Language Model (LLM) capable of many Text oriented Natural Language tasks. ChatGPT version 4 is a multi-modal Large Language Model that can handle Text and Image as inputs.

Google Gemini is claimed to be a slightly better multi-modal LLM in that it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.

Gemini 1.0, the first version, has three different sizes:

  • Gemini Ultra — the largest and most capable model for highly complex tasks.
  • Gemini Pro — the best model for scaling across a wide range of tasks.
  • Gemini Nano — the most efficient model for on-device tasks.

An overview of LLM evaluation benchmarks

Stanford has a reference model for evaluating LLMs. It is called HELM or Holistic framework for Evaluating foundation Models, managed by the Center for Research on Foundation Models (CFRM).

The evaluation model is summarized as shown in Figure 1.

Figure 1: HELM by CFRM Stanford

As on December 2023, 119 models are evaluated on 116 scenarios generating 110 metrics. While it is a daunting task to summarize (or read) all of them in one go, a summarized portion of it is highlighted in the Figure 2.

Figure 2: Typical evaluations conducted on LLMs

Evaluation of Google Gemini

The evaluation report of Google Gemini describes that Gemini Ultra achieves new state-of-the-art results in 30 of 32 benchmarks that includes:

  • 10 of 12 popular text and reasoning benchmarks,
  • 9 of 9 image understanding benchmarks,
  • 6 of 6 video understanding benchmarks, and
  • 5 of 5 speech recognition and speech translation benchmarks

The important claim being made is the cross-modal reasoning capabilities - CMRC. That is, it could combine the learning from one form of understanding (e.g from image) with other forms of understanding (text, voice or video) to draw collective conclusions. To illustrate this further, an example in Figure 3 is provided.

Figure 3: Reference # 2

In an educational setting, this is an evaluation of a student's solution for an assignment. "The model is able to understand the messy handwriting, correctly understand the problem formulation, convert both the problem and solution to mathematical typesetting, identify the specific step of reasoning where the student went wrong in solving the problem, and then give a worked through correct solution to the problem" - Google.

What does this mean to your business?

As one can expect, each iteration of a model from the same provider or a competitor is better than the current state of the art models. However, the important question to ponder is, "what is critical for your specific business case?".

LLMs are generally known to have these three problems:

  1. They tend to hallucinate (make up information - in a convincing manner - that does not actually exist)
  2. They are not strong in numerical reasoning, in a multi-step problem scenario
  3. They are susceptible to jail-breaking, with prompt-engineering, leading to providing unethical, biased or dangerous information

Some example business cases/business functions

As the products like Google Gemini AI provide cutting-edge capabilities, we need to familiarize ourselves with examples after examples of use cases to finally get some ideas that will specifically work very well for your business scenario. You may be yet to identify an applicable business use case now, but you may identify after reviewing many examples where LLMs are successfully implemented.

The examples that follow are at a high level, that are helpful to start a process of narrowing down spiral, ultimately leading to one or few fruitful business case(s) in the context in which you operate.

Finance: Interpret physical Invoices, Payment vouchers, Bank statements and other such financial instruments to prepare journals or summaries to cross-check system entries

Human Resources: Get a summary of vast amount of research in Behavioral sciences to guide HR policies that will nudge the employees for better performance

Operations: Develop test-runs of image based recognition, analysis and reconciliation of physical items in the place of expensive and maintenance heavy technologies such as RFID (Radio Frequency Identifiers)

Audit and Compliance: Visual surveillance and audit in the place of manual photography and checklists (for example Store visual compliance)

Customer support: Leverage voice interactions for improving service

Marketing and sales: Develop marketing headlines for given products, promotions, events or campaigns

Research and Development: Summarize vast information on current products or services to get new ideas suggestions from LLMs

Thoughts to take away and ponder further

The examples discussed above are some samples to reiterate the point that better evaluation of the LLMs need to be seen in the context of the business case you are planning to pursue for using a tool like ChatGPT or Google Gemini. It may require, design of some specific evaluations that provide a guarantee of performance for your specific business case in a space-time context.

References

  1. https://blog.google/technology/ai/google-gemini-ai/#performance
  2. https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf
  3. https://www.toolify.ai/gemini-ai/unleash-the-power-of-gemini-googles-gamechanging-gpt4-model-158188
  4. https://crfm.stanford.edu/helm/latest/#/groups/boolq
  5. https://aclanthology.org/N19-1300/

?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了