登录查看更多内容

What Google Gemini AI means to your business!

Srikrishna KrishnaraoSrinivasan

ML AI Research and Innovation | Business Solutions | Decision Automation | Advanced Analytics | Data, Information Governance | Leadership | Consulting

发布日期: 2023年12月12日

Introduction

Large Language Models are evaluated on a standard set of tasks that are well recognized in the generative AI space. Google Gemini generative AI (a competitor of ChatGPT) has published their report on the evaluation of Gemini. This article summarizes what Google Gemini and its evaluation means to your business.

Background of Google Gemini compared to ChatGPT

ChatGPT version 3.5 is a Large Language Model (LLM) capable of many Text oriented Natural Language tasks. ChatGPT version 4 is a multi-modal Large Language Model that can handle Text and Image as inputs.

Google Gemini is claimed to be a slightly better multi-modal LLM in that it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.

Gemini 1.0, the first version, has three different sizes:

Gemini Ultra — the largest and most capable model for highly complex tasks.
Gemini Pro — the best model for scaling across a wide range of tasks.
Gemini Nano — the most efficient model for on-device tasks.

An overview of LLM evaluation benchmarks

Stanford has a reference model for evaluating LLMs. It is called HELM or Holistic framework for Evaluating foundation Models, managed by the Center for Research on Foundation Models (CFRM).

The evaluation model is summarized as shown in Figure 1.

As on December 2023, 119 models are evaluated on 116 scenarios generating 110 metrics. While it is a daunting task to summarize (or read) all of them in one go, a summarized portion of it is highlighted in the Figure 2.

Figure 2: Typical evaluations conducted on LLMs

Evaluation of Google Gemini

The evaluation report of Google Gemini describes that Gemini Ultra achieves new state-of-the-art results in 30 of 32 benchmarks that includes:

10 of 12 popular text and reasoning benchmarks,
9 of 9 image understanding benchmarks,
6 of 6 video understanding benchmarks, and
5 of 5 speech recognition and speech translation benchmarks

The important claim being made is the cross-modal reasoning capabilities - CMRC. That is, it could combine the learning from one form of understanding (e.g from image) with other forms of understanding (text, voice or video) to draw collective conclusions. To illustrate this further, an example in Figure 3 is provided.

In an educational setting, this is an evaluation of a student's solution for an assignment. "The model is able to understand the messy handwriting, correctly understand the problem formulation, convert both the problem and solution to mathematical typesetting, identify the specific step of reasoning where the student went wrong in solving the problem, and then give a worked through correct solution to the problem" - Google.

Munish Kaushik 7 个月前

The Rise of Inflection-2: A Game-Changer in AI…

Courtney McLaren 11 个月前

Meet the Titans of AI

Bj?rgvin Gudjonsson 3 个月前

What does this mean to your business?

As one can expect, each iteration of a model from the same provider or a competitor is better than the current state of the art models. However, the important question to ponder is, "what is critical for your specific business case?".

LLMs are generally known to have these three problems:

They tend to hallucinate (make up information - in a convincing manner - that does not actually exist)
They are not strong in numerical reasoning, in a multi-step problem scenario
They are susceptible to jail-breaking, with prompt-engineering, leading to providing unethical, biased or dangerous information

Some example business cases/business functions

As the products like Google Gemini AI provide cutting-edge capabilities, we need to familiarize ourselves with examples after examples of use cases to finally get some ideas that will specifically work very well for your business scenario. You may be yet to identify an applicable business use case now, but you may identify after reviewing many examples where LLMs are successfully implemented.

The examples that follow are at a high level, that are helpful to start a process of narrowing down spiral, ultimately leading to one or few fruitful business case(s) in the context in which you operate.

Finance: Interpret physical Invoices, Payment vouchers, Bank statements and other such financial instruments to prepare journals or summaries to cross-check system entries

Human Resources: Get a summary of vast amount of research in Behavioral sciences to guide HR policies that will nudge the employees for better performance

Operations: Develop test-runs of image based recognition, analysis and reconciliation of physical items in the place of expensive and maintenance heavy technologies such as RFID (Radio Frequency Identifiers)

Audit and Compliance: Visual surveillance and audit in the place of manual photography and checklists (for example Store visual compliance)

Customer support: Leverage voice interactions for improving service

Marketing and sales: Develop marketing headlines for given products, promotions, events or campaigns

Research and Development: Summarize vast information on current products or services to get new ideas suggestions from LLMs

Thoughts to take away and ponder further

The examples discussed above are some samples to reiterate the point that better evaluation of the LLMs need to be seen in the context of the business case you are planning to pursue for using a tool like ChatGPT or Google Gemini. It may require, design of some specific evaluations that provide a guarantee of performance for your specific business case in a space-time context.

References

What Google Gemini AI means to your business!

Srikrishna KrishnaraoSrinivasan

ML AI Research and Innovation | Business Solutions | Decision Automation | Advanced Analytics | Data, Information Governance | Leadership | Consulting

Introduction

Background of Google Gemini compared to ChatGPT

An overview of LLM evaluation benchmarks

Evaluation of Google Gemini

领英推荐

Thoughts to take away and ponder further

更多精彩文章

社区洞察

其他会员也浏览了

Unveiling the Power of Language: Optimizing AI Through Intuitive Prompts

ChatGPT

Safeguarding your AI Integration

How AI is Learning to Understand Us—and Why It Matters More Than You Think

AI Language Models and AI

Bade Miyan vs. Chote Miyan: The AI Showdown Between SLMs and LLMs”

Multimodality is King - Bridging the Gap Between Language and Vision in AI

ChatGPT: The Rise Of AI and its Impacts on Every Industry

Crafting Intelligence: The Art of Tailoring Large Language Models for Precision and Relevance

Introduction

Background of Google Gemini compared to ChatGPT

An overview of LLM evaluation benchmarks

Evaluation of Google Gemini

领英推荐

Thoughts to take away and ponder further

Research and Innovation In Business Context

2024年3月15日

AI Talent Hiring Guide

2023年12月16日

Association Rules, Clustering, Classification and Anomaly detection

2023年12月9日

Data Mining Foundations Continued

2023年12月6日

Data Mining Foundations

2023年12月3日

APIs for ChatGPT and competitors

2023年11月10日

APIs for ChatGPT and competitors - Links

2023年11月10日

Expert shares on Failures as stepping stones for innovation: Thematic analysis

2023年11月4日

Happier Workplace: Your contribution solicited

2023年11月4日

ChatGPT Competitors

2023年11月3日

社区洞察

其他会员也浏览了

Unveiling the Power of Language: Optimizing AI Through Intuitive Prompts

ChatGPT

Safeguarding your AI Integration

How AI is Learning to Understand Us—and Why It Matters More Than You Think

AI Language Models and AI

Bade Miyan vs. Chote Miyan: The AI Showdown Between SLMs and LLMs”

Multimodality is King - Bridging the Gap Between Language and Vision in AI

ChatGPT: The Rise Of AI and its Impacts on Every Industry

Crafting Intelligence: The Art of Tailoring Large Language Models for Precision and Relevance