What Google Gemini AI means to your business!
Srikrishna KrishnaraoSrinivasan
ML AI Research and Innovation | Business Solutions | Decision Automation | Advanced Analytics | Data, Information Governance | Leadership | Consulting
Introduction
Large Language Models are evaluated on a standard set of tasks that are well recognized in the generative AI space. Google Gemini generative AI (a competitor of ChatGPT) has published their report on the evaluation of Gemini. This article summarizes what Google Gemini and its evaluation means to your business.
Background of Google Gemini compared to ChatGPT
ChatGPT version 3.5 is a Large Language Model (LLM) capable of many Text oriented Natural Language tasks. ChatGPT version 4 is a multi-modal Large Language Model that can handle Text and Image as inputs.
Google Gemini is claimed to be a slightly better multi-modal LLM in that it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.
Gemini 1.0, the first version, has three different sizes:
An overview of LLM evaluation benchmarks
Stanford has a reference model for evaluating LLMs. It is called HELM or Holistic framework for Evaluating foundation Models, managed by the Center for Research on Foundation Models (CFRM).
The evaluation model is summarized as shown in Figure 1.
As on December 2023, 119 models are evaluated on 116 scenarios generating 110 metrics. While it is a daunting task to summarize (or read) all of them in one go, a summarized portion of it is highlighted in the Figure 2.
Evaluation of Google Gemini
The evaluation report of Google Gemini describes that Gemini Ultra achieves new state-of-the-art results in 30 of 32 benchmarks that includes:
The important claim being made is the cross-modal reasoning capabilities - CMRC. That is, it could combine the learning from one form of understanding (e.g from image) with other forms of understanding (text, voice or video) to draw collective conclusions. To illustrate this further, an example in Figure 3 is provided.
In an educational setting, this is an evaluation of a student's solution for an assignment. "The model is able to understand the messy handwriting, correctly understand the problem formulation, convert both the problem and solution to mathematical typesetting, identify the specific step of reasoning where the student went wrong in solving the problem, and then give a worked through correct solution to the problem" - Google.
领英推荐
What does this mean to your business?
As one can expect, each iteration of a model from the same provider or a competitor is better than the current state of the art models. However, the important question to ponder is, "what is critical for your specific business case?".
LLMs are generally known to have these three problems:
Some example business cases/business functions
As the products like Google Gemini AI provide cutting-edge capabilities, we need to familiarize ourselves with examples after examples of use cases to finally get some ideas that will specifically work very well for your business scenario. You may be yet to identify an applicable business use case now, but you may identify after reviewing many examples where LLMs are successfully implemented.
The examples that follow are at a high level, that are helpful to start a process of narrowing down spiral, ultimately leading to one or few fruitful business case(s) in the context in which you operate.
Finance: Interpret physical Invoices, Payment vouchers, Bank statements and other such financial instruments to prepare journals or summaries to cross-check system entries
Human Resources: Get a summary of vast amount of research in Behavioral sciences to guide HR policies that will nudge the employees for better performance
Operations: Develop test-runs of image based recognition, analysis and reconciliation of physical items in the place of expensive and maintenance heavy technologies such as RFID (Radio Frequency Identifiers)
Audit and Compliance: Visual surveillance and audit in the place of manual photography and checklists (for example Store visual compliance)
Customer support: Leverage voice interactions for improving service
Marketing and sales: Develop marketing headlines for given products, promotions, events or campaigns
Research and Development: Summarize vast information on current products or services to get new ideas suggestions from LLMs
Thoughts to take away and ponder further
The examples discussed above are some samples to reiterate the point that better evaluation of the LLMs need to be seen in the context of the business case you are planning to pursue for using a tool like ChatGPT or Google Gemini. It may require, design of some specific evaluations that provide a guarantee of performance for your specific business case in a space-time context.
References
?