登录查看更多内容

Image Recognition Battle: Google vs. OpenAI

Tiran Dagan

Strategy, Transformation & Alliances Executive | Sales Management & Revenue Optimization | Partner & Alliance Management | Strategic & Financial Planning | Offering & Product Lifecycle Management

发布日期: 2024年1月4日

+ 关注

I looked at a few APIs for image analysis and sent them all the same prompt. Here are the services I explored:

Google Bard (whose image recognition is actually Vertex AI ), try this in your own Google Cloud console (link to VertexAI image query )
OpenAI Vision (accessed via the Chat Completion API, using the gpt-4-vision-preview model
AWS Rekognition

I quickly discarded AWS's service because it is not easy to implement through CURL and requires building a model and custom code. I was looking to quickly send API calls using flutterflow. So how do Google and OpenAI's services compare?

Please note that both services are still in preview mode, so they might improve and invalidate my results. This is a very specific use case, so do not draw conclusions on your specific task. I also wanted to use an LLM model out of the box (no custom training)

Here is the image I provided:

I sent a detailed system prompt requesting the API return an analysis of the contents of the fridge in JSON format. I gave detailed instructions on the format of the results, and structure of the JSON and asked the LLM models to look for occluded items, provide quantities, best estimates of the volume of the containers and the location within the fridge.

I provided the same prompt across both services. Here are the outputs (scroll past the outputs to see a summary and analysis of the results):

Open AI Vision GPT ("gpt-4-vision-preview")

{
  "Description": "A refrigerator with a left door and main storage area, filled with various food items and condiments. The shelves and compartments are well-stocked.",
  "Numerical confidence": 0.7,
  "Improvements": "To improve recognition, provide higher resolution images, better lighting, and remove occlusions by repositioning items for a clearer view.",
  "Ingredients": [
    {
      "Ingredient": "Butter",
      "Object Type": "Box",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 8.0,
        "Units": "Oz",
        "Multiples": 1
      },
      "Description": "Yellowish block",
      "Location": "Door shelf"
    },
    {
      "Ingredient": "Yogurt",
      "Object Type": "Plastic container",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 5.3,
        "Units": "Oz",
        "Multiples": 2
      },
      "Description": "White container with label",
      "Location": "Door shelf"
    },
    {
      "Ingredient": "Eggs",
      "Object Type": "Carton",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 1.0,
        "Units": "Dozen",
        "Multiples": 1
      },
      "Description": "White and brown eggs visible",
      "Location": "Main shelf"
    },
    {
      "Ingredient": "Bell peppers",
      "Object Type": "Plastic bag",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 1.0,
        "Units": "Lbs",
        "Multiples": 1
      },
      "Description": "Green and red peppers",
      "Location": "Main shelf"
    },
    {
      "Ingredient": "Salad mix",
      "Object Type": "Plastic container",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 5.0,
        "Units": "Oz",
        "Multiples": 2
      },
      "Description": "Green leafy vegetables",
      "Location": "Main shelf"
    },
    {
      "Ingredient": "Tomatoes",
      "Object Type": "Plastic container",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 1.0,
        "Units": "Lbs",
        "Multiples": 1
      },
      "Description": "Red round tomatoes",
      "Location": "Vegetable drawer"
    },
    {
      "Ingredient": "Carrots",
      "Object Type": "Plastic bag",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 1.0,
        "Units": "Lbs",
        "Multiples": 1
      },
      "Description": "Orange elongated vegetables",
      "Location": "Vegetable drawer"
    },
    {
      "Ingredient": "Milk",
      "Object Type": "Plastic bottle",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 0.5,
        "Units": "Gallons",
        "Multiples": 1
      },
      "Description": "White liquid",
      "Location": "Door shelf"
    },
    {
      "Ingredient": "Condiments",
      "Object Type": "Glass containers",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 12.0,
        "Units": "Oz",
        "Multiples": 5
      },
      "Description": "Various sauces and spreads",
      "Location": "Door shelf"
    },
    {
      "Ingredient": "Cheese",
      "Object Type": "Plastic wrap",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 8.0,
        "Units": "Oz",
        "Multiples": 1
      },
      "Description": "Yellowish block",
      "Location": "Main shelf"
    }
  ]
}

Google Bard/VertexAI

I used the new multi-modal API which is exposed through Bard beta and is available via the Vertex AI API which I was given access to. Here are the results from VertexAI:

{
"Description": "A refrigerator with two doors and two shelves. The left door has three shelves and the main storage has two shelves. The shelves are made of glass and the walls are white.",
"Numerical Confidence": 0.8,
"Improvements": "Higher resolution images and better lighting would improve the recognition.",
"Ingredients": [
{
"Ingredient": "Eggs",
"Object Type": "Carton",
"Brand": "Unknown",
"Quantity": {
"Number": 12,
"Units": null,
"Multiples": 1
},
"Description": "White eggs in a white carton",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Milk",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Gallon",
"Multiples": 1
},
"Description": "White milk in a white plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Butter",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Yellow butter in a yellow plastic container",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Cheese",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Yellow cheese in a yellow plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Yogurt",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Quarts",
"Multiples": 1
},
"Description": "White yogurt in a white plastic container",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Fruit",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Red and green fruit in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Vegetables",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Green vegetables in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Meat",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Red meat in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Fish",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "White fish in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Bread",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Loaf",
"Multiples": 1
},
"Description": "White bread in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Cereal",
"Object Type": "Box",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Box",
"Multiples": 1
},
"Description": "Blue and yellow cereal box",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Juice",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Gallon",
"Multiples": 1
},
"Description": "Orange juice in a clear plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Soda",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Gallon",
"Multiples": 1
},
"Description": "Black soda in a clear plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Water",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Gallon",
"Multiples": 1
},
"Description": "Clear water in a clear plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Condiments",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Bottle",
"Multiples": 1
},
"Description": "Red and yellow condiments in a clear plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Spices",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Container",
"Multiples": 1
},
"Description": "Green and red spices in a clear plastic container",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Frozen Food",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Bag",
"Multiples": 1
},
"Description": "White frozen food in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Ice Cream",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Container",
"Multiples": 1
},
"Description": "White ice cream in a white plastic container",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Fruit",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Red and green fruit in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Vegetables",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Green vegetables in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Meat",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Red meat in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Fish",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {

Notice the JSON is truncated at the end of the Google query? It reached the maximum token count of (2048) so there are more items that this query would spit out - I hope to figure out how to access a larger token count (or if the model is currently limited at this point). I didn't pursue the token further because the difference between OpenAI and bard is glaringly obvious, here is a tabulation of the results side by side:

MIT Technology Review 12 个月前

Will Microsoft Acquire OpenAI?

Michael Spencer 2 年前

Recipe for Instant Homemade AI Agents With Azure…

John Maeda 2 个月前

GPT Vision

Pluses

Did a much better job identifying the structure of the fridge (door versus main shelf)
Did a better job finding 5 containers of condiments in the door, while Google identified a single plastic bottle
Used more accurate language in quantifying items, e.g. "1 Dozen eggs"
Did a better job at estimating the volume of the contents (e.g. 2x 5Oz yogurt containers, vs. Google finding 1 Quart Yogurt)

Challenges

Stopped at a very small number of items, and did not reach the maximum token count (I gave it a 5000 token limit)

Google

Pluses

Picked up on many more products (fruit, vegetables, meat, fish bread, cereal, etc)
Much much quicker response (30s vs. 1 minute for a smaller list from GPT-Vision)

Challenges

Had too much hallucinations: some of the products (Fish, Ice Cream) are incorrect - clearly I don't have Ice Cream in my fridge

Summary

Neither service understood the structure of the photo (various shelves and locations in the door), and they missed several occluded items. Both missed the peppers (both red and green) and the lettuce (though it is possible Vertex AI would provide these if it didnt expire the tokens).

Overall - I give Google VertexAI a thumbs up on this task - it still needs some work and possibly a customer model, but the speed of the response and the level of details in identifying objects was very impressive!

What do you think about this experiment? Any comments or suggestions on improving it?

Ricardo Cortes

Data Analyst / Open to Work! | SQL | Looker | Excel | Python

2 个月

Thank you for this fascinating article. I appreciate your effort in testing and explaining the use of this API.

Jozsef Boros

Business Development Director, Playbook Labs & 815Labs - powered by Mito Group

10 个月

Congrats! It is obvious. We have to learn more and more about AI its limitations, and possible use cases. I had a similar experience with my bookshelves. As you did?without any specialized training. The result is fascinating.

John Gnotek

Communications, Media & Technology Consultant at Cognizant

10 个月

Awesome experiment. Well done.

Chad Andrews

Global AI Solutions Leader, M&E | Sand Technologies

10 个月

On behalf of peppers everywhere, I am outraged. But very interesting experiment. Thanks for sharing.

查看更多评论

要查看或添加评论，请登录

查看全部

Image Recognition Battle: Google vs. OpenAI

Tiran Dagan

Strategy, Transformation & Alliances Executive | Sales Management & Revenue Optimization | Partner & Alliance Management | Strategic & Financial Planning | Offering & Product Lifecycle Management

Open AI Vision GPT ("gpt-4-vision-preview")

Google Bard/VertexAI

领英推荐

GPT Vision

Google

Summary

更多精彩文章

社区洞察

其他会员也浏览了

Google's Imagen Is More Relatable than OpenAI's DALL-E 2

OpenAI's SearchGPT: It's About the Optics

A Comprehensive Guide to Azure OpenAI Service

Transform Your Business with Azure OpenAI Services: The Future of AI Integration

Google Gemma – Gemini junior

Issue #283 - The ML Engineer ??

Azure OpenAI Tutorial: Mastering AI with Azure and OpenAI

Is Databricks + MosaicML now competing with OpenAI, Vertex, Azure and Bedrock?

Using Amazon OpenSearch Serverless Vector Search and OpenAI CLIP Multimodal Model for Semantic Image?Search

ICYMI: All Azure OpenAI announcements at Microsoft Build 2024

Open AI Vision GPT ("gpt-4-vision-preview")

Google Bard/VertexAI

领英推荐

GPT Vision

Google

Summary

Tiran Dagan: Master Index

2024年3月4日

The ISG Operating Model

2024年2月29日

Financial Analysis of Non-Profit Media Companies

2024年2月22日

Chain of Thought: A New Frontier in Prompt Engineering

2024年1月26日

The Paragraph Method: An Ingenious Prompt Engineering Method

2024年1月25日

Efforts to Jam Starlink Communications in Ukraine: A Detailed Analysis

2023年12月21日

Chapter 4: Beyond Dollars: ISG's Influence and Brand Impact

2023年11月28日

From Myth to Mastery: 2) Bridging Myths and Mechanisms – The Age of Enlightenment and Mechanical Wonders

2023年10月20日

The Chinese Room Argument in the Age of AGI and Generative AI

2023年10月20日

From Myth to Mastery: Tracing the Evolution of Artificial Intelligence

2023年10月20日

社区洞察

其他会员也浏览了

Google's Imagen Is More Relatable than OpenAI's DALL-E 2

OpenAI's SearchGPT: It's About the Optics

A Comprehensive Guide to Azure OpenAI Service

Transform Your Business with Azure OpenAI Services: The Future of AI Integration

Google Gemma – Gemini junior

Issue #283 - The ML Engineer ??

Azure OpenAI Tutorial: Mastering AI with Azure and OpenAI

Is Databricks + MosaicML now competing with OpenAI, Vertex, Azure and Bedrock?

Using Amazon OpenSearch Serverless Vector Search and OpenAI CLIP Multimodal Model for Semantic Image?Search

ICYMI: All Azure OpenAI announcements at Microsoft Build 2024