Image Recognition Battle: Google vs. OpenAI
Image created with DALL-E

Image Recognition Battle: Google vs. OpenAI

I looked at a few APIs for image analysis and sent them all the same prompt. Here are the services I explored:

I quickly discarded AWS's service because it is not easy to implement through CURL and requires building a model and custom code. I was looking to quickly send API calls using flutterflow. So how do Google and OpenAI's services compare?

Please note that both services are still in preview mode, so they might improve and invalidate my results. This is a very specific use case, so do not draw conclusions on your specific task. I also wanted to use an LLM model out of the box (no custom training)

Here is the image I provided:

I sent a detailed system prompt requesting the API return an analysis of the contents of the fridge in JSON format. I gave detailed instructions on the format of the results, and structure of the JSON and asked the LLM models to look for occluded items, provide quantities, best estimates of the volume of the containers and the location within the fridge.

I provided the same prompt across both services. Here are the outputs (scroll past the outputs to see a summary and analysis of the results):

Open AI Vision GPT ("gpt-4-vision-preview")

{
  "Description": "A refrigerator with a left door and main storage area, filled with various food items and condiments. The shelves and compartments are well-stocked.",
  "Numerical confidence": 0.7,
  "Improvements": "To improve recognition, provide higher resolution images, better lighting, and remove occlusions by repositioning items for a clearer view.",
  "Ingredients": [
    {
      "Ingredient": "Butter",
      "Object Type": "Box",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 8.0,
        "Units": "Oz",
        "Multiples": 1
      },
      "Description": "Yellowish block",
      "Location": "Door shelf"
    },
    {
      "Ingredient": "Yogurt",
      "Object Type": "Plastic container",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 5.3,
        "Units": "Oz",
        "Multiples": 2
      },
      "Description": "White container with label",
      "Location": "Door shelf"
    },
    {
      "Ingredient": "Eggs",
      "Object Type": "Carton",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 1.0,
        "Units": "Dozen",
        "Multiples": 1
      },
      "Description": "White and brown eggs visible",
      "Location": "Main shelf"
    },
    {
      "Ingredient": "Bell peppers",
      "Object Type": "Plastic bag",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 1.0,
        "Units": "Lbs",
        "Multiples": 1
      },
      "Description": "Green and red peppers",
      "Location": "Main shelf"
    },
    {
      "Ingredient": "Salad mix",
      "Object Type": "Plastic container",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 5.0,
        "Units": "Oz",
        "Multiples": 2
      },
      "Description": "Green leafy vegetables",
      "Location": "Main shelf"
    },
    {
      "Ingredient": "Tomatoes",
      "Object Type": "Plastic container",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 1.0,
        "Units": "Lbs",
        "Multiples": 1
      },
      "Description": "Red round tomatoes",
      "Location": "Vegetable drawer"
    },
    {
      "Ingredient": "Carrots",
      "Object Type": "Plastic bag",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 1.0,
        "Units": "Lbs",
        "Multiples": 1
      },
      "Description": "Orange elongated vegetables",
      "Location": "Vegetable drawer"
    },
    {
      "Ingredient": "Milk",
      "Object Type": "Plastic bottle",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 0.5,
        "Units": "Gallons",
        "Multiples": 1
      },
      "Description": "White liquid",
      "Location": "Door shelf"
    },
    {
      "Ingredient": "Condiments",
      "Object Type": "Glass containers",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 12.0,
        "Units": "Oz",
        "Multiples": 5
      },
      "Description": "Various sauces and spreads",
      "Location": "Door shelf"
    },
    {
      "Ingredient": "Cheese",
      "Object Type": "Plastic wrap",
      "Brand": "Unknown",
      "Quantity": {
        "Number": 8.0,
        "Units": "Oz",
        "Multiples": 1
      },
      "Description": "Yellowish block",
      "Location": "Main shelf"
    }
  ]
}        

Google Bard/VertexAI

I used the new multi-modal API which is exposed through Bard beta and is available via the Vertex AI API which I was given access to. Here are the results from VertexAI:

{
"Description": "A refrigerator with two doors and two shelves. The left door has three shelves and the main storage has two shelves. The shelves are made of glass and the walls are white.",
"Numerical Confidence": 0.8,
"Improvements": "Higher resolution images and better lighting would improve the recognition.",
"Ingredients": [
{
"Ingredient": "Eggs",
"Object Type": "Carton",
"Brand": "Unknown",
"Quantity": {
"Number": 12,
"Units": null,
"Multiples": 1
},
"Description": "White eggs in a white carton",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Milk",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Gallon",
"Multiples": 1
},
"Description": "White milk in a white plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Butter",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Yellow butter in a yellow plastic container",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Cheese",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Yellow cheese in a yellow plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Yogurt",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Quarts",
"Multiples": 1
},
"Description": "White yogurt in a white plastic container",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Fruit",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Red and green fruit in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Vegetables",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Green vegetables in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Meat",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Red meat in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Fish",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "White fish in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Bread",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Loaf",
"Multiples": 1
},
"Description": "White bread in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Cereal",
"Object Type": "Box",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Box",
"Multiples": 1
},
"Description": "Blue and yellow cereal box",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Juice",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Gallon",
"Multiples": 1
},
"Description": "Orange juice in a clear plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Soda",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Gallon",
"Multiples": 1
},
"Description": "Black soda in a clear plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Water",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Gallon",
"Multiples": 1
},
"Description": "Clear water in a clear plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Condiments",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Bottle",
"Multiples": 1
},
"Description": "Red and yellow condiments in a clear plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Spices",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Container",
"Multiples": 1
},
"Description": "Green and red spices in a clear plastic container",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Frozen Food",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Bag",
"Multiples": 1
},
"Description": "White frozen food in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Ice Cream",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Container",
"Multiples": 1
},
"Description": "White ice cream in a white plastic container",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Fruit",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Red and green fruit in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Vegetables",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Green vegetables in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Meat",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Red meat in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Fish",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {        

Notice the JSON is truncated at the end of the Google query? It reached the maximum token count of (2048) so there are more items that this query would spit out - I hope to figure out how to access a larger token count (or if the model is currently limited at this point). I didn't pursue the token further because the difference between OpenAI and bard is glaringly obvious, here is a tabulation of the results side by side:

GPT Vision

Pluses

  • Did a much better job identifying the structure of the fridge (door versus main shelf)
  • Did a better job finding 5 containers of condiments in the door, while Google identified a single plastic bottle
  • Used more accurate language in quantifying items, e.g. "1 Dozen eggs"
  • Did a better job at estimating the volume of the contents (e.g. 2x 5Oz yogurt containers, vs. Google finding 1 Quart Yogurt)

Challenges

  • Stopped at a very small number of items, and did not reach the maximum token count (I gave it a 5000 token limit)


Google

Pluses

  • Picked up on many more products (fruit, vegetables, meat, fish bread, cereal, etc)
  • Much much quicker response (30s vs. 1 minute for a smaller list from GPT-Vision)

Challenges

  • Had too much hallucinations: some of the products (Fish, Ice Cream) are incorrect - clearly I don't have Ice Cream in my fridge

Summary

Neither service understood the structure of the photo (various shelves and locations in the door), and they missed several occluded items. Both missed the peppers (both red and green) and the lettuce (though it is possible Vertex AI would provide these if it didnt expire the tokens).

Overall - I give Google VertexAI a thumbs up on this task - it still needs some work and possibly a customer model, but the speed of the response and the level of details in identifying objects was very impressive!

What do you think about this experiment? Any comments or suggestions on improving it?


Ricardo Cortes

Data Analyst / Open to Work! | SQL | Looker | Excel | Python

2 个月

Thank you for this fascinating article. I appreciate your effort in testing and explaining the use of this API.

回复
Jozsef Boros

Business Development Director, Playbook Labs & 815Labs - powered by Mito Group

10 个月

Congrats! It is obvious. We have to learn more and more about AI its limitations, and possible use cases. I had a similar experience with my bookshelves. As you did?without any specialized training. The result is fascinating.

回复
John Gnotek

Communications, Media & Technology Consultant at Cognizant

10 个月

Awesome experiment. Well done.

回复
Chad Andrews

Global AI Solutions Leader, M&E | Sand Technologies

10 个月

On behalf of peppers everywhere, I am outraged. But very interesting experiment. Thanks for sharing.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了