Image Recognition Battle: Google vs. OpenAI
Tiran Dagan
Strategy, Transformation & Alliances Executive | Sales Management & Revenue Optimization | Partner & Alliance Management | Strategic & Financial Planning | Offering & Product Lifecycle Management
I looked at a few APIs for image analysis and sent them all the same prompt. Here are the services I explored:
I quickly discarded AWS's service because it is not easy to implement through CURL and requires building a model and custom code. I was looking to quickly send API calls using flutterflow. So how do Google and OpenAI's services compare?
Please note that both services are still in preview mode, so they might improve and invalidate my results. This is a very specific use case, so do not draw conclusions on your specific task. I also wanted to use an LLM model out of the box (no custom training)
Here is the image I provided:
I sent a detailed system prompt requesting the API return an analysis of the contents of the fridge in JSON format. I gave detailed instructions on the format of the results, and structure of the JSON and asked the LLM models to look for occluded items, provide quantities, best estimates of the volume of the containers and the location within the fridge.
I provided the same prompt across both services. Here are the outputs (scroll past the outputs to see a summary and analysis of the results):
Open AI Vision GPT ("gpt-4-vision-preview")
{
"Description": "A refrigerator with a left door and main storage area, filled with various food items and condiments. The shelves and compartments are well-stocked.",
"Numerical confidence": 0.7,
"Improvements": "To improve recognition, provide higher resolution images, better lighting, and remove occlusions by repositioning items for a clearer view.",
"Ingredients": [
{
"Ingredient": "Butter",
"Object Type": "Box",
"Brand": "Unknown",
"Quantity": {
"Number": 8.0,
"Units": "Oz",
"Multiples": 1
},
"Description": "Yellowish block",
"Location": "Door shelf"
},
{
"Ingredient": "Yogurt",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 5.3,
"Units": "Oz",
"Multiples": 2
},
"Description": "White container with label",
"Location": "Door shelf"
},
{
"Ingredient": "Eggs",
"Object Type": "Carton",
"Brand": "Unknown",
"Quantity": {
"Number": 1.0,
"Units": "Dozen",
"Multiples": 1
},
"Description": "White and brown eggs visible",
"Location": "Main shelf"
},
{
"Ingredient": "Bell peppers",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1.0,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Green and red peppers",
"Location": "Main shelf"
},
{
"Ingredient": "Salad mix",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 5.0,
"Units": "Oz",
"Multiples": 2
},
"Description": "Green leafy vegetables",
"Location": "Main shelf"
},
{
"Ingredient": "Tomatoes",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 1.0,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Red round tomatoes",
"Location": "Vegetable drawer"
},
{
"Ingredient": "Carrots",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1.0,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Orange elongated vegetables",
"Location": "Vegetable drawer"
},
{
"Ingredient": "Milk",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 0.5,
"Units": "Gallons",
"Multiples": 1
},
"Description": "White liquid",
"Location": "Door shelf"
},
{
"Ingredient": "Condiments",
"Object Type": "Glass containers",
"Brand": "Unknown",
"Quantity": {
"Number": 12.0,
"Units": "Oz",
"Multiples": 5
},
"Description": "Various sauces and spreads",
"Location": "Door shelf"
},
{
"Ingredient": "Cheese",
"Object Type": "Plastic wrap",
"Brand": "Unknown",
"Quantity": {
"Number": 8.0,
"Units": "Oz",
"Multiples": 1
},
"Description": "Yellowish block",
"Location": "Main shelf"
}
]
}
Google Bard/VertexAI
I used the new multi-modal API which is exposed through Bard beta and is available via the Vertex AI API which I was given access to. Here are the results from VertexAI:
{
"Description": "A refrigerator with two doors and two shelves. The left door has three shelves and the main storage has two shelves. The shelves are made of glass and the walls are white.",
"Numerical Confidence": 0.8,
"Improvements": "Higher resolution images and better lighting would improve the recognition.",
"Ingredients": [
{
"Ingredient": "Eggs",
"Object Type": "Carton",
"Brand": "Unknown",
"Quantity": {
"Number": 12,
"Units": null,
"Multiples": 1
},
"Description": "White eggs in a white carton",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Milk",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Gallon",
"Multiples": 1
},
"Description": "White milk in a white plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Butter",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Yellow butter in a yellow plastic container",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Cheese",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Yellow cheese in a yellow plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Yogurt",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Quarts",
"Multiples": 1
},
"Description": "White yogurt in a white plastic container",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Fruit",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Red and green fruit in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Vegetables",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Green vegetables in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Meat",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Red meat in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Fish",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "White fish in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Bread",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Loaf",
"Multiples": 1
},
"Description": "White bread in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Cereal",
"Object Type": "Box",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Box",
"Multiples": 1
},
"Description": "Blue and yellow cereal box",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Juice",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Gallon",
"Multiples": 1
},
"Description": "Orange juice in a clear plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Soda",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Gallon",
"Multiples": 1
},
"Description": "Black soda in a clear plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Water",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Gallon",
"Multiples": 1
},
"Description": "Clear water in a clear plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Condiments",
"Object Type": "Plastic bottle",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Bottle",
"Multiples": 1
},
"Description": "Red and yellow condiments in a clear plastic bottle",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Spices",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Container",
"Multiples": 1
},
"Description": "Green and red spices in a clear plastic container",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Frozen Food",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Bag",
"Multiples": 1
},
"Description": "White frozen food in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Ice Cream",
"Object Type": "Plastic container",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Container",
"Multiples": 1
},
"Description": "White ice cream in a white plastic container",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Fruit",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Red and green fruit in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Vegetables",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Green vegetables in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Meat",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
"Number": 1,
"Units": "Lbs",
"Multiples": 1
},
"Description": "Red meat in a clear plastic bag",
"Location": "Top shelf, left door"
},
{
"Ingredient": "Fish",
"Object Type": "Plastic bag",
"Brand": "Unknown",
"Quantity": {
Notice the JSON is truncated at the end of the Google query? It reached the maximum token count of (2048) so there are more items that this query would spit out - I hope to figure out how to access a larger token count (or if the model is currently limited at this point). I didn't pursue the token further because the difference between OpenAI and bard is glaringly obvious, here is a tabulation of the results side by side:
领英推荐
GPT Vision
Pluses
Challenges
Pluses
Challenges
Summary
Neither service understood the structure of the photo (various shelves and locations in the door), and they missed several occluded items. Both missed the peppers (both red and green) and the lettuce (though it is possible Vertex AI would provide these if it didnt expire the tokens).
Overall - I give Google VertexAI a thumbs up on this task - it still needs some work and possibly a customer model, but the speed of the response and the level of details in identifying objects was very impressive!
What do you think about this experiment? Any comments or suggestions on improving it?
Data Analyst / Open to Work! | SQL | Looker | Excel | Python
2 个月Thank you for this fascinating article. I appreciate your effort in testing and explaining the use of this API.
Business Development Director, Playbook Labs & 815Labs - powered by Mito Group
10 个月Congrats! It is obvious. We have to learn more and more about AI its limitations, and possible use cases. I had a similar experience with my bookshelves. As you did?without any specialized training. The result is fascinating.
Communications, Media & Technology Consultant at Cognizant
10 个月Awesome experiment. Well done.
Global AI Solutions Leader, M&E | Sand Technologies
10 个月On behalf of peppers everywhere, I am outraged. But very interesting experiment. Thanks for sharing.