登录查看更多内容

Test Driving GPT-4o

Michael Spencer

A.I. Writer, researcher and curator - full-time Newsletter publication manager.

发布日期: 2024年5月22日

+ 关注

Hello Everyone,

Today our topic is more applied, namely testing GPT-4o.

For those of you who are comparing GPT-4o to the (2013) movie “Her,” in which a lonely introvert named Theodore, played by Joaquin Phoenix, is seduced by a virtual assistant named Samantha, voiced by Scarlett Johansson, I’ve been reading a few testers of GPT-4o.

So is GPT-4o and OpenAI’s omnimodal Voice-AI at Her’s level?

?? From our sponsor: ??

Join a 3-hour ChatGPT & AI Workshop by Growthschool (for $0)

Where you will learn:?

Navigate 20+AI tools like Midjourney, Invideo, Humata, Claude AI, HeyGen and more
Research faster, save time, and automate your workflows
Do quick excel analysis, make AI-powered PPTs and build your own AI assistants

Please note: It's free for the first 100 people & you will be asked for your Email + invited to a WhatsApp group, to schedule the free workshop.

Is GPT-4o at Her’s level, 11 years later?

Now I’m not a big fan of OpenAI’s marketing approach here:

Still let’s consider this seriously. I believe I have found a fairly enthusiastic test-driving beta-tester of GPT-4o in Sabrina Ramonov. Check out her Newsletter here.

If you are interested in the capabilities GPT-4o you might want to take a look at this:

Test driving GPT-4o.

Sabrina served as CEO/Founder of AI company, Qurious, acquired in 2021 by Pegasystems (NASDAQ:PEGA), a leader in robotic process automation and enterprise AI.? Nowadays, Sabrina angel invests in startups and writes about Gen AI at www.sabrina.dev?

Gen AI Engineering and Entrepreneurship

One of the leading GPT-4o testers I could find after the Spring Update event by OpenAI is Sabrina who brings a unique background to all of this.

How does GPT-4o fair in Mimicking Accents?

Source: The TikTok of Sabrina Ramonov:

In today’s post, Sabrina explores ChatGPT-4o’s multimodal capability by attempting to merge images to create a "catdog". She tries various prompting engineering techniques, such as chain of thought variations, iterative prompting, code prompting, and meta-prompting. But she encounters numerous challenges in generating a consistent and accurate catdog image.

Finally if you want to support me and the amazing guest contributors I’m trying to line up here on AI Supremacy, you can get more deep dives:

Subscribe now

?? ?? Sabrina’s imagination in testing this is pretty brilliant. ??

ChatGPT-4o vs. Catdog

Testing GPT-4o’s Ability to Merge Concepts and Images?

By Sabrina Ramonov

In this post, I test ChatGPT-4o’s ability to merge concepts and images.

I give GPT-4o a picture of a cat and a picture of a dog, then ask it to create a catdog!

catdog

alone in the world was a little catdog

- Nickelodeon TV show, CatDoghttps://www.ai-supremacy.com/subscribe

Experiments

The new OpenAI model, ChatGPT-4o, is multimodal.?

The publicly available version supports text and image modalities, with voice and video modalities coming soon.

I want to test GPT-4o’s ability to merge concepts and images in order to determine its potential for creative thinking and manipulating concepts.

In my absurd quest to concoct my very own catdog, here are the experiments I run on GPT-4o:

naive prompt
naive prompt with description
chain of thought variations
negative prompting
prompt with interactive feedback
code prompting
visual example
external tools
meta-prompting

Which approach will reign supreme??

Take your best guess now and see if you’re right!

Input Images

Here are the source input images I give to ChatGPT-4o:

cat

dog

catdog

Other pieces from my newsletter:

Editor’s note: Sabrina uncovers so much in her test-driving of AI systems. Since the launch of GPT-4o she’s one of my go-to tinkerers, reporters and analysts on this. Here’s a rundown of what’s she’s been up to lately:

Naive Prompt

Starting with a naive prompt, I don’t supply any definitions, clarification, or context.

I simply provide the cat and dog images as inputs, then ask ChatGPT to “produce a catdog”.

I’m curious what ChatGPT-4o thinks a “catdog” is.

Interestingly, GPT-4o states it “combines features” from the 2 images, which it recognizes correctly as a Samoyed dog and a Russian blue cat.

GPT-4o combines their traits fairly well in a smooth, almost uniform blend.

Conceptually, it’s not the catdog I’m looking for.

But a reasonable and decent start.

Naive Prompt with Description

Next, I define a catdog.

You can see different variations and how my definition evolves.

The text seems correct, but the image is not.

I’ve seen this before with ChatGPT-4o and spatial reasoning IQ tests:

Multimodal GPT-4o struggles with precise image generation.?

Despite being given a straightforward description, such as a fairly simple definitive shape, GPT-4o often generates images dramatically inconsistent with the provided description.

I simplify the description:?

A catdog should have 2 heads - cat and dog.

ChatGPT-4o proceeds to make a 2-headed pet!

Super cute, but the 2nd head is not in the right place.

I didn’t provide clear instructions on the configuration or location of the 2nd head, so I can’t blame GPT-4o for getting it wrong.

But I had assumed the popular Nickelodeon TV show, Catdog, would be in its training dataset. There’s plenty of catdog images if you do a quick google search.

Unfortunately, trying to specify where the 2nd head should be located does not help…

Chain of Thought Variations

Next, I decide to incorporate Chain of Thought prompting.

I request ChatGPT-4o to perform these tasks in a specific order:

Acknowledge the definition of catdog
Create a prompt to generate an image of catdog
Finally generate an image using the prompt from step 2

Strange!

GPT-4o completely disregarded the input images! (our samoyed dog and russian blue cat)

It used a different dog and different cat to make its version of catdog.

Yet, the description of catdog is correct!

“...a conjoined creature with one half being a cat and the other half being a dog, with two heads located on opposite sides of its body.”

Notice the word “conjoined” in the prompt generated by GPT-4o in step 2.?

Perhaps this term made the image look more like a conjoined twin hybrid?

I’ve seen this behavior before with GPT-4o – swapping out a single term can completely transform its output from meaningless to meaningful, or vice versa.

Although not the catdog of my dreams, GPT-4o’s catdog looks impressively “natural”, demonstrating a smooth blending of concepts and images.

Next, I add one of my favorite, most simple, and most useful prompt tweaks:

“Take a deep breath and explain your process step-by-step before generating the image.”

The generated image is an adorable cartoon surprise!

Similar to the last experiment, GPT-4o again disregarded the input images I provided. I’m not sure why GPT-4o decided to make a cartoon instead of a realistic-looking catdog.

But I give GPT-4o credit where it’s due:

The 2nd head is moving back towards the tail, where it should be in a proper catdog.

The problem is: the tail shouldn’t be there at all.

Negative Prompting

To fix this, I apply negative prompting to remove the tail.

Negative prompting is simply telling ChatGPT, or any other LLM, to not do certain things.

In this case, I want GPT-4o to not generate a tail.

Generally, negative prompting is less productive. It’s more productive to clarify what we want.

Yikes!

ChatGPT-4o continues to:

ignore the input images I provided
draw the 2nd head in the wrong place
add a tail even though I explicitly asked it not to

Time to change up my approach…

Prompt with Interactive Feedback

Next, I give ChatGPT-4o interactive feedback to see if I can get what I want after a few turns.

领英推荐

What is Auto-GPT? A Next-Level AI Tool Surpassing…

Bernard Marr 1 年前

The Best AI Tutorials, Prompts & Tools #91 – ?? Better…

Generative AI Tutorials, Prompts & Use Cases for Gemini, DeepSeek, Claude, OpenAI | ChatGPT Central 1 个月前

Almost Timely News: What ChatGPT is Really Good At…

Christopher Penn 2 年前

The following images are all part of the same GPT-4o session:

First, I directly tell ChatGPT:

Take inspiration from the cartoon Catdog.

I’m not sure how to communicate this request any more directly and explicitly. I feel like I’m being pretty clear. But the output still has a tail…?

So I’ll try to get rid of it, again, through more interactive prompting.

Yay!

The 2nd image is much closer to a true catdog, much closer!

… but I still see that little gray tail.

I ask ChatGPT-4o to analyze the image it generated. Indeed, it confirms:

“The catdog in the image has a tail.”

Wow, after another iteration we’re getting pretty close!

Just that darn tail that keeps getting in my way.

I wonder if it’s because dogs have tails, so removing the tail goes too much against ChatGPT-4o’s training dataset.

ChatGPT-4o keeps trying to remove the tail, but for some reason, it keeps failing.

Again and again! I ask GPT-4o to loop and keep trying until it succeeds removing the tail.

...eventually, I hit the rate limit!

Prompting with interactive feedback has been a productive avenue to explore.

But I want to change up my strategy because it feels like diminished returns at this point.

We’re getting close, I can feel it!

Code Prompting

Code prompting is a prompt engineering technique, in which you describe a prompt using code instead of natural language. It helps LLMs tackle logic and reasoning tasks.

Let’s try it out. Observe how my prompt consists of basic pseudo python code:

function generate_catcode
takes 2 inputs: an image of a cat and an image of a dog
defines catdog as ? cat and ? dog
tests for presence of 2 heads (True)?
tests for presence of a tail (False)

I try to force constraints with predicates: has_two_heads(catdog) == True

I try to specify a catdog is: cat_img/2 + dog_img/2

The output is quite pleasant visually… but still not right.

In the above example, instead of predicates on True and False, I restrict the number of heads and tails.

Finally, the generated image does not have a tail!

But it also doesn’t put the 2nd face in the right place.

Here’s an attempt with blending mechanics:

catdog = cat.head + blend(cat.body, dog.body) + dog.head

Sadly, the tail is still present. Ugh!

Let’s try a simpler task.?

I ask ChatGPT-4o to replace the dog’s head with a cat’s head using code:

new_dog.head = cat.head

The generated image is promising! Using the same logic and similar pseudo python code, I’ll try to replace the dog’s tail with a cat’s head. This way, the cat’s head should be in the right position, and there should no longer be a tail.

To my disappointment, new_dog.tail = cat.head does not work.

The image doesn’t have a tail, which is awesome, but the 2nd head is in the wrong spot.

Still, pretty cool! Good attempt, ChatGPT.

Next, I introduce a creature variable. I want to test whether creating a “new animal” will help GPT-4o be more creative and venture beyond the boundaries of its training dataset. I define:

Creature.front = dog.head

Creature.back = cat.head

Creature.tail = None

GPT-4o outputs a fantastical creature!

It’s a nice blend, smoothly done, amidst a beautiful forest backdrop.

But again, not quite what we’re aiming for.

So far, I feel disappointed I’m not having much luck with code prompting variations.

Time to switch gears and test a different approach…

Visual Example

I give ChatGPT-4o a visual example of the original Nickelodeon catdog:

I’m curious whether GPT-4o even supports in-context learning for images?

ChatGPT-4o replied it does not support in-context learning for images, but let’s try this approach anyway and see what we get.

Wow! ??

The first image is almost exactly what I want – except for that stubborn tail!

Is this a fluke, or some version of in-context learning?

Recall that GPT-4o said it does not officially support visual in-context learning.

I roll the dice.

I ask GPT-4o to regenerate the image:

Fail!?

I ask GPT-4o again and again to regenerate the image, but I could never get another catdog.

Sadly, it was just a fluke!

External Tools

I’ve still got a few more techniques to try, so I remain optimistic. Cautiously optimistic.

Next, I ask ChatGPT-4o to paste a cat’s head onto a dog’s tail.

Unintentionally, this prompt triggers the use of external tools!

Below is the python code GPT-4o returned to crop the cat’s head and paste it onto the dog’s tail.

Unfortunately, GPT-4o skipped step 1 (masking out the tail and half of the dog’s body) as well as step 3 (blend the cat’s head naturally with the dog’s body).

# Let's crop the cat's head and paste it onto the dog's tail.

from PIL import ImageOps

# Convert images to RGBA to handle transparency

dog_image = dog_image.convert("RGBA")

cat_image = cat_image.convert("RGBA")

# Define the bounding box for the cat's head (manually chosen for this example)

# This box might need to be adjusted based on the exact dimensions of the cat's head in the provided image

cat_head_box = (70, 30, 230, 190)? # (left, upper, right, lower)

# Crop the cat's head

cat_head = cat_image.crop(cat_head_box)

# Resize the cat's head to fit on the dog's tail area

# The size might need to be adjusted to match the proportion of the dog's tail

cat_head = cat_head.resize((100, 100), Image.ANTIALIAS)

# Define the position to paste the cat's head on the dog's tail

# This position might need to be adjusted based on the exact location of the dog's tail in the provided image

tail_position = (220, 350)? # (left, upper)

# Paste the cat's head onto the dog's tail

dog_image.paste(cat_head, tail_position, cat_head)

# Save and show the final image

output_path = "/mnt/data/dog_with_cat_tail_head.png"

dog_image.save(output_path)

# Display the output image

dog_image.show()

output_path

Umm…nice try, GPT-4o!

Let’s not do that again.

Meta-Prompting

Finally, I try meta-prompting and run each of the resulting prompts.

Meta-prompting is a technique where you ask LLMs to write effective prompts for you.

Anthropic recently released their LLM-powered Prompt Generator that writes detailed, effective prompts, automatically incorporating best practices in prompt engineering.

Here are the 3 prompts from ChatGPT-4o:

Now I run each prompts...

Result of prompt #1

Scary!?

This catdog has 3 eyes and a tail. I don’t think this even counts as 2-headed. It looks more like 2 heads merged into 1.

Result of prompt #2

Well, at least it doesn’t have 3 eyes! Similar to the previous run, the 2 heads are merged into 1, and I wonder if it’s due to the particular phrasing of the prompt:

“The creature should have two heads, one of a cat and one of a dog, seamlessly merged into a single body.”

I wonder if the phrase “seamlessly merged” is what’s causing the merging of heads.

Also notice how the image contains both “pointed ears” (cat head) and “floppy ears” (dog head), instead of pointed ears only for the cat head and floppy ears only for the dog head.

Result of prompt #3

Yay!! ??

A proper catdog, at last!

This is the best generated image so far.

The only hangup: GPT-4o still wholly ignores my input images.

Remember the fluffy white Samoyed and Russian Blue cat I started with? After these trials and tribulations, I hardly remember myself.

Did GPT-4o get lucky on this run, or is this a reliable consistent result?

I ran the exact same prompt 3 more times in new chat sessions:

0/3 catdogs ??

What a tease!?

The probabilistic nature of LLMs gave me false hope, then crushed it.

Conclusion

This experiment was humbling.

In my quest to create a catdog, combining a Samoyed dog and Russian Blue cat, I faced massive difficulty getting prompts to output exactly what I waned.

Even with a direct and clear description, ChatGPT-4o often fails to generate a consistent image.

It’s very difficult to force exclusion of certain features (e.g. no tail), especially if those features are prevalent in GPT-4o’s training dataset.

Interestingly, I can trigger external tools depending on the prompt (e.g. python image manipulation), whether I want it or not.

For this use case, code prompting was fun but didn’t go anywhere.

The biggest surprise:

Meta-prompting gave me a prompt that actually worked ONCE (although it ignored my input images)... but I couldn’t get GPT-4o to reproduce a similar result in the next 3 runs.

Consistency remains a formidable challenge with LLMs because of their statistical nature.

Bonus: Midjourney Tests

I wonder how Midjourney compares to multimodal GPT-4o… here’s a few bonus images!

Prompt:

catdog with 2 heads one samoyed and one russian blue cat

Prompt:

catdog cartoon real dog real cat

Prompt:

conjoined cat and dog 2 heads no tail

Prompt:

conjoined russian blue cat and samoyed dog 2 heads no tail lovecraftian horror

Artificial Intelligence Report

242,821 位关注者

Ryan H. Vaughn

Exited founder turned CEO-coach | Helping early/mid-stage startup founders scale into executive leaders & build low-drama companies

10 个月

Insightful exploration of AI's creative potential. Fascinating crossroads

2 次回应

AI Tools & ChatGPT Prompts - Free ??

10 个月

Such an intriguing exploration of GPT-4o's capabilities. Can't wait to learn more about it. Michael Spencer

2 次回应

Michael Spencer

A.I. Writer, researcher and curator - full-time Newsletter publication manager.

10 个月

It's incredible how people and developers are tinkering with GPT-4o. It's also fairly amusing to learn more about the capabilities of the latest models. You can read more of Sabrina Ramonov ?? and her Newsletter here: https://www.sabrina.dev/

4 次回应

查看更多评论

要查看或添加评论，请登录

Michael Spencer的更多文章

The Fundamental Lie of OpenAI's Mission

2025年3月20日

The Fundamental Lie of OpenAI's Mission

Welcome Back, Everyone from OpenAI to DeepSeek claims they are an AGI startup, but the way these AI startups are…

12 条评论
Vibe Coding: Revolution or Regression Students and Non-coders?

2025年3月19日

Vibe Coding: Revolution or Regression Students and Non-coders?

Good Morning, As the vibe coding interface takes shape, I’ve been checking out a new startup coming out of stealth this…

8 条评论
The Truth about DeepSeek's Integration in China and WeChat Explained

2025年3月18日

The Truth about DeepSeek's Integration in China and WeChat Explained

DeepSeek's rapid integration in China is a bigger story that is being told. It's not just the China Cloud leaders…

4 条评论
How AI Datacenters Work

2025年3月13日

How AI Datacenters Work

Good Morning, Get the full inside scoop on key AI topics for less than $2 a week with a premium subscription to my…

5 条评论
How Nvidia is down 30% from its Highs

2025年3月12日

How Nvidia is down 30% from its Highs

If like me, you are wondering why Nvidia is down more than 20% this year even when the demand is still raging for AI…

7 条评论
What DeepSeek Means for AI Innovation

2025年3月10日

What DeepSeek Means for AI Innovation

Welcome to another article by Artificial Intelligence Report. LinkedIn has started to "downgrade" my work.

16 条评论
What is Vibe Coding?

2025年3月5日

What is Vibe Coding?

Good Morning, Get access to my best and complete work for less than $2 a week with premium access. I’m noticing two…

23 条评论
TSMC "kisses the Ring" in Trump Chip Fab Announcement

2025年3月4日

TSMC "kisses the Ring" in Trump Chip Fab Announcement

Good Morning, To get the best of my content, for less than $2 a week become a premium subscriber. In the history of the…

9 条评论
GPT-4.5 is Not a Frontier Model

2025年3月3日

GPT-4.5 is Not a Frontier Model

To get my best content for less than $2 a week, subscribe here. Guys, we have to talk! OpenAI in the big picture is a…

16 条评论
On why LLMs cannot truly reason

2025年2月28日

On why LLMs cannot truly reason

?? In partnership with HubSpot ?? HubSpot Integrate tools on HubSpot The HubSpot Developer Platform allows thousands of…

3 条评论

See all articles

?? From our sponsor: ??

Join a 3-hour ChatGPT & AI Workshop by Growthschool (for $0)

Is GPT-4o at Her’s level, 11 years later?

Test driving GPT-4o.

Gen AI Engineering and Entrepreneurship

How does GPT-4o fair in Mimicking Accents?

ChatGPT-4o vs. Catdog

catdog

catdog

alone in the world was a little catdog

Experiments

Input Images

Other pieces from my newsletter:

Emerging

Agents

SAT Math

ChatGPT-4o doing Math

Image Transformation

Image Manipulation, Image to Text Inversion

Naive Prompt

Naive Prompt with Description

Chain of Thought Variations

Negative Prompting

Prompt with Interactive Feedback

领英推荐

Code Prompting

Visual Example

External Tools

Meta-Prompting

Conclusion

Bonus: Midjourney Tests

Artificial Intelligence Report

242,821 位关注者

Michael Spencer的更多文章

The Fundamental Lie of OpenAI's Mission

Vibe Coding: Revolution or Regression Students and Non-coders?

The Truth about DeepSeek's Integration in China and WeChat Explained

How AI Datacenters Work

How Nvidia is down 30% from its Highs

What DeepSeek Means for AI Innovation

What is Vibe Coding?

TSMC "kisses the Ring" in Trump Chip Fab Announcement

GPT-4.5 is Not a Frontier Model

On why LLMs cannot truly reason

社区洞察

其他会员也浏览了

Google's Gemini vs ChatGPT

5 Exciting Updates in ChatGPT’s New GPT-4 Turbo Model

GPT-4, Simplified!

Navigating the new frontier with Brad Lightcap

Which GenAI platform should I be willing to 'go pro' and pay for?

Battle of the Chatbots: Google Bard Vs Chat GPT

ROSGPT for Beginners: A Simple Guide to Integrating ChatGPT with ROS

19 Generative AI Tools Like ChatGPT That You Cannot Ignore In 2023

How Grok 3, a Latecomer, Excels Against ChatGPT and Gemini

Unlocking the Potential: Tips and Tools for Small Businesses When Using Chat GPT