登录查看更多内容

I Stumped All AI Models with My First-Grader's Homework

Caleb Sima

CSO | CEO | Founder

发布日期: 2025年3月10日

Sitting at the dining table helping my daughter with homework when I hit a question with multiple possible interpretations. Curious, I tested it on several AI models - they all failed spectacularly.

Tell me what you think? What's the right answer. See below to see how AI fared.

Claude 3.7 Sonnet

OpenAI o1 Pro

OpenAI o3-mini-high

OpenAI 4.5

OpenAI o1

OpenAI 4o

Gemini Flash 2.0

Glitch

3,600 位关注者

?? Francesco ?? Cipollone

Reduce risk - focus on vulnerabilities that matter - Contextual ASPM - CEO & Founder - Phoenix security - ??♂? Runner - ?? Application Security Cloud Security | 40 under 40 | CSA UK Board | CSCP Podcast Host

1 周

Caleb????

Mark Conklin

Principal Engineer at ARM

1 周

This isn't really a 1st grader test. It is an IQ test questions from the looks of it. It does seem very difficult from a spacial reasoning perspective. If you know, you know, if you don't, I believe that AI would struggle with this question.

Max Solonski

I build effective cybersecurity programs, exceptional teams, and rational processes

2 周

Can we please stop confusing ourselves by testing non-deterministic generative technologies with deterministic tasks? It does not prove that AI is stupid. It proves that we are.

1 次回应

Christopher M. Babie

Protecting the technology to electrify & decarbonize the planet @ GE Vernova

2 周

“C” - as you rotate the figure you would get all other representations (A,B,D) except for C

1 次回应

Ajay Arora

Entrepreneur | Investor

2 周

Given the question is so broad, I would default to as simple an explanation as possible especially given the context that this was asked of a first-grader (context is king ???? imo) — not that other answers aren’t correct as well. In my interpretation, I would say A is the “correct” answer because all the rest have one square in the second row from the bottom while A has three. Happy to be wrong, would love to hear why?

查看更多评论

要查看或添加评论，请登录

Caleb Sima的更多文章

The Real Story Behind AI Security Incidents

2024年10月29日

The Real Story Behind AI Security Incidents

Headlines scream about the latest "AI threat." But our analysis of 243 documented AI security incidents/issues between…

25 条评论
Building a Comprehensive AI LLM/ML Ops Marketecture

2024年10月1日

Building a Comprehensive AI LLM/ML Ops Marketecture

Access Google Slide Version: Introduction The complexity of the AI pipeline can be challenging to grasp, especially for…

10 条评论
Future of Cybersecurity: How AI is Revolutionizing Context, Coverage, and Communication

2024年6月18日

Future of Cybersecurity: How AI is Revolutionizing Context, Coverage, and Communication

This is a short summary of my Keynote talks at BSides and RVAsec. I encourage you read the full version on my blog or…

17 条评论

I Stumped All AI Models with My First-Grader's Homework

Caleb Sima

CSO | CEO | Founder

Claude 3.7 Sonnet

OpenAI o1 Pro

OpenAI o3-mini-high

OpenAI 4.5

OpenAI o1

OpenAI 4o

Gemini Flash 2.0

Glitch

3,600 位关注者

Caleb Sima的更多文章

社区洞察

其他会员也浏览了

Deepseek vs The World??????

Text to video levels up with Sora

Picking the right AI model from OpenAI’s lineup?

OpenAI faces critical test as Chinese models close the gap in AI leadership

Speech2Text thru AI /ML— Part III

A Non-Technical Guide to Harnessing the Power of GPT-4

Strawberry (da)queries are more sophisticated than single shot prompts

?? $10B into OpenAI! What Does It Mean for Us? ??

AI Tsunami explained

Do you have any socks left?

Claude 3.7 Sonnet

OpenAI o1 Pro

OpenAI o3-mini-high

OpenAI 4.5

OpenAI o1

OpenAI 4o

Gemini Flash 2.0

Glitch

3,600 位关注者

Caleb Sima的更多文章

The Real Story Behind AI Security Incidents

Building a Comprehensive AI LLM/ML Ops Marketecture

Future of Cybersecurity: How AI is Revolutionizing Context, Coverage, and Communication

社区洞察

其他会员也浏览了

Deepseek vs The World??????

Text to video levels up with Sora

Picking the right AI model from OpenAI’s lineup?

OpenAI faces critical test as Chinese models close the gap in AI leadership

Speech2Text thru AI /ML— Part III

A Non-Technical Guide to Harnessing the Power of GPT-4

Strawberry (da)queries are more sophisticated than single shot prompts

?? $10B into OpenAI! What Does It Mean for Us? ??

AI Tsunami explained

Do you have any socks left?