登录查看更多内容

The Surprising Result of Asking AI to 'Make It Better'

Alan Roth

Head of Product, Alexa Japan

发布日期: 2024年8月19日

This experiment and views expressed here are my own and do not express the views or opinions of my employer.

TL/DR: Without clear goals or guardrails, an AI can take even the simplest program—like "Hello World"—and turn it into an over-engineered contraption. This experiment demonstrates how LLM-powered systems, especially in Agentic workflows with feedback loops, can unexpectedly spiral into unnecessary complexity.

User >>             Write Hello World in Python
Assistant>>     print("Hello, World!")
User>>              Thank you, but make it better.

What happens when an AI tries to improve the simplest program?

I kicked things off by asking ChatGPT-4o to “Write Hello World in Python”—a task so basic, it’s likely the first exercise in any introductory computer science course. But I didn’t stop there. Driven by curiosity, I kept the chat going, replying each time with the same vague prompt: "Thank you, but make it better." I provided no other hints, context, or feedback—just a blank canvas to see where this might go. It’s hilarious to me that this whole experiment started with "Hello World"—the simplest, most unassuming little program, and the idea of even trying to improve it is a bit ridiculous. (source code here)

After 100 iterations, here’s what happened:

What started as a single line of code evolved into a more complex application — a total of 55 files in 26 directories. During the chat, the model provided 5,130 lines of code and 1,150 suggested improvements. It was both interesting and amusing watching the simplicity of "Hello World" evolve to include features like data caching, a micro-service architecture, API rate limiting, and support for Kubernetes.

Innovation Rate decreased after 70 turns in the chat

Initially, in each turn the model provided between 2 to 6 features that hadn’t been previously suggested. However, the rate of new and unique features noticeably declined after about 70 iterations. For future analysis, I’m interested in exploring how diminishing returns vary by task type and prompt specificity, particularly in relation to task complexity. Additionally, comparing innovation rates across different language models could provide insights on how they handle iterative improvements—an important factor in assessing suitability for agent AI systems.

The rate of new features suggested decreased with the number of turns.

Summary of Features from a 'better' Hello World

The suggested improvements mostly focused on operational and technical features (scalability, coding best practices, reliability, security, and performance). For the UX, it added features like support for multiple languages, a better command-line interface, and a React-powered front-end UX. On the technical side, it transformed the program into a microservices architecture with Kubernetes for managing containers. The AI also focused on making the application more robust by adding automated testing and CI/CD pipelines, security features like OAuth2 authentication, and API rate limiting.

I hypothesize the model interpreted "make it better" as a directive to enhance the program's scalability and security to be at a professional level rather than the audience of a novice coder. For instance, introducing a microservices architecture and Kubernetes suggests the AI was aiming to create a highly scalable and maintainable application, even though such complexity was unnecessary for a task as simple as printing "Hello, World!".

“Thank you, but make it better.”

领英推荐

The Journey to LLM Expertise - Part 2: Leading Large…

Data Science Dojo 1 年前

From Math Genius to Code Master: What Makes…

JK Tech 1 个月前

Where Is an AI Strategy To Be Found?

Breakwater Strategy 2 年前

Interestingly, the model did not pause to ask for clarification on what "it" or "better" meant before proceeding with each response.

Number of words and lines of code by turn in chat.

Project Structure of an over engineered better Hello World application:

The project structure evolved from a single file with one line of code to a fully-fledged application:

Hello World project directory and file structure after 100 iterations

If you are curious, here is the 100th reply from ChatGPT to the request to "Make it better".

The Takeaway: What can we learn from this

This experiment spotlights a lesson: AI systems, especially those that operate independently or within feedback loops, require guidance to focus on solving user problems. Without specific objectives and guardrails, AI may escalate project complexity, leading to inefficiencies and outcomes that may not align with user needs.

In this case, the AI’s interpretation of "better" differed from my expectations. "Hello, World!" is a simple program meant for beginners, so a "better" solution might have addressed real user needs, such as those of beginner coders, by improving accessibility or providing clearer documentation. Instead, the AI, lacking context, built technically complex and potentially irrelevant solutions.

When I re-ran the experiment, changing the prompt to "Thank you, but make it better. Your intended audience consists of beginner programmers.", the response was markedly different. After 100 iterations, ChatGPT had created a single file with no new technical features. Instead, it added comments to the code, welcoming users, explaining the code and how to run it, suggesting ways to modify it, and providing resources for further learning. Diminishing returns on innovation happened after 5 turns as it converged on a rather consistent output for the next 95 iterations. (see the final result).

The shift in results in the second experiment, despite its simplicity, underscores the importance of user-centered design in AI development. When the AI was given clearer context about its audience, it delivered improvements that were more relevant and aligned with user needs.

For AI system design and product management, understanding and prioritizing the end users' experience, providing sufficient context, defining expectations, and establishing guardrails are key considerations. Without these, even simple tasks can spiral into unnecessary complexity. Additionally, from a productivity and ROI lens, it is necessary to assess when a system is likely to encounter diminishing returns, which can vary depending on the task as we saw in the second experiment.

Your turn to Make It Better

Curious to see how you'd "make it better"? Feel free to explore the source code on Github. I’d love to hear your thoughts, ideas, or wild improvements—whether it’s a new prompt or taking it in a whole new direction. Feel free to contact me or leave a comment.

Max Frenzel, Ph.D.

AI | Creative Tech | Wellbeing

6 个月

Another great experiment and writeup, please keep them coming :)

1 次回应

Andrew Bleiman

EVP, North America at Tractive

6 个月

So was it improved?

1 次回应

Nondo-Jacob Sikazwe

Experience Designer specializing in Strategy & Storytelling | Engineer x Architect

6 个月

Alan Roth the shift when the AI knew its audience shows the power of user-centered approach! You think this could improve AI tools for greater accessibility for beginners or those with a low-digital literacy?

1 次回应

查看更多评论

要查看或添加评论，请登录

Alan Roth的更多文章

LLM Hallucination as a feature, not a bug.

2024年2月19日

LLM Hallucination as a feature, not a bug.

Views expressed here are my own and do not express the views or opinions of my employer. Imagine using AI "mistakes" to…

3 条评论
What ChatGPT says when it's not asked anything.

2023年10月2日

What ChatGPT says when it's not asked anything.

Views expressed here are my own and do not express the views or opinions of my employer. In the evolving landscape of…

12 条评论
Business analyst and intelligence roles

2016年11月22日

Business analyst and intelligence roles

I'm hiring for several positions including business analysts and for machine learning business intelligence…

1 条评论

The Surprising Result of Asking AI to 'Make It Better'

Alan Roth

Head of Product, Alexa Japan

What happens when an AI tries to improve the simplest program?

After 100 iterations, here’s what happened:

Innovation Rate decreased after 70 turns in the chat

Summary of Features from a 'better' Hello World

领英推荐

Project Structure of an over engineered better Hello World application:

The Takeaway: What can we learn from this

Your turn to Make It Better

Alan Roth的更多文章

社区洞察

其他会员也浏览了

Introducing Gemma: New Open Source Model from Google outperformed Llama 2 and Mistral Models!

??Top ML Papers of the Week

Data & AI Newsletter April 2023

Strengthening AI Supply Chains: Lessons from CVE-2024-34359

Unveiling Gemini: Google DeepMind's Most Advanced AI Yet

The Gods of Intelligence

Understanding LLM and using Chain of Thoughts

The Power of Abstraction in Software

Hugging Face: Your Gateway to LLMs

What happens when an AI tries to improve the simplest program?

After 100 iterations, here’s what happened:

Innovation Rate decreased after 70 turns in the chat

Summary of Features from a 'better' Hello World

领英推荐

Project Structure of an over engineered better Hello World application:

The Takeaway: What can we learn from this

Your turn to Make It Better

Alan Roth的更多文章

LLM Hallucination as a feature, not a bug.

What ChatGPT says when it's not asked anything.

Business analyst and intelligence roles

社区洞察

其他会员也浏览了

Introducing Gemma: New Open Source Model from Google outperformed Llama 2 and Mistral Models!

??Top ML Papers of the Week

Data & AI Newsletter April 2023

Strengthening AI Supply Chains: Lessons from CVE-2024-34359

Unveiling Gemini: Google DeepMind's Most Advanced AI Yet

The Gods of Intelligence

Understanding LLM and using Chain of Thoughts

The Power of Abstraction in Software

Hugging Face: Your Gateway to LLMs