The Surprising Result of Asking AI to 'Make It Better'
This experiment and views expressed here are my own and do not express the views or opinions of my employer.
TL/DR: Without clear goals or guardrails, an AI can take even the simplest program—like "Hello World"—and turn it into an over-engineered contraption. This experiment demonstrates how LLM-powered systems, especially in Agentic workflows with feedback loops, can unexpectedly spiral into unnecessary complexity.
User >> Write Hello World in Python
Assistant>> print("Hello, World!")
User>> Thank you, but make it better.
What happens when an AI tries to improve the simplest program?
I kicked things off by asking ChatGPT-4o to “Write Hello World in Python”—a task so basic, it’s likely the first exercise in any introductory computer science course. But I didn’t stop there. Driven by curiosity, I kept the chat going, replying each time with the same vague prompt: "Thank you, but make it better." I provided no other hints, context, or feedback—just a blank canvas to see where this might go. It’s hilarious to me that this whole experiment started with "Hello World"—the simplest, most unassuming little program, and the idea of even trying to improve it is a bit ridiculous. (source code here)
After 100 iterations, here’s what happened:
What started as a single line of code evolved into a more complex application — a total of 55 files in 26 directories. During the chat, the model provided 5,130 lines of code and 1,150 suggested improvements. It was both interesting and amusing watching the simplicity of "Hello World" evolve to include features like data caching, a micro-service architecture, API rate limiting, and support for Kubernetes.
Innovation Rate decreased after 70 turns in the chat
Initially, in each turn the model provided between 2 to 6 features that hadn’t been previously suggested. However, the rate of new and unique features noticeably declined after about 70 iterations. For future analysis, I’m interested in exploring how diminishing returns vary by task type and prompt specificity, particularly in relation to task complexity. Additionally, comparing innovation rates across different language models could provide insights on how they handle iterative improvements—an important factor in assessing suitability for agent AI systems.
Summary of Features from a 'better' Hello World
The suggested improvements mostly focused on operational and technical features (scalability, coding best practices, reliability, security, and performance). For the UX, it added features like support for multiple languages, a better command-line interface, and a React-powered front-end UX. On the technical side, it transformed the program into a microservices architecture with Kubernetes for managing containers. The AI also focused on making the application more robust by adding automated testing and CI/CD pipelines, security features like OAuth2 authentication, and API rate limiting.
I hypothesize the model interpreted "make it better" as a directive to enhance the program's scalability and security to be at a professional level rather than the audience of a novice coder. For instance, introducing a microservices architecture and Kubernetes suggests the AI was aiming to create a highly scalable and maintainable application, even though such complexity was unnecessary for a task as simple as printing "Hello, World!".
“Thank you, but make it better.”
领英推荐
Interestingly, the model did not pause to ask for clarification on what "it" or "better" meant before proceeding with each response.
Project Structure of an over engineered better Hello World application:
The project structure evolved from a single file with one line of code to a fully-fledged application:
If you are curious, here is the 100th reply from ChatGPT to the request to "Make it better".
The Takeaway: What can we learn from this
This experiment spotlights a lesson: AI systems, especially those that operate independently or within feedback loops, require guidance to focus on solving user problems. Without specific objectives and guardrails, AI may escalate project complexity, leading to inefficiencies and outcomes that may not align with user needs.
In this case, the AI’s interpretation of "better" differed from my expectations. "Hello, World!" is a simple program meant for beginners, so a "better" solution might have addressed real user needs, such as those of beginner coders, by improving accessibility or providing clearer documentation. Instead, the AI, lacking context, built technically complex and potentially irrelevant solutions.
When I re-ran the experiment, changing the prompt to "Thank you, but make it better. Your intended audience consists of beginner programmers.", the response was markedly different. After 100 iterations, ChatGPT had created a single file with no new technical features. Instead, it added comments to the code, welcoming users, explaining the code and how to run it, suggesting ways to modify it, and providing resources for further learning. Diminishing returns on innovation happened after 5 turns as it converged on a rather consistent output for the next 95 iterations. (see the final result).
The shift in results in the second experiment, despite its simplicity, underscores the importance of user-centered design in AI development. When the AI was given clearer context about its audience, it delivered improvements that were more relevant and aligned with user needs.
For AI system design and product management, understanding and prioritizing the end users' experience, providing sufficient context, defining expectations, and establishing guardrails are key considerations. Without these, even simple tasks can spiral into unnecessary complexity. Additionally, from a productivity and ROI lens, it is necessary to assess when a system is likely to encounter diminishing returns, which can vary depending on the task as we saw in the second experiment.
Your turn to Make It Better
Curious to see how you'd "make it better"? Feel free to explore the source code on Github. I’d love to hear your thoughts, ideas, or wild improvements—whether it’s a new prompt or taking it in a whole new direction. Feel free to contact me or leave a comment.
AI | Creative Tech | Wellbeing
6 个月Another great experiment and writeup, please keep them coming :)
EVP, North America at Tractive
6 个月So was it improved?
Experience Designer specializing in Strategy & Storytelling | Engineer x Architect
6 个月Alan Roth the shift when the AI knew its audience shows the power of user-centered approach! You think this could improve AI tools for greater accessibility for beginners or those with a low-digital literacy?