The New AI Iterative Development Paradigm (and Why AI == IA)
There are a number of open-source AI tools that claim to create an entire application from only one prompt. I decided to take them for a test drive.
The results were mixed, although this is not surprising given these tools are still in their infancy. They will almost certainly improve over time along with the rest of the AI technology.
But one takeaway was clear ...
You will never be able to remove humans completely from the systems development loop.
Why is this the case? Read on to find out, but first, let's examine the results of our "completely automated AI engineering" experiment.
AI projects that generate apps from a single prompt
Projects including GPTEngineer, MetaGPT, and LangChain-Coder leverage the concept of prompt chaining. The goal of these projects is to perform the work of an entire software development team. Based on only a single prompt, they generate the entire code for an application. Each project has varying levels of interactivity, although each aspires to be largely automated.
MetaGPT states that it "takes a one-line requirement and outputs user stories / competitive analysis/requirements/data structures/APIs/ documents, etc." It does this by using AI prompts to simulate "product managers/architects/project managers/engineers. It provides the entire process of a software company along with carefully orchestrated SOPs."
Initial generative AI prompts are used to define requirements. Subsequent prompts in the chain are used to design and build the application.
So, what application are we going to build? The example given in the MetaGPT docs is "Write a cli snake game." Rather than a video game, I decided to write a utility that I can use in my business. In addition to technical writing, I also create puzzle books including crossword puzzles. This can be a time-consuming task, so I have been looking to make a crossword puzzle editor.
The basic capability could be simple. The app could display the crossword grid and allow the user to edit characters, words, and clues.
It could also be complicated. The editor could include the capability to generate portions of the puzzle, place theme words in primary locations, etc. There are a wide range of possibilities.
Here is the single prompt I used.
The product should be a Python Django application that allows users to create a 13 x 13 crossword puzzle. The application should only accept valid dictionary words from a specified file (dict.txt) and convert these words to uppercase for the puzzle. The application should also provide an easy-to-use interface for users to input words into the puzzle and save and load their crossword puzzles.
NOTE: I had installation issues with LangChain-Coder, and GptEngineer simply asked me what the requirements were. I thought this was odd since I attempted to specify them in the initial prompt. Perhaps it is too interactive. Thus, this article will focus on my experience with MetaGPT.
AI-Generated Requirements and Design Artifacts
MetaGPT parsed out the requirements as follows.
[("Develop a Python Django application that allows users to create a 13 x 13 crossword puzzle", "P0"),
("Ensure the application only accepts valid dictionary words from a specified file (dict.txt)", "P0"),
("Convert all words from the dict.txt file to uppercase for the puzzle", "P0"),
("Provide an easy-to-use interface for users to input words into the puzzle", "P1"),
("Allow users to save and load their crossword puzzles", "P1")]
This is an accurate restatement of the requirements from my prompt. I expected it would elaborate a bit more on the capabilities, but at least it accurately summarized what I told it.
MetaGPT further stated, "There are no unclear points.". Great, it seems confident that it knows what to do.
The AI chose some good design patterns. It said "The main challenge will be the implementation of the crossword puzzle logic. We will use a backtracking algorithm to fill the 13x13 grid with words from the dictionary. The words will be stored in a Trie data structure for efficient lookup."
There are two key design choices here, both of which I endorse.
Data models and sequence diagrams were automatically created using Mermaid. The AI output also included a competitive analysis. This was a nice surprise. I verified that each of the other references was a commercial product or open-source project.
Given our basic requirements, it may have overstated the utility of our product. Nonetheless, it was helpful to get pointers to other similar products out there.
领英推荐
Finally, we get to the Code
MetaGPT proceeded to write what looked like decent code. It summarized basic knowledge of each component fed into the prompts. For example, Django apps have a models.py file. Here is what MetaGPT said about it.
The 'models.py' file defines the Puzzle and TrieNode models. The Puzzle model has a grid attribute for storing the crossword grid and a words attribute for storing the words used in the puzzle. The TrieNode model is used for efficient word lookup.
The code for the Trie data structure seemed quite good. It was simpler actually than what I had written in the past.
class TrieNode:
def init(self, letter: str):
self.letter = letter
self.children = {}
self.is_end_of_word = False
def insert(self, word: str):
node = self
for letter in word:
if letter not in node.children:
node.children[letter] = TrieNode(letter)
node = node.children[letter]
node.is_end_of_word = True
def search(self, word: str) -> bool:
node = self
for letter in word:
if letter not in node.children:
return False
node = node.children[letter]
return node.is_end_of_word
class Trie:
def init(self):
self.root = TrieNode("*")
def insert(self, word: str):
self.root.insert(word)
def search(self, word: str) -> bool:
return self.root.search(word)
It also gave updates along the way including how much money I was spending on the ChatGPT API calls.
2023-08-28 09:54:30.493 | INFO | metagpt.provider.openai_api:update_cost:81 - Total running cost: $0.417 | Max budget: $3.000 | Current cost: $0.104, prompt_tokens: 2616, completion_tokens: 419
2023-08-28 09:54:30.494 | INFO | metagpt.actions.write_code:run:77 - Writing utils.py ...
Unfortunately, it then crashed. And it was going so well up until this point.
openai.error.RateLimitError: Rate limit reached for 10KTPM-200RPM in organization org-<obfuscated> on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues.
Through no control of my own, the program hit an OpenAI rate limit. By itself, this would have been okay. However, there is no restart button. On my second attempt, the entire process started all over again.
And this was my biggest complaint. Software development, by its nature, is an iterative process. This whole exercise was a big reminder of this fact.
Further, iterations within the process will always involve humans, at least to some extent. Coding tasks may indeed shift largely to the machines over time. However, as any technology practitioner understands, that is only one piece of the puzzle.
AI as Intelligent Assistant
After that experiment, I still wanted a crossword editor. So I went to ChatGPT and used a smaller scope prompt to start by simply displaying a crossword grid. I decided to use Python and the PtQt6 library for the user interface, a wrapper around the Qt library.
This was a perfect use case for engineering with AI, as I have never developed using this GUI library before. AI can lead the way.
ChatGPT wrote some nice code and I gave it a whirl. However, instead of a 13x13 grid, it displayed all the letters/tiles in one big column. I informed it of its error, and it humbly apologized before providing me with working code. I was able to successfully build off of that and have been learning PtQt6 along the way.
AI as an Intelligent Assistant is the sweet spot for the technology at the moment.
Why humans can't be removed from the loop
In the first edition of this newsletter, I put forth the thesis that AI turns software engineering squarely into a requirements problem. If you can clearly define what you want, AI can help you build it. But you need to understand your application in extraordinary detail.
Formal programming languages leave no detail to chance. Every bit, byte, and pixel is specified by the code. Given the vast amount of permutations possible, any unspecified requirements or assumptions can cause the as-built application to veer further and further away from the desired target.
The reason iterative development took hold is that there are so many details that eventually need to be defined, it is nearly impossible to identify and specify them upfront.
The same is true for building applications with AI. Use AI to build components and services iteratively, and constantly review, edit, and refine what is generated. Feed this updated code back to the AI so it stays in sync with the software being built going forward.
Thus, AI == IA (Intelligent Assistant) in engineering use cases. You go back and forth with your AI engineering assistant during development, i.e. the new AI-assisted iterative paradigm.
Even as AI gets more efficient and effective, there will almost always be gaps in the requirements. Humans will always need to guide that iterative process. Humans still need to prioritize and decide what the business actually needs. These higher-level activities can also benefit from an intelligent assistant. Engineering with AI will certainly get easier, but humans will still remain in the loop.
Additional Resources
For a primer on how to use ChatGPT to write code for you, please see my book Rapid Software Engineering with ChatGPT. It walks you through how to successfully design and build an entire application using AI.