Will AI automate software developers anytime soon? Reflections after having tested OpenAI Codex.
Michael H?kansson
Founder | Building the future of analytics for small businesses
The discussion around machines replacing humans in their jobs has been going on for decades. We have typically seen repetitive physical work in factories being replaced by physical robots. As more and more of the products and services we build are digital, more and more workers also produce digital products - like software developers. A couple of years ago, no one would even have suggested that AI could automate a software developer. Today, some do suggest that, and we might have seen the start of it. This article will cover my reflections after testing OpenAI Codex – the most advanced AI system to date for generating programming code.
What is OpenAI Codex?
Before we talk about the premise of AI replacing human programmers, we need to understand what an AI capable of doing this would be. The most simplified description of a software developer I can think of is that they give instructions to computers by writing code. Writing code is what OpenAI Codex has been trained to do. As an extension of OpenAI's advanced natural language model GPT-3, Codex is an AI system that can translate the natural language to code. Imagine if you, in the same way, you use natural language to give your voice assistant (Siri, Alexa, Google Assistant, ...) a command to set a timer for 5 minutes, could tell it to write a Python program that imports a CSV file and returns the difference in days between the date of the earliest and the oldest entry. Codex can do that, and it was the first problem I was faced with in the OpenAI Codex Challenge.
The OpenAI Codex Challenge
On 12 August 2021, OpenAI arranged a coding competition where close to 4800 participants, for the first time, could test out Codex and their coding skills through five Python coding assignments. This is how I first got to play with Codex. Out of the 4800 participants, 842 completed all assignments (brag time: I placed in the top 1%).
The assignments were similar to the problems you typically find in programming contests and cases in software development recruitment:
The problems were of increasing complexity. It started with the problem described earlier, where the assignment was to find the difference in days between the earliest and oldest entry dates in a CSV file. The given input was the problem description, examples of input and output, and an output explanation. There was also a code template with imports, a function definition, and the examples expressed as code.
Codex itself tried to solve the assignments on its own throughout the competition, simply by trying different solutions that it generated itself. To the right, you can see the solution code that OpenAI Codex generated and successfully submitted after 7 attempts.
And here's the code I submitted. It's not how I would usually write it, but I've only written a total of 5 characters of this code. Codex generated everything except for abs(). The max date is subtracted from the min date in the generated code, so the returned date will always be 0 or negative. This is a bit strange (and apparently incorrect) way to answer. The short fix I chose was to use the absolute value of the calculation.
And this is the way I continued the competition. I let Codex generate the first bit of code, read it through, and tweaked it. Codex generated more than 90% of the code I submitted. Wait, what, more than 90%!?
Codex playing on its home turf?
After the Codex Challenge, I sat with the feeling that Codex was performing a bit too well. My first thoughts were a bit on the conspiracy theory end: "Did OpenAI hard-code different suggestions?" When the initial conspiracies toned down, and I started thinking more clearly, I thought that since OpenAI both develop Codex and the problems of the coding competition, it wouldn't be strange if they direct the first-ever public event to problems where they think the system can shine. It turns out they did – and no one can blame them – I would have done the same.
Codex is trained on a dataset called HumanEval, which contains loads of programming puzzles like the ones in the competition. Not the same problems as in the competition, of course, but similar. And this is what's so cool. By training Codex on enough examples of problems, OpenAI has made it able to generate code that (sometimes) solves problems it hasn't seen before.
Codex's ability to generate code to solve problems it hasn't seen is, of course, cool and impressive. But what happens if we move away from these "toy puzzles" and into the "real problems" that human programmers solve?
What programming really is, and where Codex fits in
We can break down the art of programming into two parts:
领英推荐
1. Understand the high-level problem you're trying to solve and decompose it into smaller problems. Let's say you're developing a to-do list app. The high-level problem you're solving is to help people organize and remember what they should do. You need to implement a list of text items and their corresponding checkboxes? ability to sort based on priority, and when an item is checked, its check status is updated.
2. Translate the smaller problems to code. Let's take some parts of the problems defined above and translate them into JavaScript code. First, we need an object structure of a to-do list item with text, priority, and done-status values. Let's tell Codex exactly that. Here's what happens:
The green comment is what I fed into Codex's JavaScript sandbox via the private beta I got access to after the competition. What it generates is an item with the requested fields.
Then we need a function that inverts the done status of a todo item and returns it. So now let's once again tell Codex that:
The green comment is what we input to Codex, and the code is what Codex writes. It seems like what we can expect it to do.
So for the two parts describing what programming really is, we see that Codex can be quite helpful in the second – translating already decomposed, small simplified problems into code.
Most innovation is reusing and combining existing stuff
I recently read the book Eat, Sleep, Innovate, in which the authors define innovation as "something different that creates value." I like this definition because it removes pressure from innovation having to be something big and new. This is how many innovations nowadays happen. We take already existing things or concepts and combine them – like combining a camera to a phone. If the output is different from what existed before and creates value, it is innovation. If we put this into a software development context, the first to-do app didn't have to invent a new type of checkbox and a new sorting function – it could use existing code and packages. You, too, can use existing code to combine them and, in that way, build your unique product. If this is what you want to do, Codex can be of great help. On the other hand, if you aim to develop a groundbreaking algorithm solving a problem no one has researched before, your luck with Codex will be limited.
So... will AI automate software developers anytime soon?
The short answer: No.
The more nuanced answer: Some aspects of software development can become more efficient and maybe even automated using Codex. In all versions of the future that I can imagine, this kind of technology will power many applications and code editors (like already in GitHub Copilot that I've written about before). Soon, many programmers will generate working code using natural language for clearly defined and simple functions and operations. This means you might not have to go to Stack Overflow as often to check stuff up (this is what ~80% of programming is for me, haha). But in the same way as using a voice assistant to activate a timer when boiling eggs doesn't automate the whole egg boiling process, nor does Codex automate the whole programming process. And even if Codex can generate code, someone will still need to validate and test it. Software testing and quality assurance will be crucial, also with tools like this.
Another time-consuming but important part of programming is writing proper documentation and understanding existing code. Codex can translate not only natural language to code but also the other thing around. Here to the right, we have given Codex the text marked in blue. This is the code that Codex itself produced to solve problem 1 in the competition. I then wrote the start of a description section for this code. Codex then goes on to explain the code on lines 9-12. Really convenient when it works! Really dangerous if it doesn't work and it goes undiscovered!
To summarize: Right now, Codex is nothing more than a fun experiment to me. Many issues need to be handled before we can use the system in the real world, like that the code it produces can be under copyright, licenses, and of course, the model's fairness and bias. But let's worry about that another day. Today, let's enjoy getting to glimpse into and play around with the future.
Want more?