Writing Programs with AI (Part 1)
Eric Rachlin
Cofounder & CTO of Count ? Previously: Cofounder & CTO of Body Labs (acq. by Amazon), Principal Scientist at Amazon
Last month, OpenAI opened up its ChatGPT Code Interpreter model to all ChatGPT+ subscribers. Since launch, ChatGPT has been able to help users write code. Now, with Code Interpreter selected, ChatGPT can respond to users by writing a program, running it, then using that program's output to generate its response. When it works well, it's remarkable. I first got access to Code Interpreter in May and my first impression was eye-opening.
As an initial test, I uploaded a csv file of credit card transactions. Without me even asking, The Code Interpreter Model immediately wrote code to load and inspect my csv file. After running the code, it produced a plain English description of my file's overall structure and contents, which it then asked me to confirm. Once I did, ChatGPT generated a list of options for how it could automatically analyze my credit card transactions. The model's ability to proactively offer helpful suggestions –?and then execute them automatically –?was so impressive, it felt like I was watching a canned demo. By enabling ChatGPT to write and execute code, Code Interpreter substantially expands the collection of task ChatGPT can take on (more on this in Part 2).
Tools like Code Interpreter will change the way we work with computers. What's more, Code Interpreter is merely the latest in series of impressive AI-powered tools for creating and controlling software with language. If the current rate of progress continues, the cost of creating software is about to plummet. Similarly, many of the barriers that prevent people from learning to code will be eliminated. Even if you don't write code yourself, it's worth understanding the productivity gains AI is about to unleash.
Part 1 of this post talks thought how language models like GPT make it much easier to tell computers how to perform tasks. Part 2 will dive in to specific AI-powered coding tools and explain why they have so much disruptive potential. After Part 2, my plan is to write a follow up post on the implications of this disruption. I'll do my best to make all these posts as accessible to non-programmers as possible.
Why Code Matters?
When you use your computer or phone, the programs, apps and webpages you interact with are written using code. Code, written by programmers, is how computers are instructed to do things. Of course most people aren't programmers, and even for those of us who can program, writing code takes time. As a result, we're all heavily reliant on programs and code written by other people. What we can do with computers depends almost entirely on code we can't see or control.
Writing code is a major bottleneck across the entire economy. When working with software, new features and integrations which are easy to describe often take weeks or months to implement. No-code scripting solutions like Zapier aim to address this bottleneck by allowing non-programmers to create custom workflows which glue together various products and services (e.g. connecting google sheets to Salesforce).
Without code, Zapier and other no-code platforms are limited in the types of workflows they can support. Zapier is great for moving information from one app to another, but less effective at manipulating or transforming that information. It's also constrained in the range of UIs and apps it supports. In recent years, the market for no-code and low-code platforms has grown by billions of dollars per year. There is high demand for easy-to-use tools for creating and controlling software. At the same time, anyone who uses these tools is aware of their shortcomings.
Performing Tasks with Language
When you want someone to perform a task, you typically use language to communicate your request. You can't waltz around asking folks to do anything you want, but you can ask all sorts of people to do all sorts of things. Natural language isn't as precise as code, but it's much more flexible. For instance, when you seek assistance, you can easily use language to provide key context-specific details, such as who you are and why you need help. Language then allows both parties to seek clarifications and offer feedback. In general, language empowers people to deliver higher-level services (e.g., completing someone's taxes or planning a wedding) that surpass the capabilities of most software.
Recently, AI-powered interfaces have begun to bridge the gap between language and code. In late 2022, when OpenAI introduced ChatGPT, millions of people started telling a computer to perform tasks using natural language. Even though ChatGPT's output was initially restricted to text (as opposed to executing actions), it presented an exciting glimpse of the future. The combination of Language + AI could result in a general-purpose interface for task execution. Instead of relying solely on traditional software, you could treat ChatGPT more like Google. You arrive, ask it to do something, and it does!
Google for tasks
With Google, when you need information, you don't need to know where to look for it. Instead you go to google.com, type what you're looking for and (hopefully) receive a list of relevant search results. With ChatGPT, that same paradigm now exists for a wider range of activities. You don't need a standalone app to check grammar, translate text, summarize transcripts, answer questions about documents, generate todo lists, revise existing content, etc... It's all baked in to ChatGPT.
领英推荐
As of now, you can't use ChatGPT to file your taxes or plan an entire wedding, but you absolutely can use it as a low cost tax advisor or wedding planning assistant. What's more, with millions of users, ChatGPT allows OpenAI to collect massive amounts of data about what tasks it can and can't perform. Overtime, this enables OpenAI to improve performance and expand the range of tasks its ChatGPT supports. This is feedback loop is extremely powerful and we should expect it to accelerate.
If ChatGPT's output were limited to text, it wouldn't be a viable option for performing most tasks. To be a useful and powerful assistant, ChatGPT also needs to be able to take actions on its users behalf. This is why at the end of March, OpenAI introduced ChatGPT plugins. At its core, ChatGPT is a chat interface built on top of two large language models (GPT-3.5 and GPT-4). These models only output text, but with plugins enabled, the text outputs can be interpreted as API calls. Using API calls, plugins allow ChatGPT to run commands, query external data, and trigger actions in 3rd party apps. If ChatGPT started out having "ears + a brain + a mouth", plugins give it "hands".
Plugin-Powered Workflows
Right now there are substantial guardrails on how plugins work, but these will be relaxed over time. Currently each ChatGPT plugin only supports a handful of specific functions (e.g. "search OpenTable for X" or "Turn this text in to audio") and any plugin ChatGPT uses must be explicitly turned on by the user. In theory multiple plugins can be chained together to carry out complex tasks (see video below), but in practice this won't happen organically until OpenAI allows ChatGPT to activate multiple pre-approved plugins at once on the user's behalf.
Over the next year, ChatGPT will continually improve based on real world usage. As this happens, we can expect to be see more and more multistep, plugin-powered workflows being generated directly from language. This may place ChatGPT in competition with no-code tools like Zapier, or it may result in ChatGPT being incorporated into no-code and low-code platforms. In either case, users will find themselves asking AI to perform an increasingly wide range of tasks. The resulting GPT-generated workflows will include plugins that search the web, retrieve information from a users documents, perform actions using 3rd party apps, and of course write and run code.
As in the early days of Google, these increasingly powerful workflows will spread organically via shared links. Instead of the now antiquated "Let me Google that for you", ChatGPT power users will respond with "Let's ask ChatGPT!" when asked "Can you help me do X?" by a friend or colleague. At that point, the inadvertent AI evangelist will simply ask ChatGPT "Can you please help my friend with X?" and then share a link to the resulting ready-to-use AI-powered workflow. Soon enough, the most useful ChatGPT prompts will spread like memes, as they are created, shared and iterated upon.
AI-powered Code Generation
ChatGPT-style plugins are a simple, elegant solution for getting a large language models (LLM) to perform tasks. Rather than output words, LLMs can output a series of explicit commands, each of which is executed by a plugin. From the user's perspective, plugins turn an LLM into an AI-powered assistant which accomplishes tasks by creating and executing no-code workflows. Using such assistants, users will be able to perform an increasingly wide range of tasks without relying on pre-existing task-specific software.
Reducing people's reliance on prewritten software is a big deal, but code isn't going away. Even as AI improves, code is still the most efficient, reliably way to provide instructions to a computer. Code is also be needed whenever AI interfaces with existing software. For example, ChatGPT plugins are themselves written using code. In some cases, a plugin could be powered directly by AI, but in most situations code is needed to ensure that a plugin's behavior is efficient, reliable and repeatable. On top of this, we are still years away from full apps (including UIs, data storage, system access, etc...) being powered purely with AI.
For all these reasons, AI's near term impact on software is tied to its ability to write code. In Part 2 of this post, I'll talk more about the specific AI-powered coding tools I and other software developers are using. I'll also connect these tools back to how the Code Interpreter UX enables an incredibly wide range of software, potentially even allowing ChatGPT to generate new plugins on the fly. Once again, even if you don't write code yourself, I think it's useful to understand how disruptive AI-powered code generation is for software development.
Stay tuned!