The Limitations of GPT and GitHub Copilot in Real-World Programming
I'm writing this without having access to GitHub Copilot X. I'm not sure my understanding will be any different if I do.
As a programmer, I couldn't take the word "laptop" literally. I couldn't place my MacBook Pro on my lap for more than 30 minutes when writing code, as it became too hot. The situation improved after switching to the M1 chip, but not fundamentally. Using an online IDE helps my laptop return to my lap, but it's worth noting how much computing power is required to verify the correctness of the code. It brings me to the limitations of GPT and GitHub Copilot as programming tools.
ChatGPT or GitHub Copilot Doesn't Know How to Write Code
I apologize for the disappointment, but it's true—they don't. GPT and Copilot are trained language models (with Copilot fine-tuned for programming languages) that predict the most likely subsequent tokens. That's not the same as programming. Although we write code at clear abstraction layers like an article and use features like typing to reduce the chance of making mistakes, we must execute it to verify its correctness. GPT models primarily focus on learning language patterns, making them advanced code auto-completion tools—but that's not programming.
The current AI programmer fad is full of wishful thinking. A few weeks ago, I shared a programming challenge with ChatGPT to demonstrate that GPT 3.5 and 4 struggle to find an entirely correct solution for all test cases. However, when provided with a detailed natural language description of the problem, GPT models have a 50% chance of generating the right answer to pass all tests (including GPT 3.5). It shows that the test code is not even relevant, emphasizing the models' nature as language models.
领英推荐
The Fleeing Copilot from The Real Programming Battlefield
For me, programming means growing the code. Occasionally, I need to start new pieces of code, and my GitHub Copilot can be helpful in those situations (ChatGPT can be even more helpful). However, I'm usually conservative when adopting solutions, often needing to delete most of the code provided to me. Most of the time, when coding, I perform small transformations on my existing code, and I'm probably not writing a new line of code but modifying an existing one. These modifications usually occur in complicated contexts where only my automated tests can verify if the code still works. My Copilot is completely silent now, prompting me to check if it's still active constantly.
At this point, I'm unsure whether I should adapt my programming style to allow my Copilot to contribute more effectively or if the Copilot should improve to better fit my programming style. Interestingly, this conundrum resembles a typical pair programming dilemma.
While GPT and Copilot are helpful tools, they do not change the foundation of real programming. At least, not yet.