Software Engineering in the Age of LLMs.
Recent advances in Large Language Models have captured the imagination of the public with popular tools like ChatGPT. Unsurprisingly, these tools have also generated a lot of interest within technology companies. The emergence of LLMs has caused some level of uncertainty in the future of software engineering as a profession even for some seasoned programmers. In this article I want to share what researchers have learned so far about LLM applications in software engineering, how Copilot is being used, LLM use beyond code generation, and what I think this means for software engineers.
Machine Learning in Software Engineering
In the last decade Machine Learning applications have been finding their way into software engineering, usually in places where optimization via heuristics has been exhausted. For example, in optimizing data center operations, or in test case selection and prioritization [1]. Large technology companies like Facebook [2] and Google have been experimenting with applying ML to software engineering, however the same capabilities have not been universally available across the industry. More recently, specifically with the advances in Large Language Models, ML is being applied to automate the “primary” activity involved in software development which is writing code. This was enabled by advances in ML architecture, access to cloud compute, and availability of large free to consume code repositories provided by open source.?
Academic research in application of LLMs to software engineering [3] is primarily focused on software development (code generation, completion, and comprehension) and software maintenance (program repair, aka bug fixes) activities. There are also areas of software engineering where application of LLMs have not been thoroughly explored like requirements engineer, software design, and software management (aka estimation). Most folks have probably heard about ChatGPT and Copilot, however researchers have been exploring a large variety of LLMs in the context of software engineering with more then 50 different models being applied in software engineering research going back to 2018.
Learning how to code is hard so it is no wonder that folks are excited about making code generation easier. On the other hand, this idea might feel threatening to those who make their living after having learned this hard skill.??
GitHub Copilot
Several research studies have focused specifically on GitHub Copilot, the most popular commercial tool. Copilot is not the first intelligence code assistant to be studied. Google blogged about their application of ML-Enhanced Code completion which showed promise in reducing code time by 6%. At the same time researchers at CMU published findings [4] about a Copilot like tool in which they found no improvement in time to completion or correctness. Researchers at Harvard University working with participants recruited mostly form academia concluded [5] that Copilot did not necessarily improve completion time however most participants preferred to code with Copilot since it reduced search time and gave them a good starting point. Research published by GitHub [6] noted that there is a high correlation between user acceptance of Copilot code suggestion and user perception of productivity. Researchers from Microsoft [7], GitHub’s parent company, noted a 55% productivity boost in their experiment of using GitHub Copilot to implement an HTTP JavaScript server.
领英推荐
Note that this experiment was set up to incentivize participants to complete the task faster. These findings are yet to be published in a journal or presented at a conference. Finally, researchers from UC San Diego worked to establish a theory [8] of how programmers work with Copilot. They noted two modes of usage, an accelerated mode where the programmer knows the correct code to type, and Copilot simply produces it faster, and exploration mode where the programmer does not know the correct code and uses Copilot for exploration. In the second mode a sizable portion of time is spent on verifying the Copilot suggestion.
Personal Experience
The theory of usage published in [8] very much aligns with my personal experience with Copilot in Stripes production codebase. In cases where I know the correct solution Copilot works as a better autocomplete. I know what code needs to be produced next and Copilot correctly predicts what I want to type and so it is trivial for me to accept the result and move on. In cases where the solution eludes me, whether it is because I am working in an unfamiliar language or in a code base that is new to me, I sometimes write a comment and let Copilot suggest an implementation which I then spend time modifying and validating. When Copilot guesses right the experience is seamless and genuinely feels more productive, when Copilot guesses wrong the experience is frustrating to the point where I tend to turn it off in code bases where I am comfortable. Copilot in its current phase is a tool for experts, people who already have a solid grounding in programming and can quickly evaluate the suggestions being provided.?
An Emerging Specialization
Code generation is not the only use for LLMs in technology companies. LLMs are being used to quickly apply ML to all types of technical problems. About half of projects submitted for a recent Hackathon at Stripe were leveraging an LLM, like ChatGPT, to implement a novel application or capability. The ease of access to the technology and its flexibility makes it a compelling tool to use for solving all types of problems. There is an emerging new tech specialization known as an “AI Engineer.” Folks successful in these roles are skilled in Prompt Engineering. They understand the capabilities of a particular LLM well enough to provide the right use case specific context for the LLM to predict the correct answer to a specific problem. Examples of this that I have seen in practice are user support applications, where the LLM prompt focuses the model on documentation for a particular system making it an excellent assistant to the user reducing the dependence on human support agents. These positions do not require a deep understanding of ML, NLP, or Deep Learning and therefore can be staffed by folks without an advanced degree; however, they are technical in nature and as a result are quite well compensated.??
What does this mean for new software engineers?
This fall when I was visiting the RIT campus I overhead a conversation between a parent and a student in which the parent stated that all the CS undergrads are in trouble since all the code is going to be written by AI. While it is highly likely that the use of intelligent assistant tools will become standard practice across software engineering it is not likely to eliminate the profession. I have not yet seen interview questions aimed at evaluating a software engineer's ability to work with an AI assistant. However, just like the ability to be effective at using google search to find documentation and known good solutions the ability to interact with AI assistants gives professionals a practical advantage. ChatGPT can already solve most CS1 level questions. At the same time it is still very much required that software engineering graduates understand the fundamentals of programing. AlphaCode 2, Googles newest code generating ML, is estimated to be able to perform better than 85% of programming competition competitors. As a result grinding on LeetCode might not be as important in the future. Consider LLM as tools, more advanced tools than what we've seen before, but tools nonetheless. It is still up to you, the engineer, to decide on what software to write, to validate that the software produces the desired outcome, and to integrate this new software into the overall system. As a software engineer you need to learn to leverage these tools in your work and you will feel more productive, if for no other reason than having fewer browser tabs open. To excel in software engineering, you need to keep learning even after you have left college and entered the industry. The knowledge CS students acquire in an undergraduate program sets them up well to succeed in using this new generation of tools.?
References
Dynamic IT Lead | SysAdmin & Fullstack Dev | Cybersecurity Focused | 10+ Years
5 个月Nikolai Avteniev thank you very much for this well-written article. I had the pleasure of meeting you during an industry expert event with BrainStation. I'm curious, although I recently completed the first version of a project using Google Gemini, I've yet to use Co-pilot or other AI-coding partner apps while coding. Do you recommend this approach for pushing yourself through challenges and learning or is time to embrace them for the sake of speedier development?
Principal Applied Scientist in Demand Utilization for Ads
11 个月In my experience, the auto complete functionality is not that useful. I however got a lot of value by using ChatGPT as a thinking buddy or as a better stack overflow.
Software Engineer Intern @ NYC DOT & Software Engineer @ Easy Meets & GSWEP Mentee
11 个月I believe - from a students perspective - that the usage of AI powered tools in education might be training us into becoming AI engineers rather than actual software engineers. Because of this we might be getting job postings of AI engineers rather than SWEs - just with a smaller paycheck. Who knows, maybe AI just created a new job market with a lower threshold for education/leetcode.