AI for Software Engineering
For corporates, Software Engineering lifecycle is most important. This is most relevant for IT majors on where and how we can optimize the lifecycle with LLMs. In this post I will cover 2 different papers which provide deep insight into the stats of the art for Software Engineering.
The paper titled "Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code" provides systematic review of the recent advancements in code processing with language models, covering 50+ models, 30+ evaluation tasks, 170+ datasets, and 700 related works. They break down code processing models into general language models represented by the GPT family and specialized models that are specifically pretrained on code, often with tailored objectives. It discusses the relations and differences between these models, and highlight the historical transition of code modeling from statistical models and RNNs to pretrained Transformers and LLMs, which is exactly the same course that had been taken by NLP. They also discuss code-specific features such as AST, CFG, and unit tests, along with their application in training code language models, and identify key challenges and potential future directions in this domain.
As per their evaluation they have given a table and chart on where these models stand on HumanEval.
It is important to note that the authors maintain a live website where they keep updating the results. Worth checking out.
The 2nd paper was published in November and they extensive references (236) alone keep me browsing for hours. "Large Language Models for Software Engineering: Survey and Open Problems" paper covers the current state of the art in Software Engineering and how LLMs are being used. This paper sets out open research challenges for the application of LLMs to technical problems faced by software engineers. LLMs' emergent properties bring novelty and creativity with applications right across the spectrum of Software Engineering activities including coding, design, requirements, repair, refactoring, performance improvement, documentation and analytics. At a high level they cover the topics shown below
领英推荐
Further they elaborate on specific areas of Software Engineering like the evolution of program transformation.
Most models in 2023 are covered in this excellent survey.