登录查看更多内容

Migrating Code with LLMs

Dave Costenaro

Chief AI Officer at Invisibly

发布日期: 2024年3月5日

Much of what has been written about AI code generation deals with the authoring of new code. However, tremendous business value lies in the promise of taking old code and updating or migrating it toward new functionality, architectures, or programming languages.?

At CSG, designing software and technology solutions for our clients often involves a lot of custom development along these lines: updating, migrating, translating, tweaking, connecting, and more.

Over the past year, we have been evolving our practices to strengthen and accelerate such code migrations by leveraging the radical new capabilities of LLMs like GPT-4, Mistral, and Code Llama.?

The technology is impressive, but absolutely has its share of best practices and caveats that need to be understood to implement them effectively for code migration tasks. Here’s what we’ve learned in terms of Do’s and Don’ts:

Don’t:

Don’t expect a magic solution —You won’t be able to simply chuck an entire codebase of “Language A” into an LLM and prompt it to write the whole thing in “Language B”. There are limitations to the holistic and architectural treatment of large texts which require segmenting, and which I’ll talk about more in a minute.
Don’t assume the output is correct?—?You shouldn’t deploy any code written by an LLM immediately into production without first performing human review, automated testing, or both. LLMs are phenomenal at taking a first pass at migrating code to get you off the starting block and on your way… but they will still need significant handholding and iteration, especially on the last mile.?

Rocky Bhatia 7 个月前

CodeTeller: Intro. Part 1. Your Code-Speaking Guide

First Line Software 4 个月前

Replace Conditional with Polymorphism

Omar Ismail 3 年前

Do:

Work with small segments of the codebase, one at a time?—?LLMs can only consider and generate a limited amount of text at one time. More specifically, they have a finite context window, so it is best to have a system that will methodically work through individual pieces of the codebase in sequence. (At the time of this writing, most LLM context windows are on the order of “thousands of words”. This window size is growing as new research comes out, but accuracy and performance are typically sacrificed as a byproduct. New architectures will likely be needed to significantly move the needle on this.)
Prepare your codebase before migrating or translating?—?Run your pieces of code through an initial series of prompts asking the LLM to add explanatory comments, look for bugs and optimizations, add unit tests where missing, and suggest architecture simplifications that divide it into more manageable, modular sections. (As an aside, this can help a lot for both LLMs and human developers).
Calibrate your expectations?—?Anticipate an 80/20 solution that can vastly accelerate the work, but you still need to do heavy lifting on the last mile. Think: iterative and dynamic; test and verify. There is research coming out now showing that this kind of approach speeds up development, but can be a mixed bag with respect to quality of code without proper attention. Be sure to prepare frameworks for unit testing, QA/QC expert review, and user acceptance testing.
Consider your Target Programming Languages?—?Out-of-the-box accuracy in generating code varies from language to language (See Figure 2 below). LLMs are typically most accurate with the most popular languages?—?like Python and Javascript?—?because they have the most training data floating around on the Internet. It’s still likely to be helpful even with obscure languages, but this is something to test before pursuing a full blown migration. If your language(s) are striking out, just having the LLM translate code into pseudocode can be effective, either as an output for human developers, or as an intermediate step for an additional set of LLM prompts.?

In conclusion, leveraging LLMs for code migration presents a transformative opportunity for businesses to modernize legacy systems. Our experience at CSG underscores the necessity of a balanced approach that combines the power of AI with thoughtful human oversight and systematic planning and testing. By adhering to strategic practices?—?such as segmenting code, enhancing codebase readability, and setting realistic expectations?—?organizations can harness the power of AI to not only expedite the migration process but also improve the quality and maintainability of their software.

***

Dave Costenaro is Chief Data Officer at CSG Solutions | Helping Businesses Thrive with AI and Data | Contact Us

Arthur Holt

Chief Analyst at Holt Analytics

8 个月

Thanks for this! I had experimented with using a local copy of the code llama 34B param model to refactor some of my long scripts and it didn't work magically. Going to try that again with your Do's and Don'ts ??

1 次回应

VMS

8 个月

Dave Costenaro Thanks for Sharing ??

查看更多评论

要查看或添加评论，请登录

查看全部

Migrating Code with LLMs

Dave Costenaro

Chief AI Officer at Invisibly

Don’t:

领英推荐

Do:

更多精彩文章

社区洞察

其他会员也浏览了

Replit Agents: Cursor Who?

Neo4j Graph Tech Weekly (Edition:2)

Modernizing Legacy Systems with StarCoder's AI-Driven Code Migration

Jit Tech News 14

OpenTelemetry: Empowering Observability and Interoperability in Modern Applications

Our approach on LLMs Development

Can AI Write Production-Ready Code? What Every Developer Should Know

Backend 101 - A Guide to OpenAPI and API-First Approach

#145 Elevating Test Case Generation with Advanced Language & Code Analysis

Don’t:

领英推荐

Do:

RAGs to?Riches

2024年2月19日

Accelerate Search Testing with?AI

2023年11月15日

AI Regulation and Risk for St. Louis Businesses

2023年8月8日

“IBM Project Debater” Squares Off Against Human Debate Champion

2019年2月19日

AI in the Midwest

2018年8月13日

GDPR and Machine Learning Black Boxes

2018年7月11日

Game of Thrones, AI, and Family?Legacy

2018年1月17日

Preparing for Artificial Intelligence

2018年1月9日

Blockchain and Bitcoin - What Energy Companies Should Know

2017年2月27日

Create Your Own U.S. Government Budget

2016年10月10日

社区洞察

其他会员也浏览了

Replit Agents: Cursor Who?

Neo4j Graph Tech Weekly (Edition:2)

Modernizing Legacy Systems with StarCoder's AI-Driven Code Migration

Jit Tech News 14

OpenTelemetry: Empowering Observability and Interoperability in Modern Applications

Our approach on LLMs Development

Can AI Write Production-Ready Code? What Every Developer Should Know

Backend 101 - A Guide to OpenAPI and API-First Approach

#145 Elevating Test Case Generation with Advanced Language & Code Analysis