GPT-4 - a new drunk senior developer in your team
Contracting

GPT-4 - a new drunk senior developer in your team

This weekend I finally found enough time and motivation to dig deeper into Large Language Models (LLMs), mainly due to the release of GPT-4. So I decided to hire it (ChatGPT+ subscription) as an assistant to solve the least favourite parts of my daily software engineering job. In four days, GPT-4 has helped me to write database migration and better text formatting code for iOS and Android applications, answered some questions about configuration of different sorts and wrote an interactive front-end for summarising the content of 1M+ lines of JSON files. In addition to just speeding up the process it also reduced friction of taking on these concrete tasks (I probably wouldn’t have worked on these for couple of more weeks).

To summarise my experience in one sentence:

Coding with GPT-4 is like having a new drunk senior developer in the team who doesn’t have access to compiler.

Let’s dissect this sentence to find some topics which could improve developers collaboration with LLMs:

  • Coding with GPT-4 - it feels like interview or collaboration. The model seldom gets everything right on the first try, so it needs some directions. Sometimes it’s interesting and requires thorough thought, but sometimes it just involves copying compiler error. How can we ask better questions? Or more concretely, how to write (or let IDE plugins write) good prompts, so LLMs would give expected results with least messages and tokens?
  • new - the model doesn’t have any knowledge of the code base and architecture, so we cannot expect it to give us ideal solution without introducing at least some insights of what and how we use it. This is especially annoying when using Github Copilot autocomplete feature, where I would expect the model would at least get the types right in statically typed languages. How can we automatically provide LLMs with the necessary context and knowledge of our code base, architecture, and practices to improve their responses?
  • drunk senior developer - the model already has quite extensive knowledge of software engineering, but sometimes makes weird mistakes, i.e. hallucinates. How can we reduce hallucinations in LLMs?
  • in the team - it could be used individually, e.g. IDE plugin, but it could be easily used in other context also. For example, automated PR reviews as an additional layer of defence for hard-to-notice bugs. Where can we make LLMs ubiquitous so it could help the team without anyone manually calling it? (at the time of writing this post
  • doesn’t have access to compiler - GPT is like a developer who writes everything in one go and then checks whether it works as expected. Only that we are the ones who have to check if it works and then let it know if something is off. How can we automatically check the model response (e.g. compile code) and ask for improvements (e.g. ask to fix resulting error message)?

Excluding Github Copilot, I’ve actively learned about and used LLMs for only couple of days so probably some, if not all questions, are already solved. And if not yet, then they might be tomorrow morning. For example during writing this post, a blog post about Github Copilot X came out which answers/solves quite a few of the issues I’ve had.

We are living in fascinating times. While I'm excited to delegate mundane tasks to advanced AI ["to advanced AI" added by GPT-4], but at the same time I'm slightly anxious about the potential misuse of these powerful tools by individuals with malicious intentions.


PS! No LLM was used in writing this post.

PS! On second thought, while writing the last statement, I realized that I should probably run this by my new "drunk"[quotes added by GPT-4] team member, who is more than ["also" -> "more than" by GPT-4] capable of taking on an editor's job.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了