How I wrote a working web app with ChatGPT and Claude, and what you can learn from it

Mark Ridley

Experienced CIO/CTO - tech, strategy, corporate innovation & startup advisor. Founder @ Seeto

发布日期: 2024年8月28日

Prelude

I’m a professional techie, but no longer a professional developer - my badge and git permissions were handed in over a decade ago. This (long) post is aimed at engineers, technologists and technically minded product folk, and reflects my experience using Generative AI tools to build a working web application from the perspective of someone who is extremely rusty on the day-to-day tasks of being a software engineer. I'm afraid this isn't very accessible if you're not familiar with software or product development.

I am writing this because I found the experience of using off-the-shelf Large Language Models (LLMs) was transformative for me; I’ve written more working applications in the last 6 months than over the last 6 years, and I’ve really enjoyed doing it. This piece chronicles the development of the most complex of those applications, a vanilla Javascript, HTML and CSS client side application for visual concept mapping. I’m going to cover what I learned about the process of working within a team of GenAI agents to build a simple web application.

Why build an application with GenAI anyway?

I’m writing a book. My co-author, Dr Andrew MacLaren , and I are currently about 80,000 words (or 400 pages) of content into the manuscript. Ahead of one of our weekly calls I decided that I wanted to sketch out the current structure visually. If I had a physical white board, it would be full of Post-it notes, but in the absence of the space to do that in what passes for my ‘office’ (in reality, a pile of books, with bottles of wine on the side), I needed a digital tool to do it.

I know what you’re thinking. Miro, Lucidchart, Diagrams.net, even Figma or Google Slides already exist. There’s an entire genre of author-focused tools, like Scapple (built by the good folks at Literature & Latte Ltd ) that deserve a shout out. But I wanted something simple - just some notes on a canvas. I had a pretty clear idea what I wanted it to do too, but simplicity was absolutely the goal. Just notes, and a canvas.

I’ve been playing with GenAI tools to write code for about a year now, and it’s important to reiterate that although I am, in general terms, a techie, I am not a professional developer. I may have passed for one about 15 years ago, but those times are long passed. However, having ChatGPT at my disposal has meant that I’ve created more, useful, working code in the last six months than I have in the previous six years; among the random things that I’ve built this year are a Python video time-lapse script for a Raspberry Pi (to… record a mushroom growing), an OpenAI powered personal assistant that summarises my Google calendar for the day ahead, a Chrome plugin that pastes into text boxes as if it was being typed (unusually useful if you want to demo ChatGPT), and an internal version of Golinks for my teammates at Seeto to use. A year ago, most of these projects would have been started, but never finished and usable.

The thing I have found most useful in using a GenAI tool is how much having the assistance cuts down on the frustrating setup time to start actually creating something useful. This is especially relevant to occasional or hobbyist developers; for those of us who are infrequent coders, our developer toolchain is not well used, and isn’t furrowed with the deep and comfortable grooves from our digital reuse. It’s normally frustratingly complex to dust off and set up our environment each time we have an idea.

This, then, is the story of actually building a working app with GenAI tools and what I learned along the way. If you want the TLDR? Those people that tell you that GenAI will never write working code - they’re wrong. Those people that tell you that GenAI will replace developers? They’re wrong too. As it so often is, the murky truth is somewhere in the middle.

Let’s meet the?tools

As I mentioned, I’ve been using LLMs to help write scripts and micro-apps for over a year, and found them extremely helpful, but always bumped into limitations as the codebase and scope grows. I’ve also not really had time to spend pushing that envelope, until the last few weeks.

I’m lucky enough to have a number of tools at my disposal; I tend to code in VS Code, and have the paid version of Github Copilot available within the IDE. I also have premium/paid versions of ChatGPT (and now Claude), and the free versions of Amazon Q and Google Gemini Code Assist. However, I settled on using only ChatGPT and Claude rather than the IDE-centric assistants, which I’ll discuss as we go.

Having worked regularly with the GenAI tools, I tend to fall back to ChatGPT, and that was still the case when I started this project. With prolonged use, I’ve developed some behaviours to work around the limitations of context windows (the ‘working memory’ of LLMs), which often mean that the output can become untrustworthy. It’s become habitual for me to start new conversations when I spot signs of hallucination, but in this instance I decided to take a more ‘agentic’ approach, where I started three ChatGPT chats in different browser tabs; a ‘product manager’ persona, a ‘senior engineer’ persona and a ‘devops engineer’ person. These tabs became my team of agents to work on the project.

“Are you aware of an online mindmapping tool called Scapple?”

Being a more practised CTO than I am a developer, the first thing I did after deciding that I wanted a tool was to consider my “build vs buy” options. This is a packed space, and I’m already familiar with a number of tools that have the capacity to draw mind maps and concept maps. Naturally, Miro, Lucidchart and Draw.io are brilliant choices for this type of thing, but there are also a large number of specialist tools for authors. As I researched and tested, I discovered Scapple from Literature & Latte, and was deeply impressed with the small team building tools for a specific set of users that they deeply understood. However, installing an application on my PC wasn’t the experience that I wanted, and the feature set was a little too broad for my requirements (but if you’re a writer, you should check Scapple out).

Having ticked off my personal “build vs buy” checklist, I still hadn’t found my ideal solution for an incredibly simple application that ran in a browser. Given that I now have some experience building out simple apps with LLMs, the desire to quickly sketch out an application in code grew, and the next thing I knew I had started prompting.

Mark asked: are you aware of an online mindmapping tool called Scapple?

ChatGPT said: Yes, I’m aware of Scapple. Scapple is a freeform mind-mapping software developed by Literature and Latte, the creators of Scrivener. It allows users to make notes and connect them in a flexible, non-linear way, mimicking the process of jotting down ideas on a piece of paper and drawing lines between related concepts. Scapple is particularly popular among writers, researchers, and anyone who needs to organize their thoughts visually. It is available for both macOS and Windows.

Having set the scene for the scope of the application, I followed up with

“I’d like to use an online version so I don’t need to install it, but nothing equivalent seems to exist for writers. How might I go about creating an online version that uses only a browser?”

This led to GPT giving me a fairly thorough set of recommendations to proceed to building the application.

GPT responded with a solid set of recommendations

Because I’m pretty rusty on the current best practice for building web apps, I proceeded to ask ChatGPT to give me instructions on how to set up a VS Code devcontainer and the git repository on Github and followed the instructions. Having got to a point where my blank sheet was laid out in front of me, I asked ChatGPT, “Please create me a README.md in markdown to get started”.

Why README.md?

The well known issue of LLMs hallucinating is often exacerbated by the size of their context window. Over the last year we’ve seen astonishing capacity increases in the number of tokens (tokens are nearly, but not quite, equivalent to words) that an LLM can retain and work with (called the context window), but there are still limitations. When you are pasting hundreds of lines of application code into any LLM, you can stretch its limitations quite quickly. Over time I’ve started to think of LLMs as friendly but helplessly untrustworthy and deeply forgetful colleagues. It has helped me to develop a sixth sense of where the LLM is reaching the limits of its context window (the mistakes and misremembering becomes more frequent), and I now think of the context window as the tool’s short term memory. Within the tools themselves, they have little in the way of long term memory, and so we need to help them. When I’m coding with an LLM, the README.md file serves as one of my pillars of long term memory to keep the assistant on-mission.

There are dual benefits, of course; there’s no secret to the benefit of documentation for actual human developers that come along behind, including coming back to your own code. It’s become second nature for me now to start projects with a README that will not only remind me of what I was doing, but also will serve as a document I can share back with an agent to re-initiate a conversation when I’ve blown through the context window limitations.

Back to the?app

GPT had made some understandable but undesirable suggestions when defining the scope of the app. Amongst other things, it had designed a stateful, two-tier application with a front end (in HTML & Javascript) and a backend that made use of Node.js and MongoDB. Whilst a completely pragmatic, and somewhat generic suggestion, it’s not what I wanted. I always try to follow the principle of ‘Simple and working first’, and building the complexity of a backend was not necessary for my ‘working’ criteria. So, I asked GPT to simplify, and started to steer it into its role as a product owner, saying:

“great! For now, let’s keep this extremely simple. My first user story is: As a User, I want to access a mind mapping tool in a browser. The browser window should show a blank, white canvas. For non-functional requirements, for now there will be no data storage except in memory in the current session so please remove Mongo. Let’s first get our application running in a browser, preferably only using HTML and Javascript. Remove any other complex functionality”

And so, the process began. I began to coerce ChatGPT to focus more on the product owner role, saying, “Help me fill a backlog with user stories, so translate from my basic requirements into proper user stories I can add to Jira.”

I’m sure there will have been groans from some readers that I mentioned Jira, but this is an important part of structuring the long term memory for the ‘team’. Whether it’s Jira, or Github Issues, or any other task management tool, it’s important to externalise the outputs of the LLM into a more appropriate store. As I know Jira pretty well and could get a project setup quickly, this was my natural preference. Over the following conversation, ChatGPT provided me with basic user stories that I could turn into tasks.

ChatGPT provided basic user stories with a title and description

Spinning up single-minded assistants

As I progressed from simply recording the requirements to building out the application, I was using the single-threaded GPT conversation to provide me with more than just user stories. Looking back over my chat history, I started to redirect the conversation with ChatGPT to ask questions about how I could set up my VS Code dev containers and configure Docker. Realising I was overextending the scope, I entered a prompt to make the assistant more single minded, correcting my earlier mistake in distracting the easily distracted language model.

“From now on, I’d like you to ignore technical elements.

You should assist me like a professional product manager, who is going to write user stories and deliver the project. Another LLM is going to act as the engineer. You may ask questions or provide guidance to the other LLM, but you should remain focused on user stories as a way to describe your intent to the engineer.”

At this point I added the two additional assistants into the team. The Devops engineer was going to assist as I struggled through getting my devcontainer working and git branches reasonably architected;

“I’d like you to work as a devops engineer working with me on creating an application. I also have a Product Owner and an Engineer working with me on requirements and code. You will help me with the developer toolchain, and they will help with the other elements. I can provide code and requirements to you from the others”

While the software engineer was going to do the hard graft of writing the actual application,

“I would like you to act as a senior engineer helping me design an application I am working on. Another LLM is acting as the product owner and creating user stories for us. As an engineer you should provide code which I can paste into a development environment. You will receive user stories and requirements from the product manager. When requested you should provide updates for the product manager on changes you have completed. Here is the readme.md”

As I’ve mentioned, you can see here not only that I’m using prompts to constrain the scope of each assistant, but also starting to use the README.md file as the shared understanding of the current state of the application. From this point, I will regularly ask the different assistants to read and rewrite the readme.

I regularly re-used the prompt for the software engineer to re-start new sessions. Where the context window and scope for the product manager and devops engineer remained manageable for ChatGPT, the constant input and output of hundreds of lines of code regularly overwhelmed the LLM and created frustrating and avoidable problems in code. Being able to stop the session and start again from the ‘last known good’ of the README and the last stable commit became essential.

The compounding complexity of software and the context windows of?LLMs

As the application grew in size from a simple three file web app (index.html, app.js and styles.css) to something much more modular and complex, ChatGPT started to struggle. The 4o model (before the August update) was substantially better than any previous OpenAI model at writing code, but pasting more than around 150 lines of code into the chat would cause some serious issues (I’m going to discuss later why I’m using general purpose LLMs and not an IDE based assistant like Github Copilot). My first response was to break the application into more modules, with a clear separation of concern for logical blocks, such as notes, connections and data handling. Prior to the separation of code into smaller blocks, I would often paste in an app.js file of over 200 lines and GPT became very prone to frustrating hallucinations, often subtly changing important parts of the code in its output.

Any experienced software engineer is familiar with the cumulative effect of writing code, where each new feature?—?every line of code in fact?—?results in a non-linear increase in the complexity of the codebase. Each addition makes the code just a little harder to understand. Building an application is all about running into edge cases; we find out that we have to impose a character limit, or that we need to save state so that things don’t break when the browser reloads, or indeed that we’re saving state every 1ms and ruining our performance. Each time we hit another edge case, each time we try to make the application a little more robust or a little more user friendly, we add complexity. That complexity makes the application a little harder to understand, and we hit even more edge cases and add even more complexity.

The compounding complexity of software doesn’t fit well with current GenAI models in LLMs. When our applications are small and simple, the code fits easily into context windows; the LLM can ‘remember’ what it’s doing without too much hallucination. But as the app grows, using a general purpose LLM like ChatGPT becomes harder to manage without side effects.

Some of this is specific to the way that OpenAI have implemented the code authoring in ChatGPT. If you ask ChatGPT to write code that meets a particular user story, it will inevitably return the full piece of code in a single-threaded output. Although there is a code block with syntax highlighting, it will almost always contain the code from start to finish, meaning long outputs (which count towards the use of your short term memory context window). If you need to refactor a specific piece of code, the assistant will regularly rewrite the entire file and reply with the application code in full, rather than showing the edited and affected sub-section of the code.

Because the code being sent back as an output is an entire inline block (usually without explicit explanation of where the changes are being made), it’s often easiest for the user (ie, me) to simply copy the entire code block and paste it back into the IDE in full. Because even small changes in the name of a variable or function name will introduce a breaking change, it was altogether too easy for me to regularly break working code by ‘trusting’ that the output of the LLM would be accurate.

As the code grew more complex there were increasingly long expeditions in the wilderness when OpenAI’s GPT4o model could hallucinate an almost correct application of the code, but with subtle tweaks that could throw the entire application out of kilter. Maybe a function name varied slightly. Maybe a parameter or a return was imperceptibly different. If we had been asking for help writing text for an email it wouldn’t be a problem, but for something as deterministic as code, it was deeply frustrating.

Here’s an example of one of those tiny unintended refactors that can cause havoc if you just copy and paste without paying attention. Can you spot the problem?

Actual code:

Proposed code from an unrelated refactor:

Code example (with broken code provided by ChatGPT)

After losing a day or more to bugs introduced by ChatGPT’s subtly and insidiously changing outputs, I decided to try yet another alternative approach. Breaking it down into smaller, more modular files had helped but wasn’t sufficient. Altering my prompts to request partial updates on specific areas of code, or within functions, improved matters but didn’t resolve the appearance of avoidable errors. It was time to look outside of ChatGPT for help.

Switching to Claude?—?comparing OpenAI and Anthropic

I was aware that Anthropic had recently introduced Projects to Claude, a seemingly simple but enormously powerful addition to the application that enables a number of files to be uploaded. By uploading the main project files to a Claude project, the long term memory of the assistant was markedly improved. Instead of having to constantly paste in and copy out application code, I could simply reference files within the project. Projects also persist outside of chats, and so I could re-start multiple chats on different features without having to re-eductate the LLM on the scope of the application. It made it even easier to prompt, “Using README.md as a guide, I’d like you to…”

Felipe Melo 9 个月前

Typebot is better than Voiceflow for AI chatbots

Vercillius A. M. 5 个月前

Building a Facebook Chatbot from Scratch vs using a…

Wan Yuee Low 4 年前

In addition to Projects, Claude has also added Artifacts, a very helpful UI feature that shows some output?—?like code blocks and previews?—?in a side by side window to the right of the main chat. This purely UI change made it substantially easier to use Claude than ChatGPT on code.

Claude’s ability to write code, outside of the project knowledge and improved UI, was broadly equivalent to ChatGPTs. Claude tends to return partial code snippets rather than entire files, often interspersing code with comments that require the user to dig around into the application to consciously paste the code into the correct place.

Claude prefers not to repeat already working code, and will show it as comments

I suspect that this decision may have been a pragmatic one by Anthropic. One thing that I noticed very quickly after starting to use Claude was that it runs into issues with long chats quite quickly?—?something that ChatGPT hasn’t done for a very long time.

My suspicion is that although on paper context window limits are now very high in both Claude and ChatGPT, the two applications behave quite differently. It may not be immediately obvious for the casual user of ChatGPT, but if you have implemented an LLM programmatically via API, you’ll have been struck by the fact that you have to send conversation history with the prompt to replicate some of ChatGPT’s context aware magic. You don’t just send the user prompt and expect the API to maintain state, but rather you preserve state by sending back n messages (from both the user and the assistant).

I have a theory that, while much increased on earlier versions, ChatGPT has a rolling conversation context that doesn’t include older messages. This has pros and cons?—?ChatGPT is quite happy to maintain a nearly unlimited conversation, but I found it entirely likely that it would ‘forget’ earlier comments, I assume as they were pushed out of the main context window.

OpenAI’s approach to suggesting code is to include everything each time. If you send it ‘app.js’ which contains 5 functions and it just needs to change one, it will return the whole of app.js. It means that often, to save yourself from reading every line of code in each response you would just trust it to be correct and copy and paste the whole app.js contents. But, sadly, you can’t trust it all the time, and that makes it a poor companion, with suggested changes being far less atomic than they should be.

Claude is somewhat different, and at first it surprised me. Not only does it prefer snippets to full files, but at a point I’d just be warming up a conversation with GPT, Claude starts complaining “Tip: Long chats cause you to reach your usage limits faster.”. These complaints become more strident, with it completely shutting down long conversations and burning out tokens very quickly in comparison to ChatGPT.

Regardless of the intent behind the design, Claude’s propensity to return snippets of code, rather than returning entire blocks as ChatGPT does, actually forces you to interact with the supplied code more?—?checking where to insert code blocks and what needs to be removed. I found Claude ended up writing substantially better code with fewer (but not zero) frustratingly avoideable errors. Claude was generally much less forgetful, and more trustworthy in use, with fewer bad hallucinations of half remembered code.

After a day of working with Claude’s free version I was a convert, and switched my Software Engineer agent from ChatGPT to Claude. The snippet based approach to providing code, the long term memory provided by the new Projects feature, and the Artifacts user interface made Claude a better partner to provide code.

However, it wasn’t all rosy for Claude. ChatGPT’s (apparently) near-infinitely scrolling ability to maintain conversations was much more useful for the Product Manager role, and to a lesser extent for the Devops Engineer role. Claude’s limited conversation length and constant insistence to start a new chat made me feel less comfortable continuing a long conversation, as I had done with ChatGPT. As well as the chat length limitations, it could also be frustrating how many times I hit the ‘you have 1 message left before 10pm’ notification, still with a lot of code to write. It has become rare that I hit a message limit restriction in ChatGPT, and that alone was enough to convince me to maintain the Product Manager in OpenAI’s product (yes, that’s two premium tier LLM subscriptions, for great science).

Good product management and engineering principles endure

There’s a few practices that most developers (and product owners) are trained to do from their very first steps into the profession. Write good requirements that include acceptance criteria. Branch for each new task, commit often (and with a good reason). Make sure there are tests for your code. Write good documentation. Tidy as you go. These are all things I could rhyme off if I was giving advice to a more junior developer, but they are also things I had lost touch with. I knew the words, but had lost the rhythm.

Developing an application, with a virtual product owner translating my requirements into user stories, and an equally virtual software engineer translating stories into code taught me again that these were not only important, but also why they were important. Working with a virtual product team of unpredictably untrustworthy agentic colleagues, with short memories and a literally encyclopaedic knowledge of software design, was instructive.

As with most things, it was process that helped me scale. My previous attempts at knocking out applications with LLMs had been small?—?usually barely more than a few hundred lines. The Mindmeld application (as I called it), currently stands at around 130 commits over 5 weeks of activity, 28 separate files (including configs and tests) and 2,600 lines of Javascript, almost all of which was created by ChatGPT and Claude (and somewhat edited by me).

I was also increasingly proud of the output, and enjoying the experience. From the first initial prototype, that simply put a movable, editable box on a web page, the app was solving increasingly complex usability problems, and I was absolutely relishing the experience.

With that complexity came not just complexity in code, but complexity in requirements. As I had new ideas of how I wanted the app to perform, I would instruct the product owner agent to turn my partially formed ideas into well written epics and user stories, that I may slightly tweak and edit. These stories represented not just the work I was just about to do, but an increasing list of the work that I wanted to save for later. I need to rigidly enforce my work in progress limit of one (for my limited team) and using Jira to keep track of bigger epics and more detailed user stories became a necessity.

My workflow was simple. I’d come up with an idea, or a refactor, and I’d ‘discuss’ it with the product manager agent,

Creating stories from simple prompts with the product manager agent.

Once the agent had created the user story (or epic and multiple subtasks), I would add the stories onto a simple Kanban board, manually creating the stories and moving the tasks as a real team would.

Adding a user story into the ‘long term memory’ of Jira

A view of the actual kanban board in progress to develop the Mindmeld app

I would then take the story from the Jira task, and offer it up to the engineer agent by manually pasting it into the prompt

Claude responds to a request to work on a new epic

As I was feeding the prompt to the engineer agent, I would be creating a new branch in git for the fix or the feature, and getting ready for the process of copying and pasting code between VS Code and the LLM chat. It was rare that the code that was provided worked first time, but there were frequent occasions when I could make exceptionally rapid progress adding surprisingly complex functionality.

On debugging and mental?models

But. There were also times?—?just like in real software development?—?where I could lose hours on seemingly trivial changes. I will never get back the five hours I lost to my VS Code dev container crashing, seemingly irreparably. It wasn’t anything to do with the LLMs (in fact my Devops agent was really helpful in troubleshooting), but it did remind me just how important developer experience (and platform engineering) is to maintaining the velocity of great dev teams. I spent two days trying to replace the standard HTML DOM coordinates (everything measured from top and left), with a centroid based approach where the centre of the page was considered 0,0 that I still haven’t fully solved despite two or three days invested in it.

In a way, the times when the application doesn’t work is more painful and time consuming because we are using GenAI tools. When we are coding without tools (the way developers have been doing for decades), we are building up an emergent conceptual understanding of the application. We’re building links and connections, understanding where classes and methods are implemented and where they’re called. We recollect the previous design compromises we’ve made, and creating a delicate, crystalline mental model of the way the code interacts with itself and the user.

GenAI tools rob us of the immediacy of this mental model building. It’s not completely lost, but it’s neither quite as urgent or obvious. When we?—?the conductor of this agentic orchestra?—?are copying and pasting from one window to another, we’re only performing a shadow of the mental gymnastics we are required to do as a real engineer. Without the finely detailed context we build by creating the app, we lose a large part of our ability to understand where and why bugs have crept in. We simply jockey the code around, from chat window to IDE and back again, asking repeatedly why the code we’ve pasted doesn’t work the way we intended it to, and somewhat helpless to investigate.

I become somewhat argumentative with Claude over a bug

Were I still a competent developer, I would have been under the hood of the code, tracing the logic and data. But with GenAI creating the code, it’s simpler just to ask it to debug for you, at which point it will return some more code, robbing you of another opportunity to understand how the application works.

I regularly observed that the LLMs frequently tended towards overcomplication rather than simplification and elegance. Especially when faced with attempting to find, understand and fix a bug, the models would attempt to extend an existing solution to solve a problem, rather than going back to basics and simplifying as a more competent developer would. I would often have to steer the agent with prompts to request that we keep it simple.

LLMs often need to be instructed to simplify. They seem to default to addition.

Once again, it was process that would come to the rescue here. In my earlier collaboration with the tools, I had been branching and committing when I felt there was likely to be an update of some significance. Working with LLMs (re-)taught me that even relatively simple seeming fixes and features could have wide ranging consequences and my branching became much more frequent. It became common practice for me to attempt a fix, realise that the LLM was overcomplicating a solution (or destroying previously working code), and simply roll back to the last known good state without remorse. Asking an LLM to dig itself out of a buggy hole it has dug usually results in it backing up a tipper truck of randomly assorted code additions and dumping the contents over the problem.

Red, green,?refactor

Another part of good development practice that became substantially more important was adding tests to cover the major parts of the application. As the complexity grew, my manual testing of the UI was taking an increasing amount of time to complete, and grew more feeble over time. The app had grown so that there were individual notes and groups of notes, group selection, different types of connectors, content length limits, multiple canvas templates, data storage and management and it all grew too time consuming to test properly. The result, of course, is that the application wasn’t tested properly and that meant it was easy for some hallucinatory code additions to break a previously working part of the application that my manual tests weren’t covering.

Every developer knows that tests are theoretically important, but they can often feel like they get in the way of the creative parts of building solutions. When building with LLMs?—?as helpful as they are?—?tests are not optional beyond a certain level of fairly minimal complexity. Tests save time. Working with a forgetful, hallucinating but otherwise polite virtual co-worker is a wonderful example of just why we need them, and for anyone stepping into using LLMs to assist with code, it’s as good an opportunity as any to remind yourself of Kent Beck’s ‘red, green, refactor’.

So this is my experience

This is not an article that is aimed at convincing anyone that LLM coding assistants will replace software engineers. Nor is it an attempt to suggest that software engineers are irreplaceable. The truth is going to be somewhere on the spectrum between those two poles, and this post is simply to share my experience of using LLMs to build applications, express the very simple pleasure that I got from doing it and to offer some advice to those that are curious about using the tools.

It’s been years since I experienced the flow state that I used to feel when I was writing code. There is something intensely compelling about flow; it’s not exactly enjoyable, but it is something that you seek out, and something that you realise that you miss when it’s absent. Nothing else that I do?—?no hobby, no activity, no type of work, and not even writing?—?puts me into a flow state in the same way that writing code does. And you know what? I had forgotten it, and that’s the saddest part.

As I was working on this little application, building some features that it’s possible that only I wanted, I fell back into that flow state. In the evenings, I found I would still be coding late at night. I’d wander back to turn the computer off, only to sit down and start another prompt to knock out one more feature, delaying the inconvenient necessity for sleep. Waking up in the morning, I’d realise that I’d half dreamed a partial solution, and would rush to turn the PC on before I even started the coffee machine warming.

Here’s my very simple truth?—?regardless of how LLMs will impact professional software development teams, for me with my level of familiarity with software development and my interest in building some relatively pointless applications, GenAI tools are exceptionally empowering. They allow me to augment the working parts of my engineering ability with long-forgotten or never-learned information that the LLMs possess. They allow me to bypass the long and painful process of trying to remember how to best implement a class in Javascript, or the format of selectors in CSS. Put bluntly, they’re a cheat code for making software.

Like all cheats, they are a shortcut, and from my experience I would not suggest for a minute that we can be without skilled, experienced and pragmatic engineers to keep the language models in line. However, are they capable of producing valuable output? Yes. Yes, they are.

If I had a magic?wand…

Frankly, Both Claude and ChatGPT do a compelling job of writing code for basic applications. For the use case that I created, where I was modelling the behaviour of a product team with a group of agents there are some seemingly simple, but extremely powerful integrations that I’d love to see.

In this article I’ve explained the major limitation that LLMs experience when working on an application; they have no good proxy for long term memory. You’ve seen how I worked around this limitation; by using the Projects capability of Claude to offer some semblance of Github Copilot’s better contextual understanding of a project. But, wouldn’t it be even better if we could simply specify our git (Github/Gitlab/Bitbucket) repository and connect the LLM, allowing us to direct the agent to inspect the existing application state as we needed?

An equivalently simple integration could be offered to the product management / engineering handover process. Using whichever task/issue tracking tool we preferred, imagine the simplicity if we could instruct the product owner LLM to create Epics and user stories directly onto the board, and for the engineer LLM to pull the stories and create the appropriate code.

Note that I’m suggesting integration here, not an expansion of LLM features. The beauty of the LLMs for me is their wide range of capabilities, and that’s also the key reason I preferred using Claude and ChatGPT over Github Copilot. I can jump around freely in the LLM, asking product questions, testing questions, engineering questions, or even ask what the etymology of a term is. I like that freedom, and I personally prefer it to the experience of using Github Copilot, AWS Q or Google’s Code Assist.

I also want to maintain the freedom to mix and match tools, and to blend human with machine elements where I want to. I don’t want OpenAI to offer a git repository hosting solution and get into developer tooling. I don’t want Anthropic to offer task management, and I’m sanguine about what we might expect from Atlassian’s Rovo. I would just like some nice, clean, secure API and minimal UI integrations that we can hook into our existing preferred tools. These two improvements alone would elevate the capability of the current GenAI tools immeasurably, and allow us to integrate them much more directly with real, human teams.

If you’re interested in seeing the code or the working?app…

This isn’t a post about the code, but in the spirit of showing my working you can see both the current working version of the application and the source code that the LLMs have produced on Github, where I’ve released the code on an open source licence. I have no doubt there will be ample and well deserved criticism of the quality of the code, but suffice it to say that the application does exactly what I wanted it to do. That was my intent, and this public release is a demonstration of possibility, not a recommendation for wholesale adoption of GenAI tools.

In sharing this with you now, I’m following Reid Hoffman’s advice, that “If you’re not embarrassed by the first version of your product, you’ve launched too late”.

But that’s the point of this; here’s a working application which meets the needs of the user (me) and I’m happy with it AND I enjoyed building it. Maybe other people will find it useful too, and that in itself demonstrates that GenAI can, in fact, build useful, valuable applications.

Top Ten Tips for First Time LLM Coding Collaborators

Some tips for optimising your coding partner

Please do watch Andrew Ng ’s video if you’re not familiar with the concept of agentic workflows, it will really help extend the concepts in this article.
When you’re working with LLMs, develop your own sense of their context window and capabilities. Consider if having multiple simultaneous chats is useful, or when starting a new chat and removing old context will help the conversation.
Start with a README and continually update it. Use the LLM to reflect on work that has been discussed or completed, and update the documentation after each change (normally, at the same time as a big commit, or the merge of a branch). The README is ground zero of your documentation-first approach to working with the team of agents.
Commit and branch often. This is no different than years of development best practice has drilled into us, but here the impact is more obvious. Working with an automated coding partner that’s prone to breaking everything means that relying on git to maintain good working known states of code is critical. For anything but the most trivial changes (updating a typo in a README?), you should make a branch and merge it when you have a good change.
Write tests?—?or get your assistants to write tests for you. At the very beginning of the process of futzing around with code, tests will seem like an unhelpful blocker to progress. But when you have a slightly untrustworthy actor messing around in your codebase, you’re never quite sure what the blast radius of a change will be. For me, this was writing both unit tests (in Jest) and end-to-end tests (in Playwright), but as you start achieving some stability in parts of your application, add test coverage.
Create and maintain user stories. Using a tool like Jira to store the user stories, track them across the board and retain well documented requirements and acceptance criteria makes it easy to review the state of your application. The user stories also serve as an input for the LLM to create and update your documentation. And remember, documentation is the long term memory substitute for your team of agents.
Keep your code modular and logically separated. Long files currently don’t work well with LLMs, so it’s better to define good interfaces between classes, methods and functions that can be easily tested.
Add the filename to the top of your code as a comment (e.g. //app.js). This tiny hint can serve as a useful piece of context for the LLM when it responds.
Reflect your file structure / hierarchy in your docs. I found it really useful to keep this updated in the readme (use the ‘tree’ command)
Remind the LLM to simplify not add. They tend toward adding complexity, even when they should be removing it to resolve bugs. Prompt them if they start adding code to work around a problem.

I hope that you found this useful, and that you’ll share your experiences with us.

// Mark

Olivier Fontenelle

CTO PrestaShop - Digital platform - Strategy, design & delivery

1 个月

Hi Mark, Great article, thanks for it!! I'm pretty sure that many people are exploring similar processes and I love the strategies you imagined to compensate the limitations of the LLM (the 3 profiles, README.md etc)! Any throughts on leveraging more integrated tools such as cursor.com? It could provide a different, additional standpoint.

Man Wong

Co-Founder at CandidateX - Better Human. #IAMCandidateX.

1 个月

Really enjoyed reading this. Thank you for sharing.

1 次回应

Rhys Evans

Full stack JavaScript engineer and tech lead

1 个月

Excellent read

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

How I wrote a working web app with ChatGPT and Claude, and what you can learn from it

Mark Ridley

Experienced CIO/CTO - tech, strategy, corporate innovation & startup advisor. Founder @ Seeto

Prelude

Why build an application with GenAI anyway?

Let’s meet the?tools

“Are you aware of an online mindmapping tool called Scapple?”

Why README.md?

Back to the?app

Spinning up single-minded assistants

The compounding complexity of software and the context windows of?LLMs

Switching to Claude?—?comparing OpenAI and Anthropic

领英推荐

Good product management and engineering principles endure

On debugging and mental?models

Red, green,?refactor

So this is my experience

If I had a magic?wand…

If you’re interested in seeing the code or the working?app…

Top Ten Tips for First Time LLM Coding Collaborators

更多精彩文章

社区洞察

其他会员也浏览了

How I, ChatGPT ????, collaborated with Seth to invent a product and bring it to market in 2 weeks for $2 ??

Would ChatGPT replace developers?

Evaluating Component Generation: ChatGPT vs. Claude - Which One Fits Developer Needs Better?

OpenAI, Playwright, Pipelines of the Future, and More!

Five AI tools every developer advocate should be using right now.

Launching My Debut App 'Tic-Tac-Toe' with ChatGPT Prompts (Zero Coding Skills Required)

How Do You Manage Risks When Everybody is a Dev?

Prompt Engineering: An Effective Developer Approach with ChatGPT

Unlocking Productivity: The Impact of ChatGPT and AI-Powered Tools

AI in Action: A Detailed Look at ChatGPT & Google Bard's Role in Testing

Prelude

Why build an application with GenAI anyway?

Let’s meet the?tools

“Are you aware of an online mindmapping tool called Scapple?”

Why README.md?

Back to the?app

Spinning up single-minded assistants

The compounding complexity of software and the context windows of?LLMs

Switching to Claude?—?comparing OpenAI and Anthropic

领英推荐

Good product management and engineering principles endure

On debugging and mental?models

Red, green,?refactor

So this is my experience

If I had a magic?wand…

If you’re interested in seeing the code or the working?app…

Top Ten Tips for First Time LLM Coding Collaborators

Forming, Storming, Norming and the other one… the lesser known history of Tuckman’s team?model

2023年11月29日

The Best of Both Worlds: Human Developers and AI Collaborators

2023年8月1日

Who wins and who loses? How different sized businesses could be impacted by AI coding tools.

2023年7月31日

If AI coding tools reduce the number of engineers we need, where do we spend our budgets?

2023年7月30日

If engineers start to use AI coding tools, what happens to our product teams?

2023年7月30日

AI Coding Tools have arrived: how Product Engineering teams will use?them

2023年7月30日

How Generative AI will impact product engineering teams

2023年7月30日

Apple Vision Pro vs the office? The horrible collision of VR and Return To Office polices.

2023年6月6日

Starting Seeto

2021年3月24日

The opportunity that lies amidst chaos

2020年3月24日

社区洞察

其他会员也浏览了

How I, ChatGPT ????, collaborated with Seth to invent a product and bring it to market in 2 weeks for $2 ??

Would ChatGPT replace developers?

Evaluating Component Generation: ChatGPT vs. Claude - Which One Fits Developer Needs Better?

OpenAI, Playwright, Pipelines of the Future, and More!

Five AI tools every developer advocate should be using right now.

Launching My Debut App 'Tic-Tac-Toe' with ChatGPT Prompts (Zero Coding Skills Required)

How Do You Manage Risks When Everybody is a Dev?

Prompt Engineering: An Effective Developer Approach with ChatGPT

Unlocking Productivity: The Impact of ChatGPT and AI-Powered Tools

AI in Action: A Detailed Look at ChatGPT & Google Bard's Role in Testing