Lessons Learned During My Two-Hour Pair Programming Session With Steve Yegge

Lessons Learned During My Two-Hour Pair Programming Session With Steve Yegge

In September, I got the chance to pair program for two hours with the legendary Steve Yegge , where he coached me on what he calls “CHOP, or chat-oriented programming,” and built something that I’ve wanted to build for nearly a year.

I learned how to use?Cody from Sourcegraph to level up in ways that I couldn’t quite imagine before.

You may have seen the video excerpts I’ve made of videos and podcasts of talks I enjoyed — here’s the first one that I created of the famous Dr. Erik Meijer , where he talks about how we might be the last generation of developers to write code by hand, and that we should have fun doing it.

(You can see those video excerpts of that talk here: https://twitter.com/RealGeneKim/status/1833291950726033453)

I created the tool to make these video excerpts during this two-hour pair programming session with Steve!

We recorded the entire session — and here, I'm posting the “highlights reels,” where I show the prompts that I used (coached by Steve) to create the app, and the lessons learned along the way.

I can’t think of a better way to learn.? Dr. Anders Ericsson, renowned for his research on expertise and deliberate practice, wrote the fantastic book “Peak,” where he identifies key elements to acquire new skills and achieve mastery, such as learning to play musical instruments, play sports, and practice medicine.? Those elements are:

  • Expert coaching:? you learn best when guided by an expert (that’s Steve!)
  • Fast feedback: you learn best when you get immediate, actionable feedback, so you can identify and correct errors quickly, and reinforce positive behaviors (check!)
  • Intentional practice: you learn best when focusing on specific tasks (let’s CHOP more, as opposed to manually typing out code!)
  • Challenging tasks: you learn best when you tackle tasks slightly beyond your current abilities (check!)

I can’t overstate how much I learned in two hours. In this thread, I post segments from that session, with some introductions, a statement of goals, and portions from the approximately 50 minutes required to build the code that uses ffmpeg to generate video excerpts, with transcribed captions.

(I built the app in 60 easy steps! ??)

It was fascinating to re-watch the recording — I’ve watched it in its entirety several times, which I found wildly entertaining.? But I wanted to see if I could extract the lessons, so people wouldn’t need to watch the entire 90-minute video.

I inserted video captions that describe what is going on, with any prompts I’m giving to?Cody / Claude / ChatGPT, so you can follow along, as well as other insights or lessons learned.

(In the lower-right corner of the video that shows the elapsed time — I was astonished to discover that, with Steve’s help, we had gotten the video extraction working in about 47 minutes.? The remainder of the two hours was learning the tools, chit-chatting, joking around, etc.)

Among the lessons learned:

  • in the beginning, my prompts were unambitious? — Steve kept encouraged me to “type less, and lean on the LLM more.”
  • despite Steve saying that the tools fully supporting CHOP still being a long way off, you’ll see that the interaction model becomes very evident by the end — give the LLM the relevant context, ask it to build or modify something for you, and ideally, it’ll appear in place, or it’ll be something you can copy/paste into your code base.
  • a key skill is breaking tasks down to make steps more concrete for the LLM — or as Steve (and many Clojure programmers) likes to say, you reify your tasks (i.e., you make it more concrete or realized)
  • having a good way to run your tests quickly becomes critical, because you often won't read the code that the LLM wrote — until the tests fail.
  • when tests fail, a technique is just to ask the LLM to “try it again,” but lots of human judgment is required here.? Sometimes this works, while other times, you’ll be iterating in circles, never getting closer to your goal

First, here's the introduction that Steve and I recorded afterward to set the stage — I describe how we met, how it resulted in his amazing “Death of the Junior Developer” post, how he gave an amazing talk at the Enterprise Technology Leadership Summit in August, and I describe what we wanted to achieve in the two hours we allocated.

You may want to skip this first clip, but I include it for completeness.? This is where I’m walking Steve through the problem to be solved.? I had drafted a plan in a GitHub issue, that looked like this:

Desired outcomes:

  • generate a video clip excerpt from a youtube video, given highlighted start time and duration
  • add captions to the video

Inputs:

  • a highlight: which includes a {:begin, :end, :transcript}
  • a youtube video
  • a transcript of the video

Steps:

  • download the youtube video (using yt-dlp)
  • extract a segment from the video (using ffmpeg)
  • extract the transcript (given the start time and duration)
  • overlay subtitles (using ffmpeg)

I’m giving him the context that I think will be relevant, so he can get a better idea of what I’m trying to achieve.

Other highlights:

  • I mention other tools I’ve considered using, such as Video.ai, Opus Pro, and Audiocado, which perform similar tasks — but my frustration is that these tools want to pick the highlights, where I already have the highlights as screenshots.
  • I program in Clojure, my favorite programming language.? I am using IntelliJ with Cursive, GitHub Copilot, and I’ll be using Cody, my new favorite coding assistant

Okay, after those 8 minutes of orientation, we’re ready to start coding — the timer has started!??

My first prompt:? “given vars beginning and end, which are in seconds, give me the ffmpeg command to extract that portion of the video and go ahead and shell out and put the excerpt in a file ‘/tmp/out.mp4’”

Okay, that was easy!? Less than one minute implement this.? But there’s something I don't like...

…the "problem" is that Claude used the https://clojure.java.shell.? That's "fine," but I want to use the fantastic babashka.process library from?Michiel Borkent (@borkdude), which supports streaming output.

So my prompt is:? “I like using babashka.process instead”

I loved Steve’s comment: “Just say ‘instead.’? Let Claude figure it out. CHOP is all about being lazy.”? (In other words, no need to specify what is being replaced!)

A lesson: Claude is fantastic at inferring intent — you can often be very lazy and ambiguous indeed!? (Later, you’ll see where being ambiguous bites me in the butt.)

Also note that as I’m adding modules, code completion is very helpful — it auto-completes the module I need.? I’m guessing that it’s because Cody had noticed files I had opened up recently where I had copied code from.

(We also joke about emacs,?Colin Fleming [author of the fantastic Cursive Clojure plug-in for IntelliJ], and IntelliJ breaking changes.)

The next step was to extract the begin and end times from the big data structure I passed in.? The error in the REPL was: “REPL Exception: Null Pointer Exception for java.lang.Double operation”

I take Steve’s advice of trying to do less typing.

Prompt: “before calling ffmpeg, print out begin and end vars”

That worked — maybe not faster than typing it in manually, but it wasn’t much slower, either.? (And also a bit mind-expanding, because this is a much different interaction modality.? It’s an example of typing in intent, as opposed to actual code.)

Thanks to the logging messages, I now see that I got the map key wrong.? So the correction is very straightforward.??

The next prompt: “Typo: begin is actually start”

Cody changes the code, modifying the let bindings.? And after I manually fix the ffmpeg input file parameter, the shell command to ffmpeg succeeds!

But did it correctly generate the video file?

Haha.? I find this so funny to watch.? I open the MP4 file, and I’m so shocked that it’s the correct portion of the video that I keep exclaiming, “Holy sh*t!”? ??????

(It didn’t open when I double-clicked it, because it was encoded in VP9 — QuickTime Player doesn’t support that codec. So I had to open it with VLC.)

Steve notes that he’s relieved that Cody handles Clojure just fine, and it knows all the ffmpeg parameters, because it’s read every document on the internet.

He says, “Let’s not stop there.”? But I can’t stop myself from reflecting on how fun this is.? (And not frustrating, which typifies the endless hours of my life I've wasted trying to get ffmpeg to do the right thing.)

Steve notes that I should be leaning on the LLM more — less typing, more CHOPping.


I “pause the game” for a bit to exclaim just how much fun this is — after all, I’m amazed at how a data structure of a start time, duration, and a YouTube video has just turned into a video excerpt…

…which took less than five minutes to do, and without having to deal with painstakingly assembling ffmpeg arguments.? It’s unlike any interaction I’ve had with ffmpeg!? (I even had to censor out another “holy shi*t!”.? Hahah.)

Steve responds by saying, “you can be doing a lot more chatting [and less typing.” He suggests to write more ambitious prompts to get the LLM to write more of the code (alluding to some of the corrections that I did by hand).

I convey to him how productive it felt to be modifying portions of the code via the Cody inline edit mode — it’s where you highlight a function, and write a prompt in a modal dialog box, and the code changes are made in-place.

(This is made super-easy by structural editing modes found in any LISP editor.? Because all code is nested in parentheses, it’s one keystroke to highlight a form, another to expand it one level, etc.)

Steve expounds that there are three primary interaction modes emerging right now:

  • Code completions (code suggestions show up as auto-completes)
  • Chat functionality
  • Inline edits (described as a bridge between completions and chat) — this is my favorite, and is also found in editors like Cursor.

Inline edits are fantastic because context is explicit, and it often obviates any need for copying and pasting.

Okay, back to working on captions!

Onwards! Let’s create the transcript SRT file, so we can generate the video captions!?

I spend one minute walking through the big data structure, looking for where the transcript text is i this large, nested data structure.

Watching this, Steve ridicules me, saying: “Gene, are you typing that code manually again?? We’ve talked about this….”?

He proposes that I just copy the whole thing into the Cody chat window.

I’m actually quite dubious that this will actually work, but I still follow his lead — and I’m glad I did.

The prompt:? “find me the path of the transcript inside this nested map: [pasted in data structure]”

I think it’s hilarious that this is a “man vs. machine” race — and I find it within a couple of seconds.? Man beats machine!!? ???

It's because Cody generated an error: the prompt was too big for the Claude input context window — it’s a lot of data.? We tried putting the same prompt into ChatGPT, and also got the “too big to fit” error message.

Steve insists that we try Gemini — turns out it's 120K tokens.? I’m blown away that it generates an ALMOST correct answer — it didn’t get two of the keys quite right.? (Clojure can have “namespaced keywords.”? Very easy to forgive this error.)

FWIW, the path into the data structure is this (the namespaced keys it got wrong is bolded):

(->>? EXCERPT :props :podcast-episode/transcripts first :episode-transcript/transcript first)

I mentioned to Steve how amazing I find Gemini to be — just the previous week, I gave it 120K tokens of HTML, and asked it to extract all the <a <href>> image tags.? Which it did fabulously!??

Lesson: Do you have a huge nested data structure, and can't find the exact path to what you're looking for? ? Just give it to Gemini, which can search through the whole thing, and tell you how to retrieve what you're looking for. Amazing.

(Thank you, Paige Bailey Logan Kilpatrick !)

(Watching this video, I now realize that back then, I didn’t know there was a difference between AI Studio and Vertex AI, or had gotten them confused in my mind.? Which is why I didn’t know how to continue a chat conversation!? Thx to Paige for finally explaining the difference to me a couple of weeks ago!)

Okay, we’ve found the transcript text — onwards!? (It's been 20 minutes since we started coding.)

Okay, we’ve found the transcript. The next step is to extract all the transcript entries that are between the start and end times.

The prompt: “[sample of transcript data structure] Here's what the transcript looks like - it's a vector of maps. write a function that, given a start and end, extracts out all the relevant entries in the transcript”

Cody returns code that looks right, but who knows?? So, let’s have Cody write some tests.

I take a minute to get some TDD infrastructure set up.?

?(It’s the RCF library from?

@dustingetz

, which will run all the tests in the REPL every time I load the file, which is awesome.? Gives fast and frequent feedback in a very natural way.)

Seeing this, Steve says: "Less typing, Gene!" — I’m not sure Steve realizes that I'm not coding, but instead getting the above TDD infrastructure wired up.

Now we’re ready to make some tests.

The prompt:? “this is where to write tests; write me tests of extract-transcript-entries”

Haha.? I censor out another “holy sh*t!” seeing four tests!? It really is hilarious watching me so delighted by all these things that Cody is doing for me.

What’s also interesting is that two of the four tests are failing.? Steve recommends just copy/pasting the errors into the chat session, which will often fix the error…

Will it?

Okay, we’re twenty minutes into the coding session — Claude wrote four tests, of which two are failing.? Steve recommends just copying/pasting the errors into the Cody chat session, which will often fix the error…

Steve walks through the various ways that we can get Cody to suggest fixes.

The prompt:? “[copy/paste the entire output from the test run]”. (That's it!? No instructions.? Just the error.)

Claude changes a ">" to "<=" — this suggested that it thought it was an off-by-one error?? Sounds plausible, I suppose….

But the tests still failed.

Instead of looking at the code, I study the tests that Cody had written — it turns out the test assertions were actually slightly wrong.? In one case, I fix the time range, another time, I fix the resulting text.

All tests are now passing, and I feel reasonably confident that the code will work in the general case.

Steve describes this episode of using an LLM to fix a bug as Zeno’s Paradox, where Achilles (the faster runner) will never overtake a slower runner (a tortoise), because by the time the faster runner reaches the point where the tortoise was, the tortoise has moved a bit further ahead.

Steve says it sometimes takes him 10-15 iterations of pulling the slot machine lever to get a correct answer — so judgement is definitely required here.

(I have a story I’ll share later of how I spent almost 45 minutes running around in circles with OpenAI’s o1 model, because it was so persuasive about how it knew how to solve a specific problem.? It actually didn't.)

Steve’s take: “Know when to use the LLM as a leaf blower, and when you need to come in with a broom or shovel for the last little bit.”

The total time spent fixing the tests was something like two minutes — it felt great, and I certainly felt productive.? I have no doubt that there was a time savings — in the video, I suggest 2-3x faster.? I think that seems about right.

What was interesting to me: I never actually looked at the code implementation — my comprehension started by studying the test cases.? (Whereas Steve's attention started with the code.? I suspect he's better at reading code than me.)

Okay, we’re ready for the next step!

In this clip, we push onward to actually generating a valid SRT transcript file, so ffmpeg can generate the captions!

But first, Steve describes how CHOP is a new skill that needs to be learned.? He emphasizes that some of these skills are only for the short-term, because they won’t be needed as tools improve —?

But other skills will be more timeless, such as judging and learning what types of problems LLMs are good at solving, versus those that they’re lousy at.

Idan Gazit ?described this as a “fingerspitzengefühl” —a German term that literally means "finger tips feeling" and intuitive flair or instinct.??

Steve: Another CHOP skill is knowing when you copy/paste, when you use inline edits, when you use ChatGPT vs Gemini vs Claude — and right now, it’s an art, not a science.

Okay, after a minute of reflection and philosophizing, we march again towards the finish line — the next step is to generate the SRT transcript file.

I use the “Document Code” feature to generate a docstring of the function that it just wrote — it’s easier to read that than the code.? (And reading code is what devs do 80-90% of the time, as opposed to the 10% of the time writing code.)

Two minutes into this clip, I manually verify that, given the start and end times, the retrieved portion of the transcript matches what’s in the generated video segment.??

It's looking good!

I open up ChatGPT, and ask what a valid SRT file looks like.

Three minutes in, I write this prompt:?

“This is what an SRT file (transcript) looks like: [sample SRT file] ... Write a function to transform my list of transcripts entries (which look like this [paste data structure here]) to an SRT format”

Steve says, “There you go! ? That's what we're talking about. No typing!”? Haha.

We look at the code it generates, and Steve remarks, “That would definitely have taken longer to write by hand.”? (For sure.? There’s a loop-recur in there to generate the incrementally numbered SRT entries.)

Time to write the tets.

Prompt: “write some tests at bottom of the file, using same form as (tests ...)”. [That's what the RCF tests look like]

Five minutes, a bunch of passing tests.? "Looks good to me!"? (Haha.)

Steve makes the compelling assertion that it’s so clear what the ideal workflow should be: tools need to support specifying context from various sources, assembling it quickly, generating code, and reintegrating it seamlessly. Some of this is manual now, but as tools improve, it will become increasingly automated, creating a virtuous cycle.

I totally believe him.? I’m twenty minutes into the coding session, and I’m having fun, and we’re marching ever closer towards the goal of captioned transcripts in the video!

(I take a moment to see if I can make him jealous that he’s not using Clojure in his daily work — of course, he needs no selling on this concept at all.? After all, I’m well aware that Steve wrote the foreword to The Joy of Clojure book.)

Okay, we've generated what we think is a valid SRT transcript file — all we need to do now is give it to ffmpeg to process.??

At this point, I have no doubt we’re going to achieve this by the end of our session.

I do a little homework on ChatGPT, to make sure I understand the order of operations — ffmpeg only generates the captions for the subset of video I specify, right?? Right.

90 seconds in, Steve mentions how he’s finally figured out the “rich comment” forms in Clojure — it’s a convention where you put exploratory code in the (comment ..) blocks.? He mentions it’s like emacs scratch buffers, whereas "rich comments" live alongside the code.

(They're called "rich comments" because Clojure creator Rich Hickey did this when implementing Clojure core functions.)

Okay, I check in the code before going further.

My next prompt: “Prompt: I have a transcript SRT file "transcript.srt". Generate captions from this file.”

Oops.? That was ambiguous.

Modified prompt: “Modify ffmpeg command: I have transcript srt file "/tmp/steve-pairing/transcript.srt"; generate captions from this file.”

When I run the command, I get an error message, because the output file already exists.

Prompt: “Modify ffmpeg command: overwrite file always.”

(My goodness, a lot got done in 90 seconds!)

While ffmpeg is running, Steve describes the prompt library that is meant to be shared across a team or organization, and the ability to see what Cody is sending to the LLM.

At two minutes in, I’m so excited, because ffmpeg output mentions “font providers,” which I’ve never seen before — but it's obvious that it's making captions!? I realize that all the “competitors” we surveyed before are obviously using ffmpeg, too.? Duh!

Quick question to ChatGPT, where I learn that rendering one minute of video with captions may take 1-3 minutes.? Okay, we’ll keep waiting!

One minute later, it’s done… and…? Wow, there it is.? The video has CAPTIONS!!!!

But there’s something a little strange going on.? There’s two lines of text being rendered, and they seem strangely out of sync…

At 3.5 minutes in, I’m obviously so excited that something is working, even if it’s not entirely correct!

Steve proposes telling the LLM what is happening to see if it can fix it.

Okay, let’s take Steve’s advice.

Prompt: “this is so close! but in the captions, it's not clearing the previous caption correctly. bounding box problem?”

We look at the generated code, and we joke that "it looks good to me!" (LGTM).? This is obviously a joke, because the ffmpeg command is completely unreadable and nearly impossible to verify if correct by cursory examination.

But let’s pull the slot machine handle again!

Wait, before we do that, Steve has the best idea ever:

Prompt: “please construct the ffmpeg command step by step and document, so we can follow along.”

OMG, it is so much more readable!? Each parameter is now on a separate line, and documented.? So, so, so much better!!!

And let’s hardcode the end time so that it only renders 20 seconds to make the runs go faster.

Another pull of the slot machine handle at 3 min — while it runs, Steve asks what changed in the code.? I answer, “It looks like it’s all new code.”? Haha.

Steve’s response: “Magic.”

And then he posits why he is skeptical about coding agents at this time — he ponders whether you could just give the LLMS a broad specification: “Use ffmpeg, here’s an SRT file, here’s an input clip, generate the excerpt?? Honestly, this was hard for a human— it wasn’t obvious and involved a lot of fumbling, restarts, and retries. If a person struggles with it, an agent is likely to, as well. Over time, yes, but now?"

In the video, I agreed with him — and since then, I’ve thought more about it, and can describe with more logical rigor why.? So much of the work in this session was concretizing and reifying the problem so that it could be solved.? It was the incremental breaking down of problems so that the LLM could implement/fix/solve the problem.

In a later conversation, we describe how right now, the LLMs are good at the leaves of the task tree.? Over time, LLMs will be able to do not just the leaves, but more of the layers above them (the parents of the leafs).

In other words, over time, more of the bottom portions of the task tree will be able to be solved by LLMs.

But right now, it's pretty hit-or-miss, and needs a human in the loop — or at least for the problems I work on!

Okay, we continue to philsophize until 5 minutes in — I’m wondering why it’s taking so long to render.? (In hindsight, I think it’s because there’s multiple ffmpeg runs going — I later used “pkill” after each canceled run.? Or maybe because the video is VP9 encoded.)

Steve mentions how he really wants to see what happens if you ask Claude to render the captions in a Quentin Tarantino font, hoping for yellow spaghetti-western font.

Thirty seconds later, we’re looking at the new video file — the captions are on black background now (a big improvement as it’s more readable).??

And we're starting to put our fingers on what is wrong: the lines are rendering independently, for starters. (I will learn why later that evening.)

One more pull of the slot machine — and we note that in two hours, we’re almost done building the tool.? (Actually, looking at the timer, it’s only been one hour — the rest was chit-chat, joking, learning the tool, etc…)

Prompt: “almost! it's rendering two lines of captions, which aren't updating correctly.| let's make it simpler, by showing only one line of caption. Use Tarantino font. And use current style of composing ffmpeg command step by step, documented.”

(And again, the generated code looks so different — comments are on the same line, instead of the line above it.? Another great example of how non-deterministic they are!)


We start wrapping up — while we’re waiting for the ffmpeg to finish rendering, I reflect on why I’m so happy.

Not only was the whole session incredibly fun, I’m thrilled because, until yesterday, this felt like a “not this month’s problem” — it just seemed too big to tackle.

But incredibly, what seemed like a multi-day or week-long project was completed in just two hours — or more accurately, in about 47 minutes of coding time.

I promise to keep Steve posted on my progress —

(The actual problem: the SRT entries overlapped!? So the two lines of captions was due to the two segments of the transcript that needed to show up in the frame.? The solution required ensuring that no transcript ranges overlapped.? That took less than 10 minutes to do, IIRC.)

Okay, let’s go to the wrap-up segments!

Okay, back to the Steve and Gene reflections that we recorded after the pairing session — we talk about what we did, what we learned, etc.

  • I talk about how I generated and posted my highlights from the video of Dr. Erik Meijer, and the coolest parts was Dr. Meijer replying to the tweet, saying:

“Looks amazing! Thanks for doing this. Feels much faster to grasp than the watch the whole talk, even at 2x speed.”

Awesome!

  • My plans on using Google Gemini for laughter detection on some of the other pairing sessions I’ve done, to find the super interesting and exciting parts.? (I’m thinking of you,?Eric Normand and Tudor Girba , for all the Smalltalk/Pharo/Moldable Development stuff we did years ago!)
  • I walked through many of the observations I wrote about in this thread: https://x.com/realgenekim/st/RealGeneKim/status/1833298959890321503

Specifically on the two categories of problems for LLMs

  • Straightforward tasks: Problems solved in one turn (e.g., generating FFmpeg commands or calculating YouTube "percentage complete" from a screenshot).
  • Complex, situated tasks: Problems requiring multiple iterations and human intervention to concretize abstract goals into actionable steps (e.g., fixing overlapping captions in SRT files).
  • How TDD is almost mandatory for doing CHOP — you need fast feedback, and some assurance that the code the LLM is writing for you actually works!
  • And how grateful I was for the experience — the lesson is that if Steve ever offers to pair with you, the only possible answer is “YES!”

Okay, I think that’s it for now — I’ve spent days writing this up, so it’s finally time to post it!

I’ll get the complete video posted to YouTube in the next day or so, with and without captions (so people can see the whole screen, with nothing to obscure it).

The full 1 hour video is here: https://youtu.be/jpzv-_YQf6k?feature=shared


Niels Roesen Abildgaard

brb just gonna build some cool stuff

2 天前

At first this sounded incredibly interesting, as one of the more serious approaches to getting benefits from coding with the help of GenAI. After watching all the groundwork you laid before the pair programming session, and seeing the increment actually produced in it, I can't help but think this could easily have been done without AI in a very similar time frame. The advantage of doing it without the help of AI would be simultaneously learning of and validating the quality of relevant documentation, and how to approach even more complex problems in the future, where GenAI would struggle to produce code without compromising either scope or precision in the solution. All in all, the headlines here left me with a very different impression than the actual content.

Sean Corfield

Veteran Software Architect

4 天前

Fascinating description of the process! This sort of article is so important as we all come to terms with these new tools, what they're capable of, and to integrate them into our workflow. I've been very skeptical about GenAI but I've also been very impressed with how fast it has evolved in the last year or so. I use VS Code with GitHub Copilot and lately I've been relying on it more and more as a "pair programmer" for bouncing ideas off and reviewing my code and suggesting improvements. I've used it to generate tests but, so far, haven't used it much to generate new code. With the latest VS Code update, I am starting to use it more for edits, but it's all still early days for me.

要查看或添加评论,请登录

Gene Kim的更多文章