Whitepaper on How to Learn a New & Unfamiliar Codebase

Michael Feathers released a whitepaper on learning a new & large code base. It provides 4 tactics in order you can use to better understand it quickly, and be productive and 3 tactics to start working. I've had this happen a lot, and in the past in consulting and freelance. I've tried a variety of techniques, so this paper is neat in that it's way better than mine, organized, and has co-contributors.

I _do_ have a few thoughts on it as well based on using some of these.

https://communications.globant.com/public/wp/system-renewal-patterns/White_Paper-Patterns_of_Systems_Renewal-v4.pdf

The first, a quick read, is dead on. The best way to get over your fear of "this code base is so big and complicated, oh no" is to just dive in head first. The read part helps with the shock because you're not affecting it, your fellow developers, or the company in an way. Reading code allows you to safely do recon at a distance. Just open in your IDE and start exploring all the modules/classes and explore. Whether you go folder by folder, or just follow the imports, doesn't matter.

I know in psychology, one of the ways to get over fear is to expose the patient to the fear with SUPER small bits at a time. You gradually increase it, like "can we talk about insects" then "can we talk about spiders" and then "can you see a picture" and eventually to handling them. You can do the same with code too, over multiple sessions. You can do the same with just looking at the types, or just looking at the Data Transfer Objects, Value Objects, or just some UI code... whatever doesn't fill with dread, and allows you to get up to speed more comfortably. You don't have to do the "dive in head first" like I mentioned above. But you should do a thorough read of the important bits eventually to have context for every step after.

While Michael is referring to a process here, there have been times when I did a reading a 2nd time now that I had context months later: I understood the code better, I understood the programming language & architecture(es) better, AND I had groked how Conway's Law (how the code base mirrors the company culture) had influenced this code base. So feel free to do _multiple_ readings over time.

The Detective step 2 is defined as a separate step, and I've done this too after I've read a client's code. However, nowadays, a lot of us have real-time communications like Slack, Teams, or Discord. If there are rooms that have multiple people on it, like Developers + Designers + Product + QA, this is a good place to ask questions AS YOU READ. They can quickly get you up to speed so you don't waste time on paths others know you may start going down.

"There is a huge reason why these 4 services aren't DRY" or "Yes, the duplicate Data Transfer Objects are to shield us from the ever changing back-end so our UI doesn't have to change all the time. The types are basically the same, just different database field mappings, you can see it's affect by looking at this git changes over the past 3 months" or "Yeah, the original architect utilized MobX, but we later added Redux, but that old part doesn't hurt any body, has good tests, never changes, so we haven't prioritized it".

If you didn't have those types of questions, immediately, you'd spend hours of slide decks on the "costs of 2 architecture styles" and therapy on "what in the world were they thinking here..." when the entire team can answer you with good context in 20 seconds. You'll _also_ get the disagreements, too. Either 2 developers having completely different takes on why a part of the code base is the way it is, or Product and Design talking about the tough decisions they had to make.

More importantly, you'll glean the culture of the company, both when the code base started, how it changed if at all, and what it is now. For example, does everyone hate the code base, it has not tests, deployments are miserable, but because the code base funds the core of the business, nothing is done to prioritize it? Or is it the opposite, and everyone wants to make things better but doesn't know how? It's never that black and white, and the truth is always a mess somewhere in the middle. This is important, though because it gives you guidance on where you should focus your efforts if you like improving things, both from a coding and management perspective.

The Map is interesting to me because I'm not a note taker. I've slowly done more and more doodles from an architecture perspective since I learned taking notes helps retention. However, I'm a bit spoiled since I have a minor photographic memory; meaning I can memorize a lot quickly by simply talking or hearing someone lecture. However, I understand others need to write things done, and having these maps is a great first start.

Additionally, as they change over time, you can update the Map, but the _changes_ are interesting because you can learn about the misconceptions. Sometimes they're just that: benign, harmless misconceptions. Other times, they're massive miscommunications that could be signs of bad technical debt and misunderstanding of how a particular style architecture should work (e.g. Hexagonal with no bounded contexts, Redux with mutable reducers, Services with no dependency injection)

The Pull Threads is where I always get into trouble. The above always makes me feel better, and then I do a 1 line PR and everything just catches fire. "Oh, wait, you didn't name your PR this!? Oh no, that'll nuke the universe" or "Why did you push something that broke QA?" "Dude, it was a console.log... what's more harmless than that?" "You have NO IDEA with this code base, trust me". I agree with Michael's takes here, I'd just exercise caution. The exuberance you get from the above learning steps can work _against_ you here. Be on guard, be conservative.

Attempt a Build - agree, this is the #1 most important thing to start working with a code base. I cannot stress how important this step is, no matter how obvious. I've had clients I had to stop working with because no one on the team could even build the code and they didn't think this was a #1 priority. That is nothing but a path to misery and non-paying clients. Without the ability to build, consistently, reliably, you can't effectively do any of the other steps.

The Testing REPL part, to me, is a bit weird. In dynamic code bases, like Python or JavaScript, almost all the unit testing frameworks have the ability to test individual files _somehow_, no matter how painful (:: cough :: Jest) it CAN be done, out of the box. The opposite, if you have a horrible test suite, it would require herculean efforts to make PyTest or Mocha/Vite NOT respect describe.only and commented out tests.

This topic is also pretty vast; starting a Test First or Test After development style in a code base that traditionally has not had that style can fill books on what you need to do, be careful of, etc. For example, if you start testing only small parts, but then run the whole suite, and it suddenly fails, you have larger mutation and/or global mocks. Those are a sign you're in deep trouble and fixing the test suite got a lot more challenging.

The Rules of Engagement are also pretty important, too. I have a lot of overconfidence, but I've been burned so many times, that I'll always go to the most benign, most unimportant to make my first changes to ensure I can successfully build the code, make small changes, and deploy it without everything catching fire. The key is to have a clear understanding of the importance and value like Michael outlines here. It can help to garner goodwill when you start making fixes in high value, high profile areas that everyone else is afraid to touch.

Overall this is a great framework. I'd just suggest you do some detective work early by asking questions of the team, not just devs, as you read through the code base. Focus on knowing how to build repeatedly, reliably, and safely early. Then you can start in small safe steps and be more and more adventurous to help the team once you've proven you're aware of what dragons lurk around certain corners.

I can relate to having done most of this well. The biggest thing is to just dive into those massive codebases and get familiar…don’t be intimidated!

要查看或添加评论,请登录

Jesse Warden的更多文章

  • Error Handling for fetch in TypeScript

    Error Handling for fetch in TypeScript

    Originally posted at https://jessewarden.com/2025/02/error-handling-for-fetch-in-typescript.

    4 条评论
  • Encoding and Decoding in TypeScript

    Encoding and Decoding in TypeScript

    Originally posted at https://jessewarden.com/2025/02/encoders-and-decoders-in-typescript.

  • 1st Angular UI Story in 2025

    1st Angular UI Story in 2025

    After 11 months, I got my 1st UI story. Been doing Back-End-for-Front-End stories since we're a platform team.

    1 条评论
  • Elm to Angular for 10 Months

    Elm to Angular for 10 Months

    It's been 10 months since I went from Elm to TypeScript Angular, and I still struggle with the following: Void Return…

    1 条评论
  • Energy to Learn & Build When Burnt Out

    Energy to Learn & Build When Burnt Out

    Reminder to be nice to yourself right now if you're trying and failing to learn new things after work or practice;…

    11 条评论
  • YAGNI For Types

    YAGNI For Types

    Noticed a disturbing trend the past 3 years that I'll often end up with too many/overly verbose types. TDD has helped…

    4 条评论
  • Thoughts on ThoughtWorks Radar 2024

    Thoughts on ThoughtWorks Radar 2024

    The ThoughtWorks 2024 Radar was released (you can download the PDF with 1 click, no annoying sign up required). Below…

  • TypeScript's Lack of Naming Types and Type Conversion in Angular

    TypeScript's Lack of Naming Types and Type Conversion in Angular

    Random thoughts on TypeScript. I’ve noticed 2 things working on a larger Angular project: it is not common to name (aka…

    10 条评论
  • Six Alternatives to Using any in TypeScript

    Six Alternatives to Using any in TypeScript

    There are a few options, and strategies. We've listed from easiest to most thorough.

  • RxJS Observables That Can't Fail

    RxJS Observables That Can't Fail

    A habit I made in JavaScript, and later TypeScript, was to have Promises never fail, and instead return a Result type…

    1 条评论

社区洞察

其他会员也浏览了