登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

How to get the most out of your open-source contributions?

Sofie Van Landeghem

NLP/ML expert and Open-Source maintainer at OxyKodit

发布日期: 2023年11月2日

As a core maintainer of a largeish open-source library, I'm often asked for guidelines on how to collaborate efficiently on a code base, as a team. In particular, typical questions involve Pull Requests (PRs): how to get your own contributions approved more easily/quickly on the one hand, and how to do effective code reviews on the other hand.

What do effective Pull Requests look like?

This week, I'm celebrating 10 years of contributing open-source code on Github, and I wanted to share some tips & tricks from my own personal experience - both as a contributor and as a reviewer.

Let's first dive into the contributor's perspective. You've spent days, maybe weeks, to implement a new feature or hunt a bug, and you're finally close to wrapping this up. You're (rightfully so!) proud of the work and effort you put in, and you'd like your contribution to be merged into the 'main' branch as soon as possible. You wrap things up by quickly creating a PR contributing all the code from your branch, write one or two lines in the PR message body, and call it a day. You're looking forward to all the positive feedback on your hard work in the morning!

How to showcase the value of your work clearly?

Unfortunately, one of these things might happen instead: your PR sits around and doesn't get a review. Or it does get a review, and it's endless questions and comments. You're getting slightly annoyed or even disappointed. Can't they see the clear value of this work and just merge it in? But maybe ask yourself: what could I have done to make the value of my work more clear? To convince the reviewers of the correctness? Here's a quick shortlist that'll be helpful to give your PR a flying start the next time around:

Ensure your code is formatted according to the project's guidelines. This instills trust that you're paying attention to details.
If it's a bug fix: contribute a clear, concise unit test that fails on 'main' but succeeds on your branch.
If it's a new feature: provide sample code that demonstrates the functionality. This will incite enthousiasm and make the value of your work crystal clear and tangible.
Update all documentation relevant to the changed functionality. You might be inclined to do this "after review" or even "after approval", but in most cases that's a mistake. Writing the documentation will help you think about the general API design and whether it's consistent with your current code & functionality. Furthermore, it will help reviewers better understand the implications of your PR. Eventually, this will help speed up the approval process.
In the PR description, write a clear summary of the proposed changes and indicate any potential issues with backwards-compatibility.

Focus on one unit of work per PR

Think about what constitutes one unit of work and address only one problem/feature in one PR. Did you run into a vaguely related bug X while implementing a new feature Y? Submit the bug fix X to 'main' with a clear unit test and submit the PR for feature Y independently. Having more 'atomic' PRs has many advantages:

reviewers can focus on one thing at a time
small fixes get merged & released more quickly: a bug fix can go into the next patch release, while a major feature/refactor may have to wait for the next minor or major release
if you ever need to revert a PR because of unexpected issues, you don't end up accidentally reverting a good fix as well
you might have a few merge conflicts, but they'll be smaller & easier to deal with if you keep the functionality of each PR minimal

Be your own PR's first reviewer

Once you've done all the above: be your own PR's first reviewer. What does this mean exactly?

Submit the PR as draft, step out of "developer mode" and into "review mode", and inspect your own code changes with a healthy dose of criticism & sceptism. Is the PR description up-to-date? Does it provide enough context for someone else to follow your reasoning? Are all the changes in fact related to one core functionality? Are there any debug comments or print statements still lingering? Did you write unit tests for everything? Does the CI run succesfully, are there no merge conflicts? Does the example code run correctly? What if you change the input slightly, do you get appropriate error messages? Can you make the code more robust & user-friendly? Is the code easy to read and maintain?

And one of the most important tips: write review comments (on your own PR!) for things you're uncertain about, important design choices or non-obvious implementations you require feedback on. This will help reviewers zoom in on the important stuff. It might be tempting to think "hey maybe they don't notice this weird approach I took at L342", but you're not doing anyone a favour: a reviewer might flag it and become less confident in the remainder of the PR, or they might not flag it, which could be worse: the code gets merged into 'main' and you'll be explaining this odd code block when a bug report hits your issue tracker 2 months from now. Overall, and somewhat counter intuitively, telling your team you're uncertain about something actually helps them have confidence in you and your contributions overall. It sets the stage for open communication, positive feedback and constructive collaboration.

Any tips for the reviewer?

Switching perspective - let's now consider how a reviewer can be as effective and helpful as possible. Do consider that not all review(er)s are equal: some will just write a quick nit-pick comment, others may spend 2 hours going through all the code. Either way: explain clearly what you've done and where you're at. Useful context to give, for instance, could be:

"Just a quick drive-by comment, didn't review everything"
"I looked at the PR in detail, and only had a few minor comments"
"Started reviewing, but I need you to address point X before I can continue"

Giving this context beyond just the bare comments, helps other reviewers as well as the original contributor better understand the state of the PR, what is needed to push it along, and who should do what. And while you're at it, I'd advice to say something nice and positive as well. Even when you're unhappy with the implementation but understand the motivation for the feature, you can say so: "this would be a great feature to have". At least, acknowledge the work and effort the contributor has put in, as otherwise a bunch of critical comments may come across as very demotivating. Of course, everyone on the team should realise that reviewing is also hard work, requires context switches, and you're only doing this to help and/or that's just your job. Nevertheless, I've found it pleasant to work with team mates who uphold a positive and constructive attitude, even if they have plenty of comments or criticism.

The same principle about open and clear communication is true for the reviewer as well: if you're uncertain about certain parts of the code, simply say so. Perhaps find someone else on the team to do an additional review, or decide to lean on the contributor's expertise. In general, make it clear what the next steps should be for this PR to move along. Avoid getting trapped in the "bystander effect", where e.g. 3 people are requested for review and they all wait for someone else to do it. Avoid relying on implicit expectations such as "I thought you were still working on this" or "I thought X was still reviewing" - clarify if need be and make expectations explicit.

PR's do not necessarily follow a linear path

And finally, as a team, understand that work on PRs, like any other dev work, is not always a linear process. Sometimes you'll need to go back and revisit earlier decisions, or ultimately decide on a different design that requires some refactoring. That's not a problem in itself, as long as you're converging to a final solution and not endlessly debating / refactoring. It's still better to do one more round of changes before merging, than having to do it in subsequent releases and impacting users. As a team, realise that you're all working towards the same goal of improving the code and functionality so that it's ultimately as useful as possible for your users. And in the process, view every PR as an opportunity to learn from other developers and further improve your coding skills, whether you're the original contributor or the reviewer.

Happy coding!

Nick Sorros

CTO and Founder @ MantisNLP | AI Consultancy, NLP

1 年

This is super valuable to anyone new to to open source. Thanks for sharing Sofie Van Landeghem ??

1 次回应

Kevin Knights

Machine Learning Engineer leading operational excellence in AI systems

1 年

Thanks for sharing Sofie Van Landeghem, one question that I have is: “What would you advise to those getting started (or that want to get started) with open-source contribution?”

Chong Shen Ng, Ph.D.

Research Engineer at Flower Labs

1 年

Thanks Sofie Van Landeghem for writing and sharing this! It’s written as good as a high quality PR ??. Very useful as a reminder for myself and your tips similarly apply to closed-source contributions ??.

2 次回应

Sofie Van Landeghem

NLP/ML expert and Open-Source maintainer at OxyKodit

1 年

P.S. Thomas Van Parys: Thinking back fondly on our days when we were writing Diffany together, 10 years ago! Each their own module and code style ?? Not exactly scalable or best practices, but at least we were having a lot of fun in the process ??

查看更多评论

要查看或添加评论，请登录

Sofie Van Landeghem的更多文章

Setting your ML project up for success

2022年12月29日

Setting your ML project up for success

What can you do to maximize probability of success for your Machine Learning solution? Throughout my 15 years as data…

9 条评论
Training a custom Entity Linking model with spaCy

2020年5月7日

Training a custom Entity Linking model with spaCy

If your NLP project involves disambiguating textual mentions to different meanings (linked to unique IDs), this new…

11 条评论
Entity linking in spaCy

2019年7月12日

Entity linking in spaCy

spaCy is an awesome open-source Python library for advanced Natural Language Processing (NLP), designed specifically…

9 条评论

What do effective Pull Requests look like?

How to showcase the value of your work clearly?

Focus on one unit of work per PR

Be your own PR's first reviewer

Any tips for the reviewer?

PR's do not necessarily follow a linear path

Sofie Van Landeghem的更多文章

Setting your ML project up for success

Training a custom Entity Linking model with spaCy

Entity linking in spaCy

社区洞察