登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

Unable To Reproduce

Bef Ayenew

Technology Leader

发布日期: 2020年2月18日

+ 关注

“Seeing a spider is not a problem. But it becomes a problem when it disappears.”

I can only fix what I can reproduce and I should be able to reproduce any real issues. That is the mindset that many engineers have when they are triaging issues and trying to reproduce bug reports. On one hand, it’s hard to find fault with their approach because ruthless prioritization dictates the need to prioritize issues that are impacting more users and easier to reproduce. However, this approach also leads to a false equivalency between ease of reproduction and issue severity, sometimes leading to the ill-advised de-prioritization of important bugs.

Show Empathy. A product leader at LinkedIn recently confided in me how helpless it made him feel to report a seemingly severe bug only to see it shelved just because the oncall engineer could not reproduce. I can easily empathize with this leader because I would be equally frustrated if I kept experiencing the same issue with no remedy in sight. Well, for one, it is always important to openly acknowledge that the reporter is getting a less than ideal experience because of a bug we introduced, albeit a difficult one to reproduce...so far. Just because the issue is intermittent, it does not make the reporter any less credible or the engineers any less culpable. As the builders and owners of the experience, we are completely at fault here and should take full responsibility for this suboptimal experience, which is only compounded by our inability to offer the reporter immediate relief or at the very least, reason for hope.

Be Transparent. As part of our triage, we should overcommunicate on all the steps taken to reproduce this bug. This serves two purposes. First, it gives the reporters enough confidence that their report was taken seriously and sufficient effort went into trying to address their concerns. Secondly, it leaves a valuable audit trail of the work that has already gone into trying to reproduce this bug so we can be methodical in our triage, should this bug reappear or we decide to come back to it later. This documentation should ideally also include any observations around metrics and other signals that are helpful in capturing the severity and impact from the observed behavior.

Be Systematic. As product owners, we should be systematic in our reproduction efforts because a random check to find a random bug is only going to lead to more random outcomes. For instance, try to have multiple people reproduce because one person’s biases and usage patterns may not allow for the easy discovery of the bug. Or, we could try to narrow down the possibilities through a process of elimination instead of just randomly trying different paths to the faulty behavior.

Look for Correlation. And lastly, we should keep a careful eye on the issues that we are unable to reproduce and look for any suspicious patterns. Isolated, these incidents may seem like unusual corner cases, but considered in totality, they may reveal something far more troubling with how the system was architected. Often times, issues that are difficult to reproduce are manifestations of erratic side-effects borne out by the software’s failure to behave systematically. And much like the sneaky spider, these bugs can continue to wreak havoc under the cover of non-determinism.

These issues can often be irritating for both reporter and owner, and they are just as inevitable as they are unwanted. So how do you and your organization strike the right balance between responsible product ownership, ruthless prioritization and empathy for your bug reporters, especially when these issues persist?

I want to thank Pete Davies for providing the inspiration for this topic, being generous with his time and giving me helpful editorial feedback. I also want to thank Lee Mallabone for his help in completing this piece.

To see my writings beyond "Stuff Engineers Say," visit my articles page or follow me.

Stuff Engineers Say...

35,939 位关注者

Richard Fletcher (Ignys Ltd)

Expert at helping you turn your ideas into innovative, reliable, profitable & manufacturable electronics and software products

4 年

I’ve seen the situation where even given the exact steps to reproduce a bug there was a huge reluctance to reproduce the issue. In this case there was a big “them vs us” culture where the product was designed on one site and tested at another.

1 次回应

Anthony Veloccia

Developer

5 年

Unable to reproduce is fine as a status for a bug so long as it doesn't end up in the pile with resolved reports. At sites I belong to where members are stakeholders, the bug tracking system is publicly visible so you marking a bug as unable to reproduce does not prevent me/another user from trying to reproduce. More info is good. Since I am another average person at many sites and the dev crews make up not even 1% of total users in them...it's more likely that someone else can see that [unable to reproduce] label and has either already come across it or could possibly reproduce it which may bring forward a viable solution from a dev who otherwise can't solve problems which they can't identify.? It works for them because that place cares more about finding all the bugs and fixing them than they do about people knowing how many bugs something shipped with.? We as the community around these places benefit from it and we like them that much more for it. We spend our time helping improve their product and increasing the value of their company and in direct return we get a better product and a more stable ecosystem. It's symbiotic, and a public/UG bug tracker mostly builds trust and brings faster results in a two-way fashion.

1 次回应

Amitava Ray Chaudhuri

Staff Engineer at LinkedIn

5 年

Great article Bef. Now a days softwares are so complex that there are millions of usecases that needs to be tested. The reproducibility probability of few issues are high but for few issues it’s so low that they appear ghostly issues and reproducing them require many attempts. This reminds me the experiments performed by Lord Henry Cavendish to find the value of Universal Gravitation Constant. Reproducing a low probable issue is an example of great craftsmanship and every failed attempt by developer gives him a lot of experience. So definitely you made a great point.

2 次回应

Abe Coffman

Engineering Leadership @ LinkedIn

5 年

It strikes me that often times bugs are hard to reproduce due to the complexities of state within the system. Somewhere someone will make an argument for purely functional programming.

2 次回应

Chris Pruett

SVP of Engineering @ Yahoo

5 年

Great article, Bef. I’d like to add one more thought to the conversation. A critical skill to develop and practice is to form and test hypotheses about what’s causing the bug. I think a common fallacy is to focus too much on reproducing from the outside in. As engineers, we shouldn’t treat it as a black box. We know how the software is built and if we ask ourselves what might cause the bug, we can often get to the root without the benefit of perfect repro steps.

11 次回应

查看更多评论

要查看或添加评论，请登录

Bef Ayenew的更多文章

The Art of The Documentation

2022年8月31日

The Art of The Documentation

Earlier this week I was having a conversation with a co-worker and the subject of documentation came up. As someone who…

16 条评论
What Does Your Manager Want From You?

2022年7月28日

What Does Your Manager Want From You?

This may not be the answer they teach in business school but most managers want you to minimize their anxiety levels…

11 条评论
The Vital Role Senior ICs Play as Leaders

2022年6月6日

The Vital Role Senior ICs Play as Leaders

Engineering, much like many other functions, has two career tracks: management and IC (Individual contributor). But…

6 条评论
Jammin to Stuff Engineers Say

2022年3月30日

Jammin to Stuff Engineers Say

So what do Microsoft’s CTO and Linkedin’s COO have in common? Well, other than being incredible technology leaders…

6 条评论
A pandemic, an invasion and the express need for empathy at work

2022年2月28日

A pandemic, an invasion and the express need for empathy at work

This week, my plan was to pen a post on personal branding but that post will have to wait for another time because this…

11 条评论
An Ode to Deadlines

2022年1月31日

An Ode to Deadlines

Many years ago, I had a conversation with a co-worker at Oracle who shared with me his theory on why MIT grads didn’t…

26 条评论
Reflections | 2021 Edition

2021年12月31日

Reflections | 2021 Edition

As we are getting ready to bid 2021 goodbye and welcome 2022, it seems appropriate to take a moment and reflect a bit…

11 条评论
Availability Is The Best Ability.

2021年10月27日

Availability Is The Best Ability.

It’s fantasy football season again, and I'm happy to be back to wheeling and dealing, trying to get that elusive win…

4 条评论
My Untold Struggles

2021年9月28日

My Untold Struggles

People typically recognize us for what we have achieved, but quite often, that perception is lacking in nuance and…

11 条评论
So Long Silicon Valley. Hello Seattle.

2021年8月30日

So Long Silicon Valley. Hello Seattle.

I arrived in Silicon Valley several years ago with dreams of building the next big thing. In hindsight, I had neither…

31 条评论

See all articles

Stuff Engineers Say...

35,939 位关注者

Bef Ayenew的更多文章

The Art of The Documentation

What Does Your Manager Want From You?

The Vital Role Senior ICs Play as Leaders

Jammin to Stuff Engineers Say

A pandemic, an invasion and the express need for empathy at work

An Ode to Deadlines

Reflections | 2021 Edition

Availability Is The Best Ability.

My Untold Struggles

So Long Silicon Valley. Hello Seattle.

社区洞察