登录查看更多内容

To retry or not, is the question

Satyajit Malugu

Force multiplier for mobile engineering organizations

发布日期: 2023年10月21日

Flaky tests are a big problem for an affective continuous integration setup. We want our CI to be running all tests and continuously providing feedback about changes going in to main. But flaky tests disrupt that flow and make developers question the usefulness of CI.

The worst outcome for this situation is to let CI ‘pass’ and allow failures.

allow_failure: true

It's a slippery slope and developers won’t know whether tests are failing/passing/flaky and they start ignoring tests all together. A slightly bad outcome is the mindset of a ‘manual’ retry. i.e. the first instinct of a developer when they see a failure in the pipeline is the hit the retry button instead of figuring out the error.

The focus of this article is an automatic retry i.e when a pipeline fails when it is ok to programmatically retry. We want to test to truly fail a pipeline only if there is truly a problem but not for external issues.

Criteria for flakiness

Tests can be flaky for a number of reasons but broadly speaking they can broken to three aspects.

领英推荐

Testing Microservices vs Monolithic

Japneet Sachdeva 7 个月前

Git merge vs Git rebase: A beginner's guide to Git…

Powerplay 1 年前

Testing Queues and Batch Jobs

Diego Pacheco 2 个月前

Test issues: i.e tests by themselves are flaky and they give different results even though all the other variables are solid
Infrastructure issues: Every test has to run in some context, a lower level i.e a unit test might run only in memory but an End-to-End test might need a iOS simulator and making network calls.
Downstream affects: If your tests are not isolated, mocked to be unit tests, they might be failing with issues not from your code. Perhaps you interact with a backend server to get the 2FA login token and that service might not come back in a reasonable time.

Retrying failed tests can be employed in various ways and it depends on the criteria above. Be mindful of tradeoffs between test complexity, code base maintenance (legacy vs active) and the impact of a flaky pipeline failure.

Recommendations

Don’t employ a retry strategy blindly, if you are practicing good test practices and have stable infrastructure you might not even need a retry.
If your platform lets you specify when to retry eg: gitlab’s retry:when?use that and only retry for known flaky infrastructure reasons.
Start with a lower retry attempt count and only increase if it makes sense.
Also, if you have retries, make sure your job timeouts are reasonable. Say you really had a genuine failing test and you had 3 retries and a job timeout of 1hr. So it could take up to 3 hours before a developer sees the failure.
Prefer lower retry levels i.e test level > suite level > job level so that you are conserving resources and providing feedback as quickly as possible.
Explore the retry options provided by your test framework, check iOS, Android with gcloud espresso/fire base test lab or Flutter suite level and test level retries.

Conclusion

Just like many other software development practices, retrying tests should be based on evaluating tradeoffs and principles. Don’t be blinded with blanket arguments that retry is always a bad pattern.

To retry or not, is the question

Satyajit Malugu

Force multiplier for mobile engineering organizations

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Strategy for Effective and Efficient Developer Teams

Testability - a developers armour

Frustrated Engineer Lost 4 Hours for Continuous Integration

Never ending saga of broken staging environments for testing Micro-services : Challenges, Risks and approaches.

The testing pyramid is dead

Expedite Test Automation Code Review Process (Integration with SonarQube)

Hello World!

Elevating Quality Engineering: Exploring Contract Testing and Pact Testing in Microservices

Code Reviews

Component Contract Testing is Insufficient

领英推荐

What San-Ti Learned from John Galt: The Power of Nerds

2024年4月5日

FRS theorem of test automation

2024年3月5日

Handle feature outages gracefully in mobile apps

2024年2月15日

Elevate your mobile development cycle with a hidden debug screen

2023年11月8日

Single job vs multiple jobs in CI

2023年11月1日

Retry vs timeouts for CI pipelines

2023年10月25日

When to combine your mobile repos to a monorepo

2023年10月15日

Takeaways from ‘Testing without Testers

2016年5月13日

Triaging a device specific bug

2015年1月31日

Free personal devices for mobile devs

2014年12月22日

社区洞察

其他会员也浏览了

Strategy for Effective and Efficient Developer Teams

Testability - a developers armour

Frustrated Engineer Lost 4 Hours for Continuous Integration

Never ending saga of broken staging environments for testing Micro-services : Challenges, Risks and approaches.

The testing pyramid is dead

Expedite Test Automation Code Review Process (Integration with SonarQube)

Hello World!

Elevating Quality Engineering: Exploring Contract Testing and Pact Testing in Microservices

Code Reviews

Component Contract Testing is Insufficient