To retry or not, is the question

To retry or not, is the question

Flaky tests are a big problem for an affective continuous integration setup. We want our CI to be running all tests and continuously providing feedback about changes going in to main. But flaky tests disrupt that flow and make developers question the usefulness of CI.

The worst outcome for this situation is to let CI ‘pass’ and allow failures.

allow_failure: true

It's a slippery slope and developers won’t know whether tests are failing/passing/flaky and they start ignoring tests all together. A slightly bad outcome is the mindset of a ‘manual’ retry. i.e. the first instinct of a developer when they see a failure in the pipeline is the hit the retry button instead of figuring out the error.

The focus of this article is an automatic retry i.e when a pipeline fails when it is ok to programmatically retry. We want to test to truly fail a pipeline only if there is truly a problem but not for external issues.


Criteria for flakiness

Tests can be flaky for a number of reasons but broadly speaking they can broken to three aspects.

  1. Test issues: i.e tests by themselves are flaky and they give different results even though all the other variables are solid
  2. Infrastructure issues: Every test has to run in some context, a lower level i.e a unit test might run only in memory but an End-to-End test might need a iOS simulator and making network calls.
  3. Downstream affects: If your tests are not isolated, mocked to be unit tests, they might be failing with issues not from your code. Perhaps you interact with a backend server to get the 2FA login token and that service might not come back in a reasonable time.

Retrying failed tests can be employed in various ways and it depends on the criteria above. Be mindful of tradeoffs between test complexity, code base maintenance (legacy vs active) and the impact of a flaky pipeline failure.

Recommendations

  1. Don’t employ a retry strategy blindly, if you are practicing good test practices and have stable infrastructure you might not even need a retry.
  2. If your platform lets you specify when to retry eg: gitlab’s retry:when?use that and only retry for known flaky infrastructure reasons.
  3. Start with a lower retry attempt count and only increase if it makes sense.
  4. Also, if you have retries, make sure your job timeouts are reasonable. Say you really had a genuine failing test and you had 3 retries and a job timeout of 1hr. So it could take up to 3 hours before a developer sees the failure.
  5. Prefer lower retry levels i.e test level > suite level > job level so that you are conserving resources and providing feedback as quickly as possible.
  6. Explore the retry options provided by your test framework, check iOS, Android with gcloud espresso/fire base test lab or Flutter suite level and test level retries.


Conclusion

Just like many other software development practices, retrying tests should be based on evaluating tradeoffs and principles. Don’t be blinded with blanket arguments that retry is always a bad pattern.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了