Code Coverage is used wrong
Adrian Richter
Mature Lead Software Engineer, Software Architect, Manager of Solution Delivery
I recently came across this following maths problem..
You must run 2 laps of a track. Find the speed you must run on a second lap around the same track in order to double your average speed for the 2 laps combined.
If you don't know the answer, let me tell you at the end.
But this is not about Maths (or as American's say, Math), but an article about Code Coverage and why we are using Code Coverage wrong.
What is Code Coverage?
Code Coverage is a term used quite broadly, but specifically it can be thought of as the percentage of code covered by automated tests.
How do we interpret the number?
Generally speaking the closer the number to 100%, the more units, methods, branches or lines within a code based is covered with tests.
0% code coverage means no automated testing.
100% code coverage is the maximum automated test coverage.
In layman's terms, the higher the number, the better quality the code.
And generally you would be right.
What are we missing?
The number literally means just the percentage of lines covered by the automated tests run while generating the code coverage metric.
It does not include manual testing, user testing or any other factors in the quality control process and often some factors that are not considered quality processes.
So is a metric that is associated with testing. Actually, NO. It is literally the amount of code that is covered when running a test.
Well tested code is well covered, but well covered code does not mean it is well tested.
Thought Experiment...
Let's say we have 2 code bases of similar size. They have 20% and 90% coverage respectively.
The first code base has been around for 20 years, making it a legacy code base, and only in the last year has automated testing been introduced. Up to that time, all testing was performed by a testing team, along with user acceptance testing before realease. The quality is pretty high and the defect rate after release has been pretty good.
The second code base is relatively new, a few years old, however automated tests were written from the start. Every bit of code written was accompanied by an automated test, which proved the code written worked within the expected behaviour and requirements. This is referred to as Test Driven or Behaviour Driven Development. For those in the know, I am not saying whether it was Test First or Not, just that there is an accompanying test.
Which one has higher quality?
The Answer
The true answer is...we do not have enough information, but on the surface, the 90% coverage is better.
If we have only a single metric, we can only use that to determine what is better.
But we left out a number of factors. The 90% coveraged code base was produced recently, using modern techniques.
The general consensus is that code can be written in less time with fewer defects when the accompanied with automated tests. This will lead to higher code coverage.
So the outcome is higher code coverage when you follow these techniques. So reality is that higher code coverage is a result of using good techniques from the start of the project.
How are we getting it wrong?
So if higher code coverage means higher quality, then increasing the code coverage will lead to higher quality...Right?
Actually...NO.
While higher code coverage is an indication of quality, increaing the coverage does not increase quality and may introduce development behaviours that may be detrimental to the code base over time.
An Example
Let's say we take a code base with no automated test, i.e. 0% test coverage. This is obviously an extreme example, but we have to start with nothing to get to something.
We draw a line in the sand wand we say,
领英推荐
"From this day forward, all new code must have 80% line coverage."
This is great. We now have a clear target. 80% of new lines of code, must be accompanied by a test.
80% effectively means that for every 5 lines of code produced, 4 of them must be covered. Any change with less than 5 lines, requires 100%.
For new code, getting 100% or close to 100% can be relatively easy as you can cover all scenarios with tests and generate the required coverage while getting the code right.
Now consider changing existing code, which has no existing tests and hence no coverage. A single line change. A relatively insignificant change. But we need that line to be 80% covered.
Because it is a single line of code (<5 lines), we need to cover 100%, because it is a single line. You cannot reduce it down anymore. So you have to call the method that line is on, which can have multiple branches, and could be encapsulated with many different classes/units to get to that line of code.
So the testing effort becomes significantly more cumbersome as you cannot change the code as it could break existing functrionality. So you are forced to work within the constraints for the existing code.
The positive side is that we now have more test coverage than before and it is on an upward tragectory over time. However, we are not testing all that other code. We only testing the change we made. The automated tests are not testing the unchanged code, just the change that was made. The coverage would report that all the code touched is now covered. As we go on with the changes, the code coverage is going up. We are moving towards a better and higher quality code base.
But we have not addressed the problem. Code changes are now taking longer than before, because we have to write tests for code we didn't write. The code we are not changing is now being covered, but it not being tested...at least not correctly. It gives a false sense of quality as tests, that are there to verify correct execution are now being used to improve covereage rather than their true intention.
The behaviour that starts to become apparent is to have a test that simply calls a method in order to cover the changes. The test does not verify/assert behaviour and we get what we call in rowing terms, "Putting your oar in the water."
Increasing code coverage is the wrong message. Code Coverage is the result of changes in behaviour
Hence the maths problem...how fast must you go on the second lap to double you average overall? 2 times faster? 3 times faster? Speed of Sound? Speed of Light? The answer is...It's impossible.
Increasing Code Coverage afterwards will not give you the result you want. Once the code is written, going back to cover it, is actually an anti-pattern.
Increasing Code Coverage comes with a Cost
The higher the Code Coverage the longer things take. As you approach 100%, it is much more effort to maintain 100%. Similar to the speed of light, as you get closer you must expend more energy.
Most teams who do Test Driven Development have a good coverage of anywhere from 50% to 95%. Personally a good balance between effort and coverage is around 65%-80%, but there are many different variables around what is a good number.
If you start out high, it is easy to maintain high, even when taking on new requirements. But as soon as it starts to drop, the effort to get it backup can be harder.
If you start with nothing, it can be almost impossible to reach 100%.
The lower you at then start, the harder it will be to increase it
What should we do instead?
Once you have measured the code coverage of you code base...that is the coverage you should aim to meet. Existing code should then be excluded from all measurements.
Eventually, you will need to modify the existing. When you do, you will need to refactor the code. But going in and changing the code, you will not be able to test all the code with the same intentions as the original author of the code.
Hence, write Characterization Tests, as highlighted in "Working Effectively with Legacy Code" by Michael Feathers. This is a great book at how to get a unit of code under test before tackling the required change. The book gives many techniques on how to split code apart to make it testable, without breaking the fragile logic within.
As you do this, and it will take a lot of time, you will start to get see the increase in coverage, but at the same time, really start testing the system, in such a way that changes to existing code can be made with confidence.
Confidence to Change
A quality code base is one where the person changing the code can make change with the knowledge that they cannot make a mistake, because previous authors have put in place enough tests that the change being made would highlight any issues.
The software developer can change things with confidence knowing that they are free to think about the problem at hand without the fear of introducing a side effect or defect.
There is no metric for Code Confidence.
Conclusion
Good Quality Code produced with Modern Techniques CAUSES High Code Coverage.
Increasing Code Coverage on its own does not lead to better quality code and good techniques.
The true measure of quality of a code base, should be the confidence that the code can be modified easily and reliably.
"Putting your oar in the water" is a term used when a rowing. Basically it is when a rower, or rowers, are not putting in any effort. In a larger boat crew, during a race, if someone is getting tired, it can be easy to just go through the motion of rowing, but not put any effort into pulling to oar through the water, but still moving back and forth with everyone else. Stopping rowing while everyone is rowing at full speed can cause injury, such as oar in your back from the guy behind, or putting your oar into the person in front.
In a rowing boat, you can feel when not everyone is "pulling their weight". When there is one person "not rowing", it is easy to figure it out who, as the timing and puddle size changes. If more than one person just does minimum effort, the whole rhythm of the boat changes. You start to feel it. Eventually you have to reduce your effort, in order to keep the rhythm. Eventually everyone slackens off and the whole boat slows down.
Rowing is the only team sport, that I can think of, where everyone is doing the same thing at the same time in the same way. Every other team sport relies on each person using their skill at their time (eg. when you have the ball).
I agree. The amount of confidence a 'good' test gives to a developer is immense. There were instances for me where there were production issues and I was able to strongly suggest that this logic is unlikely an issue, as I had good tests written and focus on diagnosis on external factors. At the same time, higher code coverage DOESN'T equate to Quality. I know instances where methods meant to be written as private (assume java) were exposed just for tests under the assumption of high coverage, but rather introducing a design flaw.
Software Development & DevOps Person
3 周High code coverage == high confidence to change something. Code coverage is just a safety net (the wider the net, the higher chance you're gonna see if your change introduced a bug). But yeah...good coverage doesnt really mean you're working on quality code ??
Group Head of Engineering at MoneyHero Group
3 周“Write tests, not too many, mainly integration” Kent C. Dodds
Interest Rate Derivative Strats | Nomura | IIT | Ex-(JP Morgan, Credit-Suisse, Citi, Standard Chartered)
3 周Insightful.
Software Technical Lead | Fintech
3 周Great read Adrian.