Eliminating Testers is the Biggest Mistake that Aspiring Agile Organizations Make
Cliff Berg
Co-Founder and Managing Partner, Agile 2 Academy; Executive level Agile and DevOps advisor and consultant; Lead author of Agile 2: The Next Iteration of Agile
I don’t know where it comes from, but some people have the deep misunderstanding that Agile products don’t need testers.
It is true that manual testing should largely be replaced by automated testing, although not entirely: one still needs exploratory testing, which is inherently manual. But one still needs people whose main focus is testing - automated testing. In fact, the more complex your product is, the more you need those people.
Perhaps the misunderstanding comes from the popular conception of Test-Driven Development (TDD), a software development process whereby one writes low level unit tests prior to writing application code. It is true that for unit tests, programmers write their own tests. However, the TDD process is controversial; and whether one uses TDD or not, unit tests are only a part of overall testing.
Early Internet applications, which were usually simple three-tier architectures - derided as “monoliths” today - could be tested using mostly unit tests, and the “test pyramid” was conceived as a metaphor for how most tests should be unit level, followed above by fewer tests at the integration level, and fewer still above at the outermost “end-to-end” level. However, today’s highly component oriented and distributed applications are different: their problems tend to be integration related, and so a strong case can be made that the test pyramid should be replaced by a test “diamond”, or “test trophy” metaphor.
Look at it this way. Consider your car. Unit tests are analogous to tests of the smallest piece parts - the nuts, bolts, gears, knobs, cables, valves, bearings, crank shaft, pistons, and other pieces that make up the many components of the car. Above the piece part tests, component-level tests - for example, tests of your digital platform’s microservices - are analogous to tests of the car’s components - it breaks, its pumps, its engine (apart from the many components that support the engine), etc. Above that, product level (integration) tests are analogous to tests of the various independent car assemblies such as the engine (with all of its supporting components), the drivetrain, the break system, the chassis, the suspension system, the dashboard, etc. And above that, “end-to-end” tests are analogous to tests of the entire car.
Relying on unit tests for your digital platform is literally like relying on only the lowest level tests for your car - the tests of the nuts, bolts, gears, knobs, cables, valves, bearings, etc. Would you buy a car that only had been tested at that level? I wouldn’t.
In the diagram on the right, the software “nuts and bolts” being tested are the granular “methods” or “functions” - most of which are internal. The larger scale interactions of things are not being tested.
When teams report on their “code coverage”, such as claiming “our code coverage is 80%”, this granular testing is what they are talking about. Code coverage is for the lowest level tests - the unit tests (the “nuts and bolts”). Nothing else. For this reason, unit tests are tragically insufficient - almost comically insufficient - for ensuring that today’s highly distributed Internet scale applications work.
A testing person will know that.
One organization that I helped, which had eliminated all of its testing staff in one broad stroke, was suffering from chronic production incidents in one of its products. There were about 30 Agile teams that maintained the product. It turned out - no surprise! - that the root cause of about half of their incidents pertained to integration problems between their deployed components. (The second largest root cause category was network topology related - a common issue for today's cloud systems if one does not test in a production-like configuration.) These problems went undetected during development because the teams were focused entirely on unit level tests. Each team proudly reported their unit test code coverage in their SonarQube dashboard.
Components - microservices in this case - were unit tested, and sometimes they were component tested a little using Cucumber - most often with one “happy path” scenario and one “unhappy path” scenario - and then deployed straight to the production environment! No one checked if the component level test scenarios were sufficient. A testing expert would have known immediately that they were not.
As for automated integration tests? There were none. Integration testing was manual and ad-hoc. There was a fairly complete set of end-to-end automated tests at the user interface level, but end-to-end tests only test a fraction of the permutations of how components can interact, and it also turned out that when the end-to-end tests were run, many of the back end microservices were “mocked” - so the tests were not actually end-to-end!
These oversights occurred because no one was looking at the entire testing process to make sure that it was robust overall.
The teams thought they were “doing DevOps” because they had component “pipelines”, but it was cargo-cult DevOps because they did not really understand their testing process - because they did not have one. What they had was ad-hoc, with each team doing what it knew, which was programming - not testing - and not worrying very much about how the work of one team integrated with the work of other teams. And by the way, they had a Scrum of Scrums that met once a week.
Eventually I believe that I convinced the development manager of the product that someone was needed in a product level test lead role. Whether it was my persuasion or not, he put someone in that role. The first thing that person did was create a product level Confluence page in which he defined a holistic product testing strategy.
Let’s look at what Google does. Is Google a DevOps shop? Are they an Agile shop? Who cares? - what they are doing is working: they can deploy market-facing features many times a day at enormous scale and their stuff largely works. Here are some of the test-related roles that they have (in their words):
- Test Engineers (TEs) - With their deep product knowledge and test/quality domain expertise, TEs focused on what should be tested.
- Site Reliability Engineers (SREs) - managed systems and data centers 24x7.
- “SETI” engineers - Originally software engineers with deep infrastructure and tooling expertise, SETIs built the frameworks and packages required to implement automation, with an engineering productivity focus.
Testing cannot be “left to the teams”. To do that is to abdicate technical delivery leadership at the product level.
Don’t read too much into the phrase “autonomous self-organizing teams”
Perhaps another source of confusion about this issue is the maxim that in an Agile or DevOps organization, products should be built by “autonomous self-organizing teams”. However, like so many Agile maxims, it does not work if you take it literally or interpret it in an absolute manner.
Teams need to be largely autonomous, so that they can complete their work without depending too much on other teams - that is what the “autonomous” is about. But when you have multiple teams, they still require coordination and oversight - in other words, there needs to be leadership at a product level, with respect to the technical delivery processes used by the teams. Servant leadership, but leadership nevertheless.
In the work that I do, I spend most of my time talking to managers, because managers make decisions about the operating models for product teams and supporting teams. We need managers to make decisions, because a pool of autonomous teams that has no systematic coordination, measurement, and leadership will flounder. Managers are leaders, and leadership roles are important for large, complex digital platforms. However, their decisions can be dangerous. Managers need to have a deep understanding of Agile and DevOps, or they will make poor decisions - such as the colossally poor decision that no testers are needed.
Helping clients with Quality Engineering, Cloud/Data/Automation Testing, Test Program Mgmt, Digital transformation
4 年Good article. But am still surprised how easily the biggest tech companies with billions to spend are always used for comparison/argument. The normal enterprise with IT spend has legacy systems, regulations to comply, unintegrated parts, brick and mortar channels, low budgets to contend with. For them, testing was always a must but now “mostly automated testing” is a must.
Senior System Analysis Engineer at Volvo Car Sverige
4 年Excellent article! Test tools (mid-termed as Test automation) will aid testers but will not replace them. Testing is human to the core and cannot be replaced by machines. If companies do, then they will not sustain in long term. What machines can do is “checking”, NOT “testing”! We can only “automate” the tests with our explicit knowledge but not the tacit knowledge the tester has. This is widely misunderstood by leadership across industries. Testers are helpless!
garden gupshup
4 年https://youtu.be/QBHoy8gA2Jk