Testing Blind Spots
Over the years of experience in industry I have noticed repeated recurrences of bugs of certain types which leads me to believe that there are some areas which are what I may call blind spots in testing.
Following are some of the areas that QA engineers are likely to miss when focusing on qualifying a product or component within a limited time -
1. Error handling and recovery
2. Feature Interactions
3. Contention, deadlock and race
4. Config coverage
The reasons for areas in points 1), 2) and 3) above being blind spots are due to various reasons. The first and foremost is lack of visibility and understanding. In order to be effective in these categories of testing there needs to be clarity w.r.t the following -
1) Failure and recovery paths existing in the code and reach ability to these paths
2) Common failure scenarios in the field that customers experience on a regular basis in their environments. This could anything related to the environment, e.g loose connections in network cables leading to "port flaps" or the way administrators usually operate. e.g manually typing entries in config files leading to typographical errors.
3) Operational boundaries between the component being tested and other products and components that run along with the component being tested. This could be common resources on which contention occurs or interfaces between the components. Contention, deadlock and race needs a focused approach that considers resources, threads and their interplay. Unfortunately there is very limited visibility into the above aspects. There needs to be significant emphasis on these aspects at the time of functional and test design. Also there needs to be a strong governance that monitors consumers of common resources and ensures there is enough headroom. Individual product teams would have limited visibility into common resources and they may end up leaving no headroom in the overall system
Over and above the above lacunae, testing of supported configurations is an area that needs attention. Earlier the issue used to be around visibility but nowadays the documentation has improved and this has largely helped to address the visibility issue. Also customers are insisting all supported configs get published in the documentation and they diligently follow documentation.
But there still is a challenge by virtue of the sheer scope involved -
There are a large number of configs that get deployed, however testing usually covers a limited set of configs due to time and resource constraints. There needs to be a methodical approach that lists all possible configs and as a policy each of these get covered at least once in the test cycle. A prioritized list identified in consultation with Product Manager and Customers will help to identify which configs would need to undergo repeated testing and which would need to be tested once. In case some configs are not available in-house then partners need to be engaged to test those configs. Unfortunately in many cases we get by with a "paper qual" where we review changes and decide based on those changes physical qualification is not needed. Such an approach has pitfalls and should be avoided as far as possible.