The Worst or Most Difficult Bug ?? I’ve Encountered
Alex Andrade, M.Eng., Master QA Automation Engineer
M.Eng., MBA y Especialista en Gerencia de Proyectos
Someone asked me what the worst bug I have encountered and solved.?
In a widely used application I worked on, we found that 3 out of every 1000 transactions (0.003% of the time) were recorded with another customer's name. Our team suspected it had something to do with concurrency, but the curious thing was that it had nothing to do with:
Due to the low frequency, only a few hours were allocated for investigation, during which no solution was found. For almost a year, we only occasionally debated the strange phenomenon and its possible causes until one day, I found the solution while trying to fix something else.
The error was caused by a combination of anti-patterns (the opposite of design patterns) and other human errors during development in a class of the project that:
Explanation
After finding the problem and solution, we could understand exactly why the situation occurred: a mix between system behavior and a part of user behavior that we were unaware of:
领英推荐
Trying to intervene as little as possible and provide a quick solution, I eliminated the class's global variables and modified all involved methods and objects to communicate values via parameters. Recognizing that the class exposed two services simultaneously, I also modified all involved techniques and objects to communicate values via parameters.
Broken Windows Theory?
Shortly before finding the solution, since it was impossible to reproduce the situation in a test environment, I decided to place some logging points where I suspected data crossing could occur and throw an exception if swapped data arrived at the end of the flow. At that moment, we realized that the case did not happen three times per 1000 transactions (0.003% of the time) but about 30 times per 200 (15%), meaning it was pretty common, and users had simply learned to live with the problem, quickly passing the user through the flow again, ensuring the entire flow was handled by the same execution thread.
This situation reflects the Broken Windows Theory in the context of software products: not fixing defects (broken windows) quickly leads to undesired behaviors in society (users), such as vandalism and apathy, with the latter being the most detrimental to software. Our users did not report this very common situation because they did not believe we would take them seriously (apathy); we never prioritized it because we thought the situation occurred very few times a month.
Conclusion
?
In retrospect, two actions can be taken in this situation, one preventive and the other corrective:
#bug #code-smells #concurrencia #scalability #experience #thread #instance #issue #multi-thread #design-pattern #anti-pattern #services #solid #srp
?