登录查看更多内容

The Worst or Most Difficult Bug ?? I’ve Encountered

Alex Andrade, M.Eng., Master QA Automation Engineer

M.Eng., MBA y Especialista en Gerencia de Proyectos

发布日期: 2024年7月13日

+ 关注

???? Ver versión en Espa?ol ????.

Someone asked me what the worst bug I have encountered and solved.?

In a widely used application I worked on, we found that 3 out of every 1000 transactions (0.003% of the time) were recorded with another customer's name. Our team suspected it had something to do with concurrency, but the curious thing was that it had nothing to do with:

The time.
The region of the country.
Matching data between customers.
Common data among users registering the information: office, manager, etc.
It wasn't simultaneous transactions or transactions created almost simultaneously; there could be hours of difference between the entered data.

Due to the low frequency, only a few hours were allocated for investigation, during which no solution was found. For almost a year, we only occasionally debated the strange phenomenon and its possible causes until one day, I found the solution while trying to fix something else.

The error was caused by a combination of anti-patterns (the opposite of design patterns) and other human errors during development in a class of the project that:

Violated SOLID's Single Responsibility Principle (SRP), summarized as "one class, one task." In this case, the class exposed two services simultaneously.
It exhibited the code smell of "Variable Shadowing" or "reusing variable names in different scopes," which generally only confuses developers when reading the code. Combined with the previous list item, this generated the bug. See Do not reuse variable names in sub-scopes.
This was difficult to perceive since the class had more than 1000 lines, making it an excellent example of the "God's Class" anti-pattern. However, static code analysis using SonarQube had reported the code smell as soon as it was produced, and everyone on the team (including myself) took it as an aesthetic suggestion.

Explanation

After finding the problem and solution, we could understand exactly why the situation occurred: a mix between system behavior and a part of user behavior that we were unaware of:

To ensure application availability, a horizontal scalability strategy is applied. The system generally has one instance and three threads, increasing to nine during peak hours.
The usual behavior of users caused them to process a client quickly through the entire application flow. Thus, when the application flow called the second service of God's class, the same instance that had executed the first service responded.
However, the business dynamics sometimes caused a pause in the application flow while the user and client negotiated or clarified terms. When the flow resumed, it could be handled by a different thread with another client's data in the global variable.

领英推荐

How to calculate LOC (Lines of Code) coverage for C#…

testRigor 5 个月前

How to measure metrics or validate attributes in APIs…

Coditation 1 年前

SOFTWARE BUG

Darshika Srivastava 4 个月前

Trying to intervene as little as possible and provide a quick solution, I eliminated the class's global variables and modified all involved methods and objects to communicate values via parameters. Recognizing that the class exposed two services simultaneously, I also modified all involved techniques and objects to communicate values via parameters.

Broken Windows Theory?

Shortly before finding the solution, since it was impossible to reproduce the situation in a test environment, I decided to place some logging points where I suspected data crossing could occur and throw an exception if swapped data arrived at the end of the flow. At that moment, we realized that the case did not happen three times per 1000 transactions (0.003% of the time) but about 30 times per 200 (15%), meaning it was pretty common, and users had simply learned to live with the problem, quickly passing the user through the flow again, ensuring the entire flow was handled by the same execution thread.

This situation reflects the Broken Windows Theory in the context of software products: not fixing defects (broken windows) quickly leads to undesired behaviors in society (users), such as vandalism and apathy, with the latter being the most detrimental to software. Our users did not report this very common situation because they did not believe we would take them seriously (apathy); we never prioritized it because we thought the situation occurred very few times a month.

Conclusion

In retrospect, two actions can be taken in this situation, one preventive and the other corrective:

Take static code analysis recommendations very seriously, implementing a culture of 100% indicators within the definition of done for each feature.
Prioritize defects as soon as they are reported, regardless of their frequency. If the solution is not found, consider setting up error logging points or even throwing exceptions to prevent the situation from occurring entirely.

#bug #code-smells #concurrencia #scalability #experience #thread #instance #issue #multi-thread #design-pattern #anti-pattern #services #solid #srp

要查看或添加评论，请登录

Alex Andrade, M.Eng., Master QA Automation Engineer的更多文章

5 automated ways to ensure Code Quality

2024年8月9日

5 automated ways to ensure Code Quality

???? Ver versión en Espa?ol ????. Ensuring code quality is essential, and as a QAE or SDET, you must provide your team…
Should I manually run a full regression test after fixing a bug?

2024年7月26日

Should I manually run a full regression test after fixing a bug?

???? Ver versión en Espa?ol ????. Recently, I discussed with a group of QA friends whether running a complete…
SDET vs QA: The Latest Debate in Testing

2024年7月18日

SDET vs QA: The Latest Debate in Testing

???? Ver versión en Espa?ol ????. What is an SDET? What is the difference between an SDET and a QA? Is a specialized…
Administrar varias versiones de JDK al tiempo en MacOS o Linux

2021年3月27日

Administrar varias versiones de JDK al tiempo en MacOS o Linux

1 ADMINISTRAR VARIAS VERSIONES DE JDK AL TIEMPO EN MACOS O LINUX Generalmente, aunque podamos instalar varias versiones…

1 条评论
Como Microsoft venció la burocracia con Agile

2019年9月2日

Como Microsoft venció la burocracia con Agile

Microsoft tiene ingresos en rápido aumento y hoy (23-ago-2019) volvió a ser la empresa más valiosa del planeta, con un…

4 条评论
Estimación de pruebas en la era de Agile y DevOps

2019年8月10日

Estimación de pruebas en la era de Agile y DevOps

Estás en camino de transformar tu equipo de prueba, integrando roles de prueba en equipos ágiles y adoptando…
5 tendencias que están remodelando el futuro de la gerencia de proyecto

2019年7月18日

5 tendencias que están remodelando el futuro de la gerencia de proyecto

Crear un plan de proyecto y ejecutarlo es muy demandante. Esto ha llevado al surgimiento de una serie de herramientas…
Exponegocios: transformación digital, Google, BBVA y el internet de las cosas

2019年5月26日

Exponegocios: transformación digital, Google, BBVA y el internet de las cosas

El jueves 25 de mayo se realizó en Cali el evento Exponegocios con la participación de más de 3000 asistentes (la…
El éxito de la automatización de pruebas en entornos ágiles

2018年6月28日

El éxito de la automatización de pruebas en entornos ágiles

Texto original en inglés de Paul Clauson A pesar de que el desarrollo de software ágil se ha vuelto bastante común…

See all articles

The Worst or Most Difficult Bug ?? I’ve Encountered

Alex Andrade, M.Eng., Master QA Automation Engineer

M.Eng., MBA y Especialista en Gerencia de Proyectos

Explanation

领英推荐

Broken Windows Theory?

Conclusion

Alex Andrade, M.Eng., Master QA Automation Engineer的更多文章

社区洞察

其他会员也浏览了

Upgrading Symfony: A Step-by-Step Guide to a Seamless Transition

Metalama Status Update (October 2022)

JZLint: An open-source lint software for PKI structures

Best Practice: Third-Party Code Review

Lab Workbook: Building a SOAP Application using Maven

Don't let mocks in your web3 protocol tests fool you

Mastering Exception Handling in C# .NET: A Comprehensive Guide

JMeter is my bottleneck!!!*

Squid's (not the game)

The Dependency Inversion Principle – The Final Piece of the SOLID Puzzle

Explanation

领英推荐

Broken Windows Theory?

Conclusion

Alex Andrade, M.Eng., Master QA Automation Engineer的更多文章

5 automated ways to ensure Code Quality

Should I manually run a full regression test after fixing a bug?

SDET vs QA: The Latest Debate in Testing

Administrar varias versiones de JDK al tiempo en MacOS o Linux

Como Microsoft venció la burocracia con Agile

Estimación de pruebas en la era de Agile y DevOps

5 tendencias que están remodelando el futuro de la gerencia de proyecto

Exponegocios: transformación digital, Google, BBVA y el internet de las cosas

El éxito de la automatización de pruebas en entornos ágiles

社区洞察

其他会员也浏览了

Upgrading Symfony: A Step-by-Step Guide to a Seamless Transition

Metalama Status Update (October 2022)

JZLint: An open-source lint software for PKI structures

Best Practice: Third-Party Code Review

Lab Workbook: Building a SOAP Application using Maven

Don't let mocks in your web3 protocol tests fool you

Mastering Exception Handling in C# .NET: A Comprehensive Guide

JMeter is my bottleneck!!!*

Squid's (not the game)

The Dependency Inversion Principle – The Final Piece of the SOLID Puzzle