week 51 - why developers implement OS-specific tests, does Treatment Adherence Impact in TDD and A framework for compliance rules for TDD
Besouro: A framework for exploring compliance rules in automatic TDD behavior assessment
The improvements promoted by Test-Driven Design (TDD) have not been confirmed by quantitative assessment studies. To a great extent, the problem lies in the lack of a rigorous definition for TDD. An emerging approach has been to measure the conformance of TDD practices with the support of automated systems that embed an operational definition, which represent the specific TDD process assumed and the validation tests used to determine its presence and quantity. The empirical construction of TDD understanding and consensus building requires the ability of comparing different definitions, evaluating them with regard to practitioners’ perception, and exploring code information for improvement of automatic assessment.
More on TDD:
AgoneTest: Automated creation and assessment of Unit tests leveraging Large Language Models
Software correctness is crucial, with unit testing playing an indispensable role in the software development lifecycle. However, creating unit tests is time-consuming and costly, underlining the need for automation. Leveraging Large Language Models (LLMs) for unit test generation is a promising solution, but existing studies focus on simple, small-scale scenarios, leaving a gap in understanding LLMs' performance in real-world applications, particularly regarding integration and assessment efficacy at scale. Here, we present AgoneTest, a system focused on automatically generating and evaluating complex class-level test suites. Our contributions include a scalable automated system, a newly developed dataset for rigorous evaluation, and a detailed methodology for test quality assessment.
领英推荐
How and why developers implement OS-specific tests
(1) We find that OS-specific tests are common: 56% of the analyzed Python projects have OS-specific tests and Windows is the most targeted OS. (2) We detect that OS verification happens more frequently in test decorators (65%) than in test code (35%). (3) OS-specific tests target a diversity of code, including file/directory, network, and permission/privilege. (4) Developers may perform multiple operations in OS-specific tests, including calling OS-specific APIs, mocking OS-specific objects, and suspending execution. (5) We find that OS-specific tests are implemented mostly to overcome unavailable external resources, unsupported standard libraries, and flaky tests.
Does Treatment Adherence Impact Experiment Results in TDD?
Context: In software engineering (SE) experiments, the way in which a treatment is applied could affect results. Different interpretations of how to apply the treatment and decisions on treatment adherence could lead to different results when data are analysed. Objective: This paper aims to study whether treatment adherence has an impact on the results of an SE experiment.Method: The experiment used as test case for our research uses Test-Driven Development (TDD) and Incremental Test-LastDevelopment, (ITLD) as treatments. We reported elsewhere the design and results of such an experiment where 24 participants were recruited from industry. Here, we compare experiment results depending on the use of data from adherent participants or data from all the participants irrespective of their adherence to treatments. Results: Only 40% of the participants adhere to both TDD protocol and to the ITLD protocol; 27% never followed TDD; 20% used TDD even in the control group; 13% are defiers (used TDD in ITLD session but not in TDD session). Considering that both TDD and ITLD are less complex than other SE methods, we can hypothesize that more complex SE techniques could get even lower adherence to the treatment. Conclusion: Both TDD and ITLD are applied differently across participants. Training participants could not be enough to ensure a medium to large adherence of experiment participants.Adherence to treatments impacts results and should not be taken for granted in SE experiments.