Code review and automated tests - What developers think about?
Matheus S. Moreira
Desenvolvedor iOS na Stefanini | Graduando em Ciência da Computa??o
In the context of software development, it is expected that, nowadays, a mature development is based on well-defined, replicable and auditable processes. For quality assurance and control purposes, it has become common practice to apply Modern Code Review (MCR) between teams. Together with review processes, various types of software tests are also applied, which similarly play a key role in minimizing defects and maximizing the quality that software and associated artifacts can have. Due to such relevance, acquiring notions about code reviews, quality and tests is necessary.
Here, I am bringing the analysis of two articles that aimed to acquire answers to questions generated in the context of the MCR. One explores the code review process itself, while another pays special attention to automated test reviews. They study, in general, the perception developers have about the exercise of their crafts: how they perceive and define quality, what they consider during the review, and what practical issues and obstacles may affect them.
1. Code review - An overview
Given that reviewing code before integrating it into the main branch of a versioned project has become routine, it is natural that the quality of this step is related to the quality of the software itself, after all, this is when bugs can be found before crashing in the hands of the user or improvements and conventions are suggested to keep the codebase consistent. It turns out that, due to several factors, technical and non-technical, this stage of development can be compromised and informalized, causing the opposite of what is expected from a good review to be obtained. So, it's worth asking:
What empirical evidence do we have to know which factors contribute to the quality of a review?
In a survey made with Mozilla developers, it was asked how they perform this task. The most relevant point raised concerns the workload: while most write corrections and also review code, there is a dedicated group responsible for reviewing code changes, that is, providing an auxiliary review on specific changes derived from their unique expertise and knowledge. The existence of such a group is motivated by the fact that, when correlating workload and writing corrections, it was seen that those with high workloads tend to concentrate their efforts on a single task, that is, correction or review.
When developers are asked what factors they consider to be influential on the review time and approval decision, we have more details.
When it comes to time, factors related to the size of the submitted changes (such as the extension of the correction, number of modified files and number of code blocks involved) are the most relevant. The second category concerns the experience of those involved, that is, the reviewer and who submits the changes to be reviewed - the greater the experience, the faster the review. This finding is in agreement with previous research.
About the decision making, on the other hand, we have a slightly different scenario. Among the factors, the two most relevant are the experience of those involved (in the same way as with the question of time) and the nature of the bug, that is, its priority and its degree of severity. Here, factors about the size of the changes lose importance. As for the workload of a developer, in this research, it was observed that it does not affect they so much, which contradicts some previous research and signals a particularity. Code quality and test presence factors, in turn, were highly influential on the decision. The social nature was also notable, as one of the interviewees commented:
"If it's someone you trust, you don't have to check things so rigorously."
When it comes to the characteristics of a good review, the following was revealed: there is an emphasis on the clarity and thoroughness of the feedbacks, especially as they can contain constructive and valuable advice, serving as a form of mentorship. Of the human and personal attributes, those who are punctual in their communications, encouraging (especially during a rejection), and who know how to express appreciation for a contribution are considered good reviewers.
Regarding factors that affect the quality of a review, most agree that the reviewer's experience and the technical properties of the revised correction are strong indicators for ensuring quality. Most also agree that human and personal factors such as the reviewer's workload and participation in change discussions are also considered, as well as mood, communication style, and stress levels.
Finally, a survey was made about the challenges that developers face when performing the review task.
Among the technical challenges, what proved to be the greatest was the gain in familiarity with the code, which makes sense, since reviewers often read code they didn't write - hence, assessing whether a change is good or not can be difficult. In this context, many ask themselves if they are really able to carry out the review or if, in fact, they should delegate this task to others. Dealing with the complexity of a patch and getting support from available tools are other aspects mentioned.
Among the personal ones: managing time, determining priorities and overcoming procrastination were the most mentioned issues. In addition to these, we also have: keeping the technical knowledge up to date and dealing with context changes, that is, dealing with multiple tasks simultaneously.
2. Testing is also programming
Once code review has established itself as a routine practice, it is worth noting what exactly we are referring to as code. An existing complaint about this is that most of the literature either focuses on production code or does not explicitly differentiate it from test code. Thus, it is also convenient to analyze the particularities of the test code when it comes to review.
领英推荐
Are the bugs that occur in testing more or as significant in number as the bugs in production? In a preliminary analysis carried out on repositories of the open source projects Eclipse, Qt and Openstack, to determine if test and production files are equally associated with the occurrence of defects, another research, applying machine learning for attribute ranking, observed that the variable is test was the least significant among those selected in the relationship between file type and defects, that is: test files are not less likely of having defects. However, the same survey also noted that, in terms of review, this is not exactly taken seriously in practice.
In an interview with 12 developers, from the same mentioned open source projects but also from the industry, we have some interesting results.
When it comes to the rigor to exercise a review on test files, it was seen that these are almost twice less likely to be discussed if reviewed alongside production files - specifically, 29% of files contained at least one comment, versus 48% on reviews without production files submitted together. Despite this, the difference turned out to be small in terms of the number and length of comments and the number of reviewers involved.
During the execution of these reviews, it was seen that the practice is similar to that of writing new code: some prefer to start with production, others with tests.
The most relevant point about the practice of reviewing the tests itself concerns the context of the test and its relationship with the review tool used. When reviewing, it is only possible to view the changed files, while there is a desire (at least for part of the respondents) to easily switch between test code and associated production code and check other unchanged test cases in the submitted patch. Because of this, many use other tools, such as a local IDE.
It turns out that production code and test code are substantially different, therefore, acquiring the full context of the tests with the adequate support of the available tools is crucial for an effective review practice. Navigating between the two types of code is one wish, but one could also mention viewing which paths are covered by which test suites as another.
Unfortunately, it has been seen that the average developer prefers saving time over software quality as they don't see immediate value in well-tested software. One of the interviewees claims:
Developers are not rewarded for writing good code, but for delivering the functionality the customer wants.
3. What can we conclude
And you, did you identify yourself with what was reported here? Is the reality of the place where you work very different? Don't forget to contribute to this discussion too!
References
Code review quality: how developers see it
Oleksii Kononenko, Olga Baysal, and Michael W. Godfrey. 2016. Code review quality: how developers see it. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). Association for Computing Machinery, New York, NY, USA, 1028–1038.?https://doi.org/10.1145/2884781.2884840
When testing meets code review: why and how developers review tests
Davide Spadini, Maurício Aniche, Margaret-Anne Storey, Magiel Bruntink, and Alberto Bacchelli. 2018. When testing meets code review: why and how developers review tests. In Proceedings of the 40th International Conference on Software Engineering (ICSE '18). Association for Computing Machinery, New York, NY, USA, 677–687.?https://doi.org/10.1145/3180155.3180192