week 14 - Technical Debt Accumulation, NPM analysis, and API Harvest
by Towfiqu barbhuiya https://unsplash.com/photos/M8z2SwSwpbg

week 14 - Technical Debt Accumulation, NPM analysis, and API Harvest

The Impact of Ownership and Contribution Alignment on Code Technical Debt Accumulation

Context: Software development organisations strive to maintain their effectiveness whilst the complexity of the systems they develop continues to grow. To tackle this challenge, software development organisations tend to be organised into small teams working with components that can be developed, tested, and deployed separately. In this scenario, organisations must design their software architecture and organisational structures in such a way that enables communication and minimises dependencies, as well as helps teams reduce code and architectural degradation. Ensuring that each small, independent team is responsible for the components they primarily contribute is one approach to achieving this goal.

Objective: This article reports a study that aims at understanding the impact of ownership and contribution alignment (contribution alignment, for short) on accumulation of code technical debt (TD) and how abrupt changes in team constellation affect teams’ effectiveness in managing TD.

Method: We have conducted an embedded case study in a software development company developing a very large software system, analysing ten components belonging to one team. During the studied period, the team was split into two, and the components owned by that team were distributed between the two new teams. We have collected archival data from the company’s tools in their daily development operations.

Results: In most cases with high degrees of contribution alignment, we have noticed a negative correlation between contribution alignment and TD per line of code (TD Density) before the team split. In four components, this correlation is statistically significant. This means that a higher contribution alignment degree implies a lower TD Density. After the split, we observe a statistically significant negative correlation in three components. The positive correlation observed in the other five components could potentially be attributed to low contribution alignment, leading to difficulties in managing TD Density.

Conclusion: Our findings suggest that contribution alignment can be important in controlling TD in software development organisations. Making teams responsible for the quality of components they have more expertise over and minimising dependencies between teams can help organisations mitigate the growth of TD.

A Large Scale Analysis of Semantic Versioning in NPM

The NPM package repository contains over two million packages and serves tens of billions of downloads perweek. Nearly every single JavaScript application uses the NPM package manager to install packages from the NPM repository.

NPM relies on a “semantic versioning” (‘semver’) scheme to maintain a healthy ecosystem, where bug-fixes are reliably delivered to downstream packages as quickly as possible, while breaking changes require manual intervention by downstream package maintainers. In order to understand how developers use semver, we build a dataset containing every version of every package on NPM and analyze the flow of updates throughout the ecosystem. We build a time-travelling dependency resolver for NPM, which allows us to determine precisely which versions of each dependency would have been resolved at different times.

We segment our analysis to allow for a direct analysis of securityrelevant updates (those that introduce or patch vulnerabilities) in comparison to the rest of the ecosystem. We find that when developers use semver correctly, critical updates such as security patches can flow quite rapidly to downstream dependencies in the majority of cases (90.09%), but this does not always occur, due to developers’ imperfect use of both semver version constraints and semver version number increments. Our findings have implications for developers and researchers alike. We make our infrastructure and dataset publicly available under an open source license

APIHARVEST: Harvesting API Information from Various Online Sources (https://github.com/soarsmu/APIHarvest)

Using APIs to develop software applications is the norm. APIs help developers to build applications faster as they do not need to reinvent the wheel. It is therefore important for developers to understand the APIs that they plan to use. Developers should also make themselves aware of relevant information updates about APIs. In order to do so, developers need to find and keep track of relevant information about the APIs that they are concerned with. Yet, the API information is scattered across various online sources, which makes it difficult to track by hand. Moreover, identifying content that is related to an API is not trivial. Motivated by these challenges, in this work, we introduce a tool named APIHARVEST that aims to ease the process of finding API information from various online sources. APIHARVEST is built on works that link APIs or libraries to various online sources. It supports finding API information on GitHub repositories, Stack Overflow’s posts, tweets, YouTube videos, and common vulnerability and exposure (CVE) entries; and is extensible to support other sources.

Scented since the beginning: On the diffuseness of test smells in automatically generated test code

Software testing represents a key software engineering practice to ensure source code quality and reliability. To support developers in this activity and reduce testing effort, several automated unit test generation tools have been proposed. Most of these approaches have the main goal of covering as more branches as possible. While these approaches have good performance, little is still known on the maintainability of the test code they produce, i.e.,whether the generated tests have a good code quality and if they do not possibly introduce issues threatening their effectiveness. To bridge this gap, in this paper we study to what extent existing automated test case generation tools produce potentially problematic test code. We consider seven test smells, i.e.,suboptimal design choices applied by programmers during the development of test cases, as measure of code quality of the generated tests, and evaluate their diffuseness in the unit test classes automatically generated by three state-of-the-art tools such as Randoop, JTExpert, and Evosuite. Moreover, we investigate whether there are characteristics of test and production code influencing the generation of smelly tests. Our study shows that all the considered tools tend to generate a high quantity of two specific test smell types, i.e.,Assertion Roulette and Eager Test, which are those that previous studies showed to negatively impact the reliability of production code. We also discover that test size is correlated with the generation of smelly tests. Based on our findings, we argue that more effective automated generation algorithms that explicitly take into account test code quality should be further investigated and devised.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了