Three simple Steps promote rapid Understanding of Software in IT Modernisation

Three simple Steps promote rapid Understanding of Software in IT Modernisation

Software is executable knowledge. And from year to year, the share of business-critical knowledge encoded in software is growing. If companies want to adapt faster and faster to changing markets, then identifying which part of the software has to be adapted becomes more and more critical as an essential activity during software change. This task is called Feature or Concept Location. It always starts with a change request and results in a place in the source code where the change should be implemented. A good example is extracting one or more features from a monolith and rebuild it as a MicroService.

Concept Location is a necessary but expensive task in Software Evolution

Historical data show (1) that the costs of software evolution is up to 90% of the total software costs. And a major part of the overall costs are related to understand and comprehend the system. Because most software systems are so complex that they can no longer be understood as a whole by one person or a team of software developers, Concept Location is a challenging task and has to be executed as an iterative and exploring activity.  

There are several ways to identify hidden knowledge in software system. Ranging from knowledge crunching techniques like EventStorming, Domain Story Telling or User Story Mapping to static and dynamic source code analysis. For Concept Location a smart and efficient way of extracting valuable knowledge is crucial. And three fundamental questions have to be answered:

  1. Where has a new concept or feature to be implemented?
  2. Which existing software artefacts are affected to the change and in addition must be tested?
  3. Who in the development team is responsible for desired the change or should be informed?

First: Apply Artificial Intelligence and Machine Learning to Source Code Analysis.

Researchers have proposed the Software Naturalness hypothesis (2): Programming languages can be understood with the same approaches as natural languages. Following this hypothesis source code could be analysed with Natural Language Processing (NLP) toolbox. We use Topic Models like LDA, and other techniques like Doc2Vec to build semantic models of source code.

Combining NLP with domain knowledge extracted from knowledge crunching sessions leads in guided models, that can be applied effectively to Concept Location tasks. This helps us identifying classes inside packages where features are located.

Second: Use applied Graph Theory to model and visualise the Dependencies of a Software Program

Every software program can be easily represented as a dependency graph. In the first step we identified classes inside packages where features are located. If the software has a high quality design normally a class has only one reason for change (Single Responsibility Principle). Now we want to know what other classes are affected by the change. And this will tell us the dependency graph in which we can locate all other classes dependent on the class with the desired change.

Third: Extract Change History from Source Code Repository

At last we want to know who has done the last changes of all classes we have identified in step 1 and 2. This could be done by extracting the log history of the source code repository.

CONCLUSIONS

Concept Location is a necessary but expensive task in Software Evolution. This can be done much more efficient with combining machine learning and knowledge crunching techniques. With the right toolbox a lot of information can be extracted and learnt from data: the source code as a natural language, the dependencies between code artefacts and the commit history of the source code repository. Visualisation techniques with dependency graphs, domain model clouds, interactive topic models and business capability mappings show relevant insights and promote quickly understanding.

References

(1) Tom Mens: History and Challenges of Software Evolution in Software Evolution, Springer-Verlag Berlin Heidelberg 2008

(2) Abram Hindle, Earl Barr, Mark Gabel, Zhendong Su, Prem Devanbu: On the Naturalness Of Software, https://people.inf.ethz.ch/suz/publications/natural.pdf

要查看或添加评论,请登录

Dominik Neumann的更多文章

社区洞察

其他会员也浏览了