How can we manage a software that is becoming more and more complex?
In previous articles we talked about the design principles that govern the relationships between classes. Today we will shift our attention to a larger scale issues, we will look for principles to govern the relationships between components, understood as large groups of classes. But before defining what a component is, let's look for a good metric to define the software complexity, because it is from the need to manage the growing software complexity that the "Component-Based Software Engineering" has developed (.1) .
If we adopt a programming paradigm (we have adopted the object-oriented paradigm) then to measure the software complexity system we can take the number of lines of source code (LOC). It is a very simple metric but we will only make qualitative considerations on the software evolution complexity and therefore the LOC metric are fine. If you want to see more rigorous complexity measurement metrics (cyclomatic complexity, methods by class, cohesion, coupling, function points etc..) refer to the book reported in the bibliography (1.).
But to understand where the need for component design comes from, which has also become a branch of software engineering called "Component-Based Software Engineering", we need to start talking about the evolution of software complexity over the years.
And here I am lucky because I can use my work experience as a clear example. For the past 25 years I have been working on the software development of the cardiovascular training equipment. We can therefore analyze how the software complexity of a cardiovascular equipment such as the tapis-roulant has evolved over the last 25 years.
The graph in figure 1 shows us software complexity evolution of the treadmill over the last twenty years.
fig 1 : tapis-roulant - sw complexity evolution.
As we established, I considered the number of lines of source code (LOC) as an indicator of software complexity, including the application code and all those parts of the operating system that we had to modify. The first software version we made in the year 2000 counted 15K of LOC and managed all the basic functions of a tapis-roulant such as the management of the motor drive, the management of DOT-Matrix for messaging, some displays 7 segments , a "membrane keyboard", an emergency device and a cardioreceiver. There were also about twenty training programs and a connection to a portable memory where the loads to be set were read (belt speed and slope) and where the training results were saved. The application was developed with Object Oriented paradigm using C ++ as a programming language, there were few parts that were written in assembly language. The operating system used was NUCLEUS and the microprocessor was the Arm7 from ATMEL. Compared to the software of those who had preceded my team, my team has reduced the lines of source code by half, the previous version in fact counted over 30K LOC of C code and assembler. Therefore, thanks to the object-oriented paradigm and thanks to a good architecture, we were able to decrease the complexity of the software and also to obtain a software that is more open to future evolutions.
After two years, in 2002, following the evolution of the hardware platform (the Intel 386SX microprocessor and also a graphic display and a Touch-Screen), we had to adapt the software to the new operating system ("OnTime") and to the new microprocessor and we had to develop a new Graphic User Interface module. The LOCs have increased to 25K, of which about 30% are dedicated to the management of the user interface.
After another 2 years (in 2005) an analog TV tuner and other training features were added, the hardware and software platform did not change but the LOCs increased to 45K.
Then in 2007 the analogue tuner was replaced with a new analogue-digital tuner, moreover about twenty medical and military tests were added for the evaluation of physical performance. LOCs increased to 75K.
In 2009, internet connectivity was added and therefore the Browser, YouTube and IP-TV. We had to change the hardware platform (EDEN C5 from VIA Technologies) and the operating system (Windows CE version 3.0). LOCs increased to 150K.
Those were the years of flash technology for websites. Windows CE did not support flash technology and therefore our software did not display websites developed with flash technology well. We had made several attempts to develop our own plugin to manage navigation on flash sites, but unfortunately the results have not been exceptional. So we decided to replace Windows CE with a new operating system that was fully flash compliance. We built our custom Linux distribution and adapted our software to it. It was 2011 and the LOCs had increased to 300K. In LOC calculation I have also considered all the lines of operating system code that we have had to develop or modify.
Later on the era of smartphone stores arrived (Apple Store and Play Store) and this made it clear that we had to go there if we wanted to be open to the new business case models offered by these new technologies. So in 2013, we finally have the first version of tapis-roulant with Android. Obviously Android was the worst of all operating systems to manage an equipment with motor-drives because this need for real-time software that Android does not allow. So we had to move all the real-time software part to a new dedicated microcontroller (where we adapted the "FreeRtos" operating system). On the main microprocessor (NVIDIA's Tegra2) we have developed our custom AOSP (Android Open Software Platform) distribution of Android 4.02. Main microprocessor and microcontroller exchanged information through a USB channel. Total LOCs of Java and C ++ code increased to 600K.
Then the management asked us to adapt our platform to the customization of play store applications such as youtube or facebook. Basically we had to add multi-user to the Android operating system which was born strictly single user (it is no coincidence that Android was developed for personal devices). Too bad the Android developers had used Linux's multi-user functionality to manage the isolation and security of the application "sandbox". Basically the users in Android are the applications themselves. So we had to start the adventure of rewriting a large part of the operating system (the "Zygote", the "ActivityManagerService" and part of the "PackageManagerService"). We also got a nice patent on this feature. After a couple of years, in 2015 we were ready with the new software, we had also advanced the version of our Android AOSP distribution to version 5.1. In the meantime, the microprocessor (Tegra3) had also changed. This new processor allowed us to develop some advanced graphics applications using Unity3D technology such as "training races" (races with other connected competitors made on virual paths). LOCs had increased to 800k.
Over time other software versions were created where new functions and applications for both entertainment and training had been added.
In 2020 the hardware platform becomes IMX8 of NXP and the Android AOSP version becomes 9, the tapis-roulant software reaches one million of LOC.
As we can see in figure 1, in this study case, the complexity of software growth geometrically with size, it increases by a factor 10 every decade.
In my experience I can tell you that from 500k LOC upwards it becomes very difficult to manage the complexity of the software, considering a development team of less than ten people, an additional solution over the object oriented design becomes necessary.
There is also another aspect to consider, the failure probability of a too much complex software project.
The graph in fig 2 is related to an IBM research. It shows the probability of failure of a software project as a function of its complexity. This research uses "Function Points" as a metric of complexity instead of LOCs. Function points measure the size of an application system based on the functional view of the system, it is language independent so complexity in different programming languages can be compared. (.2). If you are interested in a detailed quantitative analysis about software complexity and failure probability of software projects, refer to the article by Fenton-Ohlssen (.3).
fig 2 : microprocessor evolution
As we can see from the graph of figure 2, as the complexity of the software system increases, the probability that the project fails increases, and we can also see that there is a maximum complexity beyond which 100% of all systems fail. There may be a limit to how large our software systems can be, but our civilization needs increasingly complex software systems and programs, so we just need to find some other mechanism that helps us reduce complexity so that we can to have ever larger software systems.
Where can we find the solution? What is the industrial sector that has solved the problem of complexity for many years. Look at the image below:
fig 3 : microprocessor evolution
Microelectronic silicon computer "chips" have grown in capability from a single transistor in the 1950s to hundreds of millions of transistors per chip on today's microprocessor and memory devices. So the model to which we must be inspired is that of the hardware industry. Computers are built as a modular structure that uses standard components at various level.
To find a solution to the growing complexity in the software development industry, we must start by matching the levels of the modules in the hardware industry with the elements of software development. See the table below
HARDWARE???????????????????????SOFTWARE
1.?Gate?????????????????????????????????? statements of Language (if, for, while,. ..)
2.?Block??????????????????????????????????Function / Class
3.?Chip???????????????????????????????????Component
4. Card???????????????????????????????????Process, Application??
The first 2 levels are implemented by programming languages. In the 3rd level we find what are commonly referred to as SW components.
Well, we can now give a first definition of software component deriving from that of hardware component : A component is a "SW integrated circuit" that communicates with the outside through a series of "pins".
An application capable of incorporating components is called a "container" and is the SW equivalent of an electronic board. We have 3 types of pins:
Properties: “Status pins”, variables that allow you to act in a protected manner on the internal status.
Methods: "Input pins", commands that cause the execution of actions.
Events: "Output pins", cause the execution of methods in the container following something that occurs in the component (callback).
So the model has based on these 3 concepts : Property, Event, Method. We can think of using an acronym for this model : PEM (Property, Event, Method).
The idea of "software integrated circuits" is justified through the enormous advances in hardware and its relative absence of bugs when compared to software. The idea of IC software corresponds to a reuse model, which has seen several technologies over the years. From standard function libraries, to dll and to COM components, to jar packet everything revolves around the concept of “Black Box” reuse. As long as the implementation of the functions respects a specification, we can change and compose the components together without problems.
Well, now that we have understood that it was the hardware industry that inspired “Component-Based Software Engineering”, we can give some definitions of a software component:
The first definition is the official one formulated by the OMG (Object Management Group 2000) (https://www.omg.org/) : "A component is a modular, deployable, and replaceable part of a system that encapsulates implementation and exposes a set of interfaces".
Another definition more oriented to the design of the component is that provided by Booch: "A component is a logically cohesive, loosely coupled module"(.4).
The last definition we see is that of Martin, it is a definition more linked to the management aspects of the component: "Components are the units of deployment. They are the smallest entities that can be deployed as part of a system. In Java, they are jar files. In .Net, they are DLLs. In compiled languages, they are aggregations of binary files. In interpreted languages, they are aggregations of source files. In all languages, they are the granule of deployment".
Components can be linked together into a single executable. Or they can be independently deployed as separate dynamically loaded plugins, such as .jar or .dll or .exe files. Regardless of how they are eventually deployed, well designed components always retain the ability to be independently deployable and, therefore, independently developable.
Well, now that we have understood what a software component is, before we begin to analyze component principles (principles first formalized by Robert Martin (4.)), I would like to make a few more considerations about the complexity of software. In the case study example with which we started the article I told you about the software evolution of "tapis-roulant" over two decades of time, I considered the source code lines as a metric of complexity, that is, the size of the software system.
This was fine because we only did a qualitative analysis, but there is another very important aspect to consider, the degree of coupling between module that can dramatically increase the complexity of software systems. Coupling is the term used to describe the dependence of one software module upon another. When coupling is high, there are many dependencies between the modules.
As we can see from figure 4) seven modules can generate up to 7 * 6 dependencies between them, and in general a system with N totally connected modules can have N*(N-1) dependencies -> O(n2). So with a bad architecture, or with a total lack of architecture, the coupling can grow with the number of components as a quadratic progression (graphic in figure 4).
领英推荐
fig 4 : - to the left a system with a fully coupled architecture , to the right the coupling impact of change.
The impact of making a change to a module is a function of its coupling, in fact every module that depends upon the changed module must be inspected, compiled, tested and redeployed. This causes a decrease of productivity. As the application size grows, more and more effort is applied to dealing with coupled modules, at the knee of the curve more effort is applied to coupling than adding features, so the effort expended to add new features drives the complexity upwards and drives the productivity downwards (fig 5).
fig 5 : decrease of productivity.
Many of us worked on a project that was fine at the beginning, we all programmed like lightning and created in a very short time the functions that the management had asked from us. But then a year later everything had changed, every time we touched the code it broke, it was like walking in the mud with a lead ball in our foot. Our productivity curve fell dramatically and we took a long time to do something.
And what does management do in this situation, they hire more programmers to make us go faster and increase productivity. They think that now that we have more people it will increase our productivity again, but there is a problem, new people don't understand the design, so they do more trouble than us, and the situation is getting worse.
What can we do to reverse this trend and increase productivity again?
We could break the software into individual components, that can be independently built, tested, and deployed, increases. However, componentization depends critically upon being able to create subsystems that are independent. Until as coupling is high, and dependency cycles are many, componentization will not be feasible. Let's look at figure 6.
fig 6 : maximally coupled components. fig 7 : minimally coupled components.
this is the structure of a seven-module system and the number of couplings is six unidirectional couplings this is the minimum number of couplings we can have in a seven-module system, the maximum number of couplings is instead ~49 (7*6), so the right solution has 6 couplings while the left solution has 42. We can give our systems an appearance similar to that of figure 6 or to that of figure 7. Obviously we would like to have our systems similar to the solution of figure 6.
fig 8 : Faulty Architecture : Dependency Cycles
But we must be very careful, the danger is that a system that looks like the one in figure 8 can hide a great danger behind the scenes. Let's see what happens if an inexperienced programmer starts working on this system. This programmer needs to call from module C4 a function of module C2 and then calls it, so he creates a dependency from C4 to C2. He honestly thinks, "ok I added a coupling", what do you want it to be !!.
But that's not true, he didn't add only an unidirectional dependence (the red arrow between C4 and C2 in figure 8), he added many more.
Now C4 depends on C2, but since C2 depends on C1 and C3, also C4 will depend on C1 and C3, similarly since C1 depends on C7 and C6, transitively also C4 will depend on C7 and C6, moreover since C3 depends on C5, also C4 will depend on C5.
And that's not all, in fact C3 which depended on C4, since now C4 depends on all the others, also C3 transitively depends on all the others.
So adding a stupid unidirectional dependence has created a loop in the dependency graph and created 14 transitive dependencies, and we can't even see them when we look at this diagram but they are there.
So we have to be very careful about "Dependency Cycles" and "Faulty Architecture".
A single cyclic dependency in an otherwise acyclic structure can dramatically increase coupling. Such dependencies have a tendency to appear over time as the system is being maintained and enhanced.
The solution is to create a software architecture with well managed interdependencies.
Figure 8 shows the cost of Modularity.
fig 8 : modularity & Software Cost.
the effort (cost) to develop an individual software module does decrease as the total number of modules increases. Given the same set of requirements, more modules means smaller individual size. However, as the number of modules grows, the effort (cost) associated with integrating the modules also grows.
If we divide the problem into a large quantity of components providing small functionalities, it will increase both the cost of integration and the interaction effort. Not only the cost but also the number of interactions, the coding complexity, testing effort and number of duplicate test cases will also increase. If an application is componentized with fewer components each providing a number of functionalities, it will cost in terms of testing as well as maintenance. It is desirable to achieve a minimum cost region so that cost and effort can be balanced against the number of components ("Region of minimum cost" in figure 8).
I would end these first considerations on the componentization of software systems with a sentence by Grady Booch (5.):
"The class is a necessary but insufficient vehicle for decomposition , large systems of objects and classes would be overwhelming, a way must exist to deal with groups of classes, Otherwise, it’s almost like building a sand castle from individual grains of sand"
Fortunately, great computer scientists such Booch, Martin, Fowler, etc.. have identified some principles and rules that help us to keep the dependencies between components under control.
In the next article we will see principles and rules about cohesion and coupling between components.
I remind you of one of my previous articles that explains the aspects of cohesion and coupling between classes, I suggest you to read it before the next one.
thanks for reading my article, and I hope you have found the topic useful,
Feel free to leave any feedback
your feedback is appreciated
?
?
Stefano
References:
1.Tiwari, Kumar,?“Component-Based Software Engineering”–Chapman & Hall Book (2021)
2.??Ian Sommerville, “Software Engineering” Addison-Wesley (8° edition 2007) - p 592-594.
3.?Fenton-Ohlsson, “Quantitative Analysis of Faults and Failures in a Complex Software System” Centre for Software Reliability City University Northampton Square, London (https://www.eecs.qmul.ac.uk/~norman/papers/FentonOhlsson105280_final.pdf).
4.?Robert C. Martin, “Clean Architecture - a craftsman's guide to software structure and design” Prentice-Hall (November 2018) p 111-118.
5. Grady Booch “Object-Oriented Analysis & Design with Applications”- Addison Wesley
Software Engineer
2 年Interesting! Thx.