Design, Endurance and Entropy
Life of Software Products
Physics of software
A few years back, I was reading an article by Lex Sisney. He talks about physics and its relationship with any organisation and calls it Organisational Physics. It starts by saying that, while a rocket goes very fast, a parachute will drag and slow the object which is in motion. Why so? Because they are designed for that purpose. There is also a saying in architecture and design that – form follows a function, i.e. design of something should serve its purpose! Period.
While software is absolutely not physical (tangible), still many of the physics principles apply to it very well. This has kept me thinking since for a while and thoughts started emerging on the design of a software, its non-functional aspects, and associated life-cycle. What should be a software designed for? Especially mass market product software, with respect to non-functional aspects?
The design & architecture
Now, when we talk about the design or architecture of software, there are a lot of definitions and these words are always used interchangeably or significantly mean different in different contexts. Let me put a definition to this before I move forward using these terms.
Design
Defining the purpose and objectives of something being built i.e. define what is being built, and why is it being built or the need of the same, what are the non-functional (apart from the obvious functional aspects) aspects that it should be serving and so on.
Architecture
An architecture defines various structural elements of something being built, and ensure that each structural blocks work with each other very well (conceptual integrity) to serve the purpose and objectives as defined by design. It also ensures various non-functional aspects are served well. While these non-functional aspects are intangible they are the most important aspects that an architecture should focus. It’s about making fundamental structural choices which are costly to change once implemented, hence have durability. It’s about taking decisions on an approach against multiple approaches possible with right trade-offs.
My definition above definitely puts the Design above the Architecture. While things are used interchangeably, like Code Design or a System Architecture, I do not want to introduce any new controversy here. In my term, Design focus on the whole system and primarily on the What, Why (and an abstract part of How to arrive at key concepts), Architecture focuses on detailing the How keeping in the mind of defined What and Why as defined by Design.
A thumb rule is, Higher the level of abstraction of software (e.g. application software) are designed with more focus on functional objectives, while, lower the level of abstraction (e.g. technology stack, operating systems) should be designed with more focus on the non-functional aspects. While both aspects are equally important, the ratio may not be same for softwares with a different layer of abstraction. Our Tally ERP application is an example towards application software, the Technology Platform is an example of the technology stack or system software. Hence, their design focus must be in the appropriate ratio.
The capabilities provided by the platform or technology can still be treated as ‘functional’ from a technology delivery perspective, while various non-functional characteristics, which are intangible, difficult to perceive are scale, volume, performance etc. which still end-user gets benefited indirectly.
Another important aspect to understand is that lower the layer of abstraction more fundamental problem a software solves, longer the period they will sustain (Endurance). As an example, x86 instruction set architecture for microprocessors have been there for 30+ years and still going strong. Windows / Linux OS kernel has been there for decades. Higher the level of abstraction which solves a specific problem typically has a shorter life-cycle (E.g. Mobile apps that come and go). While this is again a thumb rule and the ratio will vary depending on complexity, nature and domain for which the software is being built. Hence the lower level of abstraction, Endurance becomes an important non-functional aspect to be considered.
Conceptual integrity (coherency)
Wiki Definition: It is the principle that anywhere you look in your system, you can tell that the design is part of the same overall design. This includes low-level issues such as formatting and identifier naming, but also issues such as how modules and classes are designed, elegance, how well various structural blocks fit with each other.
And my favourite example, Say when you pour a cup of water into a jug of water, it is impossible later to distinguish between which part of water was earlier in the jug and which part of water came from the cup. It just becomes water with higher mass! That’s what I call changes should gel with existing system from all perspectives or be coherent.
Endurance
Belady and Lehman on software evolution dynamics say
Software keeps on functionally evolving but structurally deteriorating, which eventually terminates the life of the software.
Hence preserving design structure and reducing complexity should be given high priority in the maintenance phase. Causes of software short life-cycle are not only restricted to structural deterioration. Change of hardware, operating systems and other surrounding environments can be a fatal cause which the software or the vendor could not adapt quickly or not envisioned during the initial architectural phase.
Also, to be noted that longevity of software is not necessarily an essential property of all software. While it depends on multiple aspects like – Specificity / Generality of the solution, diversity, size of customers it serves, etc.
The longevity becomes more and more essential property of a software product typically if -
- Lower the abstraction of software
- More fundamental problem it solves / generality of solution
- Higher the diversity or size of customer base
Belady and Lehman also put their 8 laws of software evolution. I will highlight few of them with respect to the endurance aspect that we are talking about -
Continuing Change - System must be continually adapted or it becomes progressively less satisfactory
Increasing Complexity - As system evolves, its complexity increases unless work is done to maintain or reduce it
Conservation of Familiarity - As the system evolves, all associated with it, developers, sales personnel and users, for example, must maintain mastery of its content and behaviour to achieve satisfactory evolution. Excessive growth diminishes that mastery. Hence the average incremental growth remains invariant as the system evolves
Declining Quality - The quality of a system will appear to be declining unless it is rigorously maintained and adapted to operational environment changes
While all softwares are subjected to mortality, it is about how long they live. Software born, live, die, and reincarnate. Surprised? - A paper on the survey Software Lifetime and its Evolution Process over Generations talks about this. It says, Software evolution process does not end at the death of an individual software system but usually continues its evolution over generations through being replaced by newly built software. Now it’s the software vendors’ choice to ensure this newly built software is not by their competitors!
This aspect of software quality - Endurance, which I call non-functional of non-functional. This drives internal quality and sustainability of the software. Endurance adds long-term value to all the stakeholders of the software like customers, the organisation that built software, its ecosystem.
Designed for Endurance – Built to last
In my village Narasimharajapura where I born and brought up, there is a Tadsa bridge built during Sir M Vishveshwaraiah reign. The bridge which facilitated a railway track and road was submerged 60 years ago into the backwater of Bhadra Dam which was constructed later. Recently 3 years back, due to drought, the submerged bridge was visible and accessible and it was found to be perfectly strong and could take a load of new generation vehicles even after being submerged for 60 years! This brings to the point of endurance. Was that bridge designed for endurance? Yes-No-May-be-so? I say a strong YES. It cannot be an accidental outcome; it must have been designed for that objective. So it served its design purpose for 60 years!
Coming back to software products, the life-cycle of building a software, getting first few customers for them, growing customer base to a large scale, sustain those customers for long-long years being loyal to the software and keep using it! Every stage starts becoming more and more challenging than the previous one. The design and architecture of the software must be thought very well and created as future proof. The design & architecture must cater to the ever-changing world and still make software sustain!
One of the important things to understand from Belad and Lehman theory is that While the system should continuously evolve to change and adapt, as the system evolves its complexity increases. They functionally evolve, but structurally deteriorate! This brings us to a classical deadlock! While it is possible to ensure that we maintain the structure of the software, we need to understand the science behind it! Which brings us an important concept - Entropy!
Entropy (Disorder)
Now getting to thermodynamics, the second law of entropy demands that the entropy of an isolated system cannot decrease. Which is also related to a software and called software-entropy states that a closed system’s disorder cannot be reduced It can increase or remain same. As is modified, the disorder will increase by default unless work is done to control the same.
The entropy of a software typical increase is due to multiple factors as below
Designed for endurance vs. surviving with a software in a life-support-system
E.g. Initial software design phases, due to early go to market strategy or to get first-mover advantage etc., enough time/effort may not have been put into software design to consider various aspect. As the number of customer increases, these aspects will not allow getting more customers!
Accumulation of technical debts (slow poison for accelerating entropy)
Certain refactors are postponed to a post-release activity due to release pressure, but it is never proactively taken up later. Any such debts will typically have the compound interest of entropy to be paid with more and more it gets delayed. Debts may not only be ‘technical’ in nature. If customers are continuously raising defects on your product beyond your ability to solve them, your customers are becoming your liabilities, which will start having customer demonetisation impact on your organisation
External factors changing which software no more adapts to
Changes that can happen in the landscapes – the market, customer behaviours, technologies, deployment environment, people who use, sell, service, maintain and enhance the software, generational changes and so on. If the software designed no more adapts to these changing environment the entropy sets in and slowly starts degrading the adaption of software
Bug-fixes start hiding a structural (architectural) disorders or flaws
Nothing can be more disastrous than this! When a bug fixing is done, it is said that a fix can bring the system up dramatically or bring the system down dramatically. All depends on how a bug-fix was done. Typically, it requires a lot of analysis to understand the structure, the defect, keep debugging and analysing it until a “Root Cause Analysis" is done. This root cause must be fixed in alignment ensuring the architectural or conceptual integrity of the system. If defect fixed comes back again or opens up more defects or opens up earlier defects are an indication of a serious increase in entropy of the system.
Many defects also, indicates flaws in the fundamental design/architecture. These Cannot be fixed. The software must be re-designed or the design must be modified accordingly and re-engineered to achieve a significant shift in the reduction of entropy.
Unhealthy Ratio of feature addition and restructuring or refactoring
When certain modules or code looks structurally deteriorated, quick fixing instead of replacing or refactoring the modules due to release pressures. Typical examples include:
- Refactoring is always postponed for an after-release exercise. But never realised.
- Programs written in proof-of-concept / “just for demo” mindset and production work is mounted on that
- Organisation culture defines the term “known-issues” formally
- Prioritisation of a brand new feature against an issue invalidating a feature or incomplete features earlier released
- Fresher’s first department is issue-fixing, or you have a dedicated team for issue fixing! (I will talk a little bit more on this later)
Ever increasing addition of features
- The 80/20 principle says 80% of your customer uses only 20% of your product’s features and we still build feature over feature due to continuous business/market pressure. Leaving less / no bandwidth to maintain an already developed feature. With added features, the complexity of software structure increases and which increases deterioration exponentially.
- While a system is designed for a purpose, no system can ever sustain infinite requirements. A feature which is architecturally not feasible, if implemented by forcibly fitting to the existing structure, the structure will start deteriorating. While it is important to add new features to the product, which allows opening up a new market segment for the organisation to grow, these will have to be done with uttermost diligence to keep the conceptual integrity of the architecture.
Symptoms of entropy
How do you know if a software is hit with entropy or deteriorating? Ask yourself following questions. If the answers to these questions are likely yes – it’s a clear indication of entropy!
- Are too many features being added to the software and is organisation feature focused?
- Is it becoming more and more difficult for people to understand the structure of the system?
- Does software not have enough telemetry to understand how the features released are being used? What is the usage trend of features in the market? and the released features are forgotten by all stakeholders in the continuous feature marathon?
- Is everyone in product organisation knows the product in-depth? Is it encouraged? Most importantly, is the collective knowledge of the organization about the product, its design and architecture totals to less than cent percent?
- When you read a code, do you feel there are more unspoken words which you cannot make out? Does "code reflect its form" and "its form reflect it its function" or code reflect function directly?
- Is the number of defects in the software is an increasing trend?
- Does fixing any defects likely introduces more defects? Especially there is a significant time after release when these defects are reported?
- When a software document or the code is read; does it look like an assembly of multiple discrete thoughts with less or no coherence and difficult to drive a concept out of it?
The solution
Whether it is the structure of software, people’s mindset, skills & attitudes, rigour of quality processes, all need to adapt. Below I talk about these aspects and what change can help improve:
Understanding the purpose, properties, structure, flow of an existing system, before making changes.
Few approaches that never work are
- Luck is typically not always be in favour of you
- Trial and error method is a sure disastrous
Last time, my Vento car refused to start at my office basement parking. Few drivers came to help. There are only two solutions (that typically we use), if it is a scooter, turn it 45 degrees and try starting again and if it is a car, push/jerk start the car J. Hence push start was attempted and after few iterations when push start did not respond they gave up.
Next day, I called the service engineer, when he checked, the problem was with the battery drained, but even a jump-start with another battery did not respond! After towing the vehicle to the service centre, and checking, I was told that the Vento is not designed to push/jerk start! Since it was force push started, too much of fuel flowed into the engine and blocked, which did not allow even a jump start to happen!
Basically, either understand the engine which you dealing with or approach an engineer who understands them.
Here is another example. You buy a new car, you have been driving it and it's all perfect and German engineering. Now, you want to add music system or camera. Say a semi-skilled / no-skilled-self-learnt mechanic comes to fit those accessories; He enters your car with a screwdriver & cutter in his hand, opens screws, cuts and skins multiple wires to find out what works (a classic trial-and-error-method) to find flow of current with a light bulb connected to the ground on one side and skinned wire on the other side! Finally, he joins some wires and puts a cellophane tape (bandaged!)
You will easily understand that now the entropy of your car’s electric circuits has not only increased, your heart’s too :)
It requires knowledge and skilled technician who can use things like connectors to join wires (it's called dressing by the way :)), help you select a music system which is in alignment with the technical specification of your car (battery capacity, amps, etc.). This will allow the entropy to be retained if not decreased.
Follow the 50:30:20 – Enhance, Maintain & Refactor, Document & Knowledge Share
A software can be given a ratio of 50% time for new enhancements and 30 % time for clearing technical debts (like defects) and continuous refactoring process and 20% for documentation / continuous knowledge sharing. This ratio may vary depending on the size and complexity of software, team size etc. Remember, the only way to keep your house hygienic is to keep cleaning and to keep your body’s entropy is keep exercising. Similarly, for software it is refactor-refactor-refactor.
The last aspect the documentation and continuous knowledge sharing are also very important. Else over-time your software will have certain blind-spot areas, which no-one knows over time. Nothing can be more disastrous than this.
Design the change (design, structure, properties, flow etc.) so that it gels with the existing system in all possible ways
Any change starts with ‘technical feasibility’ which checks if this can be done in its current structure, or what part of the current structure needs to evolve for these change to keep the conceptual integrity.
Redesign/refactor the existing structure and plan for required testing to ensure the sanity of the system. Note that, when refactoring always do it in small iterations unless you know the system fairly and quite experienced and have seen success in doing mega refactors. Keep asking why something is done in a particular way. Nothing would have been a random decision, but it would be a choice that someone has taken at some point in time.
I have talked about this more in my next article “Software Entropy - Part II: Software Reverse Engineering and Re-Engineering”
The multiplicity of software – For anything, there will be ‘N’ approaches. Each approach will again have ‘N’ sub-approaches and so on. The road keeps on opening to multiplicity. Look at what would be the best choice, write down and evaluate pros/cons of each approach and take a call based on the key architectural principles, assumptions and constraints laid out.
Do not overfit any features or capabilities for which the system is not designed for – change the form to follow the new function
This is another most important aspect. With ever increasing business pressure, various features are demanded by the market and they will have a high-to-critical priority for development. Time will always never be sufficient.
It is very tempting in these times to mount things on top of an existing architecture, without studying or understanding fitment enough. It dramatically increases the entropy and can even lead to a disaster of the feature as well as the software architecture and lead to significant time/effort later in stabilising it and bring market confidence.
Modular systems vs. monolithic systems
Generalised code, lesser branching, higher probability of execution lead to lesser complexity. Complexity increase If
- The software has a diverse set of features which rarely gets used
- A highly specialised code is written for each of the features. The software has a property that, more it gets used, more it becomes robust. More it is exercised, more it becomes it will endure.
Monolithic systems give a lot of benefits over various non-functional characteristics, it is highly resistant to change. Also, note that it is a lot more complex and time-consuming to design & maintain such systems.
While in Modular systems as individual pieces can be designed to for high endurance and other non-functional aspects, when they are assembled together, the endurance decreases due to various reasons - primarily being that individual pieces have the ability to change or improve independently which leads to a decline in conceptual integrity unless strict interfacing is maintained and constraints are defined.
It is not that one is better over other, but about a right balance of both being applied in the design and architecture can help greatly.
A Thumb rule to use:
- Lower the layer of abstraction, monolithic design helps and higher the level of abstraction modularization helps
- Rarely changed modules like ‘core frameworks’ (e.g. kernel of an OS) can be monolithic and modular design built forever changing feature requirements.
People – Skillset and Mindset (Attitude)
It is said that product industries are driven by great people with adequate processes, while service industries are driven by great processes with adequate people and skillsets. While knowledge is easy to acquire in today’s internet era, skillsets come with experience. The most challenging to change is the mindset or attitude that people exhibit. When it comes to non-functional aspects, the mindset of people becomes paramount. While knowledge and skillset are also equally important, a right attitude allows acquire them easily.
The culture of the industry, organisation also plays a major role in this. As they say, people’s behaviour and attitude make culture and culture drive people’s behaviours. So for leaders, it is important to have this behaviour and enforce it in every possible conversation which allows building a culture of quality. With service heavy organisations like India, people ‘product mindset’ / ‘platform mindset’ or scarce availability. Hence it becomes even more important for technology leaders to participate in this cultural shift required.
Current industry trend shows that, while experienced / seniors work on the new architecture of the software systems, less-experienced / junior engineers are put into coding/maintenance of the software or bug-fixes. Engineering (especially of an existing software) requires equal (if not higher) experience to ensure that it is continuously kept with quality and sustainability. It requires a lot of experience, effort, patience, decisioning skills to Reverse-Engineer (in the case of undocumented, legacy software), Design/Engineer new capabilities, Re-design / Re-engineer parts of the existing system, refactor the code and build changes to ensure that technical debts do not creep in during this process. In fact, each such rigour applied can have the potential of decreasing entropy.
It is the hard truth that the architecture and code of a software system never quite match up. The code never reflects clearly the concepts, abstraction, components and connectors etc. An architecture does not end with documents/views. It ends with delivery of the software. Hence it is important for the architect to ensure that they accept the code against their architecture via code reviews, active participation with code design, test design, ensuring non-functional tests done and accepted and so on.
Process, Methodologies & Tools
It is equally important to have right processes, methodologies and tools that can simplify and ensure quality delivery of software maintenance. A rigours automated testing that has high coverage can help improve quality, checks and balances placed in the core structural design can show up possible additions that are not coherent.
Here is an example of checks and balance: A lift is designed for 10 people. However, if overused, can lead to breakdown and reduction of its life including the safety of people in it. Newer lifts offer a check of the total weight of people in the lift and it may warn / not function unless they are brought within the limits. It is a significant automation that helps improve life, quality, the safety aspect of people.
Will agile come to rescue?
Project Management defines 3 access to any project. Cost, Quality and Schedule. Or it could be sometimes Scope, Quality and Schedule. You can choose any two, but not all three are always possible. 4 values of Agile and 12 principles drive towards quality, excellence, self-organised highly motivated teams, welcoming change etc. These are significant in building quality and sustainable software.
While in the traditional methods, scopes (functional requirements / non-functional) are paramount, this will require the single man heroic efforts to achieve them and often overrun on schedule. While single man heroic efforts are appreciated, as the Lehman’s law Conservation of Familiarity, it will become non-trivial to ensure familiarity of the same with all stakeholders, which will, in fact, add to entropy. The best way to have familiarity is to participate in making it!
Agile’s paranoia about the schedule, the working software, can lead to typical ‘just for demo’ outcome, which cannot be delivered. Also its paranoia about having working software, forces to get individual disjoint pieces working which can lead to an architecturally inconsistent outcome and will have a serious hit on the endurance.
So, it is not about any methodology, it is about people’s attitude who adopt these methodologies, basically practices. Organisations must ensure that all the values and principles are ingrained and it becomes part of the team and organisation culture.
Conclusion
Entropy hits everyone including Rudolf Clausius. He died at 66 years of his age. While entropy will hit software, one should appreciate the need for longevity of software, and designing for it as part of version 0 and taking adequate measures during maintenance. A great example is Tally's product architecture itself! Originally designed & architected in 1995, it has sustained for over 20+ years and still going strong!
A paper on the survey done on Software Lifetime and its Evolution Process over Generations (quite old) showed that average life-cycle of a software was 10 years, minimum 2 years and maximum 30 years!
I want to leave with a provoking thought - Are you an Engineer or Re-engineer, Architect or Re-Architect? How about new ladders in organization or titles like Software Re-engineer, Senior Software Re-engineer, Principle Re-engineer, Re-architect?
Wish you a Merry Christmas and a Wonderful New Year 2017!. May God give a high endurance of your life and energy to fight entropy.
Don't forget to read: “Software Entropy - Part II: Software Reverse Engineering and Re-Engineering”
Solutions Archietect at EMC
6 年Very nice article Harsha ????
Program Manager, Agile Coach, CSM / PSM Certified
7 年Nice and crisp thoughts ... Liked the article..
Democratizing Training.
7 年An excellent post, Kodnad. May I suggest that you chunk them into smaller, more pointed, posts.
Independent Consultant - Software Development and Data Analytics
7 年Great article Harsha. Well thought out. I was a great refresher in some areas and nice intro in some other
Connecting People, Business and Systems For Extraordinary Success
7 年Very well written Harsha! Loved the way you've abundantly used real life parallels to relate to an abstract and complex subject.