Modernization Project Planning
If you are planning a modernization project, there are several concepts that are not commonly discussed but are nevertheless critical to an optimal result:
- Code – writing code is the ultimate source of code level technical debt; if we can eliminate or sharply minimize the amount of code we write, technical debt will recur very slowly if at all; perhaps more importantly, the agility and flexibility of the application can be dramatically improved, which provides a substantial competitive advantage in a digital marketplace
- Technical logic versus business logic - by definition, business logic controls a change of state in persistent data, either by validation failure which prevents a change or by a decision to change one or more data elements which are subsequently persisted; everything else is technical logic
- Complexity – there is an almost universal failure to grasp the mathematical complexity of some legacy applications, a failure which results in cost overruns, functionality shortfalls and sometimes even project failure, but there is a technical solution to this problem
- Complexity is distributed differently in technical logic versus business logic; it's business logic that drives that exponentially shaped curve above, whereas while technical logic can be complicated at times, in legacy code it never runs into the astronomical realms where business logic can reach
- Risk – we have technical risk which translates into business risk; this is related to complexity but focuses on testing which is our primary risk mitigation strategy; risk is also distributed differently between technical and business logic - we will in most cases find errors in technical logic because it is well specified, but the biggest risk in business logic is omissions - conventionally, we can't test for what we don't know (though dynamic business rule extraction provides a way to do so, discussed below)
- Agile – emergent project designs do have significant advantages over waterfall in most cases, but share one fatal flaw with waterfall: people can’t tell you what they don’t know when defining user stories, but they will try anyway; agile will generally work very well in defining technical logic, but attempting to extract business logic from subject matter experts as part of storyboarding will exhibit both errors and omissions when applied to re-developing a legacy system
- Logistics – the logistical problem of managing a large, complex modernization can be mind-boggling, with the sheer number of artifacts to be managed running to 6 or even 7 figures; this applies very specifically to business rules - the vast majority of business rules are simple or even trivial resulting in a management problem due to shear numbers; by focusing on the most complex rules - where most time will be spent - it is easy to overlook this logistical challenge in project planning
Code
Since the dawn of the computer age in the 1950s, all programming outside of academic computer science department has been with procedural languages[1], i.e., languages such as COBOL, C and Java in which the order of operations matters. But it is the intertwining of business logic with technical logic in procedural language programs that leads to the accretion of technical debt over time. We’ve seen 5 year old Java applications written with the latest agile techniques that were groaning with technical debt as bad as many 30 year old COBOL applications.
By contrast, non-procedural languages bypass this problem completely, by specifying the results desired rather than how those results are to be obtained. For example, SQL, HTML, XML and Microsoft Excel are non-procedural. You don’t tell a relational database how to retrieve a row of data, but only define the row of data to be retrieved. Other examples are LISP, Prolog, Haskell, and SWRL (the Semantic Web Rules Language), plus some of the low code/no code platforms which are gaining commercial acceptance across the IT industry.
A closely related concept is the distinction of business logic from technical logic. Business logic consists primarily of decisions and their resulting modifications to persistent data and of validations that prevent those modifications from going forward. Technical logic is everything else: message handling, data retrieval/update, control flow, orchestration of business and technical logic, etc.
The definition of business logic is derived from the formal definition for business rules: the logic that controls a change of state in persistent data. Pure decisions and the resulting calculations are inherently non-procedural – the order in which a properly defined decision table is executed does not matter since there will be one and only one result for each set of input conditions.
Technical logic is inherently procedural – you must receive a message before you can process it, and you must retrieve data before you can update it. Nevertheless, technical logic tends to fall into well understood patterns. The low code/no code platforms, like fourth generation languages (4GLs) before them, take advantage of these patterns to minimize the effort of defining a given instance of a familiar pattern – including the business rules to be executed.
As a practical matter, if we can separate the business logic from the technical logic and keep it separated, the slowing or elimination of recurrent technical debt can be achieved. The practical result is an application that enjoys both the lowest cost and time for the initial build and the least effort and time for subsequent maintenance.
Separation is most easily achieved if all business decisions utilize a business rule management system, commonly called a “rules engine,” or some other form of decision management. The best systems will include a mathematical analysis of decision tables to identify inconsistencies, redundancies, and logical “holes” which allow a null result to certain sets of input conditions.
Coupled with a low code/no code platform, programming of core transactional logic – the most critical and most difficult of programming tasks in applications with complex business logic – can be reduced to the theoretical minimum. It is possible to achieve the same results in code given a highly disciplined programming environment, but any attempt to do so is likely to degrade over time.
Nevertheless, it is possible to construct one’s own application management system without adopting some language that no one has ever heard of, much less is experienced in using, or adopting a code platform. This calls for a strategy of defining only atomic data which map to atomic decisions, but a discussion of this strategy is outside the scope of this essay and will be examined in a planned future essay.
Complexity
Complexity means more than simply complicated – it is a quantitative measurement rather than a description. Furthermore, the mathematics behind metrics such as cyclomatic complexity will defeat most people who are not professional mathematicians.
This must be overcome to a degree because a basic understanding of mathematical complexity is necessary for risk management in a legacy modernization context. A failure in this understanding is by far the most common underlying reason for the spectacular software project failures that make it into published articles. But outright failures are actually rare – what is common is cost/delivery overruns and functionality shortfalls: the "death by a thousand cuts" rather than one massive explosion.
The concept of complexity includes “essential complexity” – the complexity of the problem we are trying to solve – and “accidental complexity” – the complexity of the software that we write to solve the stated problem. An axiom of complexity theory is that the accidental complexity of software can never be less than the essential complexity of the problem. That said, clearly we should want the accidental complexity to be as close to the essential complexity as possible.
However, cyclomatic complexity is only applied to procedural logic programs since it is a sophisticated way to count branch paths which only occur in procedural languages. If we have a non?procedural decision table, there is no commonly understood way to calculate formal, mathematical complexity. What we want to capture in a complexity metric for a decision table is the permutation among the values of the input variables.
Fortunately, we can achieve a satisfactory practical metric without employing combinatorial mathematics, since it is the rough order of magnitude of the complexity which is needed for risk mitigation in project management, not rigorous precision. When we derive a decision table from legacy program code, we multiply the number of columns (i.e., input variables) times the number of rows (i.e., outcomes). A simple decision table might look like this, where the second column is the outcome and the third and fourth columns are the input values:
(Field names are redacted because of non-disclosure agreements.) The duplicated fifth column allows for logical double negation of the two Payor Type values. In this case 3 input values x 4 outcomes = a complexity index of 12.
The project problem becomes more obvious when we look at a summary from two modules in an actual project:
The vast majority of business rules are as simple as the decision table above, but a few are found which are moderately complex and a very few which are absurdly complex. The 3 business rules flagged in this summary took significantly more time than all the rest of the extracted rules put together. If your project scope includes massively complex rules like these and you don’t plan for it, you can be in big trouble if absolute accuracy is required due to business risk from incorrect decisions being made.
To get more of a project perspective, we took all the extracted rules from the project and ordered them by increasing complexity. The result was summarized in a graph:
The shape of the curve conveys the most important information: it is a rapidly rising exponential following a long period where the curve is essentially linear. It’s not even a case of an 80-20 rule – it’s more like 90-10 rule, or 95-5 rule, or worse. Conversely, the greatest majority of your rules will be simple or even trivial, leading to the management problem mentioned above.
Complexity is never even thought about when designing testing, even in test driven development. Without thinking it through, people design testing as if all the complexity fell into that near linear region in the bottom. If dealing with requirements based testing, the tests will be designed solely on what we do know, not what we don't know.
You can safely predict that these project will face difficult challenges in production parallel testing, because intermittent discrepancies with the new business logic versus the legacy system will become apparent for the first time. Without production parallel testing, you will face those same challenges in production, and indeed modernization projects are famous for getting to this point but never qualifying for going into production.
Risk
Standard approaches to risk management say to start with a risk assessment in which, for each identified risk, you should assess them among 3 estimated quantities:
1) Probability of Occurrence: The risks can be categorized as very likely, probable and unlikely based on the likelihood of each risk manifesting itself in production, where
- Likely means a 70-100% chance of its occurrence
- Probable means a 30-70% chance, and
- Unlikely is used for a risk which has a less than 30% chance of occurrence.
2) Impact Intensity: The impact intensity of the risk can be categorized as High, Medium and Low depending on how critical the risk and its effects can be.
3) Mitigation Strategy: After analyzing all the aspects of the risks and the existing preventive measures that can be used, the project team needs to decide on the mitigation strategy to deal with the risk. There can be three different categories of mitigation strategies:
- Deflection: with this category the risk is managed by transferring the risk handling to a third party or agency, such as an insurance company
- Control: devise a technical plan so as to prevent, minimize or by-pass the risk.
- Avoidance: This strategy is used when the risk factor does not pose any considerable threat. The basic idea here is to ignore the risk, do nothing and accept the consequences.
When it comes to software risk assessment, the probability is of the occurrence of the specific set or sets of data conditions that will cause that specific erroneous result to occur. This is problematic because, if we could make that determination, we could test the code and fix any problems. So, in fact, the risk probability is a largely meaningless quantity when it comes to software. It could even be misleading because any assessment might be based on estimates of whether it has ever occurred in the past, which is almost impossible to do accurately. If it can happen, sooner or later it will happen.
Impact is more directly measurable. In one case, an erroneous electronic funds transfer could result in multiple billions of dollars going awry. In another case, an error might require writing an apologetic letter, but no actual cash damages were conceivable. Because of this reality, we recommend that risk impact should be analyzed financially, not technically. We ask, for each identified risk, what are you willing to spend in hard dollars today for an insurance policy to pay off in case the risk manifests itself in the future? In the first case, millions of dollars were no problem, but the insurance premium will be the budget for technical risk mitigation since you are unlikely to find an insurance company willing to take that bet. In the second case, there was no willingness to spend any money at all, and arguably rightly so.
Since the mitigation strategies are unlikely to identify an agent both willing and financially able to assume the risk, deflection is rarely relevant as a mitigation strategy. The first example was one of control as mitigation, where testing is our primary strategy, and the second was of avoidance as mitigation.
Requirements Based Testing
The standard testing methodology for functionality is requirements based testing (RBT), in which all the functional requirements and associated business rules are tested for correspondence with the specifications. However, RBT is based on testing for what we know. It cannot find what we don’t know.
Testing in a modernization context has to seek both errors and omissions. RBT will find the errors, if applied with sufficient diligence, but by definition it cannot find omitted requirements.
Returning to our distinction between technical logic and business logic, we find some important relevance to testing. Business logic ranges from the trivial to literally astronomical levels of complexity. Technical logic ranges from the simple to perhaps moderately complex, with the occasional exception for something like very complex SQL commands.
What this means in practical terms is that it is rare for an omitted requirement to be missed in testing technical code. By and large technical logic just won’t work if something is missing, and it’s not that complex to begin with.
Business logic – which mostly means business rules – is a completely different story. We may think we have all the business rules tested, but it can be operationally difficult even to determine whether or not an error occurred.
Worse, business logic has the combinatorical problem of the permuting of input variables, a problem which rarely arises in technical logic. If we assume the simple case of each input variable having only two values, 0 and 1, when we have 10 inputs we will need 2**10 = 1,024 test cases. For the case of 40 input variables given above, we would need over 1 trillion test cases. In other words, it is effectively impossible to test completely, plus of course few inputs are restricted to 0 or 1.
If RBT is insufficient, what else could we do? The problem is establishing a standard of truth – if the requirements can’t do it to the necessary standard of accuracy, what will?
What makes modernization fundamentally different from greenfield development is that there is an established business process that depends on the precise accuracy of the system. When the legacy system was new, there was no computerized business process, and so there was no choice but to work out the kinks over time. Plus, the limited computer systems of the day did not allow for massive complexity.
But, since those early days when it first came into production, the business process came to depend on the computer. The sometimes astonishing complexity of the business logic results from 20, 30, or even 40 years of adapting to changes in the business – many of which were never practical with a fully manual process. But we can’t afford to start all over again and go through another 20+ years of debugging and adapting to changes. Unless the business process is going to be dramatically altered, a near impossibility for very complex and business critical applications, we have to do the best we can and expect a long period of debugging, including production debugging and production errors.
Regression Testing
If production errors are unacceptable, then there is only one standard of truth to be applied: the legacy system itself. For all its admitted faults, it does produce the results people expect and the business demands. The strategy of regression testing against the legacy system is a sound approach, but it too has a fault: coverage. If you were to run the legacy system for a year, it is a near certainty that substantial swatches of the code would never be executed. In one memorable case, we brought an extracted business rule to our client and asked whether or not it was valid. He leaned back in his chair and that, in the 25 years the system had been running, he was sure that situation had never occurred. But – if that situation did occur – then what we found would be the correct course to take.
In practice, any complex program is unlikely to see even 70% coverage of its business logic ever executed. So, production parallel testing will have to run for a very long period of time before it can be said to be more or less functionally equivalent to the legacy. Even then, expect defects to surface periodically.
Dynamic Business Rule Extraction (DBRE)
When challenged to develop a process to extract and prove 100% of the active business rules in a legacy system, the result was dynamic business rule extraction[2]. The goal was to find the omitted business rules. DBRE differs from static business rule extraction (SBRE) in that the program execution is a critical part of the process. In SBRE, the static source code is analyzed on a workstation but no execution occurs.
DBRE starts with SBRE to discover candidate business rules, which are used to generate executable test cases. After passing validation to eliminate any errors, the complete set of validated test cases are executed under cumulative code coverage analysis. In the resulting coverage report, the omitted business rules will be found in the code that was not executed.
We turn RBT on its head and use the test executions to determine whether the business rules correspond to the executing code, not the other way around. We continue to iterate through the unexecuted code until all that remain are branch paths that correspond to purely technical logic or to obsolete functionality. At that point, we have 100%.
SBRE will never get you to 100% when dealing with complex code, despite what product vendors will tell you. After a while, even the best technicians have to throw up their hands and concede defeat.
As a byproduct, DBRE will give business rules in a non-procedural format that can be loaded into a rules engine or decision management system, and you get a complete set of regression test cases. Those test cases allow us to close the paradox in the problem of modernization project planning - because only in this case will you have a 100% coverage of the business logic.
The Conundrum
In order to get funding for a significant modernization effort, one usually has to promise to provide new functionality. Just getting the same functionality but without all the strictures of the legacy implementation rarely gets the go-ahead. However, as soon as we change any existing functionality, we lose our standard of truth that allows us to perform production parallel regression testing.
But there is a solution. We take advantage of the time it is going to take to design the new and changed functionality to perform DBRE and then convert those regression test cases from the legacy to the new system. On a special minimalist testbed which we call a reference implementation, we exhibit functional equivalence for all update transactions using those regression test cases. We define two swim lanes to do this:
The business/enterprise architecture swim lane is where the new system is defined. While that process is occurring, the business re-architecture swim lane extracts the business rules and the meaningful technical design elements from the legacy system and creates the reference implementation. Once the reference implementation successfully processes the test cases from DBRE, then we can use the reference implementation as the foundation of the new system. We add new functionality and modify functionality, especially rationalizing business logic that failed logical analysis for redundancy, inconsistency and logical “holes.”
Agile
People invoke agile methodologies as the end-all and be-all of software projects. Even when we have a complete waterfall specification (which we can extract from a legacy code base), the preference is still for implementation in digestible chunks.
While we broadly agree with this preference for agile, we are also aware that agile shares a critical deficiency with waterfall: when gathering requirements/user stories, people can’t tell you what they don’t know – but they’ll try anyway. The results are errors and especially omissions in the user stories and associated business rules. Testing will usually find the errors, if sufficiently thorough, but it won't find the omissions (unless coupled with DBRE).
Just like the shortcomings of RBT with complex applications, and for similar reasons, the user story approach breaks down with complex business logic. Agile does very well with technical logic, defining work flows, transactions, queries, user interfaces, etc., but don’t expect to extract business logic accurately and completely that has more than a handful of inputs and outcomes.
Our recommendation is to use agile for defining the technical logic of the new system, and for the implementation of the system, but to use a business rule approach – whether business analysis, business rule extraction or a combination – for defining and especially rationalizing the business logic.
Logistics
A modernization project, using our recommendations, will generate a huge number of artifacts from analyzing the legacy system artifacts. Managing the legacy system components and the equivalent components of the new system – not to mention maintaining the bridging between the two – can become overwhelming. A legacy system can have 10’s or 100’s of thousands of logical branch paths for analysis. Each test case for DBRE needs to be archived with traceability to the original code. Renormalizing a legacy data model into a new relational or NOSQL design.
For similar reasons, we start data conversion on day one of the project, since data can hide many unforeseen horrors that can derail a project that waits too long to address them. Projects without DBRE should start testing on day one as well, which can create test cases for test driven development, our preferred approach to agile development. The cutover plans for moving into production need to be derived early as they can fundamentally alter the design of the whole project. And so it goes.
Modernization, because of the fundamental requirement for matching the results of legacy processing, requires a mindset that is far more obsessive than standard greenfield projects. If you are prepared, and infected with a strong dose of fear, you will probably do fine. On the other hand, if you are serene and confident, then I would be scared.
[1] Also known as “imperative languages” which is useful since the term “procedural language” is often used as an antonym to an object oriented language. This usage of procedural can be confusing, particularly when Java, which is an object oriented language, is described as procedural.
[2] USPTO Patent #9588871, Dynamic Business Rule Extraction, issued March 7, 2017
PACT Consulting LLC
5 年Nice work Don, as usual !!!
Retired Mainframer at Retired Mainframer
5 年Terrific discussion on technical debt, complexity, risk etc.? Added to my understanding of approaches to modernization!
z/TPF Consultant / Project Manager / Technical Writer / Mathematics Instructor
5 年I agree that “modernization” does not necessarily imply exiting the Mainframe. However, that is precisely what most people infer, and in fact what has become the prevalent goal. It boils down to return on investment. If you are to modernize existing applications on their current “legacy” platform, in order to reduce their “technical debt” as you say in the article, you must weigh said effort against the alternative of migrating to a modern platform. To do otherwise may likely result in reaching a point of diminishing returns, so you may conclude it best to bite the bullet and forge ahead with a more sustainable solution. It’s a matter of short-term thinking versus long-term thinking, balanced with the cost component in terms of both money and time.?
Really nice post.? There's one concept that remain undiscussed most of the time is "Partitioning the Legacy Applications"
IACT Certified Autistic Coach, Certified Executive Coach, Certified Business Architect, Organizational Change Catalyst
5 年Excellent & detailed article from Don regarding modernization key topics...a must read for anyone involved in modernization.