Design for Mission Criticality
Technology is all around us and today, more than ever, it is taking over tasks and activities that previously was trusted to us humans. Technology is becoming the norm and there is hardly an area where technology is not yet in use. This increased reliance comes with increased expectations; we expect technology being available. However, recent outages like the CrowdStrike outage on the 19th July 2024 or the outage that hit WhatsApp and Facebook in early 2024, have shown how IT is now inextricably interwoven into our everyday life. More and more everyday applications are hosted in a public cloud and an outage that hosts a component in the US can cause an airport in Switzerland to cancel all takeoff and landings, a hospital in Germany to cancel all non-urgent surgeries or a retailer in UK being unable to sell items as the POS[1] is not working.
Technology is everywhere, and concepts like public cloud will further increase our dependencies on globally connected systems. This means that we (across the IT industry) need to consider the unintended consequences of a failure even for non-mission critical systems. We must ensure that apply similar techniques to when we design mission critical systems, like an in-flight control system or the software of a self-driving car. Of course, there are different levels of availability requirements; not being able to sell a pint of milk has a different customer impact to that of an in-flight control system of a passenger plane becomes “unavailable” due to an outage in mid-flight, or that our hands-free in-car autopilot shutdown because of a failure, whilst driving at 70mph in the middle lane of a busy motorway.
Time to consider Mission Criticality wider
Designing mission critical systems is nothing new. Nuclear Power Station, Submarines and Space Crafts are only a small number of examples where technology has delivered mission critical services since the 1970s. In the military, technology has been playing am integrated part of any operation. For instance, modern military vehicle heavily relies on mission-critical systems that enhance and guarantee successful mission capabilities . These systems include sensors, actuators, effectors, radars, and processing resources controlling unmanned vehicles like the?ASW Continuous Trail Unmanned Vessel?(ACTUV ).
However, designing and developing mission critical systems has always been a niche field. As an IT Architect during the 1990s and early 2000s it was special to be working on a mission critical system. What is clear is that we need to cannot see mission criticality as a niche field anymore – we (across the IT Sector) must consider all levels of criticality when designing IT solutions.
What is mission criticality?
Mission criticality is a term that refers to the fact that the service of the overall IT system (user interface, integration, data as well as infrastructure components) is critical. Service criticality in this context is a measure of the business importance of the service and a failure of a mission critical service may cause immediate and serious widespread and lasting impact to business operations and may not be tolerated for reasonably long periods. In other words, a mission critical system is absolutely vital to the operation of the business and should the service fail, immediate and serious widespread impact to the business will be the result.
In addition, a mission critical service is typically constantly available (7 days 24 hours). These services typically also deliver extremely responsive service (end to end response time typically less than 1 second) with a demand profile that is random and of unpredictable nature, supporting typically thousands of transactions per hour of sometimes secure or highly secure nature.
Examples are an in-flight control system for an airplane, the control system in a self-driving car or the overall command and control system in a nuclear power plant.? But, also financial systems, like a core retail banking system allowing customers to withdraw money, or an online ordering and distribution system for a global retailer can be seen as mission critical. Whether or not a service is mission critical is mainly down to the client, and sometimes it can be a challenge to establish the level of criticality in the first place.
Other types of criticalities
Various papers differentiate between safety, mission, business, and security critical . Safety critically is when a failure “may lead to loss of life, serious personal injury, or damage to the natural environment”. A failure in a mission critical system “may lead to an inability to complete the overall system or project objectives, e.g., loss of critical infrastructure or data. A failure in a business-critical system “may lead to significant tangible or intangible economic costs, e.g., loss of business or damage to reputation” and a failure in a security-critical system “may lead to loss of sensitive data through theft or accidental loss.”
As technology is becoming more and more the “business” of an organization, I do not differentiate between business and mission criticality; it is one and the same.
Business Criticality Framework?
Awareness and understanding of the business service characteristics of each business is important to ensure that systems and technology services are designed to deliver all required business critical requirements. As referred to above, often a business service and / or an IT Application criticality is only being expressed by outlining the level of availability in terms of absolute percentage. For instance, “my application has to be 99.99% available”. However, the level of criticality of an IT Application will depend on several characteristics covering 7 different categories and not just one or two (the below list is an example, variances are possible):
领英推荐
Platinum – Mission Critical
Gold – Highly Critical
Silver – Critical
Bronze – Important
Summary
IT is now everywhere, and when we design IT solutions, we must understand all service characteristics to be able to cater for any relevant potential outage scenarios. Mission criticality is not a special case, but a characteristic that must be considered holistically.
Thanks for Reading
??
?
?
[1] POS = point of sales
Business Development Executive | EMBA | Customer value first
3 个月Absolutely agree! Integrating a proactive risk management strategy and leveraging predictive analytics can further enhance our ability to anticipate and mitigate potential outages. It's about building resilience into every layer of our IT infrastructure. #ResilientIT #PredictiveAnalytics