Beyond Robustness: Closing the Gap in Cyber-Physical Risk Management
Sinclair Koelemij
Cyber-Physical Risk Expert | Founder Cyber-Physical Risk Academy | Consultant, Speaker, Trainer, Publisher | Operational Technology | Masterclasses | Training | 45+ years in process automation. OT security focus.
Once upon a time, life was simpler for process automation engineers. Technology was proprietary, functionality was limited, and product and service prices were sky-high. Failures were mostly predictable, often the result of wear and tear—a natural touch of unpredictability to an otherwise steady routine.
Then, the world of automation underwent a dramatic transformation, as if evolution had some unfinished business. Open technology emerged, bringing a surge of functionality, plummeting costs, and a troubling new challenge: cyberattacks. Suddenly, failures were no longer random—they could be deliberate, orchestrated by unseen hands to exploit vulnerabilities. This ushered process automation into a wild and unpredictable new era.
Yet, amidst all these changes, one thing remained constant: the need for safe and reliable systems. The release of the IEC 61508 standard in 1998 marked a major milestone in managing process safety. This groundbreaking standard introduced formal methods to enhance the functional safety and reliability of electrical, electronic, and programmable electronic (E/E/PE as we used to call them) systems. By establishing a structured approach to risk management, IEC 61508 played an important role in making production processes both safer and more reliable. One of its key innovations was the introduction of Safety Integrity Levels (SIL), which set clear, quantifiable targets for reliability and risk reduction based on the likelihood of failure. The higher the SIL, the greater the reliability and risk reduction. For example, a SIL 3 safety control reduces the risk associated with a specific hazard by a factor of 1,000, while a SIL 2 control achieves a risk reduction factor of 100.
Recognizing that a “one-size-fits-all” approach was insufficient for different industries, IEC 61511 was introduced in 2003 to adapt the principles of IEC 61508 to the specific needs of the process sector. This standard focuses on the design, implementation, and operation of Safety Instrumented Systems (SIS), which manage process-related hazards and step in to prevent catastrophic failures when needed.
However, control systems operate under a different mandate. Unlike safety systems, control systems are not governed by frameworks like SIL that define explicit targets for reliability and risk reduction. Their role is to ensure process stability and efficiency during normal operations. While failures in control systems can be disruptive, they rarely pose immediate safety hazards because safety systems are there to "save the day" if necessary. Instead, control systems rely on engineering practices—such as redundancy, diagnostics, and robust design—rather than standardized risk reduction targets.
This distinction highlights the complementary roles of control and safety systems. Safety systems handle critical hazards caused by random failures in the control system, preventing catastrophic outcomes when control and process equipment malfunctions. Their reliability is essential because any failure could have severe consequences. Control systems, meanwhile, focus on maintaining smooth operations, with safety systems stepping in when required.
Safety systems are designed to address all critical process hazards identified through risk assessments like HAZOP (Hazard and Operability Study) and LOPA (Layers Of Protection Analysis). These systems implement Safety Instrumented Functions (SIFs) to manage hazards such as overpressure, thermal runaway, or toxic releases, ensuring processes remain within safe limits. However, safety systems are not designed to handle deliberate failures, such as cyberattacks, which are often overlooked in traditional risk analysis methods like HAZOP. Managing these more complex risks requires additional measures, including resilience-focused strategies, cybersecurity protections, and a broader layers of protection approach. Such an approach integrates process controls, operator interventions, mechanical safeguards, and robust cybersecurity measures to ensure comprehensive hazard coverage and adaptability in the face of deliberate threats. Furthermore, adopting a Deep Defense strategy would also incorporate various management processes as additional layers of protection.
Risk reduction in process safety is a quantitative process, where higher SIL correspond to greater risk reduction by addressing the reliability of systems in the face of random component failures. However, addressing cyber-physical risks is more complex, as it extends beyond random failures to include deliberate, orchestrated failures caused by cyberattacks. SIL levels are not designed to address intentional failures, as these attacks can manipulate systems to create simultaneous or sequential failures—scenarios that are highly improbable in the context of random failures. These new risk scenarios require entirely different mitigation and risk estimation approaches tailored to the intentional and coordinated nature of cyber threats.
The IEC 62443 standard serves a similar role in cybersecurity as IEC 61508 does in functional safety. Like IEC 61508, IEC 62443 is a cross-industry standard that provides a comprehensive framework for managing cybersecurity risks in industrial automation and control systems. However, the security levels (SL 1, SL 2, SL 3, SL 4) defined in IEC 62443 differ fundamentally from the SILs in IEC 61508.
SLs in IEC 62443 are qualitative and describe the system’s capability to withstand specific threats by meeting progressively stricter security requirements. In contrast, SILs are quantitative and directly tied to risk reduction, providing a quantitative decrease in the probability of a hazardous event. While a higher SL enhances the system's robustness against cyberattacks by addressing vulnerabilities and strengthening defenses, it does not inherently provide a direct, quantitative measure of overall risk reduction as SILs do.
Reducing overall risk depends not only on achieving a higher SL to reduce likelihood but also on implementing additional measures, such as resilience strategies, to mitigate the impact of attacks.
领英推荐
Figure 1 illustrates this concept. If we apply security measures that focus solely on increasing robustness, we only reduce the likelihood of the inherent risk (1) to 1E-04 events per annum. However, this reduction may still not bring the risk down to an acceptable level. The above figure further demonstrates that by also enhancing resilience, thus lowering impact, we could achieve an acceptable residual risk (2) with a higher likelihood (1E-03 events per annum) being still acceptable. This means that a balanced approach, combining resilience and robustness, could result in effective risk reduction while potentially requiring fewer resources for robustness alone.
Thus, while IEC 62443 significantly improves cybersecurity robustness, it complements—but does not replace—a holistic approach to risk management that considers both likelihood and impact. Achieving this holistic approach requires for the process industry the integration of process safety principles, as outlined in standards like IEC 61508 and IEC 61511, to address process-specific hazards and ensure that safety systems effectively mitigate risks from random failures alongside deliberate cyber threats.
The primary role of safety systems is to bring the process to a safe state and prevent hazardous events from escalating. As mentioned above these systems are designed to detect hazardous conditions and take predefined actions, such as shutting down equipment, isolating processes, or activating relief mechanisms, to prevent catastrophic outcomes, including loss of life, harm to health, environmental damage, and financial losses. The primary role of safety systems is to bring the process to a safe state and prevent hazardous events from escalating. These systems are designed to detect hazardous conditions and take predefined actions, such as shutting down equipment, isolating processes, or activating relief mechanisms, to prevent catastrophic outcomes, including loss of life, harm to health, environmental damage, and financial losses. While safety systems are highly effective in reducing risks for specific process hazards—using semi-quantitative risk estimation methods like LOPA—they are not designed to address risk scenarios such as deliberate cyberattacks. Achieving an optimal risk reduction requires combining process safety with additional measures, such as cybersecurity protections and resilience strategies, to address the full spectrum of potential risks.
However, both IEC 61508 and IEC 62443 only indirectly support resilience by improving system reliability and robustness. Neither standard explicitly defines or comprehensively addresses resilience as a distinct goal. Resilience—defined as the ability to detect, respond to, recover from, and adapt to failures or attacks—requires additional frameworks, such as NIST CSF (Cyber Security Framework) and ISO 22301, and strategies beyond what these standards provide.
IEC 61511 builds on IEC 61508 for the process industry by offering process-specific guidance and integrating methods like LOPA to design and manage SIS. This industry-sector-tailored approach enhances the ability to address process-specific hazards and ensures robust safety systems through a structured safety lifecycle. In contrast, IEC 62443 is a cross-industry standard that focuses on cybersecurity risks in industrial automation and control systems but does not provide sector-specific guidance.
While the process-specific enhancements of IEC 61511 indirectly contribute to resilience by reducing the chance on failures and managing risks more effectively, neither IEC 61511 nor IEC 61508 explicitly addresses resilience. IEC 62443, while including monitoring requirements that enhance early detection and help reduce the impact of certain events, still does not provide comprehensive guidance for resilience, particularly in areas such as recovery and operational continuity. These gaps require additional measures to achieve a holistic approach to risk management.
Conclusion:
To achieve a truly holistic approach to cyber-physical risk reduction, there is a clear need for an industry-sector-specific standard within IEC 62443 (ISA 99) or developed in close alignment with IEC 61511 (ISA 84) to address the unique needs of the process industry. My preference would be for this initiative to originate from the ISA 84 workgroup, as process safety is the most critical factor in ensuring the protection of people, the environment, and assets.
Cybersecurity plays a critical role in supporting and facilitating the process safety function by protecting the systems, data, and actions, that enable safety-related operations. However, the primary task remains process safety, which focuses on preventing hazardous events, safeguarding lives, protecting the environment, and ensuring the reliability of industrial processes. Cybersecurity enhances process safety by mitigating threats that could compromise the integrity, availability, or reliability of safety systems, but it operates as a supporting discipline to the overarching goal of ensuring safe and reliable process operations.
A siloed approach, where cybersecurity standards and process safety standards are developed in parallel without integration, would fail to provide a comprehensive solution. Such fragmentation risks leaving critical gaps in risk management, as it does not adequately address the interplay between cybersecurity and process safety risks. Without integration, an exclusive focus on robustness could lead to higher impact and overspending on OT security measures, as the lack of resilience strategies would potentially require additional investments to achieve acceptable risk levels.
Aligning efforts to incorporate both resilience and robustness within a unified framework would ensure a more balanced, efficient, and effective approach to managing cyber-physical risks comprehensively.
Innovative enterprise solution/security architect/DORA /CRA /Digital Compliance Strategy/ Ensure successful innovation projects in less time with more value
1 个月Like how ai always have a fingerproblem- and with that said it will also have a protection problem that the current technology is not capable of solving #time4achange #tripled