DORA - what, why, how - The Digital Operational Resilience Act
Kamil Kurowski
Helping clients to elevate tech platforms to new level of performance, security, reliability and TCO for transactional and AI systems
What.
Security and risk management professionals are confronted with an increasingly challenging task due to a growing and diverse threat environment. In response, the European Union aims to bolster the IT security of financial entities through the Digital Operational Resiliency Act (DORA) legislation. This legislation extends its impact beyond banks, insurance companies, and financial markets to include organizations engaged in business with these financial entities but also for organizations engaged in business dealings with these financial entities.
The proposed Digital Operations Resilience Act is a regulation, consisting of 56 articles?and is structured around five key pillars:
This framework should incorporate a business continuity policy and a disaster recovery procedure to uphold a strong level of digital operational resilience. Regular review of this framework is mandated, occurring at least annually, and following, documented major ICT-related incidents or findings from relevant digital operational resilience testing or audit procedures. The management body of the financial entity holds the responsibility of approving, overseeing, and being accountable for the execution of all measures associated with the ICT risk management framework.
?Why
The rationale behind DORA is clear. It mandates businesses to demonstrate to auditors that their technological infrastructure possesses the capabilities to recover from various issues, such as cyber-attacks, data breaches, or accidental data deletions. Moreover, DORA adopts a comprehensive perspective. It places the responsibility on businesses for their entire IT environment covering on-prem, hybrid and cloud. Workloads may be distributed across entire environments. In the eyes of DORA, all aspects are treated equally, and businesses are held accountable for each piece. Irrespective of the nature of the incident, businesses must be capable of providing evidence that they have the necessary components and procedures in place to ensure a prompt recovery.
How
To prepare for IT resiliency It can seem like a daunting challenge to create an always-on service in the face of those challenges. And there is no one technology that solves all problems. IT resiliency is a choice. So, what should be considered when tackling this problem? First, consider points of failure. Any component of your system which does not have a backup can lead either to reduced throughput, or reduced processing power, or a service outage. Basic assumptions for building a resilient IT system:
·??????? Build the approach, test the approach. Include the following: ?Redundancy, Early detection, Automation and failover and Test, practice, improve — it is not a plan, if you have never tried it. It is just an idea.
·??????? Deploy resiliency best practices
·??????? Measure how undertaken actions solve challenges: ?Financial — cost-benefit analysis of downtime versus technical investment, Regulatory — cost of failing to comply.
To streamline the resolution of the majority of challenges related to hard infrastructure platforms, it is recommended to invest in technologies equipped with inherent resiliency. A good example of such technology is LinuxOne, which aids in minimizing downtimes and simplifying tasks and infrastructure.
How does IBM LinuxONE help? (read more about in NEW and HOT Redbook: IBM LinuxONE Resiliency)
LinuxONE has an impressive track record of maintaining system uptime. This implies that the inherent reliability of the underlying server hardware contributes significantly to enhancing the uptime of applications by virtue of the system it operates on. It is crucial to note, however, that relying on LinuxONE hardware does not resolve all issues. Emphasizing system resiliency is adopting a lifestyle choice, necessitating the meticulous addressing of every potential point of failure across storage, computing, networking, virtual infrastructure, physical site maintenance, and other aspects, forming an integral part of overall resiliency and disaster recovery (DR).
Layers of Resiliency based on IBM LinuxONE
In constructing an environment for top resiliency on LinuxONE machines, it is mandatory to take care about all the layers that play a role and could influence the resilience of the operating environment, as depicted in the diagram below. Additionally, consideration must be given to identifying the layers that can establish the most reliable environment and ensure the highest availability of application services, regardless of planned or unplanned outages in any of the system layers. The illustration below outlines these layers and presents options on the right side that are vital for top resiliency within the IT industry using LinuxONE.
领英推荐
LinuxONE - built-in Hardware Resiliency – The foundation of the LinuxONE system is rooted in decades of extensive research and development undertaken by IBM in the realm of hardware solutions, specifically designed to support mission-critical applications across a diverse array of industries. This system exhibits remarkable overall resiliency, achieved through features such as built-in redundancy, simultaneous replacement, repair, and upgrade functionalities for the Central Processor Complex (CPC), processing units, memory, I/O drawers, and components for storage and networking. Furthermore, the resiliency of IBM LinuxONE extends to the ability to implement non-disruptive firmware updates, (known as Licensed Internal Code (LIC)). Almost always capacity upgrades on LinuxONE can be executed concurrently, eliminating the need for a system restart.
Reliability, Availability, and Serviceability (RAS) is a base for maintaining continuous service in the IT infrastructure. To achieve the highest levels of availability for crucial business applications and data, it is essential to establish a resilient environment based on strong fundamentals as providing critical IT components with backup (redundant) capacity, redundant power sources, and duplicated connections along critical paths to storage, networks, and other systems. Additionally, multiple instances of software, including operating systems, middleware, and applications are needed for top resilisncy. It is crucial to note that redundancy alone does not guarantee increased availability; technologies capable of leveraging redundancy and responding to failures with minimal impact on application availability are also necessary.
Considering redundancy and resiliency, the design of the IBM LinuxONE platform (comprising both hardware and software) incorporates principles rooted in Reliability, Availability, and Serviceability (RAS). These principles are guided by a set of high-level program objectives aimed at achieving continuous reliable operation (CRO) at the system level. The primary goals of IBM LinuxONE encompass ensuring data integrity, computational integrity, diminishing or eliminating both planned and unplanned outages, and minimizing the need for repair actions.
The RAS strategy is centered around adapting to change by leveraging insights gained from previous generations of LinuxONE. This involves ongoing investments in new RAS functions designed to eradicate or reduce all potential sources of system outages.
Infrastructure Layer – The latest iteration of the LinuxONE platform incorporates improvements to the existing Reliability, Availability, and Serviceability (RAS) designs. These enhancements are introduced through the integration of new technology, structural elements, and updated requirements in the next LinuxONE platform. The ongoing evolution of RAS is linked to the introduction of novel features and functionalities, ensuring that LinuxONE platforms consistently deliver outstanding resiliency. The RAS capabilities of LinuxONE include concurrent replacement, repair, and upgrade functions for processing units, memory, Central Processor Complex (CPC), and I/O drawers. Additionally, I/O features related to storage, network, and clustering connectivity are also subject to these advanced RAS functions.
Compute – LinuxONE hardware platform consists of one or more frames depending on the specific model. For instance, the LinuxONE Rockhopper 4 models utilize a single frame, while the LinuxONE Emperor 4 models offer flexibility, allowing configuration with one to four frames. Importantly, even in instances where only one frame is used, numerous redundant components and features are integrated as a key design principle for resiliency of the machines.
LinuxONE CPCs - To ensure the highest level of resiliency, the IBM LinuxONE CPC incorporates redundant components to prevent any potential outages, utilizing redundancy for self-healing. Each CPC drawer in a LinuxONE houses processing units, memory, and I/O interconnects. The CPC drawer design aims to minimize, and in some cases eliminate, both planned and unplanned outages by offering concurrent repair, replacement, and upgrade functions. The process through which a CPC drawer takes over for a failed CPC drawer is known as Enhanced (CPC) Drawer Availability (EDA). EDA enables a single CPC drawer in a multi-drawer configuration to be removed and reinstalled concurrently for an upgrade or repair.
Processing Units (PUs) - All cores in the LinuxONE are physically identical but can be characterized in advance or dynamically based on their features. Certain core characterizations are well-suited for specific tasks or as accelerators. In the rare event of a permanent core failure, each core can be individually replaced by one of the available spares. Core sparing operates transparently to the operating system and applications.
System Clocking - LinuxONE features two oscillator cards (OSCs) for system clocking: one primary and one secondary. If the primary OSC fails, the secondary seamlessly detects the failure and continues to provide the clock signal to the system.
Power - Resiliency capabilities for power include transparent fail-over and concurrent repair of all power components and redundant AC inputs. The power supplies for LinuxONE adhere to the N+1 design, ensuring additional power supply to maintain operations and prevent unplanned outages.
Cooling - LinuxONE offers an N+1 cooling function for the radiator-based, air-cooled model suitable for typical business computing needs. The N+1 redundant cooling function for the fluid-cooled model caters to the requirements of enterprise computing. Resiliency capabilities for cooling include transparent fail-over and concurrent repair of cooling pumps, blowers, fans, and related components.
System Control Structure - The system control structure encompasses redundant sideband control access to all units in the platform, along with redundant network switches. Support Elements (SEs) are connected to support processors in the CPC drawer, I/O drawers, power supplies, and cooling units, where Hardware Management Appliances (HMA) operate.
Memory - LinuxONE platforms implement a Redundant Array of Independent Memory (RAIM), detecting and recovering from failures of dynamic random access memory (DRAM), sockets, memory channels, or Dual Inline Memory Module (DIMM).
LinuxONE Firmware - Provides flexibility for dynamic configuration updates.
PCIe Fanout - In the CPC drawer, it offers redundant paths for data between memory and I/O drawers, housing I/O features. The PCIe fanout is hot-pluggable, enabling concurrent repair without loss of access if a PCIe fanout fails.
Capacity on Demand - To address unpredictable market opportunities and changing business needs, LinuxONE follows the basic principle of Capacity on Demand offerings. These offerings allow access to needed resources through permanent and temporary upgrades activated by one or more LICCC records. Upgrades occur without disrupting the server's operation.
How to achive “Eight nines” 99.999999% Availability (appr.: 1/3 second per year) - Low downtimes as IBM LinuxONE Emperor 4 systems, with GDPS, IBM DS8000 series with Hyper Swap and running a Red Hat OpenShift Container Platform environment are designed to deliver 99.999999% availability.
How to achieve top resiliency level read more in NEWLY published and HOT: IBM LinuxONE Resiliency.
Leader in: Compliance EU and UK, Outsourcing, ITIL, ISO Standards and ICT Benchmarking (all towers)
9 个月A useful insight, thanks. There is a lot more to DORA than just the Regulation. There are the RTS and ITS guidelines plus the many other prior EU Regulations referenced by DORA e.g., for Licenced Activities. In addition there is the register of information for information on all business functions and all third party contractual agreements. Martin @ DORACompliant.com
LinuxONE lead, Northern, Central & Eastern Europe
9 个月Important message Kamil!
DORA will emphasize the importance of enterprise grade infrastructure, security and resilience solutions. The regulations are clear and can be explained by our demo’s and bring the path to the next level of resilient infrastructure
Organizational Alchemist & Catalyst for Operational Excellence: Turning Team Dynamics into Pure Gold | Sales & Business Trainer @ UEC Business Consulting
9 个月Staying informed on digital regulations like DORA is essential for maintaining security and resilience in the financial services sector.