High availability strategy for Oracle Cloud regions

High availability strategy for Oracle Cloud regions

Different customers’ workloads require different level of business continuity, but no matter the need, Oracle Cloud Infrastructure (OCI) offers customers cloud regions with multiple fault domains to address most outages at a metropolitan scale. These fault domains are like availability zones offered by other cloud service providers, but for customers that require a higher level of protection or disaster recovery, a dual-region deployment can offer superior protection with the same effort as deploying multiple availability zones.

OCI’s foundational high availability building block within a cloud region is the fault domain. Each fault domain represents a virtual data center with power and network redundancy at the physical server, rack power distribution unit (PDU), and top of rack (TOR) switch level. Software deployment in a fault domain’s servers and smartNICs is staggered to minimize damage and enables customers to avoid single points of failure in resource deployment, ensuring application availability.

Fault domains, outages, and high availability

In Gartner’s words , fault domains are “groupings of hardware that effectively form logical data centers within the AD (Availability Domain), thus providing AZ-like (Availability Zone) options for resilience.” OCI services are either natively replicated (like block volumes) or distributed across the fault domains (like load balancers) to address the most common causes of downtime: Individual hardware component failure, software configuration, patch, or update errors.

No alt text provided for this image

According to Gartner’s research , the availability of cloud services depends on the following variables:

  • Logical (software) design
  • Implementation quality
  • Deployment processes
  • Operational processes
  • Physical design

Cloud service resilience is different from on-premises systems. According to Gartner , “you’re dealing with systems where software issues are almost always the root cause.” For software issues, the protection offered by three OCI fault domains is comparable to three availability zones, thanks to staggered software deployment at the fault domain level. Perhaps unintuitively, physical data center downtime is a less common scenario. The data from Uptime’s 2021 annual survey on the topic of data center outages and their causes shows that only 6 percent of respondents said their facilities experienced severe (“Uptime category 5”) outages in the past three years , with software, IT, and network issues accounting for most of the publicly reported outages, almost three in four. According to the Uptime research, “the underlying data center infrastructure is becoming less of a focus or a single point of failure.”

So, the use of fault domains can mitigate most planned and unplanned outages inside a single region and enables OCI to offer the same service level agreements (SLAs) on availability, performance, and manageability for most OCI services , regardless of the number of availability domains in a specific region. The only exceptions are Compute and VMware in availability domain regions with monthly SLA of 99.95% vs 99.99%, such as 22 minutes versus 4 minutes.

No alt text provided for this image

If your application supports a clustered architecture across three availability domains to provide high availability, it protects against physical data center downtime but not a region-wide event. For customers that need to protect against data center level unavailability or when the highest availability is required, OCI recommends replication across dual Oracle Cloud regions. To enable geographic disaster recovery, see Cloud Architecture Framework - Extreme Reliability . Oracle Cloud regions reside in different cities to provide geographical distance and operate in different power grids, network infrastructure, and flood plains, offering resiliency against data center failures and natural disasters.

OCI services in an Oracle Cloud region are independent of other regions, except for a handful of cross-regional services (IAM and network firewalls ), and software deployment is staggered from region to region to minimize damage. As of September 2022, OCI has 40 cloud regions around the world, with at least two regions in 10 countries and the European Union and the plan to deploy at least two Oracle Cloud regions in each country for customers who want data within a specific country to meet data residency regulations.

No alt text provided for this image

Conclusion

Utilizing the functionalities of native replication inside and across regions for many core services, a deployment across fault domains in one Oracle Cloud Region, with replication toa second region provides the highest level of regional business continuity for about the same amount of effort as a multiavailability zone deployment.

Learn more on Extreme Availability , or try OCI for free !

Originally appeared on the Oracle Cloud Infrastructure Blog .

Author: Francesco Burruano – Oracle Cloud Regions Product Marketing Manager

要查看或添加评论,请登录

社区洞察

其他会员也浏览了