Raft Replication in Oracle Database 23ai: High Availability and Scalability made simple
In today's ever-evolving business landscape, data integrity, fault tolerance, and consistency in distributed systems are paramount. Oracle Database 23c implements the Raft consensus algorithm for data replication and leader node election, ensuring high availability and scalability for mission-critical distributed database deployments. In this article, I discuss architecture of Raft replication in Oracle Database 23ai and how it provides a simple, yet robust solution to the issues that distributed systems (and by inclusion NoSQL databases) aim to address. The RAFT consensus algorithm is described by Ongaro and Outsterhout in the USENIX paper “In Search of an Understandable Consensus Algorithm”. The architecture details provided below is a 1000 ft overview, the reader is invited to read the original paper for the detailed algorithm.
Introduction to Raft Replication
RAFT replication is a built-in Oracle Database 23ai capability that integrates data replication with transaction execution in a distributed Oracle Database. Raft replication enables fast automatic failover with zero data loss. If all shards are in the same data center, it is possible to achieve sub-second failover. This capability provides a uniform configuration with no primary or standby shards.
Raft Replication Terminologies
The Raft consensus protocol is used to maintain consistency between the replicas in case of failures, network partitioning, message loss, or delay. When Raft replication is enabled, a sharded database contains multiple replication units.
Replication Unit (RU) - A replication unit (RU) is a set of chunks that have the same replication topology. Each RU has?multiple?replicas placed on different shards.
Replication factor (RF) - determines the number of participants in a RAFT group. This number includes the leader and its followers. The RU needs a majority of replicas available for write.
In Oracle Globally Distributed Database , the replication factor is specified for the entire database cluster, that is all replication units in the database have the same RF. The number of followers is limited to two, thus the replication factor is three.
Raft Group - Each replication unit contains exactly one chunk set and has a leader and a set of followers, called a raft group. The leader and its followers for a replication unit contain replicas of the same chunk set in different shards as shown below. A shard can be the leader for some replication units and a follower for other replication units. All DMLs for a particular subset of data are executed in the leader first, and then are replicated to its followers.
Replication Role - nodes(shards) in a raft group can be in one of three states: Leader, Follower or Candidate. The leader is responsible for accepting client requests and managing the replication of the log to other servers (followers). The data flows only in one direction: from leader to other servers (followers).
High Level Architecture
Raft replication in Oracle Database 23ai integrates data replication with transaction execution in a sharded database. This enables fast automatic failover with zero data loss. If all shards are in the same data center, it is possible to achieve sub-second failover. There is no need to configure and manage replication tools to achieve high availability.
Raft replication automatically reconfigures replication in case of shard host failures or when shards are added or removed from the sharded database. When Raft replication is enabled, a sharded database contains multiple replication units. A replication unit (RU) is a set of chunks that have the same replication topology. Each RU has?multiple?replicas placed on different shards. The Raft consensus protocol is used to maintain consistency between the replicas in case of failures, network partitioning, message loss, or delay.
Node failure and recovery are handled in an automated way with minimal impact to the application. The failover time is sub-3 seconds with less than 10 millisecond network latencies between AZs. This includes failure detection, shard failover, change of leadership, application reconnecting to new leader and continuing business transactions like before. The impact of the failure on the application can be further abstracted by configuring retries in the JDBC driver, and the end user experience will be that a particular request took longer rather than getting an error.
领英推荐
As shown in diagram above, when the leader for a replication unit becomes unavailable, followers will initiate a new leader election process using the Raft protocol. As long as a majority of the nodes (quorum) are still healthy, the Raft protocol ensures that a new leader is elected from the available nodes. Once a new leader is elected, routing clients (e.g., UCP) are notified using ONS notifications to update their shard and chunk mapping, ensuring that they route requests to the leader. During this failover and reconnection period, the application could be configured to wait and retry, with the retry interval and retry counts settings on the JDBC driver configuration. This is very similar to the Oracle RAC (Real Application Clusters) instance failover configuration. Upon connecting to a new leader, the application continues to function as before.
When the original leader comes back online after a failure, it first tries to identify the current leader and attempts to rejoin the cluster as a follower. Once the failed shard rejoins the cluster, the recovered follower asks the leader for logs based on its current index in order to sync up with the leader. Raft log retention is configurable using the shardsize_per_raft_logfile_gb parameter. When the recovered follower is synchronized with the leader it continues as a follower.
Leaders can be rebalanced among the shards by issuing the GDSCTL command SWITCHOVER RU -REBALANCE. You can build automation to monitor replication unit status and perform failovers to maintain desired configuration. If there are not enough Raft logs available on the present leader, the follower must be repopulated from one of the good followers using the GDSCTL command COPY RU.
Benefits of Raft Replication Architecture
The Business Problems solved with RAFT Replication in Oracle Database 23ai
1. Data Consistency
Consistency is the lifeblood of many business-critical applications, from financial systems to e-commerce platforms. Ensuring that all nodes in a distributed database have the same, up-to-date data is a non-negotiable requirement. RAFT replication in Oracle Database 23ai guarantees data consistency, enabling businesses to operate with confidence.
2. Fault Tolerance
System outages, network partitions, and node failures are unfortunate realities in the world of distributed systems. Oracle Database 23ai leverages RAFT to provide the fault tolerance needed to keep your business operations running smoothly, even in the case of loss of an entire data center or availability zone.
3. Leader Election
In a distributed environment, coordinating writes and ensuring a linearizable order of commands is essential. RAFT in Oracle Database 23ai effectively elects a leader node that manages this coordination, preventing data conflicts and inconsistencies. Business applications that require coordinated, ordered operations benefit immensely from this feature.
4. Simplicity
Complexity can be a significant hurdle when working with distributed systems. Oracle Database 23ai's implementation of RAFT offers simplicity and understandability, making it easier to design, deploy, and troubleshoot your systems. Your technical teams can spend more time innovating and less time deciphering intricacies of managing replication tools and software.
A Glimpse into the Future
In conclusion, the incorporation of raft consensus algorithm in Oracle Database 23ai is not just an upgrade—it's a leap forward. It signifies a deep commitment to delivering data management solutions that empower businesses to navigate the challenges of today and seize new opportunities. We also launched other cool features like Directory Based Sharding Method, Automatic Data Move on Sharding Key Update, Fine-Grained Refresh Rate Control for Duplicated Tables and Synchronous Duplicated Tables along with Raft replication in Oracle Database 23ai. With RAFT replication in Oracle Database 23ai, businesses can trust in the consistency, reliability, and scalability of their data, enabling them to focus on what truly matters—innovation, growth, and success.
Disclaimer
The information provided in this article is intended for information purposes only. Views and opinions shared are solely mine and it may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
CSPO | Technical Consultant | Solution Architect | FinTech
2 周Follow-up questions if anyone care to answer. 1- How Oracle decide which subset of data (DML) belongs to which replication unit with in a shard. Is it defined at the time of table creation? 2- Secondly, can we say replication units are basically partitions (horizontally scaled) with in a shard that holds multiple tables. 3- What if particular replication unit becomes hot spot. Is it possible to break the replication unit into 2 or more replication units (without downtime)? 4- When a dead node comes back online, how to determine this node has all the changes synched up from other nodes and ready to become operational. Is it manual process or automaticity dealt within oracle cluster.
SAP Technical Expert / Oracle DBA
8 个月Hi Kehinde Otubamowo. Nice article. Could you please clarify following: How will licensing work for Raft replication? Will it be separate from the existing licensing options?
Tech Program Manager | Reinforcing & Advancing Engineering Projects with Tech Know-How | HealthTech, Data-Obsessed, Solution Design
9 个月Hey Kehinde Otubamowo. I can't believe I'm just seeing this article now! Really great and encompassing review of Raft replication in Oracle Database 23c. I especially liked how you detailed the architecture and the logical flow of Raft replication, shedding light on its importance for data integrity, fault tolerance, and consistency in distributed systems. This is particularly pertinent to my current focus as I'm actively working on getting AWS certified, and understanding these concepts is invaluable for designing resilient cloud architectures. Thanks for sharing, Kehinde Otubamowo
Looking forward to seeing more posts on this awesome new feature Kehinde Otubamowo. Good stuff!
Vice President, Database Product Management at Oracle | Board Advisor | Yahoo Heroes Top 100 Women Future Leaders | Speaker | Podcast Host
1 年Would love to see this in an Oracle LiveLab so we can all try it out! Good info!