Oracle RAC 12c Release 2 Node Weighting
This blog post will cover the changes to the Node eviction algorithm in Oracle RAC 12c Release 2 based on the new Node Weighting feature. Node weighting works alongside Oracle RAC buddy instance feature (https://www.dhirubhai.net/pulse/oracle-rac-introducing-recovery-buddy-anil-nair) to further improve database availability.
Introduction
Oracle Grid Infrastructure component (CSSD) provides “node membership†services functionality. At a very high level, this translates to “CSSD being the final authority on how many nodes are in the cluster at that timeâ€.
The node membership for “mycluster†consists of nodes {1,2,3,4}
During normal node boot, CSSD process starts up as part of the Grid Infrastructure stack startup. It communicates with the CSSD process running on other nodes and generates a “node membership†mapping. It further continues to maintain the node membership map by continuously monitoring the health of the nodes of the cluster.
If CSSD process determines that some node(s) are not heart beating, it will evict the un-healthy node(s) from the cluster and update its node membership mappings. Oracle RAC 12c Release 2 introduces a change in the algorithm by which the candidate node(s) to be evicted is chosen for the specific case where the split brain results in the creation of equal number of nodes in the sub cluster.
Split Brain
Figure 2 above depicts Cluster mycluster experiencing a split brain such that
- nodes 1,2 can communicate with each other
- nodes 3,4 can communicate with each other
- nodes 1,3 cannot communicate with each other
- nodes 2,4 cannot communicate with each other
resulting in two sub clusters with 2 nodes each as shown below
Oracle Pre-12c Release 2 behavior
In releases prior to Oracle RAC 12c Release 2, a split brain resulting in two sub clusters which consist of equal number of nodes will result in the sub cluster with the lowest node number to survive while the sub cluster with the higher node number will be evicted.
Note: This behavior may change in future release/patchsets so customers are advised against basing their application logic on this behavior.
Oracle 12c Release 2 behavior
Oracle 12c Release 2 introduces a new algorithm to choose the candidate node(s) to be evicted. Instead of ignorantly relying on the node number, the new algorithm attempts to smartly choose candidate node(s) to be evicted while ensuring critical database workload can continue without hiccups. Some of the aspects considered are
- Number of services
- Singleton services
- Flex ASM instance
- Public network failure
- Type of Node (Hub or Leaf)
- Etc.
Additional aspects may be enhanced/improved/added/deleted in future release/patchsets.
What has not changed?
Split brain that results in un-equal number of nodes in the sub cluster will result in the eviction of all the nodes in the smaller sub cluster. This behavior is same as previous releases and has not changed.
Figure 4 shows a split brain resulting in two sub clusters with different number of nodes. Sub cluster A with nodes {1,2,3} while sub cluster B contains node 4.
In this case, the node(s) of sub cluster B {4} will be evicted with the intent to let majority of the work survive in sub cluster A as it contains more number of nodes.
Manual Override
Database Administrators can optionally define a node, database or service which they consider critical from a business point of view to be evaluated by the CSSD daemon while choosing candidate node(s) to be evicted.
Define a node as critical
$crsctl set server css_critical yes
$crsctl get server css_critical
CRS-5092: Current value of the server attribute CSS_CRITICAL is yes.
Note: Setting a server as css_critical requires restarting the Grid Infrastructure stack on that node.
Define a database or service as critical
$srvctl modify database –d <dbname> css_critical yes
$srvctl modify service –db <dbname> -service <service_name> css_critical yes
Caveats
It is possible to define multiple nodes or databases or services as css_critical. This may not be a good idea as in certain conditions that can result in unfavorable behavior.
For example:
- Consider a 4 node cluster with node 3 and 4 defined as CSS_CRITICAL YES
- If there is a split brain that results in sub cluster A with nodes {1,2} and sub cluster B with nodes {3,4}, then obviously the nodes in sub cluster A {1,2} will be evicted as both nodes {3,4} is marked as CSS_CRITICAL
- However If there is a split brain that results in sub cluster A with nodes {1,3} and sub cluster B with nodes {2,4}, then CSSD process will fallback to the Oracle RAC pre-12c behavior and nodes in sub cluster B {2,4} will be evicted even though node 4 was defined as CSS_CRITICAL
Therefore setting multiple nodes with CSS_CRITICAL flag must be either be avoided or done after careful consideration
Cloud Advisory Innovation Principal Manager at Accenture Enkitec Group
4 å¹´2 node RAC is case of eviction is considered as 2 equi subclusters ?
Technology Lead
7 å¹´good explanation
Solution Architect at Oracle
7 å¹´Mahendra Saroj
Senior Database Administrator
7 å¹´Thank You .