Oracle RAC 12c Release 2 Node Weighting

This blog post will cover the changes to the Node eviction algorithm in Oracle RAC 12c Release 2 based on the new Node Weighting feature. Node weighting works alongside Oracle RAC buddy instance feature (https://www.dhirubhai.net/pulse/oracle-rac-introducing-recovery-buddy-anil-nair) to further improve database availability.

Introduction

Oracle Grid Infrastructure component (CSSD) provides “node membership” services functionality. At a very high level, this translates to “CSSD being the final authority on how many nodes are in the cluster at that time”.


The node membership for “mycluster” consists of nodes {1,2,3,4}



During normal node boot, CSSD process starts up as part of the Grid Infrastructure stack startup. It communicates with the CSSD process running on other nodes and generates a “node membership” mapping. It further continues to maintain the node membership map by continuously monitoring the health of the nodes of the cluster.

If CSSD process determines that some node(s) are not heart beating, it will evict the un-healthy node(s) from the cluster and update its node membership mappings. Oracle RAC 12c Release 2 introduces a change in the algorithm by which the candidate node(s) to be evicted is chosen for the specific case where the split brain results in the creation of equal number of nodes in the sub cluster.


Split Brain



Figure 2 above depicts Cluster mycluster experiencing a split brain such that

  • nodes 1,2 can communicate with each other
  • nodes 3,4 can communicate with each other
  • nodes 1,3 cannot communicate with each other
  • nodes 2,4 cannot communicate with each other

resulting in two sub clusters with 2 nodes each as shown below






Oracle Pre-12c Release 2 behavior

In releases prior to Oracle RAC 12c Release 2, a split brain resulting in two sub clusters which consist of equal number of nodes will result in the sub cluster with the lowest node number to survive while the sub cluster with the higher node number will be evicted.

Note: This behavior may change in future release/patchsets so customers are advised against basing their application logic on this behavior.

Oracle 12c Release 2 behavior

Oracle 12c Release 2 introduces a new algorithm to choose the candidate node(s) to be evicted. Instead of ignorantly relying on the node number, the new algorithm attempts to smartly choose candidate node(s) to be evicted while ensuring critical database workload can continue without hiccups. Some of the aspects considered are

  • Number of services
  • Singleton services
  • Flex ASM instance
  • Public network failure
  • Type of Node (Hub or Leaf)
  • Etc.
Additional aspects may be enhanced/improved/added/deleted in future release/patchsets.

What has not changed?

Split brain that results in un-equal number of nodes in the sub cluster will result in the eviction of all the nodes in the smaller sub cluster. This behavior is same as previous releases and has not changed.

Figure 4 shows a split brain resulting in two sub clusters with different number of nodes. Sub cluster A with nodes {1,2,3} while sub cluster B contains node 4.


In this case, the node(s) of sub cluster B {4} will be evicted with the intent to let majority of the work survive in sub cluster A as it contains more number of nodes.

Manual Override

Database Administrators can optionally define a node, database or service which they consider critical from a business point of view to be evaluated by the CSSD daemon while choosing candidate node(s) to be evicted.

Define a node as critical

$crsctl set server css_critical yes

$crsctl get server css_critical

CRS-5092: Current value of the server attribute CSS_CRITICAL is yes.
Note: Setting a server as css_critical requires restarting the Grid Infrastructure stack on that node.

Define a database or service as critical

$srvctl modify database –d <dbname> css_critical yes

$srvctl modify service  â€“db  <dbname> -service <service_name> css_critical yes

Caveats

It is possible to define multiple nodes or databases or services as css_critical. This may not be a good idea as in certain conditions that can result in unfavorable behavior.

For example:

  • Consider a 4 node cluster with node 3 and 4 defined as CSS_CRITICAL YES
  • If there is a split brain that results in sub cluster A with nodes {1,2} and sub cluster B with nodes {3,4}, then obviously the nodes in sub cluster A {1,2} will be evicted as both nodes {3,4} is marked as CSS_CRITICAL
  • However If there is a split brain that results in sub cluster A with nodes {1,3} and sub cluster B with nodes {2,4}, then CSSD process will fallback to the Oracle RAC pre-12c behavior and nodes in sub cluster B {2,4} will be evicted even though node 4 was defined as CSS_CRITICAL

Therefore setting multiple nodes with CSS_CRITICAL flag must be either be avoided or done after careful consideration

Tomasz Ziss

Cloud Advisory Innovation Principal Manager at Accenture Enkitec Group

4 å¹´

2 node RAC is case of eviction is considered as 2 equi subclusters ?

赞
回复
Mahendra Saroj

Solution Architect at Oracle

7 å¹´

Mahendra Saroj

要查看或添加评论,请登录

Anil Nair的更多文章

  • Staying upto date with Oracle Real Application Clusters (RAC)

    Staying upto date with Oracle Real Application Clusters (RAC)

    A new year refocuses the need to upgrade and patch the Oracle Database, as most companies do not allow changes during…

    1 条评论
  • Patching, Upgrading Operating Systems on Servers running Oracle Database

    Patching, Upgrading Operating Systems on Servers running Oracle Database

    Introduction Today, enterprises are mandated to adhere to strict security practices, which means applying patches and…

    7 条评论
  • Oracle RAC features on Exadata

    Oracle RAC features on Exadata

    This is a short summary of the features documented in the white paper (Oracle RAC cache fusion performance…

    3 条评论
  • ORA-29702 during Flashback or reverting database upgrade

    ORA-29702 during Flashback or reverting database upgrade

    Mike Dietrich blogged about this issue but I received some feedback/questions that I thought it is best to provide…

    4 条评论
  • Oracle Automatic Storage Management (ASM)

    Oracle Automatic Storage Management (ASM)

    Oracle Automatic Storage Management (ASM) simplifies the otherwise complex and multiple storage management options for…

    8 条评论
  • Upgrade and apply latest RU/RURs at the same time

    Upgrade and apply latest RU/RURs at the same time

    Upgrade and Patching are some of the most strenuous activities for Database Administrators (DBA) as they have to…

    7 条评论
  • Upgrade to Oracle RAC 19c with Zero downtime

    Upgrade to Oracle RAC 19c with Zero downtime

    Customers planning on upgrading to Oracle Database 19c will soon realize that Oracle Database 19c requires Linux…

    7 条评论
  • Under the Covers of Scalability and Availability

    Under the Covers of Scalability and Availability

    Oracle Real Application Clusters (RAC) provides both scalability and availability without any changes to the…

    2 条评论
  • Noteworthy changes in Oracle RAC 18c and 19c

    Noteworthy changes in Oracle RAC 18c and 19c

    Customers monitoring Oracle RAC Databases running on Oracle 18c, Oracle19c with the latest RUs will notice the…

    4 条评论
  • OOW2019- Oracle Real Application Clusters 19c: Best Practices and Secret Internals

    OOW2019- Oracle Real Application Clusters 19c: Best Practices and Secret Internals

    Oracle OpenWorld 2019 (OOW2019) starts in less than a week and I encourage everyone involved with highly available and…

    1 条评论

社区洞察

其他会员也浏览了