Unable to switchover with farsync involved (12.2)

I think i point something with my DG configuration.

My primary is a RAC One Node, my AWS standbys are single instances.

Here is my config :

DGMGRL> show configuration;

Configuration - cellar_dg_t

  Protection Mode: MaxAvailability
  Members:
  cellarmxt    - Primary database
    cellar1xt    - Physical standby database 
    cellar2xt    - Physical standby database 
    cellarmxtfs1 - Far sync instance 
      cellarmwt    - Physical standby database 
      cellar1wt    - Physical standby database 
      cellar2wt    - Physical standby database 

  Members Not Receiving Redo:
  cellarmxtfs2 - Far sync instance (alternate of cellarmxtfs1)
  cellarmwtfs2 - Far sync instance 
  cellarmwtfs1 - Far sync instance 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 49 seconds ago)

I have far sync configured in HA (meaning that if mxtfs1 is down for some reasons, then mxtfs2 is taking the hand).
Ok this is working as expected and i have the same thing on AWS side (mwtfs1/mwtfs2).

And changes are sent to all standbys on AWS side.

Now i want to switchover to AWS side on cellarmwt.

I try the validate database 'cellarmwt' :

DGMGRL> validate database 'cellarmwt';

  Database Role:     Physical standby database
  Primary Database:  cellarmxt

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Managed by Clusterware:
    cellarmxt:  YES            
    cellarmwt:  YES

Ok it sounds good.

I checked the v$archived_log and v$log_history views too on primary and standby to confirm that it was in sync.

Then i tried the SQL command to be sure :

alter database switchover to 'cellarmwt' verify;
*
ERROR at line 1:
ORA-16467: switchover target is not synchronized

What ? not in sync ?
Hmmmm nothing in alert.log of primary or standby and even in far sync.

I decided to set LOG_ARCHIVE_TRACE to 10240 on far sync as it goes through far sync for the switchover.

edit far_sync 'cellarmxtfs1' set property 'LogArchiveTrace'=10240;

Why 10240 ? 
Because in v$archived_dest_status it complained on far sync for a non resolvable gap.

So gap resolution is involved = 2048 and MRP=8192

I retried the command :

alter database switchover to 'cellarmwt' verify;
*
ERROR at line 1:
ORA-16467: switchover target is not synchronized

and on far sync i have a nice rmi trace file now.

And surprise :

RMI startup request received from PID 19348 

kcv_recv_so_to_directconn: Received SWITCHOVER request(PING) to switchover target 1606120691 
kcv_recv_so_to_directconn: found direct connection, get gap status 
*** 2019-10-16 12:10:15.537972 1067 krsg.c 

krsg_ping: Performing PING on thread 1 
krsg_gap_ping: Setting LE entry as invalid 
krsg_check_curseq_state: Checking AL for sequence 19349 
krsg_check_connection: Establishing link for LOG_ARCHIVE_DEST_3 to standby cellarmwt 
SUCCESS: retrieved DB password file location:+DATA3/orapwCELLARMXT 
*** 2019-10-16 12:10:16.033641 3207 krsg.c 
krsg_gap_ping: Pinging LOG_ARCHIVE_DEST_3 at cellarmwt (ping iteration 1) 
stop log B-985533585.T-1.S-19349 
krsg_ping_by_dest: Target recovery incomplete 
*** 2019-10-16 12:10:16.079231 1067 krsg.c 

krsg_ping: Performing PING on thread 2 <--------------------- where did it find thread#2 as cellarmwt,cellar1wt,cellar2wt are single instance. 

krsg_gap_ping: Setting LE entry as invalid 
krsg_check_curseq_state: Checking AL for sequence 31 
krsg_check_connection: Establishing link for LOG_ARCHIVE_DEST_3 to standby cellarmwt 
*** 2019-10-16 12:10:16.100581 3207 krsg.c 
krsg_gap_ping: Pinging LOG_ARCHIVE_DEST_3 at cellarmwt (ping iteration 1) 
stop log B-985533585.T-2.S-31 
*** 2019-10-16 12:10:16.184806 3540 krsg.c 
krsg_gap_ping: Gap ping detects gap for cellarmwt 
GAP - SCN range: 0x0000001626f9fac8 - 0x0000001626f9fac8 
DBID 1018469905 branch 985533585 
...Attempt to queue request 
krsg_ping_by_dest: Discovered a gap <------------------------------ This is the problem 

krsg_ping_by_dest: Target recovery incomplete 

I have the same traces with cellarmwt,cellar1wt or cellar2wt. 
It is looking for Thread 2 for AL 31 and cellarmwt,cellar1wt or cellar2wt are single instance.

My primary is Rac One Node :

SQL> select THREAD#,STATUS,ENABLED,INSTANCE,SEQUENCE#,LAST_REDO_SEQUENCE#,LAST_REDO_TIME
  2  from gv$thread
  3  ;

   THREAD# STATUS ENABLED  INSTANCE									     SEQUENCE# LAST_REDO_SEQUENCE# LAST_REDO_TIME
---------- ------ -------- -------------------------------------------------------------------------------- ---------- ------------------- -------------------
	 1 OPEN   PUBLIC   CELLARMXT_1										 19356		     19356 16/10/2019 14:52:21
	 2 CLOSED PUBLIC   CELLARMXT_2										    31			30 22/11/2018 11:00:54


On standbys :
	 
SQL> select THREAD#,STATUS,ENABLED,INSTANCE,SEQUENCE#,LAST_REDO_SEQUENCE#,LAST_REDO_TIME
  2  from gv$thread
  3  ;

   THREAD# STATUS ENABLED  INSTANCE									     SEQUENCE# LAST_REDO_SEQUENCE# LAST_REDO_TIME
---------- ------ -------- -------------------------------------------------------------------------------- ---------- ------------------- -------------------
	 1 OPEN   PUBLIC   CELLARMWT										 19356		     19356 14/08/2019 10:43:08
	 2 CLOSED PUBLIC   CELLARMXT_2										    31			31 22/11/2018 11:00:54

The thread 2 is closed.

But far sync is checking for each thread even if it is closed.

That's surely the problem.

Without the far sync involved in the loop (removed or disabled), i'm able to switchover.

I have a SR opened for this thing.


要查看或添加评论,请登录

社区洞察

其他会员也浏览了