Unable to switchover with farsync involved (12.2)
Stéphane MAURIZIO
Stéphane MAURIZIO
Chef de service adjoint/Database Administrator (Oracle/Postgres) - Centre des technologies de l'information de l'état (CTIE)
I think i point something with my DG configuration. My primary is a RAC One Node, my AWS standbys are single instances. Here is my config : DGMGRL> show configuration; Configuration - cellar_dg_t Protection Mode: MaxAvailability Members: cellarmxt - Primary database cellar1xt - Physical standby database cellar2xt - Physical standby database cellarmxtfs1 - Far sync instance cellarmwt - Physical standby database cellar1wt - Physical standby database cellar2wt - Physical standby database Members Not Receiving Redo: cellarmxtfs2 - Far sync instance (alternate of cellarmxtfs1) cellarmwtfs2 - Far sync instance cellarmwtfs1 - Far sync instance Fast-Start Failover: DISABLED Configuration Status: SUCCESS (status updated 49 seconds ago) I have far sync configured in HA (meaning that if mxtfs1 is down for some reasons, then mxtfs2 is taking the hand). Ok this is working as expected and i have the same thing on AWS side (mwtfs1/mwtfs2). And changes are sent to all standbys on AWS side. Now i want to switchover to AWS side on cellarmwt. I try the validate database 'cellarmwt' : DGMGRL> validate database 'cellarmwt'; Database Role: Physical standby database Primary Database: cellarmxt Ready for Switchover: Yes Ready for Failover: Yes (Primary Running) Managed by Clusterware: cellarmxt: YES cellarmwt: YES Ok it sounds good. I checked the v$archived_log and v$log_history views too on primary and standby to confirm that it was in sync. Then i tried the SQL command to be sure : alter database switchover to 'cellarmwt' verify; * ERROR at line 1: ORA-16467: switchover target is not synchronized What ? not in sync ? Hmmmm nothing in alert.log of primary or standby and even in far sync. I decided to set LOG_ARCHIVE_TRACE to 10240 on far sync as it goes through far sync for the switchover. edit far_sync 'cellarmxtfs1' set property 'LogArchiveTrace'=10240; Why 10240 ? Because in v$archived_dest_status it complained on far sync for a non resolvable gap. So gap resolution is involved = 2048 and MRP=8192 I retried the command : alter database switchover to 'cellarmwt' verify; * ERROR at line 1: ORA-16467: switchover target is not synchronized and on far sync i have a nice rmi trace file now. And surprise : RMI startup request received from PID 19348 kcv_recv_so_to_directconn: Received SWITCHOVER request(PING) to switchover target 1606120691 kcv_recv_so_to_directconn: found direct connection, get gap status *** 2019-10-16 12:10:15.537972 1067 krsg.c krsg_ping: Performing PING on thread 1 krsg_gap_ping: Setting LE entry as invalid krsg_check_curseq_state: Checking AL for sequence 19349 krsg_check_connection: Establishing link for LOG_ARCHIVE_DEST_3 to standby cellarmwt SUCCESS: retrieved DB password file location:+DATA3/orapwCELLARMXT *** 2019-10-16 12:10:16.033641 3207 krsg.c krsg_gap_ping: Pinging LOG_ARCHIVE_DEST_3 at cellarmwt (ping iteration 1) stop log B-985533585.T-1.S-19349 krsg_ping_by_dest: Target recovery incomplete *** 2019-10-16 12:10:16.079231 1067 krsg.c krsg_ping: Performing PING on thread 2 <--------------------- where did it find thread#2 as cellarmwt,cellar1wt,cellar2wt are single instance. krsg_gap_ping: Setting LE entry as invalid krsg_check_curseq_state: Checking AL for sequence 31 krsg_check_connection: Establishing link for LOG_ARCHIVE_DEST_3 to standby cellarmwt *** 2019-10-16 12:10:16.100581 3207 krsg.c krsg_gap_ping: Pinging LOG_ARCHIVE_DEST_3 at cellarmwt (ping iteration 1) stop log B-985533585.T-2.S-31 *** 2019-10-16 12:10:16.184806 3540 krsg.c krsg_gap_ping: Gap ping detects gap for cellarmwt GAP - SCN range: 0x0000001626f9fac8 - 0x0000001626f9fac8 DBID 1018469905 branch 985533585 ...Attempt to queue request krsg_ping_by_dest: Discovered a gap <------------------------------ This is the problem krsg_ping_by_dest: Target recovery incomplete I have the same traces with cellarmwt,cellar1wt or cellar2wt. It is looking for Thread 2 for AL 31 and cellarmwt,cellar1wt or cellar2wt are single instance. My primary is Rac One Node : SQL> select THREAD#,STATUS,ENABLED,INSTANCE,SEQUENCE#,LAST_REDO_SEQUENCE#,LAST_REDO_TIME 2 from gv$thread 3 ; THREAD# STATUS ENABLED INSTANCE SEQUENCE# LAST_REDO_SEQUENCE# LAST_REDO_TIME ---------- ------ -------- -------------------------------------------------------------------------------- ---------- ------------------- ------------------- 1 OPEN PUBLIC CELLARMXT_1 19356 19356 16/10/2019 14:52:21 2 CLOSED PUBLIC CELLARMXT_2 31 30 22/11/2018 11:00:54 On standbys : SQL> select THREAD#,STATUS,ENABLED,INSTANCE,SEQUENCE#,LAST_REDO_SEQUENCE#,LAST_REDO_TIME 2 from gv$thread 3 ; THREAD# STATUS ENABLED INSTANCE SEQUENCE# LAST_REDO_SEQUENCE# LAST_REDO_TIME ---------- ------ -------- -------------------------------------------------------------------------------- ---------- ------------------- ------------------- 1 OPEN PUBLIC CELLARMWT 19356 19356 14/08/2019 10:43:08 2 CLOSED PUBLIC CELLARMXT_2 31 31 22/11/2018 11:00:54 The thread 2 is closed. But far sync is checking for each thread even if it is closed. That's surely the problem. Without the far sync involved in the loop (removed or disabled), i'm able to switchover. I have a SR opened for this thing.