How to Failover to a Standby Database Manually and Rebuild Data Guard - By Example

  • by
Since a fast-start failover (automatic failover) could become a false failover when the observer or the standby database cannot connect to the primary database within a specific time, which may cost the database to lose some transactions followed by reinstating or recreating the standby database (the former primary database).

Therefore, it could be a bad idea to enable fast-start failover under an unreliable network environment. In case we hit a false failover, we should disable the fast-start failover, instead, we can design a manual failover procedure to reduce database down time if MTTR is more tolerant than normal.

In this type of failover, we can analysis and distinguish all symptoms of database environment to avoid a false failover, moreover, we have more options to choose during a failover process.

Now, let's see a demonstration of a manual failover, and then rebuild Data Guard envrionment.
  1. Check current standby database status.
  2. [oracle@standby01 ~]$ sqlplus / as sysdba
    SQL> SELECT inst_id, open_mode, database_role, switchover_status FROM gv$database;

       INST_ID OPEN_MODE            DATABASE_ROLE    SWITCHOVER_STATUS
    ---------- -------------------- ---------------- --------------------
             1 MOUNTED              PHYSICAL STANDBY NOT ALLOWED
             2 MOUNTED              PHYSICAL STANDBY NOT ALLOWED

    It's a normal standby database.
  3. Shutdown abort the current primary database.
  4. [oracle@primary01 ~]$ srvctl stop database -d compdb -o abort
    This step is to simulate a blackout of the primary database.
  5. Check applying status of current standby database.
  6. SQL> SELECT inst_id, process, status, thread#, sequence#, block# FROM gv$managed_standby WHERE process IN ('RFS','LNS','MRP0');

       INST_ID PROCESS   STATUS          THREAD#  SEQUENCE#     BLOCK#
    ---------- --------- ------------ ---------- ---------- ----------
             1 MRP0      WAIT_FOR_LOG          1        364         00

    Only "WAIT_FOR_LOG" is left on MRP0.
  7. Check Data Guard Broker status before failover.
  8. [oracle@standby01 ~]$ dgmgrl
    ...
    DGMGRL> connect sys@standb1
    Password:
    Connected.
    DGMGRL> show configuration

    Configuration - COMPDR

      Protection Mode: MaxPerformance
      Databases:
        primdb - Primary database
        standb - Physical standby database

    Fast-Start Failover: DISABLED

    Configuration Status:
    ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
    ORA-16625: cannot reach database "primdb"
    DGM-17017: unable to determine configuration status

    DGMGRL> show database primdb

    Database - primdb

      Role:            PRIMARY
      Intended State:  TRANSPORT-ON
      Instance(s):
        primdb1
        primdb2

    Database Status:
    DGM-17016: failed to retrieve status for database "primdb"
    ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
    ORA-16625: cannot reach database "primdb"

    DGMGRL> show database standb

    Database - standb

      Role:            PHYSICAL STANDBY
      Intended State:  APPLY-ON
      Transport Lag:   0 seconds
      Apply Lag:       0 seconds
      Real Time Query: OFF
      Instance(s):
        standb1 (apply instance)
        standb2

    Database Status:
    SUCCESS

    You can see the database primdb cannot be connected.
  9. Do a complete failover.
  10. DGMGRL> failover to standb;
    Performing failover NOW, please wait...
    Failover succeeded, new primary is "standb"

    If the time is very demanding or a complete failover cannot be done, you need a immediate failover.
    DGMGRL> failover to standb immediate;
  11. Check current primary database (former standby database).
  12. [oracle@standby01 ~]$ sqlplus / as sysdba

    SQL> SELECT inst_id, open_mode, database_role, switchover_status FROM gv$database;


       INST_ID OPEN_MODE            DATABASE_ROLE    SWITCHOVER_STATUS
    ---------- -------------------- ---------------- --------------------
             1 READ WRITE           PRIMARY          NOT ALLOWED
             2 READ WRITE           PRIMARY          NOT ALLOWED

    Former standby database is now primary, it shows the failover is successful.
  13. Check Data Guard Broker status after failover.
  14. DGMGRL> show configuration

    Configuration - COMPDR

      Protection Mode: MaxPerformance
      Databases:
        standb - Primary database
        primdb - Physical standby database (disabled)
          ORA-16661: the standby database needs to be reinstated

    Fast-Start Failover: DISABLED

    Configuration Status:
    SUCCESS

    DGMGRL> show database primdb

    Database - primdb

      Role:            PHYSICAL STANDBY
      Intended State:  APPLY-ON
      Transport Lag:   (unknown)
      Apply Lag:       (unknown)
      Real Time Query: OFF
      Instance(s):
        primdb1
        primdb2

    Database Status:
    ORA-16661: the standby database needs to be reinstated

    DGMGRL> show database standb

    Database - standb

      Role:            PRIMARY
      Intended State:  TRANSPORT-ON
      Instance(s):
        standb1
        standb2

    Database Status:
    SUCCESS

    The current standby database (former primary database) is currently disabled, and the Data Guard environment must be back to an operable status, so we needs to reinstate or recreate the current standby database.
  15. Reinstate the current standby database.
  16. Reinstating process will flashback database to the latest synchronized status, thus, it will discard any data changes that have not been synchronized with current primary database before failover. After reinstating, the former primary database will play a standby role.
    [oracle@primary01 ~]$ srvctl start database -d compdb -o mount
    ...
    DGMGRL> reinstate database primdb;
    Reinstating database "primdb", please wait...
    Operation requires shutdown of instance "primdb2" on database "primdb"
    Shutting down instance "primdb2"...
    ORA-01109: database not open

    Database dismounted.
    ORACLE instance shut down.
    Continuing to reinstate database "primdb" ...
    Error: ORA-16653: failed to reinstate database

    Failed.
    Reinstatement of database "primdb" failed

    Since we had never enabled flashback database, the reinstating failed with ORA-16653. It's time to recreate a standby database.
  17. Recreate a standby database.
    1. Restart current standby database to nomount state.
    2. [oracle@primary01 ~]$ srvctl stop database -d compdb
      [oracle@primary01 ~]$ srvctl start database -d compdb -o nomount

    3. Duplicate a physical standby database.
    4. [oracle@primary01 ~]$ rman target sys@standb1 auxiliary sys@primdb1
      ...
      target database Password:
      connected to target database: COMPDB (DBID=841830157)
      auxiliary database Password:
      connected to auxiliary database: COMPDB (not mounted)

      RMAN> RUN
      {
      allocate channel c1 device type disk;
      allocate auxiliary channel a1 device type DISK;
      DUPLICATE TARGET DATABASE
      FOR STANDBY FROM ACTIVE DATABASE
      NOFILENAMECHECK;
      }

      2> 3> 4> 5> 6> 7> 8>
      using target database control file instead of recovery catalog
      allocated channel: c1
      channel c1: SID=118 instance=standb1 device type=DISK

      allocated channel: a1
      channel a1: SID=34 instance=primdb1 device type=DISK

      Starting Duplicate Db at 06-AUG-13

      contents of Memory Script:
      {
         backup as copy reuse
         targetfile  '/u01/app/oracle/product/11.2.0/db_1/dbs/orapwstandb1' auxiliary format
       '/u01/app/oracle/product/11.2.0/db_1/dbs/orapwprimdb1'   ;
      }
      executing Memory Script

      Starting backup at 06-AUG-13
      Finished backup at 06-AUG-13

      contents of Memory Script:
      {
         sql clone "alter system set  control_files =
        ''+DATA/compdb/controlfile/current.260.761506189'' comment=
       ''Set by RMAN'' scope=spfile";
         backup as copy current controlfile for standby auxiliary format  '+DATA/compdb/controlfile/current.260.761506189';
         sql clone "alter system set  control_files =
        ''+DATA/compdb/controlfile/current.260.761506189'' comment=
       ''Set by RMAN'' scope=spfile";
         shutdown clone immediate;
         startup clone nomount;
      }
      executing Memory Script

      sql statement: alter system set  control_files =   ''+DATA/compdb/controlfile/current.260.761506189'' comment= ''Set by RMAN'' scope=spfile

      Starting backup at 06-AUG-13
      channel c1: starting datafile copy
      copying standby control file
      output file name=/u01/app/oracle/product/11.2.0/db_1/dbs/snapcf_standb1.f tag=TAG20130806T193214 RECID=9 STAMP=797943439
      channel c1: datafile copy complete, elapsed time: 00:00:15
      Finished backup at 06-AUG-13

      sql statement: alter system set  control_files =   ''+DATA/compdb/controlfile/current.260.761506189'' comment= ''Set by RMAN'' scope=spfile

      Oracle instance shut down

      connected to auxiliary database (not started)
      Oracle instance started

      Total System Global Area     839282688 bytes

      Fixed Size                     2217992 bytes
      Variable Size                792725496 bytes
      Database Buffers              41943040 bytes
      Redo Buffers                   2396160 bytes
      allocated channel: a1
      channel a1: SID=33 instance=primdb1 device type=DISK

      contents of Memory Script:
      {
         sql clone 'alter database mount standby database';
      }
      executing Memory Script

      sql statement: alter database mount standby database
      RMAN-05529: WARNING: DB_FILE_NAME_CONVERT resulted in invalid ASM names; names changed to disk group only.

      contents of Memory Script:
      {
         set newname for tempfile  1 to
       "+data";
         switch clone tempfile all;
         set newname for datafile  1 to
       "+data";
         set newname for datafile  2 to
       "+data";
         set newname for datafile  3 to
       "+data";
         set newname for datafile  4 to
       "+data";
         set newname for datafile  5 to
       "+data";
         set newname for datafile  6 to
       "+data";
         backup as copy reuse
         datafile  1 auxiliary format
       "+data"   datafile
       2 auxiliary format
       "+data"   datafile
       3 auxiliary format
       "+data"   datafile
       4 auxiliary format
       "+data"   datafile
       5 auxiliary format
       "+data"   datafile
       6 auxiliary format
       "+data"   ;
         sql 'alter system archive log current';
      }
      executing Memory Script

      executing command: SET NEWNAME

      renamed tempfile 1 to +data in control file

      executing command: SET NEWNAME

      executing command: SET NEWNAME

      executing command: SET NEWNAME

      executing command: SET NEWNAME

      executing command: SET NEWNAME

      executing command: SET NEWNAME

      Starting backup at 06-AUG-13
      channel c1: starting datafile copy
      input datafile file number=00001 name=+DATA/compdb/datafile/system.268.795098881
      output file name=+DATA/primdb/datafile/system.897.797943475 tag=TAG20130806T193254
      channel c1: datafile copy complete, elapsed time: 00:01:35
      channel c1: starting datafile copy
      input datafile file number=00002 name=+DATA/compdb/datafile/sysaux.269.795098973
      output file name=+DATA/primdb/datafile/sysaux.898.797943569 tag=TAG20130806T193254
      channel c1: datafile copy complete, elapsed time: 00:01:55
      channel c1: starting datafile copy
      input datafile file number=00003 name=+DATA/compdb/datafile/undotbs1.270.795099057
      output file name=+DATA/primdb/datafile/undotbs1.899.797943687 tag=TAG20130806T193254
      channel c1: datafile copy complete, elapsed time: 00:00:25
      channel c1: starting datafile copy
      input datafile file number=00005 name=+DATA/compdb/datafile/example.271.795099083
      output file name=+DATA/primdb/datafile/example.900.797943711 tag=TAG20130806T193254
      channel c1: datafile copy complete, elapsed time: 00:00:26
      channel c1: starting datafile copy
      input datafile file number=00006 name=+DATA/compdb/datafile/undotbs2.272.795099097
      output file name=+DATA/primdb/datafile/undotbs2.901.797943737 tag=TAG20130806T193254
      channel c1: datafile copy complete, elapsed time: 00:00:15
      channel c1: starting datafile copy
      input datafile file number=00004 name=+DATA/compdb/datafile/users.273.795099105
      output file name=+DATA/primdb/datafile/users.902.797943753 tag=TAG20130806T193254
      channel c1: datafile copy complete, elapsed time: 00:00:07
      Finished backup at 06-AUG-13

      sql statement: alter system archive log current

      contents of Memory Script:
      {
         switch clone datafile all;
      }
      executing Memory Script

      datafile 1 switched to datafile copy
      input datafile copy RECID=9 STAMP=797943765 file name=+DATA/primdb/datafile/system.897.797943475
      datafile 2 switched to datafile copy
      input datafile copy RECID=10 STAMP=797943766 file name=+DATA/primdb/datafile/sysaux.898.797943569
      datafile 3 switched to datafile copy
      input datafile copy RECID=11 STAMP=797943766 file name=+DATA/primdb/datafile/undotbs1.899.797943687
      datafile 4 switched to datafile copy
      input datafile copy RECID=12 STAMP=797943766 file name=+DATA/primdb/datafile/users.902.797943753
      datafile 5 switched to datafile copy
      input datafile copy RECID=13 STAMP=797943766 file name=+DATA/primdb/datafile/example.900.797943711
      datafile 6 switched to datafile copy
      input datafile copy RECID=14 STAMP=797943766 file name=+DATA/primdb/datafile/undotbs2.901.797943737
      Finished Duplicate Db at 06-AUG-13
      released channel: c1
      released channel: a1

      RMAN> exit;

    5. Restart current standby database to mount state
    6. [oracle@primary01 ~]$ srvctl stop database -d compdb
      [oracle@primary01 ~]$ srvctl start database -d compdb -o mount

    7. Check current standby database status
    8. [oracle@primary01 ~]$ sqlplus / as sysdba
      ...
      SQL> select inst_id, open_mode, database_role, switchover_status from gv$database;


         INST_ID OPEN_MODE            DATABASE_ROLE    SWITCHOVER_STATUS
      ---------- -------------------- ---------------- --------------------
               1 MOUNTED              PHYSICAL STANDBY SESSIONS ACTIVE
               2 MOUNTED              PHYSICAL STANDBY TO PRIMARY

      The current standby database is rebuilt and startup to mount state successfully, which also means the controlfile is built for standby currently, no bother to get and apply another copy of standby controlfile.
  18. Enable current standby database by Data Guard Broker.
  19. DGMGRL> show database primdb

    Database - primdb

      Role:            PHYSICAL STANDBY
      Intended State:  APPLY-ON
      Transport Lag:   (unknown)
      Apply Lag:       (unknown)
      Real Time Query: OFF
      Instance(s):
        primdb1
        primdb2

    Database Status:
    ORA-16795: the standby database needs to be re-created

    DGMGRL> enable database primdb
    Enabled.
    DGMGRL> show database primdb

    Database - primdb

      Role:            PHYSICAL STANDBY
      Intended State:  APPLY-ON
      Transport Lag:   12 minutes 14 seconds
      Apply Lag:       12 minutes 14 seconds
      Real Time Query: OFF
      Instance(s):
        primdb1
        primdb2 (apply instance)

    Database Status:
    SUCCESS

    The result shows Data Guard Broker enables database primdb successfully.
  20. Check current standby database status again.
  21. [oracle@primary01 ~]$ sqlplus / as sysdba
    ...
    SQL> select inst_id, open_mode, database_role, switchover_status from gv$database;

       INST_ID OPEN_MODE            DATABASE_ROLE    SWITCHOVER_STATUS
    ---------- -------------------- ---------------- --------------------
             1 MOUNTED              PHYSICAL STANDBY NOT ALLOWED
             2 MOUNTED              PHYSICAL STANDBY NOT ALLOWED

So far, the Data Guard environment has been rebuilt, you can switchback to production environment again.
DGMGRL> connect sys@primdb1
Password:
Connected.
DGMGRL> switchover to primdb
Performing switchover NOW, please wait...
Operation requires a connection to instance "primdb2" on database "primdb"
Connecting to instance "primdb2"...
Connected.
Operation requires shutdown of instance "standb2" on database "standb"
Shutting down instance "standb2"...
Database closed.
Database dismounted.
ORACLE instance shut down.
New primary database "primdb" is opening...
Operation requires shutdown of instance "standb1" on database "standb"
Shutting down instance "standb1"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "standb1" on database "standb"
Starting instance "standb1"...
ORACLE instance started.
Database mounted.
Database opened.
Switchover succeeded, new primary is "primdb"
DGMGRL>


Now, we are back to the original environment. Everyone is pleased and can sleep well at night.

Since the primary and the standby sites could be in different buildings with a considerable distance, using RMAN to duplicate a standby database online could be unrealistic when the network resource is limited. DBA should design and compare various recreation plans in advance to reduce down time as much as possible.

Leave a Reply

Your email address will not be published. Required fields are marked *