How to Set Server-Side Transparent Application Failover (TAF)

  • by

I have talked about how to set client-side Transparent Application Failover (TAF) pretty much. This time, we turn to server-side TAF.

There're 2 ways to implement server-side TAF in this post.

  1. Make a TAF Service by DBMS_SERVICE
  2. Make a TAF Service by SRVCTL

A. Using DBMS_SERVICE

First of all, we get into the right PDB. If your database is a dedicated one, you can skip this step.

SQL> conn / as sysdba
Connected.
SQL> alter session set container=ERPAPP;

Session altered.

SQL> show con_name

CON_NAME
------------------------------
ERPAPP

1. Create a Service

Then we create a failover service for this database.

SQL> begin
  2     dbms_service.create_service(
  3             service_name => 'ERPAPP2',
  4             network_name => 'ERPAPP2',
  5             failover_method => DBMS_SERVICE.FAILOVER_METHOD_BASIC,
  6             failover_type => DBMS_SERVICE.FAILOVER_TYPE_SELECT,
  7             failover_retries => 180,
  8             failover_delay => 5
  9     );
 10  end;
 11  /

PL/SQL procedure successfully completed.

2. Start the Service

We start the newly created service on all nodes.

SQL> exec dbms_service.start_service('ERPAPP2', DBMS_SERVICE.ALL_INSTANCES);

PL/SQL procedure successfully completed.

3. Check Service Configuration

We can query ALL_SERVICES or DBA_SERVICES to get the service configuration.

SQL> column network_name format a10;
SQL> column failover_method format a10;
SQL> column failover_type format a10;
SQL> column failover_retries format 9999;
SQL> column failover_delay format 9999;
SQL> select network_name, failover_method, failover_type, failover_retries, failover_delay from all_services where name = 'ERPAPP2';

NETWORK_NA FAILOVER_M FAILOVER_T FAILOVER_RETRIES FAILOVER_DELAY
---------- ---------- ---------- ---------------- --------------
ERPAPP2    BASIC      SELECT                  180              5

The result shows what we just set.

A more complete test will be described in the next section.

B. Using SRVCTL

1. Create a Service

Since server-side TAF is for failing over user connections to any other available instance of a RAC database, we should use Server Control (SRVCTL) to add a TAF service for it. In our case, we'd especially like to add a TAF service on a Pluggable Database (PDB), which is ORCLPDB.

[oracle@primary02 ~]$ srvctl add service -d orclcdb -s CRM -pdb ORCLPDB -preferred ORCLCDB1,ORCLCDB2 -failovertype select -failovermethod basic -failoverretry 180 -failoverdelay 5

In fact, there're 2 ways to add services to a PDB.

As you can see, the new service name is CRM and preferred instances are all included. Additionally, we choose the type of failover to SELECT level.

Please note that, we use uppercase service name and instance names in the above command to prevent any letter case problem.

2. Start the Service

We need to start the service before using it.

[oracle@primary02 ~]$ srvctl start service -d orclcdb -s crm
[oracle@primary02 ~]$ srvctl status service -d orclcdb -s crm
Service CRM is running on instance(s) ORCLCDB1,ORCLCDB2

The service is running.

3. Check Service Configuration

We may check the configuration of the service.

[oracle@primary01 ~]$ srvctl config service -d orclcdb -s crm
Service name: CRM
Server pool:
Cardinality: 2
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type: SELECT
Failover method: BASIC
Failover retries: 180
Failover delay: 5
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Pluggable database name: ORCLPDB
Hub service:
Maximum lag time: ANY
SQL Translation Profile:
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Drain timeout:
Stop option:
Session State Consistency: DYNAMIC
GSM Flags: 0
Service is enabled
Preferred instances: ORCLCDB1,ORCLCDB2
Available instances:
CSS critical: no
Service uses Java: false

4. Add a TNS Name

We add an entry CRM in tnsnames.ora before we connect to the database.

CRM =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = PRIMARY-CLUSTER-SCAN)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = CRM)
      (SERVER = DEDICATED)
    )
  )

5. Check Current Instance

We connect the database from a remote client.

C:\Users\edchen>sqlplus sh/sh@crm
...

Since we use Single Client Access Name (SCAN) to guide the connection to one of the available instance, we don't know where the connection go. So we need to check what instance we are in currently.

SQL> select instance_name from v$instance;

INSTANCE_NAME
----------------
ORCLCDB2

The current instance is ORCLCDB2.

6. Normal Query

We need to know how much time a normal would be in order to compare a failover query. Here we query 100,000 rows from a table.

SQL> set timing on;
SQL> select * from (select * from sales) where rownum <= 100000;
...
100000 rows selected.

Elapsed: 00:00:34.98

As we can see, the query took about 35 seconds to output all rows.

7. Failover Query

Now we trigger the query first, then shutdown node 2 to make the query fail over to the other instance and watch the behavior of the query.

SQL> select * from (select * from sales) where rownum <= 100000;
...

While the output is displaying, we power off node 2 completely to simulate a blackout.

After a while, the query seems stopped and hang about 15 seconds or so, then continue to output the result.
...

100000 rows selected.

Elapsed: 00:00:50.27

Eventually, the query consumed 50 seconds, a little longer than a normal run. The best thing is that the query is not interrupted.

8. Check Current Instance Again

Let's check what instance we are currently in.

SQL> select instance_name from v$instance;

INSTANCE_NAME
----------------
ORCLCDB1

We are in node 1. The TAF service is actually workable.

Leave a Reply

Your email address will not be published. Required fields are marked *