How to Speed Up Data Pump

Data Pump is a more flexible and powerful utility than legacy exp or imp, which has no NLS_LANG conversion issue bothering us, so we like the utility in our country which is a non-English nation.

But the performance is an issue. I had ever compared the speed of those two types of utility to export a Tetra-Bytes grade database under Oracle database 10.2, the legacy exp won the race with direct path and parallel execution. But after a fine tune, the speed of expdp can almost catch up with exp. So our topic is about how to make expdp or impdp faster in oracle?

For 10.2 Data Pump, there is no DIRECT=Y like legacy exp, Data Pump will choose a right path to do the job, which cannot be controlled by DBA.

Here are tips to speed up export job when table mode is applied:

Parallelize executions.

With human intervention, split thousands of tables manually into several par files with same or even sized, e.g. 8 par files for parallel executions on OS-level. Don't mix up with the parameter PARALLEL in Data Pump, we trust our human intervention more.

More parameters comparison of export and import between releases can be found here:

Decompose a partitioned table into partitions in par files.

If you have a very large table with partitioned, don't export it in one line like this:

TABLES=(

SCHEMA.VERY_BIG_TABLE,

...

Data Pump will do it as your wish with the above par file, but there's a better way instead to do it:

TABLES=(

SCHEMA.VERY_BIG_TABLE:PARTITION01,

SCHEMA.VERY_BIG_TABLE:PARTITION02,

SCHEMA.VERY_BIG_TABLE:PARTITION03,

...

With setting some degrees of parallelism via PARALLEL, we can benefit from this.

Set the parameter ESTIMATE to STATISTICS

The default value of ESTIMATE is BLOCKS, it will take a very long time to estimate when Data Pump faces a very big table in the par file if the NETWORK_LINK parameter is also specified. Sometimes, it seems stuck in the process.

Turn on Asynchronous Disk IO

With full capacity during Data Pump, the bottleneck of resource is always IO, you should apply your IO to asynchronous mode or other buffer technology on OS-level to facilitate Data Pump to access disks efficiently.

For 11.2 and later, set ACCESS_METHOD to DIRECT_PATH

Since 11g release 2, Oracle introduces a new parameter ACCESS_METHOD of Data Pump, which can be set to DIRECT_PATH, DBA will get more control and choices while data manipulation and migration.

For really big tables

Big tables are really time-consuming at data loading (impdp), this is because they scan the whole table to align with constraints before inserting every new row. In other words, more data has been loaded, more data should be scanned. The last new row to be inserted would be the most time-consuming one.

How about take constraints off, then add them back after data loading? For a 5TB grade of database in my experience, it reduces time from several days to several hours overall.

Here's the steps to remove constraints of big tables temporarily.

Drop their indexes.

You should keep DDL of those indexes for adding them back later.

Disable their constraints

You should keep DDL of those constraints for enabling them back later.

Data loading.

Now you can import data now.

Enable constraints with NOVALIDATE.

You may choose VALIDATE or NOVALIDATE, VALIDATE takes longer and NOVALIDATE returns immediately.

Create indexes back.

It takes time to build indexes, but it's worth it.

The above tips can accelerate Data Pump, except you are allowed to use transportable tablespaces mode, it'll be faster, but the drawback is that transportable tablespace mode is more inflexible when migrating data between different databases.

For more information about data migration, you may refer to the following post: A Systematic Approach to Migrate Databases

How to Speed Up Data Pump

Leave a Reply Cancel reply