Liu Maclean(刘相兵 发表于 2013-12-26 15:16:42

IMPDP 相关问题

IMPDP 相关问题


客户生产系统投产前导入数据,发现相关导入速度并不理想。
客户在导入10张大数据表时发现速度并不理想,工程师到厂后发现客户开启10个会话操作,每个会话针对一张表操作,并且每个会话impdp parfile 中指定了 parallel=4 。
经过当场查询 v$px_session 视图没有发现结果说明当时并行并没有启用,查询相关系统参数如
parallel_degree_policy
parallel_max_servers
parallel_min_percent
等并无异常,参数设置良好。
当场查询相关alert 日志并没有发现ORA 错误产生,相关等待事件也并无特别enqueue 类等待或者日志写入,后台I/O 类等待。
随后开启SR SR 3-6302755511 : poor performance for impdp 跟踪并且收集了相关信息。

至10月11日下午,客户重新开启impdp 测试并且尝试导入3张表,通过查询文档
How To Relate Datapump PARALLEL Parameter to Parallel Query Slaves
使用其中 impdp 建议参数status=10 不停顿输出状态结果
IMPORT STATEMENT

impdp job_name=I1 scott/<pwd> full=y file=exptest log=implog.log parallel=4 status=10


The impdp statement is requesting the following:
1.        A full import of whatever was in the export dump file (full=y).
2.        A requested degree of parallelism of 4 (parallel=4).
3.        A request to show the status of the job every 10 seconds (status=10)
4.        A job name of I1 (job_name=I1).

While the import is running, it will write the status to the screen every 10 seconds (status=10).
We will show two of status snapshots here, the first of the table import phase, and the second of the index import phase.
These status reports are for a healthy Datapump import.
后发现 此次测试中 IMPDP 会起用3个 worker 进程同时操作,但是每个worker 进程只针对一张表做导入无论此表是否为分区表,随后此次测试过程中其中一分区大表耗时较长,说明 设置了parallel 参数只对表级进行了并行操作(同时对多张表导入),但是在导入某张表并没有启用并行。
文档中的一段描述也验证了此结果,并且同时也指出了当使用External Table mode 作为 access method 时可以启用并行,但是一般导入导出使用的access method  是 Direct path read 方式
SNAPSHOT STATUS DURING TABLE IMPORT PHASE
Job: I1
Operation: IMPORT
Mode: FULL
State: EXECUTING
Bytes Processed: 0
Current Parallelism: 4
Job Error Count: 1
Dump File: <dumpfile name>

Worker 1 Status:
Process Name: DW00
State: EXECUTING
Object Schema: SCOTT
Object Name: TESTTABLE
Object Type: SCHEMA_EXPORT/TABLE/TABLE
Total Objects: 1
Worker Parallelism: 1
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA


Summary of the above status snapshot:

1.        Job name is I1
2.        It is an import, and is currently in 'EXECUTING' phase.
3.        The requested degree of parallelism is 4
4.        There is only 1 worker, DW00. Your import may have multiple DW processes.
5.        The worker is importing a table (Object Type: SCHEMA_EXPORT/TABLE/TABLE)
6.        The table is SCOTT.TESTTABLE
7.        The table is being imported with an actual degree of parallelism of 1.

If the actual degree of parallelism is 1 on the table load, it means DataPump is doing a Direct Path load.
Parallel execution on a table load can only happen with External Table mode of loading.

随后查询文档
Parallel Capabilities of Oracle Data Pump
文档中描述如下
ACCESS METHODS: DIRECT PATH AND EXTERNAL TABLES
Data Pump supports two access methods to load and unload table row data: direct path and external tables. The access methods play a key role in determining how much parallelism will be possible. Because both methods support the same external data representation, data that is unloaded with one method can be loaded using the other method. Data Pump automatically chooses the appropriate method for each table data object.

Data Pump provides an external tables access driver that reads and writes files. The format of the files is the same format used with the direct path method. This allows for high-speed loading and unloading of database tables as an alternative to direct path. Although external tables single stream performance is slower than direct path, it uses the Oracle parallel execution engine for very large tables and partitions, which the Direct Path API cannot do.
Here are some common situations where Data Pump uses external tables as the data access method:
•        Loading and unloading very large tables and partitions in situations where parallel SQL can be used and a parallel Data Pump operation was requested
•        Loading and unloading tables that contain one or more columns of type BFILE or opaque, or any object type containing opaque columns
•        Loading and unloading tables with encrypted columns
•        Unloading table and using the QUERY parameter
•        Loading tables with active triggers
•        Loading clustered tables
•        Loading tables with fine-grained access control enabled for inserts
•        Loading tables with a global index on a partitioned table
说明了 impdp 支持2种access method 。其中虽然 单个数据流 Direct path read 较 external table 方式快,但是仅external table 方式可以启用并行,反而提高了速度,这个是 Direct path read 方式所无法使用的,
并且oracle 也建议在需要迁移大的数据表或者分区表时使用external table 的access method 从而提高速度。
至此也说明了开始时候为何并没有并行产生,为何v$px_session 没有结果。


Liu Maclean(刘相兵 发表于 2013-12-26 15:16:51

基于客户无法对导入导出方案做调整
建议客户在下次导入和导出时可以做如下调整,由于客户下次导入表较多但是每个表数据量并不特别大,
建议用户选出较大的表单独导入,以使大表有充分时间可以运行导入操作,小表可以放一起使用parallel 来同时操作。

或者使用external table 的access method 来做大表或者分区表的导入导出。
页: [1]
查看完整版本: IMPDP 相关问题