the same trail prefix not purging trail files解决一例
1. 故障描述在某系统中,OGG生产端的MGR进程中配置了PURGEOLDEXTRACTS参数,但是trail文件不能自动删除,此现象持续了一段时间,每当发现/goldengate分区占用率超过90%后,被迫人为手工删除trail文件。
2. 故障原因
通过查看两端trail文件发现GGSCI (pms_db1) 2> info rmttrail *
Extract Trail: ./dirdat/sa
Extract: DPESA
Seqno: 871
RBA: 92501414
File Size: 200M
Extract Trail: ./dirdat/sa
Extract: EXTSA
Seqno: 1452
RBA: 92305744
File Size: 200M
GGSCI (pms_db1) 3> info exttrail *
Extract Trail: ./dirdat/sa
Extract: DPESA
Seqno: 871
RBA: 92600729
File Size: 200M
Extract Trail: ./dirdat/sa
Extract: EXTSA
Seqno: 1452
RBA: 92410591
File Size: 200M
生产端和灾备端trail的sequnce number相差很多,并且两端的trail文件prefix name相同,通过MOS文档Goldengate Manager not purging trail files 得知,如果生产和灾备端trail prefix name相同,生产端的PURGEOLDEXTRACTS参数删除的trail sequnce number依据的是灾备端的trail sequnce number,灾备端的trail sequnce number为871,OGG的PURGEOLDEXTRACTS参数就会将生产端trail sequnce number为871以前的trail文件删除,而保留了trail sequnce number为871以后的trail file,进而造成了这种未PURGE现象。
3. 故障重现
测试环境概述
源端 目标端
系统 Linux version 2.6.32-100.26.2.el5 Linux version 2.6.32-100.26.2.el5
IP地址 192.168.56.6 192.168.56.7
hostname dg1 dg2
数据库名 test2 test
数据库版本 11.2.0.3 10.2.0.1
OGG user ogg ogg
process EORA_1,PORA_1 replicat RORA_1
参数文件
源端:GGSCI (dg1) 2> view params pora_1
EXTRACT PORA_1
SETENV (NLS_LANG=JAPANESE_JAPAN.JA16EUC)
PASSTHRU
RMTHOST 192.168.56.7, MGRPORT 7809
RMTTRAIL ./dirdat/pa
TABLE scott.*;
TABLEEXCLUDE scott.emp2
TABLEEXCLUDE scott.emp -----这样会过滤掉一部分的trail内容,拉开seqno号
GGSCI (dg1) 3> view params EORA_1
EXTRACT EORA_1
SETENV (NLS_LANG=JAPANESE_JAPAN.JA16EUC)
USERID ogg, PASSWORD ogg
TRANLOGOPTIONS EXCLUDEUSER ogg
DBOPTIONS DECRYPTPASSWORD AACAAAAAAAAAAAIALFYGAHVENIMJLFPH ENCRYPTKEY DEFAULT
TRANLOGOPTIONS INTEGRATEDPARAMS (max_sga_size 200, parallelism 3)
EXTTRAIL ./dirdat/pa
DDL INCLUDE ALL
DDLOPTIONS ADDTRANDATA RETRYOP RETRYDELAY 10 MAXRETRIES 10, REPORT
TABLE scott.*;
GGSCI (dg1) 7> edit param mgr
PORT 7809
AUTOSTART ER *
PURGEOLDEXTRACTS /u01/app/ggs/dirdat/*, USECHECKPOINTS, minkeephours 1
PURGEDDLHISTORY MINKEEPDAYS 7, MAXKEEPDAYS 10
PURGEMARKERHISTORY MINKEEPDAYS 7, MAXKEEPDAYS 10
LAGREPORTHOURS 1
LAGINFOMINUTES 30
LAGCRITICALMINUTES 45目标端:GGSCI (dg2) 17> view params rora_1
REPLICAT RORA_1
SETENV (NLS_LANG=JAPANESE_JAPAN.JA16EUC)
USERID ogg, PASSWORD ogg
HANDLECOLLISIONS
ASSUMETARGETDEFS
DISCARDFILE ./dirrpt/RORA_aa.DSC, PURGE
DDL INCLUDE ALL
DDLERROR DEFAULT IGNORE RETRYOP MAXRETRIES 3 RETRYDELAY 5
DDLERROR DEFAULT DISCARD
DDLERROR DEFAULT IGNORE RETRYOP
MAP scott.*, TARGET scott.*;
GGSCI (dg2) 18> view params mgr
PORT 7809
AUTOSTART ER *
PURGEOLDEXTRACTS ./dirdat/*, usecheckpoints, minkeephours 1
PURGEDDLHISTORY MINKEEPDAYS 7, MAXKEEPDAYS 10
PURGEMARKERHISTORY MINKEEPDAYS 7, MAXKEEPDAYS 10
LAGREPORTHOURS 1
LAGINFOMINUTES 30
LAGCRITICALMINUTES 45恢复场景
在源端scott用户下执行以下:SQL> begin
2 for i in 1 ..10000
3 loop
4 insert into emp2 select * from emp;
5 commit;
6 delete from emp2;
7 commit;
8 end loop;
9 end;
10 /
SQL> begin
2 for i in 1 ..10000
3 loop
4 insert into emp1 select * from emp;
5 commit;
6 delete from emp1;
7 commit;
8 end loop;
9 end;
10 /重复执行几次,当两端trail大量生成后,会出现源端trail的seqno 和目标端trail的seqno不一致,并且seqno差距拉大。
通过PURGEOLDEXTRACTS参数MINKEEPHOURS子句的设置,一个小时后,两端seqno一致的trail文件都自动删除,但是源端trail seqno超过目标端最大seqno的trail文件无法删除。
尝试通过更改源端trail命名,将源端trail开头pa为aa,来实现源端trail文件的自动删除。
本帖最后由 Francis鹏鹏 于 2013-3-13 10:54 编辑
4. 故障解决
更改源端的trail命名规则
检查两端trail文件路径以及命名规则GGSCI (dg1) 22> info exttrail *
Extract Trail: ./dirdat/pa
Extract: EORA_1
Seqno: 838
RBA: 323977
File Size: 1M
Extract Trail: ./dirdat/pa
Extract: PORA_1
Seqno: 382
RBA: 880563
File Size: 1M两端的trail文件都是以pa开头的。
重新配置Extract进程
1. 修改Extract进程的参数文件GGSCI (dg1) 141> edit params eora_1
EXTRACT EORA_1
SETENV (NLS_LANG=JAPANESE_JAPAN.JA16EUC)
USERID ogg, PASSWORD ogg
TRANLOGOPTIONS EXCLUDEUSER ogg
DBOPTIONS DECRYPTPASSWORD AACAAAAAAAAAAAGAIFAAUDVHCFUGFIYF ENCRYPTKEY DEFAULT
TRANLOGOPTIONS INTEGRATEDPARAMS (max_sga_size 200, parallelism 3)
--EXTTRAIL ./dirdat/pa
EXTTRAIL ./dirdat/aa ---原来是pa
DDL INCLUDE ALL
DDLOPTIONS ADDTRANDATA RETRYOP RETRYDELAY 10 MAXRETRIES 10, REPORT
TABLE scott.*;2. 为Extract进程添加新的trail文件格式:GGSCI (dg1) 142> ADD EXTTRAIL ./dirdat/aa, extract eora_1, megabytes 13. 重新启动Extract进程:GGSCI (dg1) 143> stop eora_1
Sending STOP request to EXTRACT EORA_1 ...
Request processed.
GGSCI (dg1) 144> start eora_1
Sending START request to MANAGER ...
EXTRACT EORA_1 starting此时,Extract进程会重新生成一个以aa开头的trail文件,通过查看dirdat-rw-rw-rw- 1 oracle oinstall 1006 Mar 11 15:46 aa000000重新配置Datapump进程
1. 删除原Datapump进程GGSCI (dg1) 165> dblogin userid ogg, password ogg
Successfully logged into database.
GGSCI (dg1) 166> delete pora_1
Deleted EXTRACT PORA_1.2. 配置Datapump进程参数GGSCI (dg1) 2> view params pora_2
EXTRACT PORA_2
SETENV (NLS_LANG=JAPANESE_JAPAN.JA16EUC)
PASSTHRU
RMTHOST 192.168.56.7, MGRPORT 7809
RMTTRAIL ./dirdat/pa
TABLE scott.*;
TABLEEXCLUDE scott.emp2
TABLEEXCLUDE scott.emp3. 重新添加Datapump进程GGSCI (dg1) 168> add extract pora_2, exttrailsource ./dirdat/aa
EXTRACT added.4. 为Datapump进程增加trail文件GGSCI (dg1) 171> add rmttrail ./dirdat/pa, extract pora_2, megabytes 1
RMTTRAIL added.5. 定义读取源端trail文件时间点GGSCI (dg1) 172> alter extract pora_2, extseqno 0, extrba 0
EXTRACT altered.6. 启动Datapump进程GGSCI (dg1) 42> start pora_2
Sending START request to MANAGER ...
EXTRACT PORA_2 starting测试PURGEOLDEXTRACTS
在源端scott用户下执行以下操作:SQL> begin
2 for i in 1 ..10000
3 loop
4 insert into emp2 select * from emp;
5 commit;
6 delete from emp2;
7 commit;
8 end loop;
9 end;
10 /
SQL> begin
2 for i in 1 ..10000
3 loop
4 insert into emp1 select * from emp;
5 commit;
6 delete from emp1;
7 commit;
8 end loop;
9 end;
10 /在源端生成aa开头的trail文件,通过Datapump进程投递到目标端的trail文件继续以pa开头生成,并且Extract、Datapump和Replicat进程都工作正常。
通过PURGEOLDEXTRACTS参数MINKEEPHOURS子句的设置,一个小时后,源端aa开头trail文件开始自动删除。
5. 测试结论:
通过修改trail file prefix name的方式,可以避免生产、灾备两端因为trail prefix same而造成生产端trail无法正常purge;并且此方法并不修改灾备端OGG,不会对灾备端造成任何影响。
参考文档
Goldengate Manager not purging trail files
How to change the name of trail files in Oracle GoldenGate OGG
页:
[1]