- 最后登录
- 2013-5-13
- 在线时间
- 6 小时
- 威望
- 18
- 金钱
- 40
- 注册时间
- 2013-3-5
- 阅读权限
- 10
- 帖子
- 11
- 精华
- 0
- 积分
- 18
- UID
- 943
|
1#
发表于 2013-3-13 10:41:00
|
查看: 4393 |
回复: 1
1. 故障描述
在某系统中,OGG生产端的MGR进程中配置了PURGEOLDEXTRACTS参数,但是trail文件不能自动删除,此现象持续了一段时间,每当发现/goldengate分区占用率超过90%后,被迫人为手工删除trail文件。
2. 故障原因
通过查看两端trail文件发现- GGSCI (pms_db1) 2> info rmttrail *
- Extract Trail: ./dirdat/sa
- Extract: DPESA
- Seqno: 871
- RBA: 92501414
- File Size: 200M
- Extract Trail: ./dirdat/sa
- Extract: EXTSA
- Seqno: 1452
- RBA: 92305744
- File Size: 200M
- GGSCI (pms_db1) 3> info exttrail *
- Extract Trail: ./dirdat/sa
- Extract: DPESA
- Seqno: 871
- RBA: 92600729
- File Size: 200M
- Extract Trail: ./dirdat/sa
- Extract: EXTSA
- Seqno: 1452
- RBA: 92410591
- File Size: 200M
复制代码 生产端和灾备端trail的sequnce number相差很多,并且两端的trail文件prefix name相同,通过MOS文档Goldengate Manager not purging trail files [ID 1460097.1]得知,如果生产和灾备端trail prefix name相同,生产端的PURGEOLDEXTRACTS参数删除的trail sequnce number依据的是灾备端的trail sequnce number,灾备端的trail sequnce number为871,OGG的PURGEOLDEXTRACTS参数就会将生产端trail sequnce number为871以前的trail文件删除,而保留了trail sequnce number为871以后的trail file,进而造成了这种未PURGE现象。
3. 故障重现
测试环境概述
| | | 系统 | Linux version 2.6.32-100.26.2.el5 | Linux version 2.6.32-100.26.2.el5 | IP地址 | | | hostname | | | 数据库名 | | | 数据库版本 | | | OGG user | | | process | | | 参数文件
源端:- GGSCI (dg1) 2> view params pora_1
- EXTRACT PORA_1
- SETENV (NLS_LANG=JAPANESE_JAPAN.JA16EUC)
- PASSTHRU
- RMTHOST 192.168.56.7, MGRPORT 7809
- RMTTRAIL ./dirdat/pa
- TABLE scott.*;
- TABLEEXCLUDE scott.emp2
- TABLEEXCLUDE scott.emp -----这样会过滤掉一部分的trail内容,拉开seqno号
- GGSCI (dg1) 3> view params EORA_1
- EXTRACT EORA_1
- SETENV (NLS_LANG=JAPANESE_JAPAN.JA16EUC)
- USERID ogg, PASSWORD ogg
- TRANLOGOPTIONS EXCLUDEUSER ogg
- DBOPTIONS DECRYPTPASSWORD AACAAAAAAAAAAAIALFYGAHVENIMJLFPH ENCRYPTKEY DEFAULT
- TRANLOGOPTIONS INTEGRATEDPARAMS (max_sga_size 200, parallelism 3)
- EXTTRAIL ./dirdat/pa
- DDL INCLUDE ALL
- DDLOPTIONS ADDTRANDATA RETRYOP RETRYDELAY 10 MAXRETRIES 10, REPORT
- TABLE scott.*;
- GGSCI (dg1) 7> edit param mgr
- PORT 7809
- AUTOSTART ER *
- PURGEOLDEXTRACTS /u01/app/ggs/dirdat/*, USECHECKPOINTS, minkeephours 1
- PURGEDDLHISTORY MINKEEPDAYS 7, MAXKEEPDAYS 10
- PURGEMARKERHISTORY MINKEEPDAYS 7, MAXKEEPDAYS 10
- LAGREPORTHOURS 1
- LAGINFOMINUTES 30
- LAGCRITICALMINUTES 45
复制代码 目标端:- GGSCI (dg2) 17> view params rora_1
- REPLICAT RORA_1
- SETENV (NLS_LANG=JAPANESE_JAPAN.JA16EUC)
- USERID ogg, PASSWORD ogg
- HANDLECOLLISIONS
- ASSUMETARGETDEFS
- DISCARDFILE ./dirrpt/RORA_aa.DSC, PURGE
- DDL INCLUDE ALL
- DDLERROR DEFAULT IGNORE RETRYOP MAXRETRIES 3 RETRYDELAY 5
- DDLERROR DEFAULT DISCARD
- DDLERROR DEFAULT IGNORE RETRYOP
- MAP scott.*, TARGET scott.*;
- GGSCI (dg2) 18> view params mgr
- PORT 7809
- AUTOSTART ER *
- PURGEOLDEXTRACTS ./dirdat/*, usecheckpoints, minkeephours 1
- PURGEDDLHISTORY MINKEEPDAYS 7, MAXKEEPDAYS 10
- PURGEMARKERHISTORY MINKEEPDAYS 7, MAXKEEPDAYS 10
- LAGREPORTHOURS 1
- LAGINFOMINUTES 30
- LAGCRITICALMINUTES 45
复制代码 恢复场景
在源端scott用户下执行以下:- SQL> begin
- 2 for i in 1 ..10000
- 3 loop
- 4 insert into emp2 select * from emp;
- 5 commit;
- 6 delete from emp2;
- 7 commit;
- 8 end loop;
- 9 end;
- 10 /
- SQL> begin
- 2 for i in 1 ..10000
- 3 loop
- 4 insert into emp1 select * from emp;
- 5 commit;
- 6 delete from emp1;
- 7 commit;
- 8 end loop;
- 9 end;
- 10 /
复制代码 重复执行几次,当两端trail大量生成后,会出现源端trail的seqno 和目标端trail的seqno不一致,并且seqno差距拉大。
通过PURGEOLDEXTRACTS参数MINKEEPHOURS子句的设置,一个小时后,两端seqno一致的trail文件都自动删除,但是源端trail seqno超过目标端最大seqno的trail文件无法删除。
尝试通过更改源端trail命名,将源端trail开头pa为aa,来实现源端trail文件的自动删除。
|
|