Goldengate Extract异常hang住
本人在某省电力处理Goldengate问题2013年03月4日晚21:43分,Extract进程abend,ggserr.log中发现相关信息如下
2013-03-04 21:43:07 INFO OGG-01738 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p15156_Redo Thread 1: start=SeqNo: 22960, RBA: 75026448, SCN: 2518.404229881 (10815131881209), Timestamp: 2013-03-04 17:54:06.000000, end=SeqNo: 22962, RBA: 276682752, SCN: 2518.405122896 (10815132774224), Timestamp: 2013-03-04 21:43:01.000000.
2013-03-04 21:43:07 INFO OGG-01738 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: CHECKPOINT: for object pool 2: p15156_Redo Thread 2: start=SeqNo: 19996, RBA: 49753104, SCN: 2518.404227591 (10815131878919), Timestamp: 2013-03-04 17:53:46.000000, end=SeqNo: 19998, RBA: 261558272, SCN: 2518.405122859 (10815132774187), Timestamp: 2013-03-04 21:42:59.000000.
2013年03月05日早9点上班发现此问题,雾水,故start extract_name,大约半小时后,在ggserr.log发现Goldengate WARNING,如下
2013-03-05 09:02:25 INFO OGG-00987 Oracle GoldenGate Command Interpreter for Oracle: GGSCI command (oracle): start extsa.
2013-03-05 09:02:25 INFO OGG-00963 Oracle GoldenGate Manager for Oracle, mgr.prm: Command received from GGSCI on host 10.223.1.91 (START EXTRACT EXTSA ).
2013-03-05 09:02:25 INFO OGG-00975 Oracle GoldenGate Manager for Oracle, mgr.prm: EXTRACT EXTSA starting.
2013-03-05 09:02:26 INFO OGG-00992 Oracle GoldenGate Capture for Oracle, extsa.prm: EXTRACT EXTSA starting.
2013-03-05 09:11:06 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpesa.prm: Command received from GGSCI: STATS daily totalsonly *.
2013-03-05 09:11:12 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpesa.prm: Command received from GGSCI: GETLAG.
2013-03-05 09:16:06 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpesa.prm: Command received from GGSCI: STATS daily totalsonly *.
2013-03-05 09:16:11 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpesa.prm: Command received from GGSCI: GETLAG.
2013-03-05 09:21:07 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpesa.prm: Command received from GGSCI: STATS daily totalsonly *.
2013-03-05 09:21:12 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpesa.prm: Command received from GGSCI: GETLAG.
2013-03-05 09:26:06 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpesa.prm: Command received from GGSCI: STATS daily totalsonly *.
2013-03-05 09:26:11 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpesa.prm: Command received from GGSCI: GETLAG.
2013-03-05 09:31:06 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpesa.prm: Command received from GGSCI: STATS daily totalsonly *.
2013-03-05 09:31:11 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpesa.prm: Command received from GGSCI: GETLAG.
2013-03-05 09:31:28 WARNING OGG-01573 Oracle GoldenGate Capture for Oracle, extsa.prm: br_validate_bcp: failed in call to: ggcrc64valid.
2013-03-05 09:31:29 WARNING OGG-01573 Oracle GoldenGate Capture for Oracle, extsa.prm: br_validate_bcp: failed in call to: br_validate_bcp.
2013-03-05 09:32:33 INFO OGG-01639 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: ACTIVE: for object pool 1: p15156_Redo Thread 1.
2013-03-05 09:32:33 INFO OGG-01640 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: recovery start XID: 10.42.14909855.
2013-03-05 09:32:33 INFO OGG-01641 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: recovery start position: SeqNo: 22954, RBA: 232610832, SCN: 2518.401588360 (10815129239688), Timestamp: 2013-03-04 13:53:58.000000.
2013-03-05 09:32:33 INFO OGG-01642 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: recovery end position: SeqNo: 22960, RBA: 33750016, SCN: 2518.404155695 (10815131807023), Timestamp: 2013-03-04 17:42:57.000000.
2013-03-05 09:32:33 INFO OGG-01643 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: CANCELED: for object pool 1: p15156_Redo Thread 1.
2013-03-05 09:32:33 INFO OGG-01639 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: ACTIVE: for object pool 2: p15156_Redo Thread 2.
2013-03-05 09:32:33 INFO OGG-01640 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: recovery start XID: 193.16.5141468.
2013-03-05 09:32:33 INFO OGG-01641 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: recovery start position: SeqNo: 19989, RBA: 178910224, SCN: 2518.401593739 (10815129245067), Timestamp: 2013-03-04 13:54:34.000000.
2013-03-05 09:32:33 INFO OGG-01642 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: recovery end position: SeqNo: 19996, RBA: 25753600, SCN: 2518.404155713 (10815131807041), Timestamp: 2013-03-04 17:42:57.000000.
2013-03-05 09:32:33 INFO OGG-01643 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: CANCELED: for object pool 2: p15156_Redo Thread 2.
此时未做理睬,Extract进程依旧为RUNNING状态,3月5日中午11时,Extract进程abend,依旧雾水,依旧start extract_name,大约半小时后再次发现WARNING,如下
2013-03-05 11:31:34 WARNING OGG-01573 Oracle GoldenGate Capture for Oracle, extsa.prm: br_validate_bcp: failed in call to: ggcrc64valid.
2013-03-05 11:31:34 WARNING OGG-01573 Oracle GoldenGate Capture for Oracle, extsa.prm: br_validate_bcp: failed in call to: br_validate_bcp.
2013-03-05 11:32:28 INFO OGG-01639 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: ACTIVE: for object pool 1: p15156_Redo Thread 1.
2013-03-05 11:32:28 INFO OGG-01640 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: recovery start XID: 10.42.14909855.
2013-03-05 11:32:28 INFO OGG-01641 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: recovery start position: SeqNo: 22954, RBA: 232610832, SCN: 2518.401588360 (10815129239688), Timestamp: 2013-03-04 13:53:58.000000.
2013-03-05 11:32:28 INFO OGG-01642 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: recovery end position: SeqNo: 22960, RBA: 33750016, SCN: 2518.404155695 (10815131807023), Timestamp: 2013-03-04 17:42:57.000000.
2013-03-05 11:32:28 INFO OGG-01643 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: CANCELED: for object pool 1: p15156_Redo Thread 1.
2013-03-05 11:32:28 INFO OGG-01639 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: ACTIVE: for object pool 2: p15156_Redo Thread 2.
2013-03-05 11:32:28 INFO OGG-01640 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: recovery start XID: 193.16.5141468.
2013-03-05 11:32:28 INFO OGG-01641 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: recovery start position: SeqNo: 19989, RBA: 178910224, SCN: 2518.401593739 (10815129245067), Timestamp: 2013-03-04 13:54:34.000000.
2013-03-05 11:32:28 INFO OGG-01642 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: recovery end position: SeqNo: 19996, RBA: 25753600, SCN: 2518.404155713 (10815131807041), Timestamp: 2013-03-04 17:42:57.000000.
2013-03-05 11:32:28 INFO OGG-01643 Oracle GoldenGate Capture for Oracle, extsa.prm: BOUNDED RECOVERY: CANCELED: for object pool 2: p15156_Redo Thread 2.
警觉后。通过metalink文档1293772.1和版主blog的文章OGG-01738 BOUNDED RECOVERY找到解决办法
但是,通过start extract_name brreset后,extract进程状态为RUNNING,并没有继续抽取,trail文件依旧停留在第一次Extract abend的时间点,2013年03月04日21时43分
相关信息收集如下
生产端
操作系统版本 HP-UX B.11.31 U ia64
数据库版本 10.2.0.4
系统架构 RAC
CPU数量 16
内存大小 130810MB
数据库容量 934606MB
归档模式 是
Goldengate版本 V11.1.1.0.30
Goldengate参数
view params dpesa
EXTRACT dpesa
PASSTHRU
RMTHOST xx.xx.xx.xx, MGRPORT 7839, compress
RMTTRAIL ./dirdat/sa
TABLE mw_sys.*;
TABLE mw_rtm.*;
TABLE mw_app.*;
view params extsa
EXTRACT extsa
SETENV (NLS_LANG="AMERICAN_AMERICA.ZHS16GBK")
USERID goldengate, PASSWORD AACAAAAAAAAAAAMACISIACJGEHDDTGUGRDFCQFQBVDUDACIA, ENCRYPTKEY default
TRANLOGOPTIONS ASMUSER sys@ASM, ASMPASSWORD AACAAAAAAAAAAALARBUGSJBIYFYEDGZDXCZIGIUBGBZFWDKJ, ENCRYPTKEY default
DDL &
INCLUDE MAPPED OBJTYPE 'table' &
INCLUDE MAPPED OBJTYPE 'index' &
EXCLUDE OPTYPE COMMENT
DDLOPTIONS NOCROSSRENAME REPORT
REPORTCOUNT EVERY 1 MINUTES, RATE
DISCARDFILE ./dirrpt/extsa.dsc,APPEND,MEGABYTES 1024
THREADOPTIONS MAXCOMMITPROPAGATIONDELAY 60000 IOLATENCY 60000
DBOPTIONS ALLOWUNUSEDCOLUMN
WARNLONGTRANS 2h,CHECKINTERVAL 3m
EXTTRAIL ./dirdat/sa
FETCHOPTIONS NOUSESNAPSHOT
TRANLOGOPTIONS CONVERTUCS2CLOBS
tableexclude MW_APP.MWT_UD_FD_FJSB_SPWYGCD;
tableexclude MW_SYS.MWT_OM_FSDATA;
TABLE MW_SYS.*;
TABLE MW_RTM.*;
TABLE MW_APP.*;
view params mgr
PORT 7839
DYNAMICPORTLIST 7840-7914
PURGEOLDEXTRACTS ./dirdat/*, usecheckpoints, minkeephours 4
AUTOSTART ER *
USERID goldengate, PASSWORD AACAAAAAAAAAAAMACISIACJGEHDDTGUGRDFCQFQBVDUDACIA, ENCRYPTKEY default
PURGEDDLHISTORY MINKEEPDAYS 7, MAXKEEPDAYS 10
PURGEMARKERHISTORY MINKEEPDAYS 7, MAXKEEPDAYS 10
LAGREPORTHOURS 1
LAGINFOMINUTES 30
LAGCRITICALMINUTES 45
另外,在mgr中添加了PURGEOLDEXTRACTS 参数,但是并未生效,trail文件无法自动清除
恳请高人指点迷津 问题已经解决,在等待了一段时间后,Extract进程恢复正常,start extract_name BRRESET可解决此问题
report如下
2013-03-05 17:36:33 INFO OGG-00546 Default thread stack size: 262144.
2013-03-05 17:36:33 INFO OGG-00547 Increasing thread stack size from 262144 to 1048576.
2013-03-05 17:36:33 INFO OGG-01513 Positioning to (Thread 1) Sequence 22962, RBA 276674576, SCN 2518.405122888.
2013-03-05 17:36:33 INFO OGG-01516 Positioned to (Thread 1) Sequence 22962, RBA 276674576, SCN 2518.405122888, Mar 4, 2013 9:43:01 PM.
2013-03-05 17:36:33 INFO OGG-01513 Positioning to (Thread 2) Sequence 19998, RBA 261557264, SCN 2518.405122859.
2013-03-05 17:36:34 INFO OGG-01516 Positioned to (Thread 2) Sequence 19998, RBA 261557264, SCN 2518.405122859, Mar 4, 2013 9:42:59 PM.
2013-03-05 17:36:34 INFO OGG-01055 Recovery initialization completed for target file ./dirdat/sa001435, at RBA 21962968.
2013-03-05 17:36:34 INFO OGG-01478 Output file ./dirdat/sa is using format RELEASE 10.4/11.1.
2013-03-05 17:36:34 INFO OGG-01517 Position of first record processed for Thread 2, Sequence 19998, RBA 261557264, SCN 2518.405122859, Mar 4, 2013 9:42:59 PM.
2013-03-05 17:36:34 INFO OGG-01517 Position of first record processed for Thread 1, Sequence 22962, RBA 276674576, SCN 2518.405122883, Mar 4, 2013 9:43:01 PM.
2013-03-05 17:36:34 INFO OGG-01026 Rolling over remote file ./dirdat/sa001435.
2013-03-05 17:36:34 INFO OGG-01053 Recovery completed for target file ./dirdat/sa001436, at RBA 950.
2013-03-05 17:36:34 INFO OGG-01057 Recovery completed for all targets.
现在Extract进程可正常抽取
但是PURGEOLDEXTRACTS依旧未能生效 Francis鹏鹏 发表于 2013-3-6 09:59 static/image/common/back.gif
问题已经解决,在等待了一段时间后,Extract进程恢复正常,start extract_name BRRESET可解决此问题
report ...
袁鹏,这个是什么平台出的错误? FYI
Symptoms
When running Oracle Golden Gate 11.1.1.0.6 or higher, extract is "abending" every 4 hours on the hour. This approximates the same time or interval that Bounded Recovery is set to by default.
Extract can be restarted and continues to work but then fails again after 4 hours with the same errors as shown below.
ERROR
---------
2011-02-06 05:15:38 WARNING OGG-01573 br_validate_bcp: failed in call to: ggcrc64valid.
2011-02-06 05:15:38 WARNING OGG-01573 br_validate_bcp: failed in call to: br_validate_bcp.
2011-02-06 05:15:38 INFO OGG-01639 BOUNDED RECOVERY: ACTIVE: for object pool 1: p7186_Redo Thread 1.
2011-02-06 05:15:38 INFO OGG-01640 BOUNDED RECOVERY: recovery start XID: 0.0.0.
...
2011-02-06 09:15:46 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p7186_Redo Thread 1: start=SeqNo: 21659, RBA: 117520912, SCN: 0.2984644709 (2984644709), Timestamp: 2011-02-06 09:15:44.000000, end=SeqNo: 21659, RBA: 117602816, SCN: 0.2984644709 (2984644709), Timestamp: 2011-02-06 09:15:44.000000.
Cause
Under these conditions, this may be a problem with the Bounded Recovery Checkpoint file. It is likely corrupted.
Solution
The solution is to reset the Bounded Recovery Checkpoint file when restarting the extract like:
GGSCI> start <extract> BRRESET noko_jj 发表于 2013-3-6 19:19 static/image/common/back.gif
袁鹏,这个是什么平台出的错误?
HP-UX平台下的,很奇怪,这个系统以前也有报错,但是这个错误是第一次出现
另外,who are you?你怎么知道我的名字? 我现在有两套RAC库,隔几天就要出现OGG-01738的问题,每次都要手动拉起来,挺郁闷的 tiejiang2009 发表于 2013-4-3 13:15 static/image/common/back.gif
我现在有两套RAC库,隔几天就要出现OGG-01738的问题,每次都要手动拉起来,挺郁闷的 ...
把错误日志贴出来嘛 本帖最后由 yiyufeng1986 于 2013-4-12 11:29 编辑
2013-04-11 17:17:25 INFO OGG-01639 BOUNDED RECOVERY: ACTIVE: for object pool 1: p16046_extr.
2013-04-11 17:17:25 INFO OGG-01640 BOUNDED RECOVERY: recovery start XID: 14.30.24583.
2013-04-11 17:17:25 INFO OGG-01641 BOUNDED RECOVERY: recovery start position: SeqNo: 6321, RBA: 2731536, SCN: 21.3696124530 (93890437746), Timestamp: 20
13-04-11 11:58:41.000000.
2013-04-11 17:17:25 INFO OGG-01642 BOUNDED RECOVERY: recovery end position: SeqNo: 6321, RBA: 47456204, SCN: 21.3696127689 (93890440905), Timestamp: 201
3-04-11 11:58:43.000000.
2013-04-11 17:17:25 INFO OGG-01643 BOUNDED RECOVERY: CANCELED: for object pool 1: p16046_extr.
2013-04-11 17:17:25 INFO OGG-01579 BOUNDED RECOVERY: VALID BCP: CP.EXTODS.000000084.
2013-04-11 17:17:25 INFO OGG-01629 BOUNDED RECOVERY: PERSISTED OBJECTS RECOVERED:.
2013-04-11 17:17:25 INFO OGG-01513 Positioning to Sequence 6321, RBA 41683472.
2013-04-11 17:17:25 INFO OGG-01516 Positioned to Sequence 6321, RBA 41683472, Apr 11, 2013 11:58:43 AM.
2013-04-11 17:17:25 INFO OGG-01055 Recovery initialization completed for target file ./dirdat/od003153, at RBA 953.
2013-04-11 17:17:25 INFO OGG-01478 Output file ./dirdat/od is using format RELEASE 10.4/11.1.
2013-04-11 17:17:25 INFO OGG-01026 Rolling over remote file ./dirdat/od003153.
2013-04-11 17:17:25 INFO OGG-01053 Recovery completed for target file ./dirdat/od003154, at RBA 953.
2013-04-11 17:17:25 INFO OGG-01057 Recovery completed for all targets.
2013-04-11 17:17:25 INFO OGG-01517 Position of first record processed Sequence 6321, RBA 42570768, SCN 21.3696127376, Apr 11, 2013 11:58:43 AM.
这段时间ext进程经常莫名的hang住,处理方法也是start ext***,BRRESET 之前也碰到过rac双节点ext进程经常hang问题,如果是rac抽取进程莫名的hang住,需要考虑rac中scn延迟的问题,需要在ogg中添加一个参数,添加scn延时时间和io延时时间参数(数据库一般默认是7秒)。 哦,之前我碰到的是replicat端的问题,不是ext源端。:——) F.k-Tao 发表于 2013-4-12 15:24 static/image/common/back.gif
哦,之前我碰到的是replicat端的问题,不是ext源端。:——)
那你之前是怎么处理的呢。 在replicat进程添加scn延时时间和io延时时间参数(数据库一般默认是7秒)。
页:
[1]