Francis鹏鹏 发表于 2013-3-6 00:07:38

Goldengate Extract异常hang住

本人在某省电力处理Goldengate问题
2013年03月4日晚21:43分,Extract进程abend,ggserr.log中发现相关信息如下
2013-03-04 21:43:07  INFO    OGG-01738  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p15156_Redo Thread 1: start=SeqNo: 22960, RBA: 75026448, SCN: 2518.404229881 (10815131881209), Timestamp: 2013-03-04 17:54:06.000000, end=SeqNo: 22962, RBA: 276682752, SCN: 2518.405122896 (10815132774224), Timestamp: 2013-03-04 21:43:01.000000.
2013-03-04 21:43:07  INFO    OGG-01738  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: CHECKPOINT: for object pool 2: p15156_Redo Thread 2: start=SeqNo: 19996, RBA: 49753104, SCN: 2518.404227591 (10815131878919), Timestamp: 2013-03-04 17:53:46.000000, end=SeqNo: 19998, RBA: 261558272, SCN: 2518.405122859 (10815132774187), Timestamp: 2013-03-04 21:42:59.000000.

2013年03月05日早9点上班发现此问题,雾水,故start extract_name,大约半小时后,在ggserr.log发现Goldengate WARNING,如下
2013-03-05 09:02:25  INFO    OGG-00987  Oracle GoldenGate Command Interpreter for Oracle:  GGSCI command (oracle): start extsa.
2013-03-05 09:02:25  INFO    OGG-00963  Oracle GoldenGate Manager for Oracle, mgr.prm:  Command received from GGSCI on host 10.223.1.91 (START EXTRACT EXTSA ).
2013-03-05 09:02:25  INFO    OGG-00975  Oracle GoldenGate Manager for Oracle, mgr.prm:  EXTRACT EXTSA starting.
2013-03-05 09:02:26  INFO    OGG-00992  Oracle GoldenGate Capture for Oracle, extsa.prm:  EXTRACT EXTSA starting.
2013-03-05 09:11:06  INFO    OGG-01021  Oracle GoldenGate Capture for Oracle, dpesa.prm:  Command received from GGSCI: STATS  daily totalsonly *.
2013-03-05 09:11:12  INFO    OGG-01021  Oracle GoldenGate Capture for Oracle, dpesa.prm:  Command received from GGSCI: GETLAG.
2013-03-05 09:16:06  INFO    OGG-01021  Oracle GoldenGate Capture for Oracle, dpesa.prm:  Command received from GGSCI: STATS  daily totalsonly *.
2013-03-05 09:16:11  INFO    OGG-01021  Oracle GoldenGate Capture for Oracle, dpesa.prm:  Command received from GGSCI: GETLAG.
2013-03-05 09:21:07  INFO    OGG-01021  Oracle GoldenGate Capture for Oracle, dpesa.prm:  Command received from GGSCI: STATS  daily totalsonly *.
2013-03-05 09:21:12  INFO    OGG-01021  Oracle GoldenGate Capture for Oracle, dpesa.prm:  Command received from GGSCI: GETLAG.
2013-03-05 09:26:06  INFO    OGG-01021  Oracle GoldenGate Capture for Oracle, dpesa.prm:  Command received from GGSCI: STATS  daily totalsonly *.
2013-03-05 09:26:11  INFO    OGG-01021  Oracle GoldenGate Capture for Oracle, dpesa.prm:  Command received from GGSCI: GETLAG.
2013-03-05 09:31:06  INFO    OGG-01021  Oracle GoldenGate Capture for Oracle, dpesa.prm:  Command received from GGSCI: STATS  daily totalsonly *.
2013-03-05 09:31:11  INFO    OGG-01021  Oracle GoldenGate Capture for Oracle, dpesa.prm:  Command received from GGSCI: GETLAG.
2013-03-05 09:31:28  WARNING OGG-01573  Oracle GoldenGate Capture for Oracle, extsa.prm:  br_validate_bcp: failed in call to: ggcrc64valid.
2013-03-05 09:31:29  WARNING OGG-01573  Oracle GoldenGate Capture for Oracle, extsa.prm:  br_validate_bcp: failed in call to: br_validate_bcp.
2013-03-05 09:32:33  INFO    OGG-01639  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY:  ACTIVE: for object pool 1: p15156_Redo Thread 1.
2013-03-05 09:32:33  INFO    OGG-01640  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: recovery start XID: 10.42.14909855.
2013-03-05 09:32:33  INFO    OGG-01641  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: recovery start position: SeqNo: 22954, RBA: 232610832, SCN: 2518.401588360 (10815129239688), Timestamp: 2013-03-04 13:53:58.000000.
2013-03-05 09:32:33  INFO    OGG-01642  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: recovery end position: SeqNo: 22960, RBA: 33750016, SCN: 2518.404155695 (10815131807023), Timestamp: 2013-03-04 17:42:57.000000.
2013-03-05 09:32:33  INFO    OGG-01643  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: CANCELED: for object pool 1: p15156_Redo Thread 1.
2013-03-05 09:32:33  INFO    OGG-01639  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY:  ACTIVE: for object pool 2: p15156_Redo Thread 2.
2013-03-05 09:32:33  INFO    OGG-01640  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: recovery start XID: 193.16.5141468.
2013-03-05 09:32:33  INFO    OGG-01641  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: recovery start position: SeqNo: 19989, RBA: 178910224, SCN: 2518.401593739 (10815129245067), Timestamp: 2013-03-04 13:54:34.000000.
2013-03-05 09:32:33  INFO    OGG-01642  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: recovery end position: SeqNo: 19996, RBA: 25753600, SCN: 2518.404155713 (10815131807041), Timestamp: 2013-03-04 17:42:57.000000.
2013-03-05 09:32:33  INFO    OGG-01643  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: CANCELED: for object pool 2: p15156_Redo Thread 2.

此时未做理睬,Extract进程依旧为RUNNING状态,3月5日中午11时,Extract进程abend,依旧雾水,依旧start extract_name,大约半小时后再次发现WARNING,如下

2013-03-05 11:31:34  WARNING OGG-01573  Oracle GoldenGate Capture for Oracle, extsa.prm:  br_validate_bcp: failed in call to: ggcrc64valid.
2013-03-05 11:31:34  WARNING OGG-01573  Oracle GoldenGate Capture for Oracle, extsa.prm:  br_validate_bcp: failed in call to: br_validate_bcp.
2013-03-05 11:32:28  INFO    OGG-01639  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY:  ACTIVE: for object pool 1: p15156_Redo Thread 1.
2013-03-05 11:32:28  INFO    OGG-01640  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: recovery start XID: 10.42.14909855.
2013-03-05 11:32:28  INFO    OGG-01641  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: recovery start position: SeqNo: 22954, RBA: 232610832, SCN: 2518.401588360 (10815129239688), Timestamp: 2013-03-04 13:53:58.000000.
2013-03-05 11:32:28  INFO    OGG-01642  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: recovery end position: SeqNo: 22960, RBA: 33750016, SCN: 2518.404155695 (10815131807023), Timestamp: 2013-03-04 17:42:57.000000.
2013-03-05 11:32:28  INFO    OGG-01643  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: CANCELED: for object pool 1: p15156_Redo Thread 1.
2013-03-05 11:32:28  INFO    OGG-01639  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY:  ACTIVE: for object pool 2: p15156_Redo Thread 2.
2013-03-05 11:32:28  INFO    OGG-01640  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: recovery start XID: 193.16.5141468.
2013-03-05 11:32:28  INFO    OGG-01641  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: recovery start position: SeqNo: 19989, RBA: 178910224, SCN: 2518.401593739 (10815129245067), Timestamp: 2013-03-04 13:54:34.000000.
2013-03-05 11:32:28  INFO    OGG-01642  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: recovery end position: SeqNo: 19996, RBA: 25753600, SCN: 2518.404155713 (10815131807041), Timestamp: 2013-03-04 17:42:57.000000.
2013-03-05 11:32:28  INFO    OGG-01643  Oracle GoldenGate Capture for Oracle, extsa.prm:  BOUNDED RECOVERY: CANCELED: for object pool 2: p15156_Redo Thread 2.

警觉后。通过metalink文档1293772.1和版主blog的文章OGG-01738 BOUNDED RECOVERY找到解决办法
但是,通过start extract_name brreset后,extract进程状态为RUNNING,并没有继续抽取,trail文件依旧停留在第一次Extract abend的时间点,2013年03月04日21时43分

相关信息收集如下
生产端       
操作系统版本        HP-UX B.11.31 U ia64
数据库版本        10.2.0.4
系统架构        RAC
CPU数量        16
内存大小        130810MB
数据库容量        934606MB
归档模式        是
Goldengate版本        V11.1.1.0.30
Goldengate参数       

view params dpesa

EXTRACT dpesa
PASSTHRU
RMTHOST xx.xx.xx.xx, MGRPORT 7839, compress
RMTTRAIL ./dirdat/sa
TABLE mw_sys.*;
TABLE mw_rtm.*;
TABLE mw_app.*;


view params extsa

EXTRACT extsa
SETENV (NLS_LANG="AMERICAN_AMERICA.ZHS16GBK")
USERID goldengate, PASSWORD AACAAAAAAAAAAAMACISIACJGEHDDTGUGRDFCQFQBVDUDACIA, ENCRYPTKEY default
TRANLOGOPTIONS ASMUSER sys@ASM, ASMPASSWORD AACAAAAAAAAAAALARBUGSJBIYFYEDGZDXCZIGIUBGBZFWDKJ, ENCRYPTKEY default

DDL &
INCLUDE MAPPED OBJTYPE 'table' &
INCLUDE MAPPED OBJTYPE 'index' &
EXCLUDE OPTYPE COMMENT
DDLOPTIONS  NOCROSSRENAME  REPORT

REPORTCOUNT EVERY 1 MINUTES, RATE
DISCARDFILE ./dirrpt/extsa.dsc,APPEND,MEGABYTES 1024
THREADOPTIONS  MAXCOMMITPROPAGATIONDELAY 60000 IOLATENCY 60000
DBOPTIONS  ALLOWUNUSEDCOLUMN
WARNLONGTRANS 2h,CHECKINTERVAL 3m
EXTTRAIL ./dirdat/sa
FETCHOPTIONS NOUSESNAPSHOT
TRANLOGOPTIONS  CONVERTUCS2CLOBS
tableexclude MW_APP.MWT_UD_FD_FJSB_SPWYGCD;
tableexclude MW_SYS.MWT_OM_FSDATA;
TABLE MW_SYS.*;
TABLE MW_RTM.*;
TABLE MW_APP.*;


view params mgr

PORT 7839
DYNAMICPORTLIST 7840-7914
PURGEOLDEXTRACTS ./dirdat/*, usecheckpoints, minkeephours 4
AUTOSTART ER *
USERID goldengate, PASSWORD AACAAAAAAAAAAAMACISIACJGEHDDTGUGRDFCQFQBVDUDACIA, ENCRYPTKEY default
PURGEDDLHISTORY MINKEEPDAYS 7, MAXKEEPDAYS 10
PURGEMARKERHISTORY MINKEEPDAYS 7, MAXKEEPDAYS 10
LAGREPORTHOURS 1
LAGINFOMINUTES 30
LAGCRITICALMINUTES 45
       
另外,在mgr中添加了PURGEOLDEXTRACTS 参数,但是并未生效,trail文件无法自动清除
       
恳请高人指点迷津

Francis鹏鹏 发表于 2013-3-6 09:59:45

问题已经解决,在等待了一段时间后,Extract进程恢复正常,start extract_name BRRESET可解决此问题
report如下
2013-03-05 17:36:33  INFO    OGG-00546  Default thread stack size: 262144.

2013-03-05 17:36:33  INFO    OGG-00547  Increasing thread stack size from 262144 to 1048576.

2013-03-05 17:36:33  INFO    OGG-01513  Positioning to (Thread 1) Sequence 22962, RBA 276674576, SCN 2518.405122888.

2013-03-05 17:36:33  INFO    OGG-01516  Positioned to (Thread 1) Sequence 22962, RBA 276674576, SCN 2518.405122888, Mar 4, 2013 9:43:01 PM.

2013-03-05 17:36:33  INFO    OGG-01513  Positioning to (Thread 2) Sequence 19998, RBA 261557264, SCN 2518.405122859.

2013-03-05 17:36:34  INFO    OGG-01516  Positioned to (Thread 2) Sequence 19998, RBA 261557264, SCN 2518.405122859, Mar 4, 2013 9:42:59 PM.

2013-03-05 17:36:34  INFO    OGG-01055  Recovery initialization completed for target file ./dirdat/sa001435, at RBA 21962968.

2013-03-05 17:36:34  INFO    OGG-01478  Output file ./dirdat/sa is using format RELEASE 10.4/11.1.

2013-03-05 17:36:34  INFO    OGG-01517  Position of first record processed for Thread 2, Sequence 19998, RBA 261557264, SCN 2518.405122859, Mar 4, 2013 9:42:59 PM.

2013-03-05 17:36:34  INFO    OGG-01517  Position of first record processed for Thread 1, Sequence 22962, RBA 276674576, SCN 2518.405122883, Mar 4, 2013 9:43:01 PM.

2013-03-05 17:36:34  INFO    OGG-01026  Rolling over remote file ./dirdat/sa001435.

2013-03-05 17:36:34  INFO    OGG-01053  Recovery completed for target file ./dirdat/sa001436, at RBA 950.

2013-03-05 17:36:34  INFO    OGG-01057  Recovery completed for all targets.

现在Extract进程可正常抽取

但是PURGEOLDEXTRACTS依旧未能生效

noko_jj 发表于 2013-3-6 19:19:46

Francis鹏鹏 发表于 2013-3-6 09:59 static/image/common/back.gif
问题已经解决,在等待了一段时间后,Extract进程恢复正常,start extract_name BRRESET可解决此问题
report ...

袁鹏,这个是什么平台出的错误?

Maclean Liu(刘相兵 发表于 2013-3-7 11:11:08

FYI

Symptoms

When running Oracle Golden Gate 11.1.1.0.6 or higher,  extract is "abending" every 4 hours on the hour. This approximates the same time or interval that Bounded Recovery is set to by default.
Extract can be restarted and continues to work but then fails again after 4 hours with the same errors as shown below.

ERROR
---------
2011-02-06 05:15:38  WARNING OGG-01573  br_validate_bcp: failed in call to: ggcrc64valid.

2011-02-06 05:15:38  WARNING OGG-01573  br_validate_bcp: failed in call to: br_validate_bcp.

2011-02-06 05:15:38  INFO    OGG-01639  BOUNDED RECOVERY:  ACTIVE: for object pool 1: p7186_Redo Thread 1.

2011-02-06 05:15:38  INFO    OGG-01640  BOUNDED RECOVERY: recovery start XID: 0.0.0.
...
2011-02-06 09:15:46 INFO OGG-01738 BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p7186_Redo Thread 1: start=SeqNo: 21659, RBA: 117520912, SCN: 0.2984644709 (2984644709), Timestamp: 2011-02-06 09:15:44.000000, end=SeqNo: 21659, RBA: 117602816, SCN: 0.2984644709 (2984644709), Timestamp: 2011-02-06 09:15:44.000000.

Cause

Under these conditions, this may be a problem with the Bounded Recovery Checkpoint file. It is likely corrupted.

Solution

The solution is to reset the Bounded Recovery Checkpoint file when restarting the extract like:

GGSCI> start <extract> BRRESET

Francis鹏鹏 发表于 2013-3-7 21:21:31

noko_jj 发表于 2013-3-6 19:19 static/image/common/back.gif
袁鹏,这个是什么平台出的错误?

HP-UX平台下的,很奇怪,这个系统以前也有报错,但是这个错误是第一次出现
另外,who are you?你怎么知道我的名字?

tiejiang2009 发表于 2013-4-3 13:15:19

我现在有两套RAC库,隔几天就要出现OGG-01738的问题,每次都要手动拉起来,挺郁闷的

Francis鹏鹏 发表于 2013-4-10 09:05:48

tiejiang2009 发表于 2013-4-3 13:15 static/image/common/back.gif
我现在有两套RAC库,隔几天就要出现OGG-01738的问题,每次都要手动拉起来,挺郁闷的 ...

把错误日志贴出来嘛

yiyufeng1986 发表于 2013-4-12 10:40:37

本帖最后由 yiyufeng1986 于 2013-4-12 11:29 编辑

2013-04-11 17:17:25  INFO    OGG-01639  BOUNDED RECOVERY:  ACTIVE: for object pool 1: p16046_extr.

2013-04-11 17:17:25  INFO    OGG-01640  BOUNDED RECOVERY: recovery start XID: 14.30.24583.

2013-04-11 17:17:25  INFO    OGG-01641  BOUNDED RECOVERY: recovery start position: SeqNo: 6321, RBA: 2731536, SCN: 21.3696124530 (93890437746), Timestamp: 20
13-04-11 11:58:41.000000.

2013-04-11 17:17:25  INFO    OGG-01642  BOUNDED RECOVERY: recovery end position: SeqNo: 6321, RBA: 47456204, SCN: 21.3696127689 (93890440905), Timestamp: 201
3-04-11 11:58:43.000000.

2013-04-11 17:17:25  INFO    OGG-01643  BOUNDED RECOVERY: CANCELED: for object pool 1: p16046_extr.

2013-04-11 17:17:25  INFO    OGG-01579  BOUNDED RECOVERY: VALID BCP: CP.EXTODS.000000084.

2013-04-11 17:17:25  INFO    OGG-01629  BOUNDED RECOVERY: PERSISTED OBJECTS RECOVERED:.

2013-04-11 17:17:25  INFO    OGG-01513  Positioning to Sequence 6321, RBA 41683472.

2013-04-11 17:17:25  INFO    OGG-01516  Positioned to Sequence 6321, RBA 41683472, Apr 11, 2013 11:58:43 AM.

2013-04-11 17:17:25  INFO    OGG-01055  Recovery initialization completed for target file ./dirdat/od003153, at RBA 953.

2013-04-11 17:17:25  INFO    OGG-01478  Output file ./dirdat/od is using format RELEASE 10.4/11.1.

2013-04-11 17:17:25  INFO    OGG-01026  Rolling over remote file ./dirdat/od003153.

2013-04-11 17:17:25  INFO    OGG-01053  Recovery completed for target file ./dirdat/od003154, at RBA 953.

2013-04-11 17:17:25  INFO    OGG-01057  Recovery completed for all targets.

2013-04-11 17:17:25  INFO    OGG-01517  Position of first record processed Sequence 6321, RBA 42570768, SCN 21.3696127376, Apr 11, 2013 11:58:43 AM.

yiyufeng1986 发表于 2013-4-12 10:42:28

这段时间ext进程经常莫名的hang住,处理方法也是start ext***,BRRESET

F.k-Tao 发表于 2013-4-12 15:23:52

之前也碰到过rac双节点ext进程经常hang问题,如果是rac抽取进程莫名的hang住,需要考虑rac中scn延迟的问题,需要在ogg中添加一个参数,添加scn延时时间和io延时时间参数(数据库一般默认是7秒)。

F.k-Tao 发表于 2013-4-12 15:24:38

哦,之前我碰到的是replicat端的问题,不是ext源端。:——)

yiyufeng1986 发表于 2013-4-12 17:17:15

F.k-Tao 发表于 2013-4-12 15:24 static/image/common/back.gif
哦,之前我碰到的是replicat端的问题,不是ext源端。:——)

那你之前是怎么处理的呢。

F.k-Tao 发表于 2013-4-14 12:22:59

在replicat进程添加scn延时时间和io延时时间参数(数据库一般默认是7秒)。
页: [1]
查看完整版本: Goldengate Extract异常hang住