goldengate抽取延迟3-4小时
请教刘大及各位高手:目前我们生产库的一套ogg进程,抽取进程延迟达到3-4个小时,请问有何办法可以减少延迟?
环境介绍如下:
现在共三套数据库,复制关系从A->B->C,A&B为同机房,C为异地。A为IBM小机P750,B为P550,C为PCSERVER,数据库的版本皆为10.2.0.5,OGG版本为112101.
目前A->B的extract进程无延迟,B的replicat进程(REPB&REPC&REPD)延迟为3-5S,从B机配置了抽取进程exsb,转发进程pumsb,C端有replicat,目前B端的exsb延迟比较大,显示如下:
GGSCI (lmsdb2) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXSB 00:59:28 00:00:03
EXTRACT RUNNING PUMSB 00:00:00 00:00:09
REPLICAT RUNNING REPB 00:00:00 00:00:00
REPLICAT RUNNING REPC 00:00:00 00:00:00
REPLICAT RUNNING REPD 00:00:00 00:00:01
GGSCI (lmsdb2) 2> lag exsb
Sending GETLAG request to EXTRACT EXSB ...
Last record lag: 3575 seconds.
GGSCI (lmsdb2) 3> lag pumsb
Sending GETLAG request to EXTRACT PUMSB ...
Last record lag: 3582 seconds.
B数据库的awr如附件所示,谢谢各位了^_^
另外参数eofdelay&flushsecs都已经调整过了,改进效果不大 本帖最后由 ellengan 于 2013-4-17 17:00 编辑
抽取进程exsb参数:
===
Extract exsb
UserId goldengate, Password xxxxx
CacheMgr CacheDirectory /goldengate/dirtmp 120G, CacheDirectory /goldengate/
extTrail ./dirdat/sB
ThreadOptions MaxCommitPropagationDelay 10000 IOLatency 3000
TranLogOptions AltArchivedLogFormat Instance xxxDB %t_%s_%r.dbf
TranLogOptions AltArchiveLogDest Instance xxxDB /goldengate/archive
DDL Include Mapped
eofdelay 5
flushsecs 3
DDLOptions AddTranData, Report
DDLOptions NoCrossRename, Report
Sequence xxx.*;
TableExclude xxx.SYS_EXPORT_SCHEMA*
Table xxx.*;
===
转发进程pumsb参数:
===
Extract pumsb
UserId goldengate, Password xxx
RmtHost 172.30.1xx.xx, MgrPort 7809
RmtTrail ./dirdat/pb
DDL Include Mapped
DDLOptions AddTranData, Report
DDLOptions NoCrossRename, Report
TableExclude xxx.SYS_EXPORT*
Sequence xxx.*;
Table xxx.*;
=== 麻烦各位帮忙看看啦,谢谢大家^_^ 1. 先评估一下B机器上extract的redo抽取速率.
方法:
send ...., status
过10分钟,
再次运行send ......, status
看10分钟内处理了多少M redo日志,评估extract的抽取速率。
2. dd 一下 redo,看dd读取的速率. 延迟比较大,所以一直读取的是归档日志,目前归档跟redo都在同一个文件目录下。目前共有6个500M大小的redo,白天50分钟左右完成一个500M的归档日志。
现在读取的是30907的归档日志,请教以下所示如何计算出extract在10分钟内读取了多少m的日志?
===
GGSCI (lmsdb2) 22> send exsb,status
Sending STATUS request to EXTRACT EXSB ...
EXTRACT EXSB (PID 8651724)
Current status: Recovery complete: Processing data
Current read position:
Redo thread #: 1
Sequence #: 30907
RBA: 79566408
Timestamp: 2013-04-18 15:32:01.000000
SCN: 4.2968808899
Current write position:
Sequence #: 665
RBA: 107578387
Timestamp: 2013-04-18 16:17:03.240695
Extract Trail: ./dirdat/sB
GGSCI (lmsdb2) 25> send exsb,status
Sending STATUS request to EXTRACT EXSB ...
EXTRACT EXSB (PID 8651724)
Current status: Recovery complete: Processing data
Current read position:
Redo thread #: 1
Sequence #: 30907
RBA: 171782416
Timestamp: 2013-04-18 15:41:13.000000
SCN: 4.2971116008
Current write position:
Sequence #: 666
RBA: 18267276
Timestamp: 2013-04-18 16:27:04.135832
Extract Trail: ./dirdat/sB
====
另外dd了一下该归档日志,速度约为23M/s,比较慢 该归档日志大小为500M
bash-3.2$ date;dd if=/goldengate/1_xxx_734201040.dbf bs=64k| dd of=/dev/null;date
Thu Apr 18 17:10:20 GMT+08:00 2013
7768+1 records in.
7768+1 records out.
994385+0 records in.
994385+0 records out.
Thu Apr 18 17:10:34 GMT+08:00 2013 GGSCI> stats extract <extract_name>, totalsonly *, reportrate sec
GGSCI> stats extract <extract_name>, totalsonly *, reportrate min
看下呢? GGSCI (lmsdb2) 9> stats extract exsb,totalsonly *,reportrate sec
Sending STATS request to EXTRACT EXSB ...
Start of Statistics at 2013-04-19 12:13:25.
DDL replication statistics (for all trails):
*** Total statistics since extract started ***
Operations 79061.00
Mapped operations 79061.00
Unmapped operations 0.00
Other operations 0.00
Excluded operations 0.00
Output to ./dirdat/sB:
Cumulative totals for specified table(s):
*** Total statistics since 2013-04-18 11:15:01 ***
No database operations have been performed.
*** Daily statistics since 2013-04-19 00:00:00 ***
No database operations have been performed.
*** Hourly statistics since 2013-04-19 12:00:00 ***
No database operations have been performed.
*** Latest statistics since 2013-04-18 11:15:01 ***
No database operations have been performed.
End of Statistics.
GGSCI (lmsdb2) 10> stats extract exsb,totalsonly *,reportrate ,min
Sending STATS request to EXTRACT EXSB ...
Start of Statistics at 2013-04-19 12:13:33.
DDL replication statistics (for all trails):
*** Total statistics since extract started ***
Operations 79071.00
Mapped operations 79071.00
Unmapped operations 0.00
Other operations 0.00
Excluded operations 0.00
Output to ./dirdat/sB:
Cumulative totals for specified table(s):
*** Total statistics since 2013-04-18 11:15:01 ***
No database operations have been performed.
*** Daily statistics since 2013-04-19 00:00:00 ***
No database operations have been performed.
*** Hourly statistics since 2013-04-19 12:00:00 ***
No database operations have been performed.
*** Latest statistics since 2013-04-18 11:15:01 ***
No database operations have been performed.
End of Statistics. 谢谢刘大
stats extract exsb 输出比较多,被截断了,如附件所示 extract的IO慢 不仅仅可能是读的慢 还可能是写trail的慢, 归档+online redo 和 trail file所在 路径是否是同一个文件系统? 归档+online redo 和 trail file所在路径都是在同一个文件系统,不是存储,都是P550自带的本地盘 巧妇难为无米之炊啊 艾。刘大是说我们系统硬盘的IO太差么? 不过我们的系统工程师看了,说这个文件系统不是很忙,使用率不高,所以我也搞不懂了 在中间库的复制进程添加参数 DBOPTIONS _NOAUTOMATICSEQUENCEFLUSH之后,问题解决,现在延迟一下子降到10s之内.多谢刘大的协助!! _NOAUTOMATICSEQUENCEFLUSH
Oracle GoldenGate - Version 11.2.1.0.4 and later
Information in this document applies to any platform.
Goal
I am using GoldenGate to replicate sequence last_number. As this is for failover purpose, the DDL setup (including DDL trigger) is enabled in target database. during sequence replication by replicat, I saw lots of following DDLs captured in ggs_ddl_trace.log:
alter sequence .... cycle; ---- for the sequence with cycle option
alter sequence ... nocycle; ---- for the sequence without cycle option
What is this DDL for, and can I disable them to improve the purpose?
Fix
The replicat changes the last_number in target sequence by querying the nextval, but this change is not secured if the target database goes down. The observed DDL will move the high water mark (HWM) of target sequence and make sure sequences replicate reliably for each sequence record.
The DDL may be skipped with replicat parameter:
DBOPTIONS _NOAUTOMATICSEQUENCEFLUSH
However, if target db goes down, the sequence number may be out of sync with source. then you will need to use FLUSH to make sure they are in sync (or FLUSH it before the target goes down, for scheduled shutdown).
Other options to reduce the overhead from target DDL trigger:
1. disable the target DDL trigger if possible, and enable it only before the failover happens.
2. increase the cach size (e.g., from default 20 to 2000), which will reduce the number of sequence records to be replicated.
3. modify ddl_filter.sql in target to filter out the DDL (note 1420197.1 shows how to use ddl_filter.sql).
这个问题可能还是和 source库该sequence 的cache较小有关
但加了 _NOAUTOMATICSEQUENCEFLUSH 可能导致目标库终止时可能 产生sequence 与source库不一致的情况。
页:
[1]