goldengate抽取延迟3-4小时

ellengan 发表于 2013-4-17 16:49:04

请教刘大及各位高手：

目前我们生产库的一套ogg进程，抽取进程延迟达到3-4个小时，请问有何办法可以减少延迟？

环境介绍如下：
现在共三套数据库，复制关系从A->B->C，A&B为同机房，C为异地。A为IBM小机P750，B为P550,C为PCSERVER，数据库的版本皆为10.2.0.5，OGG版本为112101.
目前A->B的extract进程无延迟，B的replicat进程（REPB&REPC&REPD）延迟为3-5S，从B机配置了抽取进程exsb，转发进程pumsb，C端有replicat，目前B端的exsb延迟比较大，显示如下：
GGSCI (lmsdb2) 1> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
EXTRACT RUNNING EXSB 00:59:28 00:00:03
EXTRACT RUNNING PUMSB 00:00:00 00:00:09
REPLICAT RUNNING REPB 00:00:00 00:00:00
REPLICAT RUNNING REPC 00:00:00 00:00:00
REPLICAT RUNNING REPD 00:00:00 00:00:01

GGSCI (lmsdb2) 2> lag exsb

Sending GETLAG request to EXTRACT EXSB ...
Last record lag: 3575 seconds.

GGSCI (lmsdb2) 3> lag pumsb

Sending GETLAG request to EXTRACT PUMSB ...
Last record lag: 3582 seconds.

B数据库的awr如附件所示,谢谢各位了^_^

ellengan 发表于 2013-4-17 16:51:38

另外参数eofdelay&flushsecs都已经调整过了，改进效果不大

ellengan 发表于 2013-4-17 16:59:35

本帖最后由 ellengan 于 2013-4-17 17:00 编辑

抽取进程exsb参数:
===
Extract exsb
UserId goldengate, Password xxxxx
CacheMgr CacheDirectory /goldengate/dirtmp 120G, CacheDirectory /goldengate/
extTrail ./dirdat/sB
ThreadOptions MaxCommitPropagationDelay 10000 IOLatency 3000
TranLogOptions AltArchivedLogFormat Instance xxxDB %t_%s_%r.dbf
TranLogOptions AltArchiveLogDest Instance xxxDB /goldengate/archive
DDL Include Mapped
eofdelay 5
flushsecs 3
DDLOptions AddTranData, Report
DDLOptions NoCrossRename, Report
Sequence xxx.*;
TableExclude xxx.SYS_EXPORT_SCHEMA*
Table xxx.*;
===
转发进程pumsb参数：
===
Extract pumsb
UserId goldengate, Password xxx
RmtHost 172.30.1xx.xx, MgrPort 7809
RmtTrail ./dirdat/pb
DDL Include Mapped
DDLOptions AddTranData, Report
DDLOptions NoCrossRename, Report
TableExclude xxx.SYS_EXPORT*
Sequence xxx.*;
Table xxx.*;
===

ellengan 发表于 2013-4-17 17:02:51

麻烦各位帮忙看看啦，谢谢大家^_^

syhnd 发表于 2013-4-18 16:13:16

1. 先评估一下B机器上extract的redo抽取速率.
方法:
send ...., status
过10分钟,
再次运行send ......, status
看10分钟内处理了多少M redo日志，评估extract的抽取速率。

2. dd 一下 redo，看dd读取的速率.

ellengan 发表于 2013-4-18 17:12:11

延迟比较大，所以一直读取的是归档日志，目前归档跟redo都在同一个文件目录下。目前共有6个500M大小的redo，白天50分钟左右完成一个500M的归档日志。
现在读取的是30907的归档日志，请教以下所示如何计算出extract在10分钟内读取了多少m的日志？
===
GGSCI (lmsdb2) 22> send exsb,status

Sending STATUS request to EXTRACT EXSB ...

EXTRACT EXSB (PID 8651724)
  Current status: Recovery complete: Processing data

  Current read position:
  Redo thread #: 1
  Sequence #: 30907
  RBA: 79566408
  Timestamp: 2013-04-18 15:32:01.000000
  SCN: 4.2968808899
  Current write position:
  Sequence #: 665
  RBA: 107578387
  Timestamp: 2013-04-18 16:17:03.240695
  Extract Trail: ./dirdat/sB

GGSCI (lmsdb2) 25>  send exsb,status

Sending STATUS request to EXTRACT EXSB ...

EXTRACT EXSB (PID 8651724)
  Current status: Recovery complete: Processing data

  Current read position:
  Redo thread #: 1
  Sequence #: 30907
  RBA: 171782416
  Timestamp: 2013-04-18 15:41:13.000000
  SCN: 4.2971116008
  Current write position:
  Sequence #: 666
  RBA: 18267276
  Timestamp: 2013-04-18 16:27:04.135832
  Extract Trail: ./dirdat/sB
====
另外dd了一下该归档日志，速度约为23M/s，比较慢

ellengan 发表于 2013-4-18 22:19:10

该归档日志大小为500M
bash-3.2$ date;dd if=/goldengate/1_xxx_734201040.dbf bs=64k| dd of=/dev/null;date
Thu Apr 18 17:10:20 GMT+08:00 2013
7768+1 records in.
7768+1 records out.
994385+0 records in.
994385+0 records out.
Thu Apr 18 17:10:34 GMT+08:00 2013

Maclean Liu(刘相兵 发表于 2013-4-19 12:03:55

GGSCI> stats extract <extract_name>, totalsonly *, reportrate sec
GGSCI> stats extract <extract_name>, totalsonly *, reportrate min

看下呢?

ellengan 发表于 2013-4-19 12:14:03

GGSCI (lmsdb2) 9> stats extract exsb,totalsonly *,reportrate sec

Sending STATS request to EXTRACT EXSB ...

Start of Statistics at 2013-04-19 12:13:25.

DDL replication statistics (for all trails):

*** Total statistics since extract started    ***
      Operations                                  79061.00
      Mapped operations                            79061.00
      Unmapped operations                               0.00
      Other operations                                  0.00
      Excluded operations                               0.00

Output to ./dirdat/sB:

Cumulative totals for specified table(s):

*** Total statistics since 2013-04-18 11:15:01 ***

      No database operations have been performed.

*** Daily statistics since 2013-04-19 00:00:00 ***

      No database operations have been performed.

*** Hourly statistics since 2013-04-19 12:00:00 ***

      No database operations have been performed.

*** Latest statistics since 2013-04-18 11:15:01 ***

      No database operations have been performed.

End of Statistics.

GGSCI (lmsdb2) 10> stats extract exsb,totalsonly *,reportrate ,min

Sending STATS request to EXTRACT EXSB ...

Start of Statistics at 2013-04-19 12:13:33.

DDL replication statistics (for all trails):

*** Total statistics since extract started    ***
      Operations                                  79071.00
      Mapped operations                            79071.00
      Unmapped operations                               0.00
      Other operations                                  0.00
      Excluded operations                               0.00

Output to ./dirdat/sB:

Cumulative totals for specified table(s):

*** Total statistics since 2013-04-18 11:15:01 ***

      No database operations have been performed.

*** Daily statistics since 2013-04-19 00:00:00 ***

      No database operations have been performed.

*** Hourly statistics since 2013-04-19 12:00:00 ***

      No database operations have been performed.

*** Latest statistics since 2013-04-18 11:15:01 ***

      No database operations have been performed.

End of Statistics.

ellengan 发表于 2013-4-19 12:16:52

谢谢刘大
stats extract exsb 输出比较多，被截断了，如附件所示

Maclean Liu(刘相兵 发表于 2013-4-19 15:34:29

extract的IO慢不仅仅可能是读的慢还可能是写trail的慢，归档+online redo 和 trail file所在路径是否是同一个文件系统？

ellengan 发表于 2013-4-19 16:06:11

归档+online redo 和 trail file所在路径都是在同一个文件系统，不是存储，都是P550自带的本地盘

Maclean Liu(刘相兵 发表于 2013-4-19 16:37:20

巧妇难为无米之炊啊

ellengan 发表于 2013-4-19 16:39:52

艾。刘大是说我们系统硬盘的IO太差么？不过我们的系统工程师看了，说这个文件系统不是很忙，使用率不高，所以我也搞不懂了

ellengan 发表于 2013-4-22 16:38:57

在中间库的复制进程添加参数 DBOPTIONS _NOAUTOMATICSEQUENCEFLUSH之后，问题解决，现在延迟一下子降到10s之内.多谢刘大的协助!!

Maclean Liu(刘相兵 发表于 2013-4-22 18:55:09

_NOAUTOMATICSEQUENCEFLUSH

Oracle GoldenGate - Version 11.2.1.0.4 and later
Information in this document applies to any platform.
Goal

I am using GoldenGate to replicate sequence last_number.  As this is for failover purpose, the DDL setup (including DDL trigger) is enabled in target database.  during sequence replication by replicat, I saw lots of following DDLs captured in ggs_ddl_trace.log:
alter sequence .... cycle; ---- for the sequence with cycle option

alter sequence ... nocycle;  ---- for the sequence without cycle option

What is this DDL for, and can I disable them to improve the purpose?
Fix

The replicat changes the last_number in target sequence by querying the nextval, but this change is not secured if the target database goes down.  The observed DDL will move the high water mark (HWM) of target sequence and make sure sequences replicate reliably for each sequence record.

The DDL may be skipped with replicat parameter:

DBOPTIONS _NOAUTOMATICSEQUENCEFLUSH

However, if target db goes down, the sequence number may be out of sync with source.  then you will need to use FLUSH to make sure they are in sync (or FLUSH it before the target goes down, for scheduled shutdown).

Other options to reduce the overhead from target DDL trigger:

1. disable the target DDL trigger if possible, and enable it only before the failover happens.

2. increase the cach size (e.g., from default 20 to 2000), which will reduce the number of sequence records to be replicated.

3. modify ddl_filter.sql in target to filter out the DDL (note 1420197.1 shows how to use ddl_filter.sql).

Maclean Liu(刘相兵 发表于 2013-4-22 18:58:04

这个问题可能还是和 source库该sequence 的cache较小有关

但加了 _NOAUTOMATICSEQUENCEFLUSH 可能导致目标库终止时可能产生sequence 与source库不一致的情况。

页: [1]

Oracle数据库数据恢复、性能优化's Archiver

goldengate抽取延迟3-4小时