ccshu

73 积分	0 好友	0 主题

发消息

OGG的奇怪等待

1^#

发表于 2012-2-24 12:45:52 | 查看: 10095| 回复: 8

好像最近群里面讨论OGG比较热烈，我就发一个OGG的问题吧。

我们的OGG是基于RAC -》单实例复制的，昨天晚上因为维护重启了一下，漫长的等待由此开始。

OGG首先要求寻找一个2012-02-22 15:10:00的归档，而他的实际checkpoint时间是2012-02-23 23:01:54

满足它的这个无理要求后，OGG开始漫长的等待，平均一小时推进一小时的归档时间。

我们曾经在以前遇到过类似的现象，当时发现有个事物一直没有commit或rollback，kill该session后一般可以解决，但今天未发现此类事物。

SQL> select min(start_time) from gv$transaction;  MIN(START_TIME)--------------------02/24/12 12:29:59

依照过去的经验，OGG必须以1:1的时间等待到它真实的checkpoint时间点后才会恢复正常，期间任何中断都会导致计时的重新开始。我正在寻找原因，不知道各位有没有兴趣一起来参与一下呢？

GGSCI > info extract ext_full showch

EXTRACT EXT_FULL  Last Started 2012-02-24 10:54 Status RUNNING
Checkpoint Lag    12:56:35 (updated 00:00:09 ago)
Log Read Checkpoint  Oracle Redo Logs
                  2012-02-23 23:01:54  Thread 1, Seqno 41124, RBA 131803900
Log Read Checkpoint  Oracle Redo Logs
                  2012-02-23 23:01:54  Thread 2, Seqno 99721, RBA 152595944

Current Checkpoint Detail:

Read Checkpoint #1

  Oracle RAC Redo Log

  Startup Checkpoint (starting position in the data source):
Thread #: 1
Sequence #: 40851
RBA: 172443664
Timestamp: 2012-02-22 15:18:40.000000
SCN: 2951.1993440757 (12676441931253)
Redo File: /crmarc/SIEBELDB/archivelog/2012_02_24/o1_mf_1_40851_1MZgQa2gU_.arc

  Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
Thread #: 1
Sequence #: 40851
RBA: 172443664
Timestamp: 2012-02-22 15:18:40.000000
SCN: 2951.1993440757 (12676441931253)
Redo File: /crmarc/SIEBELDB/archivelog/2012_02_24/o1_mf_1_40851_1MZgQa2gU_.arc

  Current Checkpoint (position of last record read in the data source):
Thread #: 1
Sequence #: 41124
RBA: 131803900
Timestamp: 2012-02-23 23:01:54.000000
SCN: 2951.2768914013 (12677217404509)
Redo File: Not Available

Read Checkpoint #2

  Oracle RAC Redo Log

  Startup Checkpoint (starting position in the data source):
Thread #: 2
Sequence #: 99721
RBA: 152580624
Timestamp: 2012-02-23 23:01:54.000000
SCN: 2951.2768914090 (12677217404586)
Redo File: /crmarc/SIEBELDB/archivelog/2012_02_23/o1_mf_2_99721_1MZ8heKt3_.arc

  Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
Thread #: 2
Sequence #: 99721
RBA: 152580624
Timestamp: 2012-02-23 23:01:54.000000
SCN: 2951.2768914090 (12677217404586)
Redo File: /crmarc/SIEBELDB/archivelog/2012_02_23/o1_mf_2_99721_1MZ8heKt3_.arc

  Current Checkpoint (position of last record read in the data source):
Thread #: 2
Sequence #: 99721
RBA: 152595944
Timestamp: 2012-02-23 23:01:54.000000
SCN: 2951.2768914106 (12677217404602)
Redo File: /crmarc/SIEBELDB/archivelog/2012_02_23/o1_mf_2_99721_1MZ8heKt3_.arc

Write Checkpoint #1

  GGS Log Trail

  Current Checkpoint (current write position):
Sequence #: 10074
RBA: 1031
Timestamp: 2012-02-24 11:58:29.984520
Extract Trail: /crmarc/ggs_src/dirdat/p8

Header:
  Version = 2
  Record Source = A
  Type = 6
  # Input Checkpoints = 2
  # Output Checkpoints = 1

File Information:
  Block Size = 2048
  Max Blocks = 100
  Record Length = 4096
  Current Offset = 0

Configuration:
  Data Source = 3
  Transaction Integrity = 1
  Task Type = 0

Status:
  Start Time = 2012-02-24 10:54:56
  Last Update Time = 2012-02-24 11:58:29
  Stop Status = A
  Last Result = 0

EXT_FULL.txt

30.38 KB, 下载次数: 2094

分享0

收藏0 回复只看该作者道具举报

ccshu

2^#

发表于 2012-2-24 15:35:20

我自己找到一篇文档，但不确定是不是能用？

How to Exclude Empty Transactions From GoldenGate Extract Capture? [ID 1389353.1]

回复只看该作者道具举报

teapot

3^#

发表于 2012-2-24 18:06:50

Log Read Checkpoint  Oracle Redo Logs
                  2012-02-23 23:01:54  Thread 1, Seqno 41124, RBA 131803900
Log Read Checkpoint  Oracle Redo Logs
                  2012-02-23 23:01:54  Thread 2, Seqno 99721, RBA 152595944

这两个就是GG的读取日志的检查点，最主要看seqno和RBA吧

OGG首先要求寻找一个2012-02-22 15:10:00的归档，而他的实际checkpoint时间是2012-02-23 23:01:54

满足它的这个无理要求后，OGG开始漫长的等待，平均一小时推进一小时的归档时间。

不明白什么意思，GG的checkpoint不刷新？  有redo写吗？
可以拿ML的redo写的位置的SQL看一下

还有检查一下lag

回复只看该作者道具举报

teapot

4^#

发表于 2012-2-24 18:40:17

另外如果想要确定ext是否hang，可以使用
REPORTCOUNT EVERY 10 MINUTES, RATE
来看一下report的情况

回复只看该作者道具举报

teapot

5^#

发表于 2012-2-24 20:39:37

OGG Extract RBA Not Moving And LAG Increasing And Appears Hung [ID 964705.1]

Issue:
The GGSCI command INFO EXTRACT <group> shows that the RBA is not increasing and the LAG is increasing. Extract seems to be hanging.

Solution Summary:
Extract may be processing a long transaction.

Solution details:
To check, issue this command:

GGSCI> SEND EXTRACT <group>, STATUS

It should show that Extract is processing a transaction with a large number of records. To verify, issue this command:

GGSCI> SEND EXTRACT <group>, REPORT

View the report file (VIEW REPORT <group>). If WARNLONGTRANS has been implemented in the params then the report file should show messages like "Repositioning for Long Transaction" and "Finished Long Transaction, Processed nnnnn records." This indicates that Extract is running and processing long transactions.

回复只看该作者道具举报

ccshu

6^#

发表于 2012-2-24 20:51:43

extract 应该没有hang，我在之前提供的ext_full.txt文件中提供了extract的输出日志，大概每隔一个小时，extract会输出一段，让我知道，其实内部的checkpoint正在以大概1:1的速度推进。

GGSCI > send extract ext_full, status

Sending STATUS request to EXTRACT EXT_FULL ...

EXTRACT EXT_FULL (PID 2875426)
  Current status: In recovery[1]: Processing data with empty data queue

  Current read positions:
  Redo thread #: 1
  Sequence #: 40984
  RBA: 152630512
  Timestamp: 2012-02-23 11:01:38.000000
  SCN: 2951.2431164110

  Redo thread #: 2
  Sequence #: 99721
  RBA: 152595944
  Timestamp: 2012-02-23 23:01:54.000000
  SCN: 2951.2768914106

  Current write position:
  Sequence #: 10074
  RBA: 1031
  Timestamp: 2012-02-24 20:49:08.438460
  Extract Trail: /crmarc/ggs_src/dirdat/p8

回复只看该作者道具举报

teapot

7^#

发表于 2012-2-24 20:52:58

2012-02-24 10:54:56  INFO OGG-01517  Position of first record processed for Thread 1, Sequence 40851, RBA 172443664, SCN 2951.1993440757, Feb 22, 2012 3:18:40 PM.

2012-02-24 10:56:18  WARNING OGG-00723  Record with class# 88, slt# 5, at seqno 40864, rba 107356176 SCN 2951.2002356992 (12676450847488) has secondary transaction ID that is duplicate of existing open uncommitted transaction.

2012-02-24 10:56:18  WARNING OGG-00715  [Thread #1] Purging transaction (transaction id: 36.5.777249, start time: 2012-02-22 16:01:36, start seqno: 40863, start RBA: 377455632).

2012-02-24 10:56:18  WARNING OGG-00712  Updating I/O checkpoint after purging orphaned transactions on thread 1 with current position (Seq#: 40864, RBA: 107369064).

2012-02-24 11:50:20  WARNING OGG-00723  Record with class# 178, slt# 18, at seqno 40879, rba 383272464 SCN 2951.2014095363 (12676462585859) has secondary transaction ID that is duplicate of existing open uncommitted transaction.

2012-02-24 11:50:20  WARNING OGG-00715  [Thread #1] Purging transaction (transaction id: 81.18.86283, start time: 2012-02-22 17:01:26, start seqno: 40879, start RBA: 334827024).

2012-02-24 11:50:20  WARNING OGG-00712  Updating I/O checkpoint after purging orphaned transactions on thread 1 with current position (Seq#: 40879, RBA: 383275216).

https://forums.oracle.com/forums/thread.jspa?threadID=2338443
从这边讲是分布式事务，这个具体没有碰到过

Does Oracle Goldengate extract support distributed transactions? [ID 1235986.1]
GG当前不支持分布式事务

回复只看该作者道具举报

teapot

8^#

发表于 2012-2-24 21:53:12

文档上建议，分布式事务，通过一个用户去处理，GG不去捕获那个用户的事务

回复只看该作者道具举报

回到原点

9^#

发表于 2012-2-27 17:00:04

高，学习了

高手，学习了

回复只看该作者道具举报

返回列表

		自动登录	找回密码
密码			注册