Oracle数据库数据恢复、性能优化»论坛 › Oracle › Oracle数据库管理 › OGG-01028 not greater than the highest SCN 的原理是 ...

316 积分	0 好友	0 主题

发消息

OGG-01028 not greater than the highest SCN 的原理是什么？

1^#

发表于 2011-12-29 15:21:28 | 查看: 19222| 回复: 16

当前使用的版本是 11.1.1.0.0Build 078

GGSCI (p560b) 13>"p560b:/home/oracle/ogg>"./ggsci

Oracle GoldenGateCommand Interpreter for Oracle
Version 11.1.1.0.0Build 078
AIX 5L, ppc, 64bit(optimized), Oracle 11 on Jul 28 2010 15:53:00

Copyright (C) 1995,2010, Oracle and/or its affiliates. All rights reserved.

多次遇到OGG-01028 not greater than the highest SCN，很烦人
2011-12-2108:20:13
ERROR
OGG-01028
encountered commit SCN 0.234946815 (234946815) that is not greater thanthe highest SCN already processed 0.234946819

(234946819) Redo Thread 2 (2) xid 17.27.112675(0x0011.001b.0001b823), starting seq.rba 3763.67596304, scn 0.234946814(234946814), commit seq.rba 3763.6759
6916 committimestamp 2011-12-21 08:20:12.000000.

2011-12-2614:30:47
ERROR
OGG-01028
encountered commit SCN 0.237693580 (237693580) that is not greater thanthe highest SCN already processed 0.237693616 (237693616) Redo Thread 2 (2) xid18.15.115041 (0x0012.000f.0001c161), starting seq.rba 3806.12705296, scn0.237693569 (237693569), commit seq.rba 3806.12710664 commit timestamp2011-12-26 14:30:46.000000.

NOTE参考见987379.1，http://blog.sina.com.cn/s/blog_4e7cf89d0100htow.html 也给出了分析，但是本人愚钝，仍然没有理解到底为什么会发生，为什么加大延时可能可以解决该问题。

请不吝赐教！

分享0

收藏0 回复只看该作者道具举报

Maclean Liu(刘相兵

2^#

发表于 2011-12-29 18:54:20

Document searched :

OGG Encountered SCN That Is Not Greater Than The Highest SCN Already Processed [ID 957112.1]

Applies to:
Oracle GoldenGate - Version: 6.0.0 and later [Release: 6.0.0 and later ]
Information in this document applies to any platform.
Solution

Issue:
Extract ERROR 180 encountered commit SCN <scn> that is not greater than the highest SCN already processed

Solution Overview:
This error occurs shortly after a transaction is written to the idle node but does not yet appear in the redo log when the current transaction is processed. This transaction will have a higher SCN than the previous transaction

Solution Details:
In an Oracle RAC environment, Extract has a coordinator thread. This thread assembles the operations that are generated on each of the RAC nodes into SCN order. Before starting to assemble the transactions, Extract waits for a short time after Oracle writes the transaction to the redo log. If a node has been idle with no transactions, Extract waits the number of milliseconds specified in the Extract parameter THREADOPTIONS MAXCOMMITPROPAGATIONDELAY (default 3 seconds). Waiting ensures that there are no new transactions on the idle node before writing the current transactions to the trail.

Possible Reasons for error:
-- One of the threads is slower than the other.
-- The redo logs are not flushed on time due to latency on Log writes.
-- Network issues between Extract and one of the RAC nodes, if Extract is running on a system separate from RAC nodes.
-- Long log write times due to a standby configuration, if any.
-- Log file I/Os are taking unusually long times to complete.
-- Time imperfections between the cluster nodes. All nodes in the RAC cluster must have synchronized system clocks. If Extract is running on a system other than any of the RAC nodes, that system clock must be in sync, too, because we compare the local system's time to the commit timestamp to make critical decisions. For information about synchronizing system clocks, consult www.ntp.org or your systems administrator
-- The DB timezone setting is different than the OS timezone setting

Case example:
The following shows a time sequence and explains what happens at each step in time:
TIME WHAT HAPPENED

------- ---------

2008-03-31 14:08:21.964 Starting from this timestamp, thread 2 is reporting that it is at the end of the file (EOF).

...... Thread 2 keeps reporting EOF.

2008-03-31 14:25:33.021 Thread 1 reports it is at EOF

2008-03-31 14:25:33.110 Thread 2 reports it is at EOF

2008-03-31 14:25:34.040 Thread 1 gets the commit record, with timestamp of 2008-03-31 14:10:52. It commits the transaction with an SCN of 1.2943495430.

no timestamp The coordinator thread processes the committed transaction from thread 1, because thread 2 kept reporting EOF for the last 17 minutes.

2008-03-31 14:25:34.188 Thread 2 got an update record and a commit record with timestamp of 2008-03-31 14:10:52. It commits transaction with an SCN of 1.2943495406

2008-03-31 14:25:34.xxx The main thread processes the committed transaction from thread 2, and detects an out of order SCN. Extract abends with this error: 2008-03-31 14:25:34 GGS ERROR 180 encountered commit SCN 1.2943495406 (0001.af722cee) that is not greater than the highest SCN already processed 1.2943495430 (0001.af722d06) Redo Thread 2 (2) xid 29.27.131432 (0x001d.01b.00020168), starting seq.rba 3725.98984464, scn 1.2943495403 (0001.af722ceb), commit seq.rba 3725.99011828 commit timestamp 2008-03-31 14:10:52.000000.

n this case, the out of order SCN was caused by the huge gap in the time between when the record was generated to the log buffer and when it was captured by Extract.

Troubleshooting steps:

-- Check to see if the clocks on different nodes in the same RAC are in sync

-- Check to see if there are any unusual time gaps between log data on threads during the SCN error.

-- Check how the 'remote'  log files, if any, are shared with the system on which Extract is running (e.g., NFS, SAN, shared SCSI, etc.) and see if there are any noticeable problems there.

-- Check to see if there are messages in the Oracle alert logs, lgwr tracefiles or net logs during the problematic time, to see if they provide any clues to point to one of the 'Possible Reasons' discussed above.
-- Check to see if any analysis been done on the I/O performance on the drives that contain the log files and whether they are "real Disk";, NFS, SAN, etc.
-- Check to see if the DB timezone is setup correctly.  Issue 'SELECT dbtimezone FROM DUAL', the timezone should match the OS.  For example is the result is "+00:00", that means DB is using GMT/UTC timezone.

Recommendation:-
The MAXCOMMITPROPAGATIONDELAY parameter can be used to set the delay time well above the max_commit_propogation_delay setting in the database, plus the default extra padding that Extract adds (2000 milliseconds).

In Oracle RAC, the max_commit_propogation_delay specifies the maximum amount of time allowed before the system change number (SCN) held in the SGA of an instance is refreshed by the log writer process (LGWR). Units are in hundredths of seconds.

To check Oracle's value:

Connect as a user with dba privileges and issue:
SQL> show parameter max_commit NAME TYPE VALUE

------------------------------------ ----------- ------------------------------ max_commit_propagation_delay integer 0

To set MAXCOMMITPROPAGATIONDELAY : The value of MAXCOMMITPROPAGATIONDELAY must be greater than zero and less than 90000 milliseconds.

This is how the line should look in the Extract parameter file:

THREADOPTIONS MAXCOMMITPROPAGATIONDELAY 2700

Starting from GGS Version 9.x and above, an additional parameter IOLATENCY can be used if Extract abends with 'encountered SCN XXXXX' too often. IOLATENCY adjusts the delta between the database-configured max commit propogation delay and the internal value that Extract uses. By default IOLATENCY is set to 1.5 seconds 

Note: Valid values for IOLATENCY are between 0 and 180000 milliseconds (3 minutes).

The combined parameters should look like this:

THREADOPTIONS MAXCOMMITPROPAGATIONDELAY <msec> IOLATENCY <msec>

The combination of MAXCOMMITPROPAGATIONDELAY and IOLATENCY can be used to ensure that:

1) the Oracle threads have written their most recent SCN data to the logs
2) the I/O processes have had time to complete, considering the various factors that increase I/O latency, such as hardware contention, file locking, long seek and queue times, etc.

The key is in adjusting IOLATENCY and MAXCOMMITPROPAGATIONDELAY parameters to an optimum value based on the frequency of hitting the error.

Hint: If the problem happens too often, you can start with high values for IOLATENCY and MAXCOMMITPROPAGATIONDELAY parameters. Once the error stops happening, you can gradually decrease the values to see the SCN number where this error starts appearing again. This would give you an idea of the boundary values specific to that environment.

Furthermore, the reason Extract successfully restarts after this error is that, on restart, it re-reads the operations, and this time they are all on disk and can be processed in correct SCN order. The side effect is that Extract rewrites operations into the trail. This may cause the data pump or Replicat to abend with incompatible record errors.

Note: From Oracle 10.2, parameter max_commit_propagation_delay has been deprecated. So "SQL> show parameter max_commit NAME TYPE VALUE " is no longer viable for Oracle 10.2 upward.

Final Fix of this issue in V11.1.1+

Oracle DB HEARTBEAT is utilized to coordinate the read from multiple nodes to totally eliminate this issue.  This is being tracked in BugDB 10356426, which is implemented for V11.1.1.1, also backported back to V11.1.1.0.x code line.  If you are running V11.1.1.0.x GoldenGate, please request a build that has this fix.

		自动登录	找回密码
密码			注册

OGG-01028 not greater than the highest SCN 的原理是什么？

回复 8# 的帖子

回复 9# 的帖子

回复 12# 的帖子