- 最后登录
- 2023-8-16
- 在线时间
- 1686 小时
- 威望
- 2135
- 金钱
- 50532
- 注册时间
- 2011-10-12
- 阅读权限
- 200
- 帖子
- 5207
- 精华
- 39
- 积分
- 2135
- UID
- 2
|
2#
发表于 2011-12-29 18:54:20
Document searched :
OGG Encountered SCN That Is Not Greater Than The Highest SCN Already Processed [ID 957112.1]
Applies to:
Oracle GoldenGate - Version: 6.0.0 and later [Release: 6.0.0 and later ]
Information in this document applies to any platform.
Solution
Issue:
Extract ERROR 180 encountered commit SCN <scn> that is not greater than the highest SCN already processed
Solution Overview:
This error occurs shortly after a transaction is written to the idle node but does not yet appear in the redo log when the current transaction is processed. This transaction will have a higher SCN than the previous transaction
Solution Details:
In an Oracle RAC environment, Extract has a coordinator thread. This thread assembles the operations that are generated on each of the RAC nodes into SCN order. Before starting to assemble the transactions, Extract waits for a short time after Oracle writes the transaction to the redo log. If a node has been idle with no transactions, Extract waits the number of milliseconds specified in the Extract parameter THREADOPTIONS MAXCOMMITPROPAGATIONDELAY (default 3 seconds). Waiting ensures that there are no new transactions on the idle node before writing the current transactions to the trail.
Possible Reasons for error:
-- One of the threads is slower than the other.
-- The redo logs are not flushed on time due to latency on Log writes.
-- Network issues between Extract and one of the RAC nodes, if Extract is running on a system separate from RAC nodes.
-- Long log write times due to a standby configuration, if any.
-- Log file I/Os are taking unusually long times to complete.
-- Time imperfections between the cluster nodes. All nodes in the RAC cluster must have synchronized system clocks. If Extract is running on a system other than any of the RAC nodes, that system clock must be in sync, too, because we compare the local system's time to the commit timestamp to make critical decisions. For information about synchronizing system clocks, consult www.ntp.org or your systems administrator
-- The DB timezone setting is different than the OS timezone setting
Case example:
The following shows a time sequence and explains what happens at each step in time:
TIME WHAT HAPPENED
------- ---------
2008-03-31 14:08:21.964 Starting from this timestamp, thread 2 is reporting that it is at the end of the file (EOF).
...... Thread 2 keeps reporting EOF.
2008-03-31 14:25:33.021 Thread 1 reports it is at EOF
2008-03-31 14:25:33.110 Thread 2 reports it is at EOF
2008-03-31 14:25:34.040 Thread 1 gets the commit record, with timestamp of 2008-03-31 14:10:52. It commits the transaction with an SCN of 1.2943495430.
no timestamp The coordinator thread processes the committed transaction from thread 1, because thread 2 kept reporting EOF for the last 17 minutes.
2008-03-31 14:25:34.188 Thread 2 got an update record and a commit record with timestamp of 2008-03-31 14:10:52. It commits transaction with an SCN of 1.2943495406
2008-03-31 14:25:34.xxx The main thread processes the committed transaction from thread 2, and detects an out of order SCN. Extract abends with this error: 2008-03-31 14:25:34 GGS ERROR 180 encountered commit SCN 1.2943495406 (0001.af722cee) that is not greater than the highest SCN already processed 1.2943495430 (0001.af722d06) Redo Thread 2 (2) xid 29.27.131432 (0x001d.01b.00020168), starting seq.rba 3725.98984464, scn 1.2943495403 (0001.af722ceb), commit seq.rba 3725.99011828 commit timestamp 2008-03-31 14:10:52.000000.
n this case, the out of order SCN was caused by the huge gap in the time between when the record was generated to the log buffer and when it was captured by Extract.
Troubleshooting steps:
-- Check to see if the clocks on different nodes in the same RAC are in sync
-- Check to see if there are any unusual time gaps between log data on threads during the SCN error.
-- Check how the 'remote' log files, if any, are shared with the system on which Extract is running (e.g., NFS, SAN, shared SCSI, etc.) and see if there are any noticeable problems there.
-- Check to see if there are messages in the Oracle alert logs, lgwr tracefiles or net logs during the problematic time, to see if they provide any clues to point to one of the 'Possible Reasons' discussed above.
-- Check to see if any analysis been done on the I/O performance on the drives that contain the log files and whether they are "real Disk";, NFS, SAN, etc.
-- Check to see if the DB timezone is setup correctly. Issue 'SELECT dbtimezone FROM DUAL', the timezone should match the OS. For example is the result is "+00:00", that means DB is using GMT/UTC timezone.
Recommendation:-
The MAXCOMMITPROPAGATIONDELAY parameter can be used to set the delay time well above the max_commit_propogation_delay setting in the database, plus the default extra padding that Extract adds (2000 milliseconds).
In Oracle RAC, the max_commit_propogation_delay specifies the maximum amount of time allowed before the system change number (SCN) held in the SGA of an instance is refreshed by the log writer process (LGWR). Units are in hundredths of seconds.
To check Oracle's value:
Connect as a user with dba privileges and issue:
SQL> show parameter max_commit NAME TYPE VALUE
------------------------------------ ----------- ------------------------------ max_commit_propagation_delay integer 0
To set MAXCOMMITPROPAGATIONDELAY : The value of MAXCOMMITPROPAGATIONDELAY must be greater than zero and less than 90000 milliseconds.
This is how the line should look in the Extract parameter file:
THREADOPTIONS MAXCOMMITPROPAGATIONDELAY 2700
Starting from GGS Version 9.x and above, an additional parameter IOLATENCY can be used if Extract abends with 'encountered SCN XXXXX' too often. IOLATENCY adjusts the delta between the database-configured max commit propogation delay and the internal value that Extract uses. By default IOLATENCY is set to 1.5 seconds
Note: Valid values for IOLATENCY are between 0 and 180000 milliseconds (3 minutes).
The combined parameters should look like this:
THREADOPTIONS MAXCOMMITPROPAGATIONDELAY <msec> IOLATENCY <msec>
The combination of MAXCOMMITPROPAGATIONDELAY and IOLATENCY can be used to ensure that:
1) the Oracle threads have written their most recent SCN data to the logs
2) the I/O processes have had time to complete, considering the various factors that increase I/O latency, such as hardware contention, file locking, long seek and queue times, etc.
The key is in adjusting IOLATENCY and MAXCOMMITPROPAGATIONDELAY parameters to an optimum value based on the frequency of hitting the error.
Hint: If the problem happens too often, you can start with high values for IOLATENCY and MAXCOMMITPROPAGATIONDELAY parameters. Once the error stops happening, you can gradually decrease the values to see the SCN number where this error starts appearing again. This would give you an idea of the boundary values specific to that environment.
Furthermore, the reason Extract successfully restarts after this error is that, on restart, it re-reads the operations, and this time they are all on disk and can be processed in correct SCN order. The side effect is that Extract rewrites operations into the trail. This may cause the data pump or Replicat to abend with incompatible record errors.
Note: From Oracle 10.2, parameter max_commit_propagation_delay has been deprecated. So "SQL> show parameter max_commit NAME TYPE VALUE " is no longer viable for Oracle 10.2 upward.
Final Fix of this issue in V11.1.1+
Oracle DB HEARTBEAT is utilized to coordinate the read from multiple nodes to totally eliminate this issue. This is being tracked in BugDB 10356426, which is implemented for V11.1.1.1, also backported back to V11.1.1.0.x code line. If you are running V11.1.1.0.x GoldenGate, please request a build that has this fix. |
|