Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

316

积分

0

好友

0

主题
1#
发表于 2011-12-29 15:21:28 | 查看: 19223| 回复: 16
当前使用的版本是 11.1.1.0.0Build 078

GGSCI (p560b) 13>"p560b:/home/oracle/ogg>"./ggsci

Oracle GoldenGateCommand Interpreter for Oracle
Version 11.1.1.0.0Build 078
AIX 5L, ppc, 64bit(optimized), Oracle 11 on Jul 28 2010 15:53:00

Copyright (C) 1995,2010, Oracle and/or its affiliates. All rights reserved.

多次遇到OGG-01028 not greater than the highest SCN,很烦人
2011-12-2108:20:13
ERROR
OGG-01028
encountered commit SCN 0.234946815 (234946815) that is not greater thanthe highest SCN already processed 0.234946819


(234946819) Redo Thread 2 (2) xid 17.27.112675(0x0011.001b.0001b823), starting seq.rba 3763.67596304, scn 0.234946814(234946814), commit seq.rba 3763.6759

6916 committimestamp 2011-12-21 08:20:12.000000.

2011-12-2614:30:47
ERROR
OGG-01028
encountered commit SCN 0.237693580 (237693580) that is not greater thanthe highest SCN already processed 0.237693616 (237693616) Redo Thread 2 (2) xid18.15.115041 (0x0012.000f.0001c161), starting seq.rba 3806.12705296, scn0.237693569 (237693569), commit seq.rba 3806.12710664 commit timestamp2011-12-26 14:30:46.000000.



NOTE参考见987379.1,http://blog.sina.com.cn/s/blog_4e7cf89d0100htow.html 也给出了分析,但是本人愚钝,仍然没有理解到底为什么会发生,为什么加大延时可能可以解决该问题。

请不吝赐教!
17#
发表于 2012-3-6 22:07:00
Please add THREADOPTIONS MAXCOMMITPROPAGATIONDELAY 80000 IOLATENCY 160000 to the extract parameter file and restart it. If you continue to encounter the extract abend again with same error then please try to increase parameter values and retry starting the extract.

Also please follow the below steps whenever extract abends again with same error.
1.Add below parameter to the extract parameter file
THREADOPTIONS MAXCOMMITPROPAGATIONDELAY 90000 IOLATENCY 180000

2.Do a ETROLLOVER on Extract, and take a note of the new sequence number of the trail file.
GGSCI> ALTER EXTRACT <extract_name>, ETROLLOVER
GGSCI> INFO EXTRACT <extract_name>, DETAIL

3. Start the extract
GGSCI> START EXTRACT <extract_name>

3. Send PUMP, LOGEND, to see if it's at the end of the trail.
SEND EXTRACT [pump_name], LOGEND

4. Once it is at the end of the trail file, You must stop the pump, and do an ETROLLOVER for it too. Take note of the new trail file sequence number that is created from this step.
GGSCI> STOP EXTRACT [pump_name]
GGSCI> ALTER EXTRACT [pump_name], ETROLLOVER

5. Alter the pump to SEQNO to the new trail file created from step #1.
GGSCI> ALTER EXTRACT [pump_name], EXTSEQNO xxx EXTRBA 0

6. Restart pump
START EXTRACT [pump_name]

7. Send Replicat, LOGEND to make sure it has processed all the remaining data, and stop Replicat.
SEND REPLICAT [name], LOGEND
STOP REPLICAT [name]

8. If replicat is not at end of trail, generate a report and forcestop replicat
SEND REPLICAT [name], REPORT
STOP REPLICAT [name]!

9. Add the following parameters to replicat parameter file to allow replicat to process each trail record as a single transaction
GROUPTRANSOPS 1
MAXTRANSOPS 1

11. Restart the replicat
GGSCI> START REPLICAT [name]

12. Once replicat has completely processed the trail, stop the replicat
GGSCI> STOP REPLICAT [name]

13. Add below parameter to the Replicat parameter file
HANDLECOLLISIONS

14.Comment out below parameters
GROUPTRANSOPS 1
MAXTRANSOPS 1

15. ALTER REPLICAT, SEQNO to the new trail file created by pump extract in step #4.
GGSCI> ALTER REPLICAT [name], EXTSEQNO XXX EXTRBA 0

16. Restart the replicats
GGSCI> START REPLICAT [name]

17.Check Replicat statistics
GGSCI> SEND REPLICAT [name], TOTALSONLY *

18. Once Replicat has processed the out of order SCN operations, disable HANDLECOLLISIONS. Edit the replicat parameter file and comment out the HANDLECOLLISIONS parameter.

19.Stop the replicats
GGSCI> STOP REPLICAT [name]

20.Restart the replicats
GGSCI> START REPLICAT [name]


Brief Description on the Issue:
-------------------
This error normally happens only in a RAC environment after a transaction is written to the idle node but does not yet appear in the redo log when the current transaction is processed. This transaction will have a higher SCN then the previous transaction. Normally on restart we will be able to get the extract running as on restart, it re-reads the operations, and this time they are all on disk and can be processed in correct SCN order. We can avoid this error by adding the parameter THREADOPTIONS MAXCOMMITPROPAGATIONDELAY to the extract . The MAXCOMMITPROPAGATIONDELAY parameter can be used to set the delay time well above the max_commit_propagation_delay setting in the database, plus the default extra padding that Extract adds (2000 milliseconds).

In an Oracle RAC environment, Extract has a coordinator thread. This thread assembles the operations that are generated on each of the RAC nodes into SCN order. Before starting to assemble the transactions, Extract waits for a short time after Oracle writes the transaction to the redo log. If a node has been idle with no transactions, Extract waits the number of milliseconds specified in the Extract parameter THREADOPTIONS MAXCOMMITPROPAGATIONDELAY (default 3 secs). Waiting ensures that there are no new transactions on the idle node before writing the current transactions to the trail.

Possible Reasons for error:
---------------------------
1.One of the threads is slower than the other.
2.The redo logs are not flushed on time due to latency on Log writes.
3.Network issues between Extract and one of the RAC nodes, if Extract is running on a system separate from RAC nodes.
4.Long log write times due to a standby configuration, if any.
5.Log file I/Os are taking unusually long times to complete.
6.Time imperfections between the cluster nodes. All nodes in the RAC cluster must have synchronized system clocks. If Extract is running on a system other than any of the RAC nodes, that system clock must be in sync, too, because we compare the local system's time to the commit timestamp to make critical decisions. For information about synchronizing system clocks, consult www.ntp.org or your systems administrator

More info and the possible reasons for this error is available in the KM added to the SR.

The reason Extract successfully restarts after this error is that, on restart, it re-reads the operations, and this time they are all on disk and can be processed in correct SCN order. The side effect is that Extract rewrites operations into the new trail file.

when HANDLECOLLISIONS parameter is used in replicat, it may possible to have duplicate rows inserted to target tables with no PK/UI as it mostly handles UPDATE/DELETE operations only, not Inserts. Insert during overlapping time may cause the duplicate records.

If you have any target tables without PK/UI, before starting the replicat to read form new trail file, it is better to use Logdump utility and scan the new trail file and previous trail file, check and compare the records re-written and alter the replicat to read from a new record RBA. Make sure you have grouptransops 1 and maxtransops 1 in the replicat parameter file and ran to the end of the previous trail file and then run current trail file after altering it to the new record RBA. This way can make sure we have not missed any of the records and can avoid inserting duplicate rows in to the target tables without PK/UI.

As a best practice we always recommend to have PK/UI configured at table level to enforce uniqueness at database level to avoid duplicate rows in the target tables. You could always contact us for any technical help for following above procedure.

回复 只看该作者 道具 举报

16#
发表于 2011-12-30 23:19:10
一个例子

instance 1 :

create table inc_scn (t1 int);

begin
loop
insert into inc_scn values(1);
commit;
end loop;
end;
/



instance 2 idle:



02:17:50 SQL> select THREAD# , LAST_REDO_CHANGE# from v$thread;

   THREAD# LAST_REDO_CHANGE#
---------- -----------------
         1          12095639
         2          11951058

02:17:56 SQL> /

   THREAD# LAST_REDO_CHANGE#
---------- -----------------
         1          12105768
         2          11951058

02:17:57 SQL> /

   THREAD# LAST_REDO_CHANGE#
---------- -----------------
         1          12114511
         2          11951058

02:18:00 SQL> /

   THREAD# LAST_REDO_CHANGE#
---------- -----------------
         1          12114511
         2          11951058

02:18:01 SQL> /

   THREAD# LAST_REDO_CHANGE#
---------- -----------------
         1          12124139
         2          11951058

回复 只看该作者 道具 举报

15#
发表于 2011-12-30 22:48:06

回复 12# 的帖子

对,我没有理解你所说的增进SCN会有什么用。

难道是说相对事物较少的节点,它的SCN比事物较多的那个节点总是小一些吗?所以,它上面一有事物,就会碰上这个问题?

回复 只看该作者 道具 举报

14#
发表于 2011-12-30 22:39:31
比如有一个事物在19:00 提交,SCN 为 1900,也就是说CSCN为1900,但是这个事物直到 19:05 才写完,
那么OGG 是从19:00 开始计算 wait 的时间,还是从19:05开始计算?
如果是从19:00开始计算,那么默认等待3秒,也就是说19:03,在还没有写入完成的时候就开始读取LOG文件了。

OGG是只读LOGFILE,不读LOGBUFFER的吧?!

回复 只看该作者 道具 举报

13#
发表于 2011-12-30 22:34:45
2008-03-31 14:25:33.021 Thread 1 reports it is at EOF

这里的EOF到底是说TRAIL的EOF,还是说REDOLOG 的EOF?

回复 只看该作者 道具 举报

12#
发表于 2011-12-30 22:34:42

回复 9# 的帖子

这只是一种手段, 貌似shareplex 在RAC 主备用环境中 也会出现类似的"encountered commit SCN 0.234946815 (234946815) that is not greater thanthe highest SCN already processed " , 也是通过 每秒做一些DML来解决的

回复 只看该作者 道具 举报

11#
发表于 2011-12-30 22:34:14
6楼的问题也没回答,我总是觉得NOTE里面的例子有问题,既这样讲:
If a node has been idle with no transactions, Extract waits the number of milliseconds specified in the Extract parameter THREADOPTIONS MAXCOMMITPROPAGATIONDELAY (default 3 seconds). Waiting ensures that there are no new transactions on the idle node before writing the current transactions to the trail.

在例子里却没有WAIT 3 seconds,仅仅是因为 thread 2 kept reporting EOF for the last 17 minutes ??

回复 只看该作者 道具 举报

10#
发表于 2011-12-30 22:23:33
--Final Fix of this issue in V11.1.1+

Oracle DB HEARTBEAT is utilized to coordinate the read from multiple nodes to totally eliminate this issue.  This is being tracked in BugDB 10356426, which is implemented for V11.1.1.1, also backported back to V11.1.1.0.x code line.  If you are running V11.1.1.0.x GoldenGate, please request a build that has this fix.

哎,不知道这里是如何利用HEARTBEAT 来避免的!

讲OGG INTERNAL的文章太少了!

回复 只看该作者 道具 举报

9#
发表于 2011-12-30 22:22:34

回复 8# 的帖子

--可以在空闲节点上配置 类似每秒种插入一条记录这样的自动作业,以激励scn的更新

如果不这样,会怎样呢?

ML顺便讲一下RAC中,SCN的原理好吧?

回复 只看该作者 道具 举报

8#
发表于 2011-12-30 18:51:29
补充:

对于 主备用类型的RAC: 即2节点RAC,其中一个空闲备用,另一个接入应用; 可能存在空闲节点的scn 在短时间内 远远落后于繁忙节点的情况。

针对这种场景,可以在空闲节点上配置 类似每秒种插入一条记录这样的自动作业,以激励scn的更新。

回复 只看该作者 道具 举报

7#
发表于 2011-12-30 18:32:07
试想 当Extract capture 的Interval 变长后 , 那么本身 因 max_commit_propagation_delay或 IO 延迟 而 小概率出现的"encountered commit SCN 0.234946815 (234946815) that is not greater thanthe highest SCN already processed "
问题 , 不也会因此大大减少出现的概率吗?

当然如果有 RAC 一方 thread scn明显滞后 或者 节点间 时钟不一致的话 是没有办法通过 IOLATENCY and MAXCOMMITPROPAGATIONDELAY 解决的。

回复 只看该作者 道具 举报

6#
发表于 2011-12-30 00:50:04
In this case, the out of order SCN was caused by the huge gap in the time between when the record was generated to the log buffer and when it was captured by Extract.

这个举例里面不正好符合增大读取LOG延时的说法吗?为什么说这是导致的原因呢?

回复 只看该作者 道具 举报

5#
发表于 2011-12-30 00:46:56
问题是为什么增大等待读取log时间,可能可以避免这个问题?

回复 只看该作者 道具 举报

4#
发表于 2011-12-29 19:14:26
1.
该问题的成因是 ogg 在RAC环境中通过 coordinator thread 来汇集每一个节点上的redo vector 以便scn 有序,
Extract 会在oracle 写入redo log后 等待一段时间才开始汇集事务。

在以下几种情形中会引发该OGG-01028
encountered commit SCN 0.234946815 (234946815) that is not greater thanthe highest SCN already processed 错误:

RAC中 一个thread 的scn明显滞后于别的thread
  redo log 因为写出延迟导致没有及时刷新等

如果你的环境是RAC系统 , 那么建议

比对  各thread 之间的 scn
比对 各节点的时间
检查告警日志

如:
select thread#,LAST_REDO_CHANGE# from v$thread;

   THREAD# LAST_REDO_CHANGE#
---------- -----------------
         1          11698506
         2          11698595



SQL> alter session set nls_date_format='YYYY-MM-DD hh24:mi:ss';

Session altered.

SQL> select sysdate,dbtimezone from dual;

SYSDATE             DBTIME
------------------- ------
2011-12-29 22:11:10 +00:00


2. 可以通过配置IOLATENCY and MAXCOMMITPROPAGATIONDELAY 参数减少该问题发生的概率

The key is in adjusting IOLATENCY and MAXCOMMITPROPAGATIONDELAY parameters to an optimum value based on the frequency of hitting the error.

回复 只看该作者 道具 举报

3#
发表于 2011-12-29 18:54:47
How to recover from Extract ERROR 180 encountered commit SCN that is not greater than the highest SCN already processed (GG Version 10) [ID 987379.1]

Applies to:
Oracle GoldenGate - Version: 10.0.0.0 and later   [Release: 10.0.0 and later ]
Information in this document applies to any platform.
Goal
How to recover from Extract ERROR 180 encountered commit SCN<n.xxx> that is not greater than the highest SCN already processed (GG Version 10)
Solution
This applies for Oracle GoldenGate Version 10 and above only using Version 10 formatted trail and NOT using the RMTTRAIL or EXTTRAIL FORMAT RELEASE parameter.

Description:
This error occurs in an Oracle RAC environment after a transaction is written to the idle node but does not yet appear in the redo log when the current transaction is processed. This transaction will have a higher SCN then the previous transaction.

For a discussion on how to avoid this issue, please view Document 957112.1.

Solution Overview:
Verify that the Extract Pump and Replicat have completely processed the records in trail then alter Extract Pump and Replicat to process new trail created by the ETROLLOVER from the main Extract.

Solution Details:
1. Do an ETROLLOVER on Extract, and take note of the new sequence number of the trail file.
ALTER EXTRACT [name], ETROLLOVER

2. Start extract
START EXTRACT [name]

3. Send PUMP, LOGEND, to see if it's at the end of the previous trail.
SEND EXTRACT [pump_name], LOGEND

4. Once it is at the end of the trail file, You must stop the pump, and do an ETROLLOVER for it too. Take note of the new trail file sequence number that is created from this step.
STOP EXTRACT [pump_name]
ALTER EXTRACT [pump_name], ETROLLOVER

5. Alter the pump to SEQNO to the new trail file created from step #1.
ALTER EXTRACT [pump_name], EXTSEQNO ##### EXTRBA 0

6. Restart pump
START EXTRACT [pump_name]

7. Send Replicat, LOGEND to make sure it has processed all the remaining data, and stop Replicat.
SEND REPLICAT [name], LOGEND
STOP REPLICAT [name]

8. If replicat is not at end of trail, generate a report and forcestop replicat
SEND REPLICAT [name], REPORT
STOP REPLICAT [name]!

9. Add the following parameters to replicat parameter file to allow replicat to process each trail record as a single transaction, or set them to 1 if you have any of these parameters:
GROUPTRANSOPS 1
MAXTRANSOPS 1

10. Restart replicat
START REPLICAT [name]

11. Once replicat has completely processed the trail, stop the replicat
STOP REPLICAT [name]

12. Edit the replicat parameter file:
     - Add parameter HANDLECOLLISIONS to Replicat parameter file
     - Remove or comment out GROUPTRANSOPS and MAXTRANSOPS or revert them back to their original values.

Note: There are pre-conditions for using  HANDLECOLLISIONS . Ther emust be either

a. no pkupdates

or
b. extract  has "FETCHOPTIONS FETCHPKUPDATECOLS"

Also all the tables should have Primary key or unique index on the table to avoid data integrity issues when using handlecollisions.

13. ALTER REPLICAT, SEQNO to the new trail file created in step #4.
ALTER REPLICAT [name], EXTSEQNO ###### EXTRBA 0

14. Start Replicat
START REPLICAT [name]

15. Once Replicat has processed the out of order SCN operations, disable HANDLECOLLISIONS. You could also look for the CSN and wait for Replicat to checkpoint past it.
SEND REPLICAT [name], NOHANDLECOLLISIONS.

16.Edit the replicat parameter and comment out the HANDLECOLLISIONS parameter. You do not need to stop/restart replicat. This will ensure that on any subsequent replicat restarts the parameter is disabled.

Note:
If the out of order transactions happen to the SAME record on the SAME table, then the Replicat will probably abend in step 7 or step 10. If that is the case, look at the discard file find out that particular record (the primary key value). Then add handlecollisions, continue with the steps below. Later, once the Replicat caught up, that particular record needs to be manually synced.
However the chance of an out of order transactions happening to the SAME record on SAME table is rare. Given an example, this will mean that the user application insert record A from node 1, then immediately it updated this record A on node 2. Out of order transactions usually happens to different tables or different records on same table.
In other words, if you do see collisions been handled in this recovery procedure, that means you have to re-sync ONLY those particular records that have collisions.




References
BUG:9219767 - RFE: PUMP EXTRACT STRUCK DUE TO IMPROPER MESSAGE PROVIDED
BUG:9259123 - RFE: AUTOMATIC RECOVERY OF THE EXTRACT WHICH GOT ABENDED WITH OUT OF ORDER SCN

回复 只看该作者 道具 举报

2#
发表于 2011-12-29 18:54:20
Document searched :



OGG Encountered SCN That Is Not Greater Than The Highest SCN Already Processed [ID 957112.1]

Applies to:
Oracle GoldenGate - Version: 6.0.0 and later   [Release: 6.0.0 and later ]
Information in this document applies to any platform.
Solution

Issue:
Extract ERROR 180 encountered commit SCN <scn> that is not greater than the highest SCN already processed

Solution Overview:
This error occurs shortly after a transaction is written to the idle node but does not yet appear in the redo log when the current transaction is processed. This transaction will have a higher SCN than the previous transaction

Solution Details:
In an Oracle RAC environment, Extract has a coordinator thread. This thread assembles the operations that are generated on each of the RAC nodes into SCN order. Before starting to assemble the transactions, Extract waits for a short time after Oracle writes the transaction to the redo log. If a node has been idle with no transactions, Extract waits the number of milliseconds specified in the Extract parameter THREADOPTIONS MAXCOMMITPROPAGATIONDELAY (default 3 seconds). Waiting ensures that there are no new transactions on the idle node before writing the current transactions to the trail.

Possible Reasons for error:
-- One of the threads is slower than the other.
-- The redo logs are not flushed on time due to latency on Log writes.
-- Network issues between Extract and one of the RAC nodes, if Extract is running on a system separate from RAC nodes.
-- Long log write times due to a standby configuration, if any.
-- Log file I/Os are taking unusually long times to complete.
-- Time imperfections between the cluster nodes. All nodes in the RAC cluster must have synchronized system clocks. If Extract is running on a system other than any of the RAC nodes, that system clock must be in sync, too, because we compare the local system's time to the commit timestamp to make critical decisions. For information about synchronizing system clocks, consult www.ntp.org or your systems administrator
-- The DB timezone setting is different than the OS timezone setting


Case example:
The following shows a time sequence and explains what happens at each step in time:
TIME WHAT HAPPENED

------- ---------

2008-03-31 14:08:21.964 Starting from this timestamp, thread 2 is reporting that it is at the end of the file (EOF).

...... Thread 2 keeps reporting EOF.

2008-03-31 14:25:33.021 Thread 1 reports it is at EOF

2008-03-31 14:25:33.110 Thread 2 reports it is at EOF

2008-03-31 14:25:34.040 Thread 1 gets the commit record, with timestamp of 2008-03-31 14:10:52. It commits the transaction with an SCN of 1.2943495430.

no timestamp The coordinator thread processes the committed transaction from thread 1, because thread 2 kept reporting EOF for the last 17 minutes.

2008-03-31 14:25:34.188 Thread 2 got an update record and a commit record with timestamp of 2008-03-31 14:10:52. It commits transaction with an SCN of 1.2943495406

2008-03-31 14:25:34.xxx The main thread processes the committed transaction from thread 2, and detects an out of order SCN. Extract abends with this error: 2008-03-31 14:25:34 GGS ERROR 180 encountered commit SCN 1.2943495406 (0001.af722cee) that is not greater than the highest SCN already processed 1.2943495430 (0001.af722d06) Redo Thread 2 (2) xid 29.27.131432 (0x001d.01b.00020168), starting seq.rba 3725.98984464, scn 1.2943495403 (0001.af722ceb), commit seq.rba 3725.99011828 commit timestamp 2008-03-31 14:10:52.000000.

n this case, the out of order SCN was caused by the huge gap in the time between when the record was generated to the log buffer and when it was captured by Extract.

Troubleshooting steps:

-- Check to see if the clocks on different nodes in the same RAC are in sync

-- Check to see if there are any unusual time gaps between log data on threads during the SCN error.

-- Check how the 'remote'  log files, if any, are shared with the system on which Extract is running (e.g., NFS, SAN, shared SCSI, etc.) and see if there are any noticeable problems there.

-- Check to see if there are messages in the Oracle alert logs, lgwr tracefiles or net logs during the problematic time, to see if they provide any clues to point to one of the 'Possible Reasons' discussed above.
-- Check to see if any analysis been done on the I/O performance on the drives that contain the log files and whether they are "real Disk";, NFS, SAN, etc.
-- Check to see if the DB timezone is setup correctly.  Issue 'SELECT dbtimezone FROM DUAL', the timezone should match the OS.  For example is the result is "+00:00", that means DB is using GMT/UTC timezone.


Recommendation:-
The MAXCOMMITPROPAGATIONDELAY parameter can be used to set the delay time well above the max_commit_propogation_delay setting in the database, plus the default extra padding that Extract adds (2000 milliseconds).

In Oracle RAC, the max_commit_propogation_delay specifies the maximum amount of time allowed before the system change number (SCN) held in the SGA of an instance is refreshed by the log writer process (LGWR). Units are in hundredths of seconds.

To check Oracle's value:

Connect as a user with dba privileges and issue:
SQL> show parameter max_commit NAME TYPE VALUE

------------------------------------ ----------- ------------------------------ max_commit_propagation_delay integer 0

To set MAXCOMMITPROPAGATIONDELAY : The value of MAXCOMMITPROPAGATIONDELAY must be greater than zero and less than 90000 milliseconds.

This is how the line should look in the Extract parameter file:

THREADOPTIONS MAXCOMMITPROPAGATIONDELAY 2700

Starting from GGS Version 9.x and above, an additional parameter IOLATENCY can be used if Extract abends with 'encountered SCN XXXXX' too often. IOLATENCY adjusts the delta between the database-configured max commit propogation delay and the internal value that Extract uses. By default IOLATENCY is set to 1.5 seconds&nbsp;

Note: Valid values for IOLATENCY are between 0 and 180000 milliseconds (3 minutes).

The combined parameters should look like this:

THREADOPTIONS MAXCOMMITPROPAGATIONDELAY <msec> IOLATENCY <msec>

The combination of MAXCOMMITPROPAGATIONDELAY and IOLATENCY can be used to ensure that:

1) the Oracle threads have written their most recent SCN data to the logs
2) the I/O processes have had time to complete, considering the various factors that increase I/O latency, such as hardware contention, file locking, long seek and queue times, etc.

The key is in adjusting IOLATENCY and MAXCOMMITPROPAGATIONDELAY parameters to an optimum value based on the frequency of hitting the error.

Hint: If the problem happens too often, you can start with high values for IOLATENCY and MAXCOMMITPROPAGATIONDELAY parameters. Once the error stops happening, you can gradually decrease the values to see the SCN number where this error starts appearing again. This would give you an idea of the boundary values specific to that environment.

Furthermore, the reason Extract successfully restarts after this error is that, on restart, it re-reads the operations, and this time they are all on disk and can be processed in correct SCN order. The side effect is that Extract rewrites operations into the trail. This may cause the data pump or Replicat to abend with incompatible record errors.

Note: From Oracle 10.2, parameter max_commit_propagation_delay has been deprecated. So "SQL> show parameter max_commit NAME TYPE VALUE " is no longer viable for Oracle 10.2 upward.

Final Fix of this issue in V11.1.1+

Oracle DB HEARTBEAT is utilized to coordinate the read from multiple nodes to totally eliminate this issue.  This is being tracked in BugDB 10356426, which is implemented for V11.1.1.1, also backported back to V11.1.1.0.x code line.  If you are running V11.1.1.0.x GoldenGate, please request a build that has this fix.

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-12-24 03:57 , Processed in 0.051155 second(s), 22 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569