Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

42

积分

0

好友

1

主题
1#
发表于 2011-12-20 14:53:47 | 查看: 9094| 回复: 4
hi,
环境:10.2.0.5 two-node rac for aix 6.1
现象:1号节点instance crash,详见最后一次启动日志
之前开发人员杀过用户进程,过程没有看到。

日志:
Mon Dec 19 15:49:10 GMT+08:00 2011Errors in file /oracle/admin/epmsc/bdump/epmsc1_pmon_20120358.trc:
ORA-00480: LCK* process terminated with error
Mon Dec 19 15:49:10 GMT+08:00 2011PMON: terminating instance due to error 480
Mon Dec 19 15:49:10 GMT+08:00 2011System state dump is made for local instance
System State dumped to trace file /oracle/admin/epmsc/bdump/epmsc1_diag_20972238.trc
Mon Dec 19 15:49:12 GMT+08:00 2011Shutting down instance (abort)

metalink对ORA-00480描述也较少,各位大侠有啥发现?

复件 ora_log.rar

1.87 MB, 下载次数: 1342

log and trace

2#
发表于 2011-12-20 15:16:29
cd /oracle/admin/epmsc/bdump/
ls -ltr  *22479184*
ls -ltr *lck*

将相关的lck trace上传一下

回复 只看该作者 道具 举报

3#
发表于 2011-12-20 16:02:03
分析:


LCK0进程 的近期等待事件是ksxr poll remote instances, 进程号 PID= 65 SPID 22479184  状态是DEAD


          SO: 700001ae7555ae0, type: 2, owner: 0, flag: INIT/-/-/0x00
  (process) Oracle pid=65, calls cur/top: 7000019d7f0e428/7000019d7f0e428, flag: (6) SYSTEM
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 0 0 21
              last post received-location: ksbria
              last process to post me: 700001ae7555ae0 1 6
              last post sent: 0 0 167
              last post sent-location: kqrbtm
              last process posted by me: 700001ae6581b50 31 0
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 700001ae561ca38
    O/S info: user: oracle, term: UNKNOWN, ospid: 22479184 (DEAD)
    OSD pid info: Unix process pid: 22479184, image: oracle@pmscpdba (LCK0)
    SO: 700001ae48bac70, type: 4, owner: 700001ae7555ae0, flag: INIT/-/-/0x00
    (session) sid: 3211 trans: 0, creator: 700001ae7555ae0, flag: (51) USR/- BSY/-/-/-/-/-
              DID: 0000-0041-00000002, short-term DID: 0000-0000-00000000
              txn branch: 0
              oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
    service name: SYS$BACKGROUND
    waiting for 'rdbms ipc message' wait_time=0, seconds since wait started=3
                timeout=129, =0, =0
                blocking sess=0x0 seq=17047
    Dumping Session Wait History
     for 'ksxr poll remote instances' count=1 wait_time=0.000005 sec
                =0, =0, =0
     for 'rdbms ipc message' count=1 wait_time=0.000358 sec
                timeout=129, =0, =0
     for 'ksxr poll remote instances' count=1 wait_time=0.000006 sec
                =0, =0, =0
     for 'rdbms ipc message' count=1 wait_time=0.000137 sec
                timeout=129, =0, =0
     for 'ksxr poll remote instances' count=1 wait_time=0.000007 sec
                =0, =0, =0
     for 'rdbms ipc message' count=1 wait_time=0.000260 sec
                timeout=129, =0, =0
     for 'ksxr poll remote instances' count=1 wait_time=0.000007 sec
                =0, =0, =0
     for 'rdbms ipc message' count=1 wait_time=0.000323 sec
                timeout=129, =0, =0
     for 'ksxr poll remote instances' count=1 wait_time=0.000005 sec
                =0, =0, =0
     for 'rdbms ipc message' count=1 wait_time=0.000145 sec


最后post LCK0的 是 700001ae7555ae0  其自身 LCK0 post LCK0
LCK0最后post 的是 700001ae6581b50  

LCK0当时持有 大量的 enqueue lock 资源 ,几百个,其中有十几个类型为CI Cross Instance Enqueue CI:Cross Instance Call Invocation

猜测LCK 进程一直在高负载工作,占用大量的CPU


    ------------process 0x700001ae9aff608--------------------
    proc version      : 0
    Local node        : 0
    pid               : 22479184
    lkp_node          : 0
    svr_mode          : 0
    proc state        : KJP_NORMAL
    Last drm hb acked : 0
    Total accesses    : 7592925
    Imm.  accesses    : 7592506
    Locks on ASTQ     : 0
    Locks Pending AST : 0
    Granted locks     : 59683
    AST_Q:
    PENDING_Q:
    GRANTED_Q:
    lp 700001ae9e902a0 gl KJUSERPR rp 7000019bb8b9f78 [0x6][0x2],[CI]

0x6 => Test call
      master 0 pid 22479184 bast 1 rseq 20 mseq 0 history 0x9a5
      open opt  KJUSERPROCESS_OWNED
    lp 700001ae9e90540 gl KJUSERPR rp 7000019bb8b9d68 [0x1a][0x2],[CI]

1a=> 26  Purge dictionary Object number Cache

      master 0 pid 22479184 bast 1 rseq 19 mseq 0 history 0x9a5
      open opt  KJUSERPROCESS_OWNED
    lp 700001ae9e90a80 gl KJUSERPR rp 7000019bb8b9b58 [0x1e][0x2],[CI]  

这里 1e => 30    process waiters after row cache requeue
         0x2 => Used to invoke the function in backgroud process



      master 0 pid 22479184 bast 1 rseq 19 mseq 0 history 0x9a5
      open opt  KJUSERPROCESS_OWNED
    lp 700001ae9e90fd8 gl KJUSERPR rp 7000019b96217e8 [0x31][0x2],[CI]
      master 1 pid 22479184 bast 1 rseq 16 mseq 0 history 0x95
      open opt  KJUSERPROCESS_OWNED
    lp 700001ae9e91518 gl KJUSERPR rp 7000019b6357810 [0x35][0x2],[CI]
      master 1 pid 22479184 bast 1 rseq 17 mseq 0 history 0x95
      open opt  KJUSERPROCESS_OWNED
    lp 700001ae9e917d0 gl KJUSERPR rp 7000019cacf75f8 [0x39][0x2],[CI]
      master 1 pid 22479184 bast 1 rseq 22 mseq 0 history 0x95
      open opt  KJUSERPROCESS_OWNED
    lp 700001ae9e91a70 gl KJUSERPR rp 7000019becd45b8 [0x41][0x2],[CI]
......................



    SO: 700001ae58922c0, type: 4, owner: 700001ae75552f0, flag: INIT/-/-/0x00
    (session) sid: 3245 trans: 0, creator: 700001ae75552f0, flag: (100051) USR/- BSY/-/-/-/-/-
              DID: 0001-003A-00000002, short-term DID: 0001-003A-00000003
              txn branch: 0
              oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
    service name: SYS$BACKGROUND
    waiting for 'smon timer' wait_time=0, seconds since wait started=1
                sleep time=12c, failed=0, =0
                blocking sess=0x0 seq=64966
    Dumping Session Wait History
     for 'DFS lock handle' count=1 wait_time=0.000950 sec
                type|mode=43490005, id1=1, id2=2
     for 'DFS lock handle' count=1 wait_time=0.000376 sec
                type|mode=43490005, id1=1, id2=1
     for 'enq: TT - contention' count=1 wait_time=0.000296 sec
                name|mode=54540004, tablespace ID=1, operation=10
     for 'enq: US - contention' count=1 wait_time=0.000428 sec
                name|mode=55530006, undo segment #=29, 0=0
     for 'smon timer' count=1 wait_time=3.119856 sec
                sleep time=12c, failed=0, =0
     for 'smon timer' count=1 wait_time=4.882208 sec
                sleep time=12c, failed=0, =0
     for 'smon timer' count=1 wait_time=2.254375 sec
                sleep time=12c, failed=0, =0
     for 'DFS lock handle' count=1 wait_time=0.001051 sec
                type|mode=43490005, id1=1, id2=2
     for 'DFS lock handle' count=1 wait_time=0.000274 sec
                type|mode=43490005, id1=1, id2=1
     for 'enq: US - contention' count=1 wait_time=0.000496 sec
                name|mode=55530006, undo segment #=3, 0=0
    Sampled Session History of session 3245 serial 1

SMON 进程之前在等待'DFS lock handle

id1 => 43490005 CI
id2= >  1     Reuse (checkpoint and invalidate) block range
id3 =>  2,1 Used to invoke the function in backgroud process

回复 只看该作者 道具 举报

4#
发表于 2011-12-20 16:07:22
相关的Bug 信息<br><br><table class="t_table" style="background: " width="98%"><tbody><tr><td><b>Type</b></td><td>B - Defect</td><td><b>Fixed in Product Version</b></td><td>-</td></tr><tr><td><b>Severity</b></td><td>2 - Severe Loss of Service</td><td><b>Product Version</b></td><td>10.2.0.4</td></tr><tr><td><b>Status</b></td><td>33 - Suspended, Req'd Info not Avail</td><td><b>Platform</b></td><td>212 - IBM AIX on POWER Systems (64-bit)</td></tr><tr><td><b>Created</b></td><td>17-May-2011</td><td><b>Platform Version</b></td><td>6.1</td></tr><tr><td><b>Updated</b></td><td>16-Nov-2011</td><td><b>Base Bug</b></td><td>-</td></tr><tr><td><b>Database Version</b></td><td>10.2.0.4.7</td><td><br></td><td> <br></td></tr><tr><td><b>Affects Platforms</b></td><td>Generic</td><td><br></td><td> <br></td></tr><tr><td><b>Product Source</b></td><td>Oracle</td><td><br></td><td> <br></td></tr></tbody></table>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;<img src="https://support.oracle.com/CSP/ui/images/collapse_white.png?MOS_5.4.0.2.0_GENERIC_111212" alt="" border="0">&nbsp; &nbsp;&nbsp; &nbsp; <b>Related Products</b><br>&nbsp; &nbsp;&nbsp; &nbsp; <br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;<table class="t_table" style="background: " width="98%"><tbody><tr><td><b>Line</b></td><td>Oracle Database Products</td><td><b>Family</b></td><td>Oracle Database</td></tr><tr><td><b>Area</b></td><td>Oracle Database</td><td><b>Product</b></td><td>5 - Oracle Server - Enterprise Edition</td></tr></tbody></table>&nbsp; &nbsp;&nbsp; &nbsp; <br>Hdr: 12564483 10.2.0.4.7 RDBMS 10.2.0.4 RAC PRODID-5 PORTID-212<br>Abstract: DATABASE HANG WITH CI-0X1E-0X2 LOCK<br><br><b>*** 05/17/11 06:35 pm *** (ADD: Impact/Symptom-&gt;DATABASE HANG )</b><br><b>*** 05/17/11 06:35 pm ***</b><br>&nbsp;&nbsp;<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;BUG TYPE CHOSEN<br>&nbsp;&nbsp;===============<br>&nbsp;&nbsp;Code<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;SubComponent: Real Application Clusters<br>&nbsp;&nbsp;=======================================<br>&nbsp;&nbsp;DETAILED PROBLEM DESCRIPTION<br>&nbsp;&nbsp;============================<br>&nbsp;&nbsp;2 node RAC 10.2.0.4.7, Database has been hanging for few times. Alert log<br>&nbsp;&nbsp;report many:<br>&nbsp;&nbsp;WARNING: inbound connection timed out (ORA-3136)<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;Latest occurrence is at: May 12 starting from 8:15, till instance shutdown<br>&nbsp;&nbsp;abort around 9:04.<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;Proper system state dump collected for the last occurrence.<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;DIAGNOSTIC ANALYSIS<br>&nbsp;&nbsp;===================<br>&nbsp;&nbsp;per system state dump cvalle1_ora_897314.trc:<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;PROCESS 16<br>&nbsp; &nbsp;&nbsp;&nbsp;OSD pid info: Unix process pid: 483334, image: oracle@sdbsora01 (MMON)<br>&nbsp;&nbsp;<br>&nbsp; &nbsp;&nbsp;&nbsp;waiting for 'DFS lock handle' blocking sess=0x0 seq=4637 wait_time=0<br>&nbsp;&nbsp;seconds since wait started=213<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;type|mode=43490005, id1=1e, id2=2<br>&nbsp;&nbsp;<br>&nbsp; &nbsp;&nbsp; &nbsp; ----------resource 0x700000413c74e78----------------------<br>&nbsp; &nbsp;&nbsp; &nbsp; resname&nbsp; &nbsp;&nbsp; &nbsp; : [0x1e][0x2],[CI]<br>&nbsp; &nbsp;&nbsp; &nbsp; Local node&nbsp; &nbsp; : 0<br>&nbsp; &nbsp;&nbsp; &nbsp; dir_node&nbsp; &nbsp;&nbsp; &nbsp;: 0<br>&nbsp; &nbsp;&nbsp; &nbsp; master_node&nbsp; &nbsp;: 0<br>&nbsp;&nbsp;<br>&nbsp; &nbsp;&nbsp; &nbsp; GRANTED_Q :<br><font color="Red">&nbsp; &nbsp;&nbsp; &nbsp; lp 7000004350d38f0 gl KJUSERPR rp 700000413c74e78 [0x1e][0x2],[CI]&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;==&gt;同样的CI </font><br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;master 0 owner 1&nbsp;&nbsp;bast 0 rseq 3 mseq 0x2 history 0xd4977d8d<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;open opt&nbsp;&nbsp;KJUSERNO_XID<br>&nbsp; &nbsp;&nbsp; &nbsp; CONVERT_Q:<br>&nbsp; &nbsp;&nbsp; &nbsp; lp 7000004340e3340 gl KJUSERNL rl KJUSEREX rp 700000413c74e78<br>&nbsp;&nbsp;[0x1e][0x2],[CI]<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;master 0 pid 483334 bast 0 rseq 4 mseq 0 history 0x49ab549a<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;convert opt KJUSERNODEADLOCKWAIT KJUSERNODEADLOCKBLOCK<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;MMON is waiting for CI-0x1e-0x2 lock. This lock is held on inst 2. There is<br>&nbsp;&nbsp;no resource dump in inst 2 system state cvalle2_ora_8700252.trc. But LCK0<br>&nbsp;&nbsp;process dump shows it is holding CI-0x1e-0x2:<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;PROCESS 20:<br>&nbsp; &nbsp;&nbsp;&nbsp;OSD pid info: Unix process pid: 328170, image: oracle@sdbsora02 (LCK0)<br>&nbsp; &nbsp; waiting for 'rdbms ipc message' blocking sess=0x0 seq=3266 wait_time=0<br>&nbsp;&nbsp;seconds since wait started=0<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;timeout=fd, =0, =0<br>&nbsp; &nbsp;&nbsp;&nbsp;Dumping Session Wait History<br><font color="Red">&nbsp; &nbsp;&nbsp; &nbsp;for 'ksxr poll remote instances' count=1 wait_time=0&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;=&gt; 同样的等待事件</font><br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;=0, =0, =0<br>&nbsp; &nbsp;&nbsp; &nbsp;for 'rdbms ipc message' count=1 wait_time=0<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;timeout=106, =0, =0<br>&nbsp; &nbsp;&nbsp; &nbsp;for 'ksxr poll remote instances' count=1 wait_time=0<br>&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;=0, =0, =0<br>&nbsp;&nbsp;<br>&nbsp; &nbsp;&nbsp;&nbsp;lp 70000043741bb08 gl KJUSERPR rp 7000004131d6588 [0x1e][0x2],[CI]<br>&nbsp; &nbsp;&nbsp; &nbsp; master 0 pid 328170 bast 1 rseq 51 mseq 0 history 0x495149a5<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;Its waiting status seems normal.<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;There are also many user sessions waiting for 'reliable messages' and the&nbsp; &nbsp;=&gt; mmon wait for it <br>&nbsp;&nbsp;number of sessions waiting for this is increasing.<br>&nbsp;&nbsp;<br><font color="Red">&nbsp;&nbsp;But ct is on 10.2.0.4.7 already and bug 6148054 - supercede by Bug:7801939,</font><br><font color="Red">&nbsp;&nbsp;the fix has already been applied.</font><br>&nbsp;&nbsp;<br>&nbsp;&nbsp;Need to find out why LCK0 not releasing the lock and why database hang.<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;WORKAROUND?<br>&nbsp;&nbsp;===========<br>&nbsp;&nbsp;No<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;TECHNICAL IMPACT<br>&nbsp;&nbsp;================<br>&nbsp;&nbsp;Causing downtime.<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;RELATED ISSUES (bugs, forums, RFAs)<br>&nbsp;&nbsp;===================================<br>&nbsp;&nbsp;None. Most bug related with CI-0x1e-0x2 where LCK0 is holder, LCK0 is<br>&nbsp;&nbsp;waiting for latch etc, not like this case.<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;HOW OFTEN DOES THE ISSUE REPRODUCE AT CUSTOMER SITE?<br>&nbsp;&nbsp;====================================================<br>&nbsp;&nbsp;Intermittent<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;DOES THE ISSUE REPRODUCE INTERNALLY?<br>&nbsp;&nbsp;====================================<br>&nbsp;&nbsp;Not attempted<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;EXPLAIN WHY THE ISSUE WAS NOT TESTED INTERNALLY.<br>&nbsp;&nbsp;================================================<br>&nbsp;&nbsp;not feasible<br>&nbsp;&nbsp;<br>&nbsp;&nbsp;IS A TESTCASE AVAILABLE?<br>&nbsp;&nbsp;========================<br>&nbsp;&nbsp;No<br><br><br>官方认为<font face="helvetica"><b>Bug 7801939在 10.2.0.5 中已被修复</b></font><br><br><br><br><br><br><u>Bug 7801939&nbsp;&nbsp;Contention for "channel operations parent latch" child latch</u> This note gives a brief overview bug 7801939.<br> The content was last updated on: 18-NOV-2010<br> <i>Click <a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1" target="_blank">here</a> for details of each of the sections below.</i><br><u>Affects:</u><blockquote><table class="t_table" style="background: "><tbody><tr><td><b>Product (<i>Component</i>)</b></td><td>Oracle Server&nbsp;&nbsp;(Rdbms)</td></tr><tr><td><b>Range of versions <i>believed</i> to be affected</b></td><td>Versions BELOW 11.2</td></tr><tr><td><b>Versions <i>confirmed</i> as being affected</b></td><td><ul><li><a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1#AFFECTS_10.2.0.4" target="_blank">10.2.0.4</a></li></ul></td></tr><tr><td><b>Platforms affected</b></td><td>Generic (all / most platforms affected)</td></tr></tbody></table></blockquote><u>Fixed:</u><blockquote><table class="t_table" style="background: "><tbody><tr><td><b>This issue is fixed in</b></td><td><ul><li><a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1#FIXED_11.2.0.1" target="_blank">11.2.0.1 (Base Release)</a></li><li><a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1#FIXED_10.2.0.5" target="_blank">10.2.0.5 (Server Patch Set)</a></li><li><a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1#FIXED_10.2.0.4.4" target="_blank">10.2.0.4.4 (Patch Set Update)</a></li><li><a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=342443.1" target="_blank">10.2.0.4 Patch 34 on Windows Platforms</a></li></ul></td></tr></tbody></table></blockquote><table class="t_table" style="background: " width="90%"><tbody><tr><td><u>Symptoms:</u></td><td><u>Related To:</u></td></tr><tr><td><ul><li><a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1#TAGS_LATCHC" target="_blank">Latch Contention</a></li></ul></td><td><ul><li><a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1#TAGS_AQ" target="_blank">Advanced Queuing</a></li></ul></td></tr></tbody></table><u>Description</u> The interprocess messaging mechanism may exhibithigh latch contention under certain workloads.This shows as contention for a particular"channel operations parent latch" In particular this can affect AQ. Note: This fix also addresses the issue in <a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=6148054.8" target="_blank">bug 6148054</a> <br><br><br><font face="helvetica">&nbsp; &nbsp;&nbsp; &nbsp;<table class="t_table" style="background: " width="98%"><tbody><tr><td colspan="6" rowspan="1"><b>Bug 6148054 - RAC hang waiting for "reliable message" [ID 6148054.8]</b></td></tr><tr><td colspan="6" rowspan="1"></td></tr><tr><td colspan="2" rowspan="1" width="25%"> </td><td colspan="3" rowspan="1" width="50%"><i>Modified</i> 30-MAR-2011&nbsp; &nbsp;&nbsp;&nbsp;<i>Type</i> PATCH&nbsp; &nbsp;&nbsp;&nbsp;<i>Status</i> PUBLISHED</td><td> </td></tr></tbody></table></font><br> <u>Bug 6148054&nbsp;&nbsp;RAC hang waiting for "reliable message"</u> This note gives a brief overview of bug 6148054.<br> The content was last updated on: 30-MAR-2011<br> <i>Click <a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1" target="_blank">here</a> for details of each of the sections below.</i><br><u>Affects:</u><blockquote><table class="t_table" style="background: "><tbody><tr><td><b>Product (<i>Component</i>)</b></td><td>Oracle Server&nbsp;&nbsp;(Rdbms)</td></tr><tr><td><b>Range of versions <i>believed</i> to be affected</b></td><td>Versions &gt;= 10.2.0.4 but BELOW 10.2.0.5</td></tr><tr><td><b>Versions <i>confirmed</i> as being affected</b></td><td><ul><li><a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1#AFFECTS_10.2.0.4" target="_blank">10.2.0.4</a></li></ul></td></tr><tr><td><b>Platforms affected</b></td><td>Generic (all / most platforms affected)</td></tr></tbody></table><br> It is believed to be a <a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1#TAGS_REGRESSION" target="_blank">regression</a> in <b>default</b> behaviour thus:<br>&nbsp; &nbsp;Regression introduced in 10.2.0.4<br><br> <b><font color="red">Note that this fix has been <a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1#TAGS_SUPERCEDED" target="_blank">superceded</a> by the fix in <a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=7801939.8" target="_blank">Bug:7801939</a></font></b> <br></blockquote><u>Fixed:</u><blockquote><table class="t_table" style="background: "><tbody><tr><td><b>This issue is fixed in</b></td><td><ul><li><a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1#FIXED_11.1.0.6" target="_blank">11.1.0.6 (Base Release)</a></li></ul></td></tr></tbody></table></blockquote><table class="t_table" style="background: " width="90%"><tbody><tr><td><u>Symptoms:</u></td><td><u>Related To:</u></td></tr><tr><td><ul><li><a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1#TAGS_HANG" target="_blank">Hang (Process Hang)</a></li><li> Waits for "reliable message"</li><li><a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=170464.1" target="_blank">Waits for "wait for unread message on broadcast channel"</a></li></ul></td><td><ul><li><a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=245840.1#TAGS_OPS" target="_blank">RAC (Real Application Clusters) / OPS</a></li></ul></td></tr></tbody></table><u>Description</u><blockquote> In a RAC environment which has the fix for <a href="https://support.oracle.com/CSP/main/article?cmd=show&amp;type=NOT&amp;id=4605569.8" target="_blank">bug 4605569</a>present some processes may periodically hangwaiting on "reliable message" and "wait for unread message on broadcast channel" </blockquote>

[[i] 本帖最后由 maclean 于 2011-12-20 16:08 编辑 [/i]]

回复 只看该作者 道具 举报

5#
发表于 2011-12-20 16:14:26
具体建议:

目前metalink对该bug没有明确的定位, 原本认为该bug在 现有的patch中修复了, 实际发现可能是伪修复; 所以目前官方没有提供 可用的one-off patch 补丁 或 workaround 的方式

若该问题反复出现,那么建议:

1. 升级到 最新的11.2.0.3

2. 调优数据库 降低工作负载

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-12-23 20:05 , Processed in 0.053301 second(s), 24 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569