Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

2135

积分

502

好友

184

主题
1#
发表于 2013-10-12 00:02:36 | 查看: 2673| 回复: 1
ocssd.log日志中出现"clsc_disc_orphans: protocol exchange timed out"信息随之宕机:

[    CSSD]2012-08-16 23:24:15.702 [3086] >TRACE:   clsc_disc_orphans: protocol exchange timed out, time limit 120 sec, time since connect 121276 ms
[    CSSD]2012-09-09 23:24:25.620 [3086] >TRACE:   clsc_disc_orphans: protocol exchange timed out, time limit 120 sec, time since connect 121399 ms
[    CSSD]2012-10-01 13:56:41.962 [3086] >TRACE:   clsc_disc_orphans: protocol exchange timed out, time limit 120 sec, time since connect 121150 ms
[    CSSD]2012-10-11 11:05:40.298 [3086] >TRACE:   clsc_disc_orphans: protocol exchange timed out, time limit 120 sec, time since connect 121321 ms


同时伴随有2节点出现较多RACGVIP检测超时:

2012-10-12 10:03:24.950: [  CRSEVT][11040]32CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/10.2.0/crs/bin/racgwrap(check) timed out for ora.pdcmdb04.vip! (timeout=60)
...............
2012-10-16 09:14:07.334: [  CRSEVT][11066]32CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/10.2.0/crs/bin/racgwrap(check) timed out for ora.pdcmdb04.vip! (timeout=60)




基于现有的可用日志和TRACE信息分析:当RAC GM client监听线程在处理"clsc_disc_orphans"时,CSSD.LOG中会出现"clsc_disc_orphans"的信息。该函数在处理clsc_disc尝试断开连接时,负责获得和持有线程信息。 存在BUG(Bug 9132429: LNX64-10205-CRS:NODE CRASH AFTER 5 MINUTES OF HANG/RESUME OCSSD.BIN。)可能导致多个session形成死锁,最终导致节点HANG住或被驱逐宕机; 且该BUG可能附带导致VIP意外OFFLINE。

由于缺少宕机当时的CSSD进程的core dump以及stack call信息,无法确诊该BUG 9132429是引起宕机问题的根本原因;



3.        问题建议

1.  APPLY PATH 9132429

《Patch 9132429: LNX64-10205-CRS:NODE CRASH AFTER 5 MINUTES OF HANG/RESUME OCSSD.BIN》FOR 10.2.0.4 补丁目前等待开发部门BUILD,建议在该补丁可用后实施该补丁。
下载专业ORACLE数据库恢复工具PRM-DUL  For Oracle http://www.parnassusdata.com/zh-hans/emergency-services

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638  QQ: 47079569   
2#
发表于 2013-10-12 00:08:13
The problem here is that while the GM client listener thread is processing

clsc_disc_orphans(), the function obtains the ugblm mutex and holds it while

descending into clsc_disc() to attempt to disconnect the connection.

.

clsc_disc() eventually triggers clscidisc() to disconnect to internal

connection. Here clscidisc() will go ahead to grab the ugblm mutex again,

thereby causing the thread to deadlock on itself.

.

Since the ugblm mutex is never released, when the NM sending thread

eventually makes an attempt at the same mutex, it is also blocked waiting for

the mutex - causing the issue that this bug observes.

.

This is a CLSC issue. The fix will involve preventing clsc_disc_orphan() from

deadlocking on itself with the ugblm mutex.

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-6-14 16:30 , Processed in 0.047789 second(s), 23 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569