- 最后登录
- 2023-8-16
- 在线时间
- 1686 小时
- 威望
- 2135
- 金钱
- 50532
- 注册时间
- 2011-10-12
- 阅读权限
- 200
- 帖子
- 5207
- 精华
- 39
- 积分
- 2135
- UID
- 2
|
1#
发表于 2013-10-12 00:02:36
|
查看: 2673 |
回复: 1
ocssd.log日志中出现"clsc_disc_orphans: protocol exchange timed out"信息随之宕机:
[ CSSD]2012-08-16 23:24:15.702 [3086] >TRACE: clsc_disc_orphans: protocol exchange timed out, time limit 120 sec, time since connect 121276 ms
[ CSSD]2012-09-09 23:24:25.620 [3086] >TRACE: clsc_disc_orphans: protocol exchange timed out, time limit 120 sec, time since connect 121399 ms
[ CSSD]2012-10-01 13:56:41.962 [3086] >TRACE: clsc_disc_orphans: protocol exchange timed out, time limit 120 sec, time since connect 121150 ms
[ CSSD]2012-10-11 11:05:40.298 [3086] >TRACE: clsc_disc_orphans: protocol exchange timed out, time limit 120 sec, time since connect 121321 ms
同时伴随有2节点出现较多RACGVIP检测超时:
2012-10-12 10:03:24.950: [ CRSEVT][11040]32CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/10.2.0/crs/bin/racgwrap(check) timed out for ora.pdcmdb04.vip! (timeout=60)
...............
2012-10-16 09:14:07.334: [ CRSEVT][11066]32CAAMonitorHandler :: 0:Action Script /u01/app/oracle/product/10.2.0/crs/bin/racgwrap(check) timed out for ora.pdcmdb04.vip! (timeout=60)
基于现有的可用日志和TRACE信息分析:当RAC GM client监听线程在处理"clsc_disc_orphans"时,CSSD.LOG中会出现"clsc_disc_orphans"的信息。该函数在处理clsc_disc尝试断开连接时,负责获得和持有线程信息。 存在BUG(Bug 9132429: LNX64-10205-CRS:NODE CRASH AFTER 5 MINUTES OF HANG/RESUME OCSSD.BIN。)可能导致多个session形成死锁,最终导致节点HANG住或被驱逐宕机; 且该BUG可能附带导致VIP意外OFFLINE。
由于缺少宕机当时的CSSD进程的core dump以及stack call信息,无法确诊该BUG 9132429是引起宕机问题的根本原因;
3. 问题建议
1. APPLY PATH 9132429
《Patch 9132429: LNX64-10205-CRS:NODE CRASH AFTER 5 MINUTES OF HANG/RESUME OCSSD.BIN》FOR 10.2.0.4 补丁目前等待开发部门BUILD,建议在该补丁可用后实施该补丁。
|
|