- 最后登录
- 2019-8-29
- 在线时间
- 5 小时
- 威望
- 5
- 金钱
- 5
- 注册时间
- 2013-5-24
- 阅读权限
- 10
- 帖子
- 8
- 精华
- 0
- 积分
- 5
- UID
- 1106
|
1#
发表于 2013-6-7 09:53:31
|
查看: 5542 |
回复: 7
本帖最后由 huiwenshu 于 2013-6-7 10:07 编辑
环境:
AIX6.1
HACMP6.1
RAC 10.2.0.5
出现的现象,节点1宕机后,节点2的CRS也启动不起来了,interconnect使用直连方式,netstat -in 还能查看到public和priv的ip.大家帮忙看下呢。附件为当时节点2的cssd.log和crsd.log. 节点1宕机时间在06.06日3点过. 以下是部分cssd.log:
$ tail -n 30000 ocssd.l01 | more
[ CSSD]2013-06-06 05:42:08.502 [4885] >TRACE: clssnmLocalJoinEvent: set curtime (815061278) for my node
[ CSSD]2013-06-06 05:42:08.502 [4885] >TRACE: clssnmLocalJoinEvent: scan 5 nodes
[ CSSD]2013-06-06 05:42:08.502 [4885] >TRACE: clssnmLocalJoinEvent: node(3), state(0), cont (1), sleep (0), diskHB 1, diskinfo
110951850
[ CSSD]2013-06-06 05:42:08.502 [4885] >TRACE: clssnmLocalJoinEvent: node(3), LAT (1)
[ CSSD]2013-06-06 05:42:08.502 [4885] >TRACE: clssnmLocalJoinEvent: node(4), state(1), cont (0), sleep (0), diskHB 1, diskinfo
110951850
[ CSSD]2013-06-06 05:42:08.502 [4885] >TRACE: clssnmLocalJoinEvent: No sleeping for mynode(4)
[ CSSD]2013-06-06 05:42:08.502 [4885] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[ CSSD]2013-06-06 05:42:09.283 [4114] >TRACE: clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
[ CSSD]2013-06-06 05:42:10.283 [4114] >TRACE: clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
[ CSSD]2013-06-06 05:42:11.283 [4114] >TRACE: clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
[ CSSD]2013-06-06 05:42:11.364 [4628] >TRACE: clssnmSendingThread: sending join msg to all nodes
[ CSSD]2013-06-06 05:42:11.364 [4628] >TRACE: clssnmSendingThread: sent 5 join msgs to all nodes
[ CSSD]2013-06-06 05:42:12.283 [4114] >TRACE: clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
[ CSSD]2013-06-06 05:42:13.283 [4114] >TRACE: clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
[ CSSD]2013-06-06 05:42:14.284 [4114] >TRACE: clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
[ CSSD]2013-06-06 05:42:15.284 [4114] >TRACE: clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
[ CSSD]2013-06-06 05:42:15.504 [4885] >TRACE: clssnmRcfgMgrThread: Local Join
[ CSSD]2013-06-06 05:42:15.504 [4885] >TRACE: clssnmLocalJoinEvent: begin on node(4), waittime 193000
[ CSSD]2013-06-06 05:42:15.504 [4885] >TRACE: clssnmLocalJoinEvent: curTime (815068280) - LAT (814983171) = 85109, for node (3)
, waittime 193000
[ CSSD]2013-06-06 05:42:15.504 [4885] >TRACE: clssnmLocalJoinEvent: curTime (815068280) - LAT (814983427) = 84853, for node (4)
, waittime 193000
[ CSSD]2013-06-06 05:42:15.504 [4885] >TRACE: clssnmLocalJoinEvent: set curtime (815068280) for my node
[ CSSD]2013-06-06 05:42:15.504 [4885] >TRACE: clssnmLocalJoinEvent: scan 5 nodes
[ CSSD]2013-06-06 05:42:15.504 [4885] >TRACE: clssnmLocalJoinEvent: node(3), state(0), cont (1), sleep (0), diskHB 1, diskinfo
110951850
[ CSSD]2013-06-06 05:42:15.504 [4885] >TRACE: clssnmLocalJoinEvent: node(3), LAT (1)
[ CSSD]2013-06-06 05:42:15.504 [4885] >TRACE: clssnmLocalJoinEvent: node(4), state(1), cont (0), sleep (0), diskHB 1, diskinfo
110951850
[ CSSD]2013-06-06 05:42:15.504 [4885] >TRACE: clssnmLocalJoinEvent: No sleeping for mynode(4)
[ CSSD]2013-06-06 05:42:15.504 [4885] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[ CSSD]2013-06-06 05:42:15.504 [4885] >WARNING: clssnmRcfgMgrThread: not possible to join the cluster. Please reboot the node.
[ CSSD]2013-06-06 05:42:15.504 [4885] >WARNING: clssnmReconfigThread: state(1) clusterState(0) exit
[ CSSD]2013-06-06 05:42:15.504 [4885] >ERROR: ###################################
[ CSSD]2013-06-06 05:42:15.504 [4885] >ERROR: clssscExit: CSSD aborting from thread clssnmRcfgMgrThread
[ CSSD]2013-06-06 05:42:15.504 [4885] >ERROR: ###################################
[ CSSD]--- DUMP GROCK STATE DB ---
[ CSSD]--- END OF GROCK STATE DUMP --- |
|