- 最后登录
- 2023-8-16
- 在线时间
- 1686 小时
- 威望
- 2135
- 金钱
- 50532
- 注册时间
- 2011-10-12
- 阅读权限
- 200
- 帖子
- 5207
- 精华
- 39
- 积分
- 2135
- UID
- 2
|
4#
发表于 2012-6-11 14:42:29
KEY WORD 10.2.0.1 + OCFS
NODE 2- [ CSSD]2012-06-10 23:06:22.507 [3054844816] >TRACE: clssgmReconfigThread: completed for reconfig(5), with status(1)
- [ CSSD]2012-06-10 23:06:22.945 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x948e4b8) proc(0x948f690) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:06:23.498 [3065334672] >TRACE: clssnmWaitForAcks: done, msg type(15)
- [ CSSD]2012-06-10 23:06:23.498 [3065334672] >TRACE: clssnmDoSyncUpdate: Sync Complete!
- [ CSSD]2012-06-10 23:06:24.298 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x94b8700) proc(0x949d480) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:07:29.241 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x948e4b8) proc(0x9487758) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:07:34.770 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x94935c0) proc(0x949d480) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:08:35.623 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x948fb98) proc(0x94878c8) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:08:38.701 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x94935c0) proc(0x949d480) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:08:45.646 [68623248] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(1) wrtcnt(1) LATS(378924) Disk lastSeqNo(1)
- [ CSSD]2012-06-10 23:08:47.014 [89602960] >TRACE: clssnmConnComplete: probe from node 1
- [ CSSD]2012-06-10 23:08:47.014 [89602960] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
- [ CSSD]2012-06-10 23:08:47.015 [89602960] >TRACE: clssnmConnComplete: connected to node 1 (con 0x948fb98), state 1 birth 0, unique 1339384124/1339384124 prevConuni(0)
- [ CSSD]2012-06-10 23:08:47.712 [3065334672] >TRACE: clssnmDoSyncUpdate: Initiating sync 6
- [ CSSD]2012-06-10 23:08:47.712 [3065334672] >TRACE: clssnmSetupAckWait: Ack message type (11)
- [ CSSD]2012-06-10 23:08:47.713 [3065334672] >TRACE: clssnmSetupAckWait: node(1) is ALIVE
- [ CSSD]2012-06-10 23:08:47.713 [3065334672] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
- [ CSSD]2012-06-10 23:08:47.713 [3065334672] >TRACE: clssnmSendSync: syncSeqNo(6)
- [ CSSD]2012-06-10 23:08:47.713 [3065334672] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(2)
- [ CSSD]2012-06-10 23:08:47.713 [89602960] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[rac2] seq[5] sync[6]
- [ CSSD]2012-06-10 23:08:47.789 [1480512] >USER: NMEVENT_SUSPEND [00][00][00][04]
- [ CSSD]2012-06-10 23:08:48.716 [3065334672] >TRACE: clssnmWaitForAcks: done, msg type(11)
- [ CSSD]2012-06-10 23:08:48.716 [3065334672] >TRACE: clssnmDoSyncUpdate: node(0) missCount(755) state(0)
- [ CSSD]2012-06-10 23:08:48.716 [3065334672] >TRACE: clssnmDoSyncUpdate: node(1) is transitioning from joining state to active state
- [ CSSD]2012-06-10 23:08:48.716 [3065334672] >TRACE: clssnmSetupAckWait: Ack message type (13)
- [ CSSD]2012-06-10 23:08:48.716 [3065334672] >TRACE: clssnmSetupAckWait: node(1) is ACTIVE
- [ CSSD]2012-06-10 23:08:48.716 [3065334672] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
- [ CSSD]2012-06-10 23:08:48.716 [3065334672] >TRACE: clssnmSendVote: syncSeqNo(6)
- [ CSSD]2012-06-10 23:08:48.717 [3065334672] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(2)
- [ CSSD]2012-06-10 23:08:48.717 [89602960] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(6)
- [ CSSD]2012-06-10 23:08:49.718 [3065334672] >TRACE: clssnmWaitForAcks: done, msg type(13)
- [ CSSD]2012-06-10 23:08:49.718 [3065334672] >TRACE: clssnmCheckDskInfo: Checking disk info...
- [ CSSD]2012-06-10 23:08:50.719 [3065334672] >TRACE: clssnmEvict: Start
- [ CSSD]2012-06-10 23:08:50.719 [3065334672] >TRACE: clssnmWaitOnEvictions: Start
- [ CSSD]2012-06-10 23:08:50.719 [3065334672] >TRACE: clssnmWaitOnEvictions: Node(0) down, LATS(0),timeout(383994)
- [ CSSD]2012-06-10 23:08:50.719 [3065334672] >TRACE: clssnmSetupAckWait: Ack message type (15)
- [ CSSD]2012-06-10 23:08:50.719 [3065334672] >TRACE: clssnmSetupAckWait: node(1) is ACTIVE
- [ CSSD]2012-06-10 23:08:50.719 [3065334672] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
- [ CSSD]2012-06-10 23:08:50.719 [3065334672] >TRACE: clssnmSendUpdate: syncSeqNo(6)
- [ CSSD]2012-06-10 23:08:50.721 [3065334672] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(2)
- [ CSSD]2012-06-10 23:08:50.722 [89602960] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
- [ CSSD]2012-06-10 23:08:50.722 [89602960] >TRACE: clssnmDeactivateNode: node 0 () left cluster
- [ CSSD]2012-06-10 23:08:50.722 [89602960] >TRACE: clssnmUpdateNodeState: node 1, state (2/2) unique (1339384124/1339384124) prevConuni(0) birth (6/6) (old/new)
- [ CSSD]2012-06-10 23:08:50.722 [89602960] >TRACE: clssnmUpdateNodeState: node 2, state (3/3) unique (1339383371/1339383371) prevConuni(0) birth (4/4) (old/new)
- [ CSSD]2012-06-10 23:08:50.722 [89602960] >USER: clssnmHandleUpdate: SYNC(6) from node(2) completed
- [ CSSD]2012-06-10 23:08:50.722 [89602960] >USER: clssnmHandleUpdate: NODE 1 (rac1) IS ACTIVE MEMBER OF CLUSTER
- [ CSSD]2012-06-10 23:08:50.722 [89602960] >USER: clssnmHandleUpdate: NODE 2 (rac2) IS ACTIVE MEMBER OF CLUSTER
- [ CSSD]2012-06-10 23:08:50.731 [3054844816] >TRACE: clssgmReconfigThread: started for reconfig (6)
- [ CSSD]2012-06-10 23:08:50.731 [3054844816] >USER: NMEVENT_RECONFIG [00][00][00][06]
- [ CSSD]2012-06-10 23:08:50.731 [3054844816] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 6
- [ CSSD]2012-06-10 23:08:50.819 [125442960] >TRACE: clssgmInitialRecv: (0x94935c0) accepted a new connection from node 1 born at 6 active (2, 2), vers (10,3,1,2)
- [ CSSD]2012-06-10 23:08:50.819 [125442960] >TRACE: clssgmInitialRecv: conns done (2/2)
- [ CSSD]2012-06-10 23:08:50.819 [3054844816] >TRACE: clssgmEstablishMasterNode: MASTER for 6 is node(2) birth(4)
- [ CSSD]2012-06-10 23:08:50.819 [3054844816] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
- [ CSSD]2012-06-10 23:08:50.828 [3054844816] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
- [ CSSD]CLSS-3000: reconfiguration successful, incarnation 6 with 2 nodes
- [ CSSD]CLSS-3001: local node number 2, master node number 2
- [ CSSD]2012-06-10 23:08:50.832 [3054844816] >TRACE: clssgmReconfigThread: completed for reconfig(6), with status(1)
- [ CSSD]2012-06-10 23:08:51.723 [3065334672] >TRACE: clssnmWaitForAcks: done, msg type(15)
- [ CSSD]2012-06-10 23:08:51.723 [3065334672] >TRACE: clssnmDoSyncUpdate: Sync Complete!
- [ CSSD]2012-06-10 23:08:54.141 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x948e4b8) proc(0x949c380) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:09:42.122 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x93386c0) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:10:48.531 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x948e4b8) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:11:55.014 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x94a6e80) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:12:50.928 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x94a6e80) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:13:01.353 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x94a6e80) proc(0x949fdb8) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:13:04.177 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x948e4b8) proc(0x94b6948) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:14:07.876 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x93386c0) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:15:14.465 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x948e4b8) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:16:21.076 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x94a6e80) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:17:27.628 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x93386c0) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:18:34.073 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x948e4b8) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:19:40.592 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x94a6e80) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:20:47.143 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x93386c0) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:21:53.814 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x93386c0) proc(0x94b9358) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:22:51.812 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x94a6e80) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:23:00.323 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x93386c0) proc(0x949fdb8) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:23:04.915 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x94a6e80) proc(0x94b6948) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:24:06.618 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x94a6e80) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:25:13.000 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x948e4b8) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:26:19.550 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x93386c0) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:27:26.027 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x94a6e80) proc(0x949fe20) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:28:16.154 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x93386c0) proc(0x949a138) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:28:16.244 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x9389c20) proc(0x94ae2c8) pid() proto(10:2:1:1)
- [ CSSD]2012-06-10 23:28:18.580 [104463248] >TRACE: clssgmClientConnectMsg: Connect from con(0x948e4b8) proc(0x9389468) pid() proto(10:2:1:1)
复制代码 NODE 2 在23:08:45.646 [68623248] >TRACE: clssnmReadDskHeartbeat: node(1) is down. 发现 NODE 1 的 diskheartbeat 失败 发起对NODE 1的驱逐
类似的NODE 1也做了以一样的的事情- [ CSSD]2012-06-10 23:08:47.185 [90983312] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
- [ CSSD]2012-06-10 23:08:47.185 [90983312] >TRACE: clssnmClusterListener: Probing node(2)
- [ CSSD]2012-06-10 23:08:47.188 [90983312] >TRACE: clssnmConnComplete: connected to node 2 (con 0x8c006c8), state 3 birth 0, unique 1339383371/1339383371 prevConuni(0)
- [ CSSD]2012-06-10 23:08:47.212 [55368592] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(6) wrtcnt(693) LATS(0) Disk lastSeqNo(693)
- [ CSSD]2012-06-10 23:08:47.234 [3065797520] >TRACE: clssnmPollingThread: Connection complete
- [ CSSD]2012-06-10 23:08:47.234 [3055307664] >TRACE: clssnmSendingThread: Connection complete
- [ CSSD]2012-06-10 23:08:47.234 [3044817808] >TRACE: clssnmRcfgMgrThread: Connection complete
- [ CSSD]2012-06-10 23:08:47.234 [3044817808] >TRACE: clssnmRcfgMgrThread: Local Join
- [ CSSD]2012-06-10 23:08:47.234 [3044817808] >TRACE: clssnmLocalJoinEvent: set node(2) inactive
- [ CSSD]2012-06-10 23:08:47.234 [3044817808] >WARNING: clssnmLocalJoinEvent: takeover aborted due to UNKNOWN nodes
- [ CSSD]2012-06-10 23:08:47.234 [101473168] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_mycrs_1))
- [ CSSD]2012-06-10 23:08:47.234 [101473168] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_mycrs))
- [ CSSD]2012-06-10 23:08:47.887 [90983312] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[rac2] seq[5] sync[6]
- [ CSSD]2012-06-10 23:08:48.235 [3044817808] >TRACE: clssnmRcfgMgrThread: lastleader(2) unique(1339384124)
- [ CSSD]2012-06-10 23:08:48.890 [90983312] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(6)
- [ CSSD]2012-06-10 23:08:50.895 [90983312] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
- [ CSSD]2012-06-10 23:08:50.895 [90983312] >TRACE: clssnmDeactivateNode: node 0 () left cluster
- [ CSSD]2012-06-10 23:08:50.895 [90983312] >TRACE: clssnmUpdateNodeState: node 1, state (1/2) unique (1339384124/1339384124) prevConuni(0) birth (0/6) (old/new)
- [ CSSD]2012-06-10 23:08:50.895 [90983312] >TRACE: clssnmUpdateNodeState: node 2, state (4/3) unique (1339383371/1339383371) prevConuni(0) birth (0/4) (old/new)
- [ CSSD]2012-06-10 23:08:50.896 [90983312] >USER: clssnmHandleUpdate: SYNC(6) from node(2) completed
- [ CSSD]2012-06-10 23:08:50.896 [90983312] >USER: clssnmHandleUpdate: NODE 1 (rac1) IS ACTIVE MEMBER OF CLUSTER
- [ CSSD]2012-06-10 23:08:50.896 [90983312] >USER: clssnmHandleUpdate: NODE 2 (rac2) IS ACTIVE MEMBER OF CLUSTER
- [ CSSD]2012-06-10 23:08:50.987 [2058448] >USER: NMEVENT_SUSPEND [00][00][00][00]
- [ CSSD]2012-06-10 23:08:50.988 [3034327952] >TRACE: clssgmReconfigThread: started for reconfig (6)
- [ CSSD]2012-06-10 23:08:50.988 [3034327952] >USER: NMEVENT_RECONFIG [00][00][00][06]
- [ CSSD]2012-06-10 23:08:50.989 [3034327952] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 6
- [ CSSD]2012-06-10 23:08:50.992 [3076287376] >TRACE: clssgmInitialRecv: (0x8d4d7b8) accepted a new connection from node 2 born at 4 active (2, 2), vers (10,3,1,2)
- [ CSSD]2012-06-10 23:08:50.992 [3076287376] >TRACE: clssgmInitialRecv: conns done (2/2)
- [ CSSD]2012-06-10 23:08:50.993 [3034327952] >TRACE: clssgmEstablishMasterNode: MASTER for 6 is node(2) birth(4)
- [ CSSD]2012-06-10 23:08:50.993 [3034327952] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
- [ CSSD]2012-06-10 23:08:51.001 [3076287376] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(68) incarn 6
- [ CSSD]CLSS-3000: reconfiguration successful, incarnation 6 with 2 nodes
- [ CSSD]CLSS-3001: local node number 1, master node number 2
复制代码 clssnmReadDskHeartbeat 磁盘心跳失败引发的10.2.0.1 的brain split 导致 Stonith algorithm算法被触发,2台主机均重启
建议
不要使用 10.2.0.1 版本的CRS ,不管你是测试 还是产品环境!!
不要使用OCFS作为共享存储解决方案! |
|