- 最后登录
- 2017-5-4
- 在线时间
- 81 小时
- 威望
- 999
- 金钱
- 2391
- 注册时间
- 2013-9-11
- 阅读权限
- 150
- 帖子
- 1124
- 精华
- 5
- 积分
- 999
- UID
- 1220
|
1#
发表于 2013-10-13 23:17:26
|
查看: 27825 |
回复: 2
Oracle11gR2 RAC高可用性功能测试
Oracle11gR2 RAC高可用性功能测试
RAC节点故障模拟测试
重启单个RAC 节点模拟测试(同kill ocssd)
模拟操作步骤
启动ORACLE RAC数据库(包括监听程序)
客户应用加载到数据库,
在客户连接较多的节点上执行 ‘shutdown -r’操作
AIX, HPUX, Windows: "shutdown -r"
Linux: "shutdown -r now"
Solaris: "reboot"
预期测试结果
数据库实例进程及CRS进程消失
RAC数据中存活的实例将进行实例恢复
SCAN IP、VIP、服务及客户连接均切换到存活的节点上(SQL*PLUS的连接能够自动连切换到存活节点上)
存活节点在短暂停顿后将继续提供服务
系统重新启动之后,节点上的集群服务和数据库将会自动启动,重新加入集群。Vip也将切换回原始节点。
测试过程记录
由于ocssd 进程被kill 会立即重启服务器,故此测试请参阅ocssd 进程 crash 模拟测试部分。
网络故障模拟测试
本部分主要通过模拟RAC公用网络中断、Inter-connect网络中断、监听程序故障来模拟现实应用中可能发生的意外,通过测试以确定预期应采取的措施及相关指标值。
公用网络故障模拟测试
模拟操作步骤
在RAC的一个节点上,移去该节点上公用网卡的网线,模拟外网中断。
预期测试结果
正常情况下生产环境客户应该做了APA双网卡冗余,拔出任何一个网线节点和数据库连接不受任何影响,
拔出2根网线将会造成影响,节点listener 将会停止并且vip 会转移至另一节点。
测量过程记录
A.测试拔除2节点唯一的一根公网网线进行测试(通过禁用交换机上的端口)
配置信息抓取:
### Public ###
192.168.1.100 crm01
192.168.1.103 crm02
### Private ###
192.168.1.132 crm01-priv
192.168.1.133 crm02-priv
节点1:
$ /u03/oracle/app/11.2.0/grid/bin/oifcfg getif
lan901 192.168.1.128 global cluster_interconnect
lan900 192.168.1.96 global public
$ netstat -in
crm01:/# netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
lo0 32808 127.0.0.0 127.0.0.1 534464040 0 534465314 0 0
lan900:1 1500 192.168.1.96 192.168.1.101 20302 0 20058 0 0
lan900:2 1500 192.168.1.96 192.168.1.102 641 0 0 0 0
lan901 1500 192.168.1.128 192.168.1.132 35145101 0 32736598 0 0
lan900 1500 192.168.1.96 192.168.1.100 3015542162 0 5849105425 0 0
lan901:8 1500 169.254.0.0 169.254.124.177 218940954 0 94982382 0 0
Warning: The above name 'lan901:801' is truncated, use -w to show the output in wide format
$ crsctl stat res -t
--------------------------------------------------------------------------------
$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRM1GIDG.dg
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.LISTENER.lsnr
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.asm
ONLINE ONLINE crm01 Started
ONLINE ONLINE crm02 Started
ora.gsd
OFFLINE OFFLINE crm01
OFFLINE OFFLINE crm02
ora.net1.network
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.ons
ONLINE ONLINE crm01
ONLINE ONLINE crm02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE crm01
ora.cvu
1 ONLINE ONLINE crm02
ora.oc4j
1 OFFLINE OFFLINE
ora.scan1.vip
1 ONLINE ONLINE crm01
ora.oracrm.db
1 ONLINE ONLINE crm01 Open
2 ONLINE ONLINE crm02 Open
ora.crm01.vip
1 ONLINE ONLINE crm01
ora.crm02.vip
1 ONLINE ONLINE crm02
$ crsctl stat res -t -init
--------------------------------------------------------------------------------
$ crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE crm01 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm01
ora.crsd
1 ONLINE ONLINE crm01
ora.cssd
1 ONLINE ONLINE crm01
ora.cssdmonitor
1 ONLINE ONLINE crm01
ora.ctssd
1 ONLINE ONLINE crm01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm01
ora.gipcd
1 ONLINE ONLINE crm01
ora.gpnpd
1 ONLINE ONLINE crm01
ora.mdnsd
1 ONLINE ONLINE crm01
节点2:
$ /u03/oracle/app/11.2.0/grid/bin/oifcfg getif
lan901 192.168.1.128 global cluster_interconnect
lan900 192.168.1.96 global public
$ netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
lo0 32808 127.0.0.0 127.0.0.1 9573254 0 9572904 0 0
lan900:1 1500 192.168.1.96 192.168.1.104 10691 0 10443 0 0
lan901 1500 192.168.1.128 192.168.1.133 33986831 0 32906011 0 0
lan900 1500 192.168.1.96 192.168.1.103 15345874 0 14256166 0 0
lan901:8 1500 169.254.0.0 169.254.135.57 209273222 0 104554939 0 0
Warning: The above name 'lan901:801' is truncated, use -w to show the output in wide format
$ crsctl stat res -t
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRM1GIDG.dg
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.LISTENER.lsnr
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.asm
ONLINE ONLINE crm01 Started
ONLINE ONLINE crm02 Started
ora.gsd
OFFLINE OFFLINE crm01
OFFLINE OFFLINE crm02
ora.net1.network
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.ons
ONLINE ONLINE crm01
ONLINE ONLINE crm02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE crm01
ora.cvu
1 ONLINE ONLINE crm02
ora.oc4j
1 OFFLINE OFFLINE
ora.scan1.vip
1 ONLINE ONLINE crm01
ora.oracrm.db
1 ONLINE ONLINE crm01 Open
2 ONLINE ONLINE crm02 Open
ora.crm01.vip
1 ONLINE ONLINE crm01
ora.crm02.vip
1 ONLINE ONLINE crm02
$ crsctl stat res -t -init
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE crm02 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm02
ora.crsd
1 ONLINE ONLINE crm02
ora.cssd
1 ONLINE ONLINE crm02
ora.cssdmonitor
1 ONLINE ONLINE crm02
ora.ctssd
1 ONLINE ONLINE crm02 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm02
ora.gipcd
1 ONLINE ONLINE crm02
ora.gpnpd
1 ONLINE ONLINE crm02
ora.mdnsd
1 ONLINE ONLINE crm02
宕节点1的公共网卡:
(1).宕节点1 的1个公网网卡(DOWN)
从操作系统日志可以看出11:19:53网线网络down
May 23 11:19:53 crm01 sshd[9445]: Accepted publickey for grid from 192.168.1.103 port 32669 ssh2
May 23 11:19:53 crm01 cmnetd[4301]: Link level address on network interface lan900 has been changed from 0x00237d3e3303 to 0x0023
7d3e3146.
(2).宕节点1 的2个公网网卡(DOWN)
从操作系统日志可以看出11:24:36网线网络down
May 23 11:24:36 crm01 cmnetd[4301]: lan900 is down at the data link layer.
May 23 11:24:36 crm01 cmnetd[4301]: lan900 failed.
May 23 11:24:36 crm01 cmnetd[4301]: Subnet 192.168.1.96 down
May 23 11:24:37 crm01 cmviputil[9924]: Issuing cmmodnet to remove 192.168.1.101 address in 192.168.1.96
ping 节点1 vip 192.168.1.101,掉包数: 9
节点1 crsd:
vi /u03/oracle/app/11.2.0/grid/log/crm0*/crsd/crsd.log
2013-05-23 11:24:37.171: [ AGFW][45] {0:2:16224} Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:20421266
2013-05-23 11:24:37.171: [ AGFW][45] {0:2:16224} Received state change for ora.net1.network crm01 1 [old state = ONLINE, new s
tate = OFFLINE]
2013-05-23 11:24:37.172: [ AGFW][45] {0:2:16224} Agfw Proxy Server sending message to PE, Contents = [MIDTo:2|OpID:3|FromA:{Inval
id|Node:0|Process:0|Type:0}|ToA:{Invalid|Node:-1|Process:-1|Type:-1}|MIDFrom:0|Type:4|Pri2|Id:553824:Ver:2]
2013-05-23 11:24:37.172: [ AGFW][45] {0:2:16224} Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:20421
266
2013-05-23 11:24:37.172: [ CRSPE][50] {0:2:16224} State change received from crm01 for ora.net1.network crm01 1
2013-05-23 11:24:37.173: [ CRSPE][50] {0:2:16224} Processing PE command id=45808. Description: [Resource State Change (ora.net1.ne
twork crm01 1) : 6000000002b63630]
2013-05-23 11:24:37.173: [ CRSPE][50] {0:2:16224} RI [ora.net1.network crm01 1] new external state [OFFLINE] old value: [ONLINE
] on crm01 label = []
2013-05-23 11:24:37.174: [ CRSPE][50] {0:2:16224} Resource Resource Instance ID[ora.net1.network crm01 1]. Values:
STATE=OFFLINE
TARGET=ONLINE
LAST_SERVER=crm01
CURRENT_RCOUNT=1
LAST_RESTART=0
FAILURE_COUNT=0
FAILURE_HISTORY=
STATE_DETAILS=
INCARNATION=2
STATE_CHANGE_VERS=1
LAST_FAULT=1369279055
LAST_STATE_CHANGE=1369279477
INTERNAL_STATE=0
DEGREE_ID=1
ID=ora.net1.network crm01 1
Lock Info:
Write Locks:none
ReadLocks:|STATE INITED||INITIAL CHECK DONE| has failed!
2013-05-23 11:24:37.175: [ CRSPE][50] {0:2:16224} Processing unplanned state change for [ora.net1.network crm01 1]
2013-05-23 11:24:37.176: [ CRSRPT][51] {0:2:16224} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.net1.network
2013-05-23 11:24:37.177: [ CRSPE][50] {0:2:16224} Scheduled local recovery for [ora.net1.network crm01 1]
2013-05-23 11:24:37.177: [ CRSPE][50] {0:2:16224} Op 60000000030c81a0 has 3 WOs
2013-05-23 11:24:37.181: [ CRSPE][50] {0:2:16224} RI [ora.net1.network crm01 1] new internal state: [STARTING] old value: [STAB
LE]
2013-05-23 11:24:37.181: [ CRSPE][50] {0:2:16224} Sending message to agfw: id = 553826
2013-05-23 11:24:37.182: [ AGFW][45] {0:2:16224} Agfw Proxy Server received the message: RESOURCE_START[ora.net1.network crm01
1] ID 4098:553826
2013-05-23 11:24:37.182: [ CRSPE][50] {0:2:16224} CRS-2672: Attempting to start 'ora.net1.network' on 'crm01'
2013-05-23 11:24:37.182: [ AGFW][45] {0:2:16224} Agfw Proxy Server forwarding the message: RESOURCE_START[ora.net1.network crm
01 1] ID 4098:553826 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:24:37.184: [ AGFW][45] {0:2:16224} Agfw Proxy Server received the message: RESOURCE_PROBE[ora.LISTENER.lsnr crm0
1 1] ID 4097:553827
2013-05-23 11:24:37.184: [ AGFW][45] {0:2:16224} Agfw Proxy Server forwarding the message: RESOURCE_PROBE[ora.LISTENER.lsnr oraqdc
s01 1] ID 4097:553827 to the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 11:24:37.184: [ AGFW][45] {0:2:16224} Agfw Proxy Server received the message: RESOURCE_PROBE[ora.ons crm01 1] ID 40
97:553828
2013-05-23 11:24:37.185: [ AGFW][45] {0:2:16224} Agfw Proxy Server forwarding the message: RESOURCE_PROBE[ora.ons crm01 1] ID
4097:553828 to the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 11:24:37.185: [ AGFW][45] {0:2:16224} Agfw Proxy Server received the message: RESOURCE_PROBE[ora.crm01.vip 1 1] ID
4097:553830
2013-05-23 11:24:37.185: [ AGFW][45] {0:2:16224} Agfw Proxy Server forwarding the message: RESOURCE_PROBE[ora.crm01.vip 1 1] I
D 4097:553830 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:24:37.224: [ AGFW][45] {0:2:16224} Received the reply to the message: RESOURCE_PROBE[ora.crm01.vip 1 1] ID 4097:
553833 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:24:37.224: [ AGFW][45] {0:2:16224} Received the reply to the message: RESOURCE_PROBE[ora.crm01.vip 1 1] ID 4097:
553833 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:24:37.224: [ AGFW][45] {0:2:16224} ora.crm01.vip 1 1 received state from probe request. Old state = ONLINE, New
state = ONLINE
2013-05-23 11:24:37.224: [ AGFW][45] {0:2:16224} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_PROBE[ora.ora
qdcs01.vip 1 1] ID 4097:553830
2013-05-23 11:24:37.330: [ AGFW][45] {0:2:16224} Received the reply to the message: RESOURCE_START[ora.net1.network crm01 1] I
D 4098:553829 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:24:37.330: [ AGFW][45] {0:2:16224} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.net1.ne
twork crm01 1] ID 4098:553826
2013-05-23 11:24:37.331: [ CRSPE][50] {0:2:16224} Received reply to action [Start] message ID: 553826
2013-05-23 11:24:37.358: [ AGFW][45] {0:2:16224} Received the reply to the message: RESOURCE_PROBE[ora.LISTENER.lsnr crm01 1]
ID 4097:553831 from the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 11:24:37.358: [ AGFW][45] {0:2:16224} ora.LISTENER.lsnr crm01 1 received state from probe request. Old state = ONLI
NE, New state = ONLINE
2013-05-23 11:24:37.358: [ AGFW][45] {0:2:16224} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_PROBE[ora.LI
STENER.lsnr crm01 1] ID 4097:553827
2013-05-23 11:24:37.391: [ AGFW][45] {0:2:16224} Received the reply to the message: RESOURCE_START[ora.net1.network crm01 1] I
D 4098:553829 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:24:37.392: [ AGFW][45] {0:2:16224} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.ne
t1.network crm01 1] ID 4098:553826
2013-05-23 11:24:37.392: [ CRSPE][50] {0:2:16224} Received reply to action [Start] message ID: 553826
2013-05-23 11:24:37.392: [ CRSPE][50] {0:2:16224} RI [ora.net1.network crm01 1] new internal state: [STABLE] old value: [STARTI
NG]
2013-05-23 11:24:37.392: [ CRSPE][50] {0:2:16224} CRS-2674: Start of 'ora.net1.network' on 'crm01' failed
2013-05-23 11:24:37.393: [ CRSPE][50] {0:2:16224} Sequencer for [ora.net1.network crm01 1] has completed with error: CRS-0215:
Could not start resource 'ora.net1.network'.
2013-05-23 11:24:37.394: [ CRSPE][50] {0:2:16224} RIs of this resource are not relocatable:ora.net1.network: failover impossible
2013-05-23 11:24:37.395: [ CRSPE][50] {0:2:16224} Failover cannot be completed for [ora.net1.network crm01 1]. Stopping it and
the resource tree
2013-05-23 11:24:37.399: [ CRSPE][50] {0:2:16224} Op 60000000030d31e0 has 14 WOs
2013-05-23 11:24:37.401: [ CRSPE][50] {0:2:16224} ICE has queued an operation. Details: Operation [STOP of [ora.net1.network oraqdc
s01 1] on [crm01] : 60000000030d31e0] cannot run cause it needs W lock for: WO for Placement Path RI:[ora.ons crm01 1] server
[] target states [OFFLINE ], locked by op [Resource Check: [ora.ons crm01 1]]. Owner: CRS-2683: It is locked by 'SYSTEM' for comm
and 'Unplanned Resource State Change : ora.net1.network'
2013-05-23 11:24:37.469: [ AGFW][45] {0:2:16224} Received the reply to the message: RESOURCE_PROBE[ora.ons crm01 1] ID 4097:55
3832 from the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 11:24:37.469: [ AGFW][45] {0:2:16224} ora.ons crm01 1 received state from probe request. Old state = ONLINE, New st
ate = ONLINE
2013-05-23 11:24:37.470: [ AGFW][45] {0:2:16224} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_PROBE[ora.on
s crm01 1] ID 4097:553828
2013-05-23 11:24:37.470: [ CRSPE][50] {0:2:16224} Failover cannot be completed for [ora.net1.network crm01 1]. Stopping it and
the resource tree
2013-05-23 11:24:37.473: [ CRSPE][50] {0:2:16224} Operation [STOP of [ora.net1.network crm01 1] on [crm01] : 60000000030d31e
0] has been replaced with [STOP of [ora.net1.network crm01 1] on [crm01] : 6000000002ffe1f0
2013-05-23 11:24:37.474: [ CRSPE][50] {0:2:16224} Op 6000000002ffe1f0 has 14 WOs
2013-05-23 11:24:37.475: [ CRSPE][50] {0:2:16224} RI [ora.ons crm01 1] new internal state: [STOPPING] old value: [STABLE]
2013-05-23 11:24:37.476: [ CRSPE][50] {0:2:16224} Sending message to agfw: id = 553839
2013-05-23 11:24:37.476: [ AGFW][45] {0:2:16224} Agfw Proxy Server received the message: RESOURCE_STOP[ora.ons crm01 1] ID 409
9:553839
2013-05-23 11:24:37.476: [ CRSPE][50] {0:2:16224} CRS-2673: Attempting to stop 'ora.ons' on 'crm01'
2013-05-23 11:24:37.476: [ AGFW][45] {0:2:16224} Agfw Proxy Server forwarding the message: RESOURCE_STOP[ora.ons crm01 1] ID 4
099:553839 to the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 11:24:37.477: [ CRSPE][50] {0:2:16224} RI [ora.LISTENER.lsnr crm01 1] new internal state: [STOPPING] old value: [STA
BLE]
2013-05-23 11:24:37.477: [ CRSPE][50] {0:2:16224} Sending message to agfw: id = 553841
2013-05-23 11:24:37.478: [ AGFW][45] {0:2:16224} Agfw Proxy Server received the message: RESOURCE_STOP[ora.LISTENER.lsnr crm01
1] ID 4099:553841
2013-05-23 11:24:37.478: [ CRSPE][50] {0:2:16224} CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'crm01'
2013-05-23 11:24:37.478: [ AGFW][45] {0:2:16224} Agfw Proxy Server forwarding the message: RESOURCE_STOP[ora.LISTENER.lsnr crm
01 1] ID 4099:553841 to the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 11:24:37.607: [UiServer][52] CS(6000000001aab940)set Properties ( grid,60000000019f83f0)
2013-05-23 11:24:37.626: [UiServer][51] {1:41988:45869} Container [ Name: UI_STOP
API_HDR_VER:
TextMessage[2]
CLIENT:
TextMessage[]
CLIENT_NAME:
TextMessage[/u03/oracle/app/11.2.0/grid/bin/tnslsnr]
CLIENT_PID:
TextMessage[8626]
CLIENT_PRIMARY_GROUP:
TextMessage[dba]
EVENT_TAG:
TextMessage[1]
FILTER:
TextMessage[(^A(^A(^ANAME^A==^Aora.LISTENER.lsnr^A)^A&&^A(^ALAST_SERVER^A==^Acrm01^A)^A)^A&&^A(^ASTATE^A!=^AOFFLINE^A)^A)
^AUSR_ORA_OPI=true]
FILTER_TAG:
TextMessage[1]
LOCALE:
TextMessage[AMERICAN_AMERICA.WE8ISO8859P1]
NO_WAIT_TAG:
TextMessage[1]
]
2013-05-23 11:24:37.626: [UiServer][51] {1:41988:45869} Sending message to PE. ctx= 60000000018e1ab0, Client PID: 8626
2013-05-23 11:24:37.627: [ CRSPE][50] {1:41988:45869} Cmd : 6000000001d2c9d0 : flags: EVENT_TAG
2013-05-23 11:24:37.627: [ CRSPE][50] {1:41988:45869} Processing PE command id=45809. Description: [Stop Resource : 6000000001d2c9
d0]
2013-05-23 11:24:37.628: [ CRSPE][50] {1:41988:45869} Expression Filter : (((NAME == ora.LISTENER.lsnr) AND (LAST_SERVER == crm
01)) AND (STATE != OFFLINE))
2013-05-23 11:24:37.629: [ CRSPE][50] {1:41988:45869} Expression Filter : (((NAME == ora.LISTENER.lsnr) AND (LAST_SERVER == crm
01)) AND (STATE != OFFLINE))
2013-05-23 11:24:37.632: [ CRSPE][50] {1:41988:45869} Attribute overrides for the command: USR_ORA_OPI = true;
2013-05-23 11:24:37.633: [ CRSPE][50] {1:41988:45869} Filtering duplicate ops: server [] state [OFFLINE]
2013-05-23 11:24:37.633: [ CRSPE][50] {1:41988:45869} Op 60000000030c3180 has 5 WOs
2013-05-23 11:24:37.636: [ CRSPE][50] {1:41988:45869} PE Command [ Stop Resource : 6000000001d2c9d0 ] has completed
2013-05-23 11:24:37.636: [ CRSPE][50] {1:41988:45869} UI Command [Stop Resource : 6000000001d2c9d0] is replying to sender.
2013-05-23 11:24:37.636: [UiServer][51] {1:41988:45869} Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2545: Cannot operate on 'instance of ora.LISTENER.lsnr assigned to crm01'. It is locked by 'SYSTEM' for c
ommand 'Unplanned Resource State Change : ora.net1.network']
MSGTYPE:
TextMessage[1]
OBJID:
TextMessage[ora.LISTENER.lsnr crm01 1]
WAIT:
TextMessage[0]
]
2013-05-23 11:24:37.637: [UiServer][51] {1:41988:45869} Container [ Name: UI_DATA
ora.LISTENER.lsnr crm01 1:
TextMessage[233]
]
2013-05-23 11:24:37.637: [UiServer][51] {1:41988:45869} Done for ctx=60000000018e1ab0
2013-05-23 11:24:37.776: [UiServer][52] CS(6000000001aab940)set Properties ( root,60000000019f83f0)
2013-05-23 11:24:37.778: [ AGFW][45] {0:2:16224} Received the reply to the message: RESOURCE_STOP[ora.LISTENER.lsnr crm01 1] I
D 4099:553842 from the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 11:24:37.779: [ AGFW][45] {0:2:16224} Agfw Proxy Server sending the reply to PE for message:RESOURCE_STOP[ora.LISTENER
.lsnr crm01 1] ID 4099:553841
2013-05-23 11:24:37.779: [ CRSPE][50] {0:2:16224} Received reply to action [Stop] message ID: 553841
2013-05-23 11:24:37.796: [UiServer][51] {1:41988:45870} Sending message to PE. ctx= 60000000018e18d0, Client PID: 6056
2013-05-23 11:24:37.797: [ CRSPE][50] {1:41988:45870} Processing PE command id=45810. Description: [Stat Resource : 6000000001d2c9
d0]
2013-05-23 11:24:37.809: [UiServer][51] {1:41988:45870} Done for ctx=60000000018e18d0
2013-05-23 11:24:37.836: [UiServer][52] CS(6000000001aab940)set Properties ( root,60000000019f83f0)
2013-05-23 11:24:37.856: [UiServer][51] {1:41988:45871} Sending message to PE. ctx= 60000000018e1740, Client PID: 6056
2013-05-23 11:24:37.857: [ CRSPE][50] {1:41988:45871} Processing PE command id=45811. Description: [Stat Resource : 6000000001d2c9
d0]
2013-05-23 11:24:37.859: [UiServer][51] {1:41988:45871} Done for ctx=60000000018e1740
2013-05-23 11:24:37.900: [ AGFW][45] {0:2:16224} Received the reply to the message: RESOURCE_STOP[ora.LISTENER.lsnr crm01 1] I
D 4099:553842 from the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 11:24:37.900: [ AGFW][45] {0:2:16224} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_STOP[ora.LIS
2013-05-23 11:24:37.857: [ CRSPE][50] {1:41988:45871} Processing PE command id=45811. Description: [Stat Resource : 6000000001d2c9
d0]
2013-05-23 11:24:37.859: [UiServer][51] {1:41988:45871} Done for ctx=60000000018e1740
2013-05-23 11:24:37.900: [ AGFW][45] {0:2:16224} Received the reply to the message: RESOURCE_STOP[ora.LISTENER.lsnr crm01 1] I
D 4099:553842 from the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 11:24:37.900: [ AGFW][45] {0:2:16224} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_STOP[ora.LIS
TENER.lsnr crm01 1] ID 4099:553841
2013-05-23 11:24:37.900: [ CRSPE][50] {0:2:16224} Received reply to action [Stop] message ID: 553841
2013-05-23 11:24:37.900: [ CRSPE][50] {0:2:16224} RI [ora.LISTENER.lsnr crm01 1] new internal state: [STABLE] old value: [STOPP
ING]
2013-05-23 11:24:37.900: [ CRSPE][50] {0:2:16224} RI [ora.LISTENER.lsnr crm01 1] new external state [OFFLINE] old value: [ONLIN
E] label = []
2013-05-23 11:24:37.901: [ CRSPE][50] {0:2:16224} CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'crm01' succeeded
2013-05-23 11:24:37.903: [ CRSRPT][51] {0:2:16224} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.LISTENER.lsnr
2013-05-23 11:24:37.903: [ CRSPE][50] {0:2:16224} RI [ora.crm01.vip 1 1] new internal state: [STOPPING] old value: [STABLE]
2013-05-23 11:24:37.904: [ CRSPE][50] {0:2:16224} Sending message to agfw: id = 553852
2013-05-23 11:24:37.904: [ AGFW][45] {0:2:16224} Agfw Proxy Server received the message: RESOURCE_STOP[ora.crm01.vip 1 1] ID 4
099:553852
2013-05-23 11:24:37.904: [ CRSPE][50] {0:2:16224} CRS-2673: Attempting to stop 'ora.crm01.vip' on 'crm01'
2013-05-23 11:24:37.904: [ AGFW][45] {0:2:16224} Agfw Proxy Server forwarding the message: RESOURCE_STOP[ora.crm01.vip 1 1] ID
4099:553852 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:24:37.950: [ AGFW][45] {0:2:16224} Received the reply to the message: RESOURCE_STOP[ora.ons crm01 1] ID 4099:553
840 from the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 11:24:37.951: [ AGFW][45] {0:2:16224} Agfw Proxy Server sending the reply to PE for message:RESOURCE_STOP[ora.ons oraqd
cs01 1] ID 4099:553839
2013-05-23 11:24:37.951: [ CRSPE][50] {0:2:16224} Received reply to action [Stop] message ID: 553839
2013-05-23 11:24:38.053: [ AGFW][45] {0:2:16224} Received the reply to the message: RESOURCE_STOP[ora.crm01.vip 1 1] ID 4099:5
53853 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:24:38.054: [ AGFW][45] {0:2:16224} Agfw Proxy Server sending the reply to PE for message:RESOURCE_STOP[ora.crm01
.vip 1 1] ID 4099:553852
2013-05-23 11:24:38.054: [ CRSPE][50] {0:2:16224} Received reply to action [Stop] message ID: 553852
2013-05-23 11:24:38.060: [UiServer][52] CS(6000000001aab940)set Properties ( root,60000000019f83f0)
2013-05-23 11:24:38.076: [UiServer][51] {1:41988:45872} Sending message to PE. ctx= 60000000018e1ab0, Client PID: 6056
2013-05-23 11:24:38.076: [ CRSPE][50] {1:41988:45872} Processing PE command id=45812. Description: [Stat Resource : 6000000002cc32
f0]
2013-05-23 11:24:38.089: [UiServer][51] {1:41988:45872} Done for ctx=60000000018e1ab0
2013-05-23 11:24:38.203: [ AGFW][45] {0:2:16224} Received the reply to the message: RESOURCE_STOP[ora.ons crm01 1] ID 4099:553
840 from the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
@
2013-05-23 11:24:38.204: [ AGFW][45] {0:2:16224} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_STOP[ora.ons
crm01 1] ID 4099:553839
2013-05-23 11:24:38.204: [ CRSPE][50] {0:2:16224} Received reply to action [Stop] message ID: 553839
2013-05-23 11:24:38.204: [ CRSPE][50] {0:2:16224} RI [ora.ons crm01 1] new internal state: [STABLE] old value: [STOPPING]
2013-05-23 11:24:38.204: [ CRSPE][50] {0:2:16224} RI [ora.ons crm01 1] new external state [OFFLINE] old value: [ONLINE] label =
[]
2013-05-23 11:24:38.205: [ CRSPE][50] {0:2:16224} CRS-2677: Stop of 'ora.ons' on 'crm01' succeeded
2013-05-23 11:24:38.206: [ CRSRPT][51] {0:2:16224} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.ons
2013-05-23 11:25:18.492: [ AGFW][45] {0:2:16224} Received the reply to the message: RESOURCE_STOP[ora.crm01.vip 1 1] ID 4099:5
53853 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:25:18.493: [ AGFW][45] {0:2:16224} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_STOP[ora.oraq
dcs01.vip 1 1] ID 4099:553852
2013-05-23 11:25:18.494: [ CRSPE][50] {0:2:16224} Received reply to action [Stop] message ID: 553852
2013-05-23 11:25:18.494: [ CRSPE][50] {0:2:16224} RI [ora.crm01.vip 1 1] new internal state: [STABLE] old value: [STOPPING]
2013-05-23 11:25:18.494: [ CRSPE][50] {0:2:16224} RI [ora.crm01.vip 1 1] new external state [OFFLINE] old value: [ONLINE] label
= []
2013-05-23 11:25:18.495: [ CRSPE][50] {0:2:16224} CRS-2677: Stop of 'ora.crm01.vip' on 'crm01' succeeded
2013-05-23 11:25:18.497: [ CRSRPT][51] {0:2:16224} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.crm01.vip
2013-05-23 11:25:18.501: [ CRSPE][50] {0:2:16224} RI [ora.crm01.vip 1 1] new internal state: [STARTING] old value: [STABLE]
2013-05-23 11:25:18.501: [ CRSPE][50] {0:2:16224} Sending message to agfw: id = 553866
2013-05-23 11:25:18.502: [ CRSPE][50] {0:2:16224} CRS-2672: Attempting to start 'ora.crm01.vip' on 'crm02'
2013-05-23 11:25:18.507: [UiServer][52] CS(6000000001aab940)set Properties ( root,60000000019f83f0)
2013-05-23 11:25:18.526: [UiServer][51] {1:41988:45873} Sending message to PE. ctx= 60000000018e18d0, Client PID: 6056
2013-05-23 11:25:18.526: [ CRSPE][50] {1:41988:45873} Cmd : 6000000002c77140 : flags: HOST_TAG
2013-05-23 11:25:18.527: [ CRSPE][50] {1:41988:45873} Processing PE command id=45813. Description: [Stat Resource : 6000000002c771
40]
2013-05-23 11:25:18.529: [UiServer][51] {1:41988:45873} Done for ctx=60000000018e18d0
2013-05-23 11:25:19.730: [ CRSPE][50] {0:2:16224} Received reply to action [Start] message ID: 553866
2013-05-23 11:25:19.733: [ CRSPE][50] {0:2:16224} Received reply to action [Start] message ID: 553866
2013-05-23 11:25:19.733: [ CRSPE][50] {0:2:16224} RI [ora.crm01.vip 1 1] new internal state: [STABLE] old value: [STARTING]
2013-05-23 11:25:19.733: [ CRSPE][50] {0:2:16224} RI [ora.crm01.vip 1 1] new external state [INTERMEDIATE] old value: [OFFLINE]
on crm02 label = [FAILED OVER]
2013-05-23 11:25:19.733: [ CRSPE][50] {0:2:16224} Set LAST_SERVER to crm02 for [ora.crm01.vip 1 1]
2013-05-23 11:25:19.733: [ CRSPE][50] {0:2:16224} Set State Details to [FAILED OVER] from [ ] for [ora.crm01.vip 1 1]
2013-05-23 11:25:19.733: [ CRSPE][50] {0:2:16224} CRS-2676: Start of 'ora.crm01.vip' on 'crm02' succeeded
2013-05-23 11:25:19.735: [ CRSOCR][47] {0:2:16224} Multi Write Batch processing...
2013-05-23 11:25:19.736: [ CRSRPT][51] {0:2:16224} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.crm01.vip
2013-05-23 11:25:19.746: [ CRSOCR][47] {0:2:16224} Multi Write Batch done.
2013-05-23 11:25:19.748: [ CRSPE][50] {0:2:16224} Op 60000000030cc820 has 3 WOs
2013-05-23 11:25:19.751: [ CRSPE][50] {0:2:16224} PE Command [ Resource State Change (ora.net1.network crm01 1) : 6000000002b63
630 ] has completed
2013-05-23 11:25:19.752: [ AGFW][45] {0:2:16224} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:553873
2013-05-23 11:25:19.752: [ AGFW][45] {0:2:16224} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:553873
2013-05-23 11:25:19.753: [ AGFW][45] {0:2:16224} Agfw received reply from PE for resource state change for ora.net1.network oraqdc
s01 1
2013-05-23 11:25:20.385: [ CRSPE][50] {2:55972:4308} Processing PE command id=4291. Description: [Stat Resource : 6000000001978ed0
]
2013-05-23 11:25:20.385: [ CRSPE][50] {2:55972:4308} Expression Filter : ((NAME == ora.scan1.vip) AND (LAST_SERVER == crm02))
2013-05-23 11:26:20.396: [ CRSPE][50] {2:55972:4309} Processing PE command id=4292. Description: [Stat Resource : 6000000001978ed0
]
2013-05-23 11:26:20.397: [ CRSPE][50] {2:55972:4309} Expression Filter : ((NAME == ora.scan1.vip) AND (LAST_SERVER == crm02))
2013-05-23 11:27:20.405: [ CRSPE][50] {2:55972:4310} Processing PE command id=4293. Description: [Stat Resource : 6000000001978ed0
]
2013-05-23 11:27:20.406: [ CRSPE][50] {2:55972:4310} Expression Filter : ((NAME == ora.scan1.vip) AND (LAST_SERVER == crm02))
2013-05-23 11:27:22.726: [UiServer][52] CS(6000000001aab940)set Properties ( root,60000000019f83f0)
2013-05-23 11:27:22.744: [UiServer][51] {1:41988:45874} Sending message to PE. ctx= 60000000018e1740, Client PID: 5911
2013-05-23 11:27:22.745: [ CRSPE][50] {1:41988:45874} Processing PE command id=45814. Description: [Stat Resource : 6000000001978e
d0]
2013-05-23 11:27:22.747: [UiServer][51] {1:41988:45874} Done for ctx=60000000018e1740
2013-05-23 11:28:20.414: [ CRSPE][50] {2:55972:4311} Processing PE command id=4294. Description: [Stat Resource : 6000000001971e40
]
2013-05-23 11:28:20.415: [ CRSPE][50] {2:55972:4311} Expression Filter : ((NAME == ora.scan1.vip) AND (LAST_SERVER == crm02))
2013-05-23 11:29:20.404: [ CRSPE][50] {2:55972:4312} Processing PE command id=4295. Description: [Stat Resource : 6000000001978ed0
]
2013-05-23 11:29:20.405: [ CRSPE][50] {2:55972:4312} Expression Filter : ((NAME == ora.scan1.vip) AND (LAST_SERVER == crm02))
2013-05-23 11:30:20.424: [ CRSPE][50] {2:55972:4313} Processing PE command id=4296. Description: [Stat Resource : 6000000001978ed0
]
2013-05-23 11:30:20.425: [ CRSPE][50] {2:55972:4313} Expression Filter : ((NAME == ora.scan1.vip) AND (LAST_SERVER == crm02))
节点2 crsd:
2013-05-23 11:24:20.390: [UiServer][47] {2:55972:4307} Done for ctx=60000000026ff4c0
2013-05-23 11:25:18.513: [ AGFW][41] {0:2:16224} Agfw Proxy Server received the message: RESOURCE_START[ora.crm01.vip 1 1] ID
4098:553866
2013-05-23 11:25:18.514: [ AGFW][41] {0:2:16224} Agfw Proxy Server forwarding the message: RESOURCE_START[ora.crm01.vip 1 1] I
D 4098:553866 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:25:19.724: [ AGFW][41] {0:2:16224} Received the reply to the message: RESOURCE_START[ora.crm01.vip 1 1] ID 4098:
563912 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:25:19.725: [ AGFW][41] {0:2:16224} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.crm0
1.vip 1 1] ID 4098:553866
2013-05-23 11:25:19.727: [ AGFW][41] {0:2:16224} Received the reply to the message: RESOURCE_START[ora.crm01.vip 1 1] ID 4098:
563912 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:25:19.728: [ AGFW][41] {0:2:16224} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.ora
qdcs01.vip 1 1] ID 4098:553866
2013-05-23 11:25:19.754: [ AGFW][41] {0:2:16224} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:553874
2013-05-23 11:25:19.755: [ AGFW][41] {0:2:16224} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:553874
2013-05-23 11:25:20.362: [UiServer][48] CS(60000000029217f0)set Properties ( grid,60000000027d3290)
2013-05-23 11:25:20.382: [UiServer][47] {2:55972:4308} Sending message to PE. ctx= 60000000026ff4c0, Client PID: 27951
2013-05-23 11:25:20.397: [UiServer][47] {2:55972:4308} Done for ctx=60000000026ff4c0
2013-05-23 11:26:20.372: [UiServer][48] CS(60000000029217f0)set Properties ( grid,60000000027d3290)
2013-05-23 11:26:20.391: [UiServer][47] {2:55972:4309} Sending message to PE. ctx= 60000000026ff5b0, Client PID: 27951
(3). 恢复节点1的公网网卡(UP)
从操作系统日志可以看出11:48:45网线网络up
May 23 11:48:45 crm01 cmnetd[4301]: lan900 is up at the data link layer.
May 23 11:48:45 crm01 cmnetd[4301]: lan900 recovered.
May 23 11:48:45 crm01 cmnetd[4301]: Subnet 192.168.1.96 up
May 23 11:49:19 crm01 cmviputil[10929]: Issuing cmmodnet to add 192.168.1.101 address in 192.168.1.96
May 23 11:49:19 crm01 cmviputil[10929]:
May 23 11:49:19 crm01 syslog: /usr/sbin/cmmodnet -a -i 192.168.1.101 192.168.1.96
May 23 11:49:49 crm01 cmnetd[4301]: Link level address on network interface lan900 has been changed from 0x00237d3e3146 to 0x0023
7d3e3303.
May 23 11:53:05 crm01 xntpd[3128]: synchronized to 10.70.213.132, stratum=2
ping 节点1 vip 192.168.1.101,掉包数: 8
节点1 crsd:
2013-05-23 11:48:39.519: [ CRSRPT][51] {0:2:16228} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.ons
2013-05-23 11:48:39.520: [ CRSPE][50] {0:2:16228} PE Command [ Resource State Change (ora.net1.network crm01 1) : 6000000002b63
630 ] has completed
2013-05-23 11:48:39.521: [ AGFW][45] {0:2:16228} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:554204
2013-05-23 11:48:39.521: [ AGFW][45] {0:2:16228} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:554204
2013-05-23 11:48:39.521: [ AGFW][45] {0:2:16228} Agfw received reply from PE for resource state change for ora.net1.network oraqdc
s01 1
2013-05-23 11:49:18.375: [ CRSPE][50] {1:41988:45880} Received reply to action [Stop] message ID: 554197
2013-05-23 11:49:18.375: [ CRSPE][50] {1:41988:45880} RI [ora.crm01.vip 1 1] new internal state: [STABLE] old value: [STOPPING]
2013-05-23 11:49:18.375: [ CRSPE][50] {1:41988:45880} RI [ora.crm01.vip 1 1] new external state [OFFLINE] old value: [INTERMEDI
ATE] label = []
2013-05-23 11:49:18.375: [ CRSPE][50] {1:41988:45880} Set State Details to [] from [ FAILED OVER] for [ora.crm01.vip 1 1]
2013-05-23 11:49:18.376: [ CRSPE][50] {1:41988:45880} CRS-2677: Stop of 'ora.crm01.vip' on 'crm02' succeeded
2013-05-23 11:49:18.377: [ CRSRPT][51] {1:41988:45880} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.crm01.vip
2013-05-23 11:49:18.380: [ CRSPE][50] {1:41988:45880} RI [ora.crm01.vip 1 1] new internal state: [STARTING] old value: [STABLE]
2013-05-23 11:49:18.380: [UiServer][51] {1:41988:45880} Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2677: Stop of 'ora.crm01.vip' on 'crm02' succeeded]
MSGTYPE:
TextMessage[3]
OBJID:
TextMessage[ora.crm01.vip]
WAIT:
TextMessage[0]
]
2013-05-23 11:49:18.381: [UiServer][51] {1:41988:45880} CS(6000000001aab810)No connection to client.6
2013-05-23 11:49:18.381: [UiServer][51] {1:41988:45880} Communication exception sending reply back to client.FatalCommsException : F
ailed to send response to client.
(File: clsMessageStream.cpp, line: 275
2013-05-23 11:49:18.381: [UiServer][51] {1:41988:45880} CS(6000000001aab810)No connection to client.6
2013-05-23 11:49:18.381: [UiServer][51] {1:41988:45880} Communication exception sending reply back to client.FatalCommsException : F
ailed to send response to client.
(File: clsMessageStream.cpp, line: 275
2013-05-23 11:49:18.381: [ CRSPE][50] {1:41988:45880} Sending message to agfw: id = 554213
2013-05-23 11:49:18.382: [ AGFW][45] {1:41988:45880} Agfw Proxy Server received the message: RESOURCE_START[ora.crm01.vip 1 1]
ID 4098:554213
2013-05-23 11:49:18.382: [ CRSPE][50] {1:41988:45880} CRS-2672: Attempting to start 'ora.crm01.vip' on 'crm01'
2013-05-23 11:49:18.382: [ AGFW][45] {1:41988:45880} Agfw Proxy Server forwarding the message: RESOURCE_START[ora.crm01.vip 1
1] ID 4098:554213 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:49:18.385: [UiServer][51] {1:41988:45880} Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2672: Attempting to start 'ora.crm01.vip' on 'crm01']
MSGTYPE:
TextMessage[3]
OBJID:
TextMessage[ora.crm01.vip]
WAIT:
TextMessage[0]
]
2013-05-23 11:49:18.385: [UiServer][51] {1:41988:45880} CS(6000000001aab810)No connection to client.6
2013-05-23 11:49:18.386: [UiServer][51] {1:41988:45880} Communication exception sending reply back to client.FatalCommsException : F
ailed to send response to client.
(File: clsMessageStream.cpp, line: 275
2013-05-23 11:49:19.646: [ AGFW][45] {1:41988:45880} Received the reply to the message: RESOURCE_START[ora.crm01.vip 1 1] ID 4
098:554214 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:49:19.647: [ AGFW][45] {1:41988:45880} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.oraq
dcs01.vip 1 1] ID 4098:554213
2013-05-23 11:49:19.647: [ CRSPE][50] {1:41988:45880} Received reply to action [Start] message ID: 554213
2013-05-23 11:49:19.649: [ AGFW][45] {1:41988:45880} Received the reply to the message: RESOURCE_START[ora.crm01.vip 1 1] ID 4
098:554214 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:49:19.650: [ AGFW][45] {1:41988:45880} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[or
a.crm01.vip 1 1] ID 4098:554213
2013-05-23 11:49:19.650: [ CRSPE][50] {1:41988:45880} Received reply to action [Start] message ID: 554213
2013-05-23 11:49:19.650: [ CRSPE][50] {1:41988:45880} RI [ora.crm01.vip 1 1] new internal state: [STABLE] old value: [STARTING]
2013-05-23 11:49:19.650: [ CRSPE][50] {1:41988:45880} RI [ora.crm01.vip 1 1] new external state [ONLINE] old value: [OFFLINE] o
n crm01 label = []
2013-05-23 11:49:19.650: [ CRSPE][50] {1:41988:45880} Set LAST_SERVER to crm01 for [ora.crm01.vip 1 1]
2013-05-23 11:49:19.651: [ CRSPE][50] {1:41988:45880} CRS-2676: Start of 'ora.crm01.vip' on 'crm01' succeeded
2013-05-23 11:49:19.653: [ CRSOCR][47] {1:41988:45880} Multi Write Batch processing...
2013-05-23 11:49:19.654: [ CRSRPT][51] {1:41988:45880} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.crm01.vip
2013-05-23 11:49:19.654: [UiServer][51] {1:41988:45880} Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2676: Start of 'ora.crm01.vip' on 'crm01' succeeded]
MSGTYPE:
TextMessage[3]
OBJID:
TextMessage[ora.crm01.vip]
WAIT:
TextMessage[0]
]
2013-05-23 11:49:19.654: [UiServer][51] {1:41988:45880} CS(6000000001aab810)No connection to client.6
......
......
2013-05-23 11:49:21.557: [ AGFW][45] {1:41988:45880} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.LIS
TENER.lsnr crm01 1] ID 4098:554219
2013-05-23 11:49:21.558: [ CRSPE][50] {1:41988:45880} Received reply to action [Start] message ID: 554219
2013-05-23 11:49:21.667: [ AGFW][45] {1:41988:45880} Received the reply to the message: RESOURCE_START[ora.LISTENER.lsnr crm01
1] ID 4098:554220 from the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 11:49:21.668: [ AGFW][45] {1:41988:45880} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[or
a.LISTENER.lsnr crm01 1] ID 4098:554219
2013-05-23 11:49:21.668: [ CRSPE][50] {1:41988:45880} Received reply to action [Start] message ID: 554219
2013-05-23 11:49:21.668: [ CRSPE][50] {1:41988:45880} RI [ora.LISTENER.lsnr crm01 1] new internal state: [STABLE] old value: [S
TARTING]
2013-05-23 11:49:21.668: [ CRSPE][50] {1:41988:45880} RI [ora.LISTENER.lsnr crm01 1] new external state [ONLINE] old value: [OF
FLINE] on crm01 label = []
2013-05-23 11:49:21.669: [ CRSPE][50] {1:41988:45880} CRS-2676: Start of 'ora.LISTENER.lsnr' on 'crm01' succeeded
2013-05-23 11:49:21.671: [ CRSRPT][51] {1:41988:45880} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.LISTENER.lsnr
2013-05-23 11:49:21.673: [ CRSPE][50] {1:41988:45880} PE Command [ Relocate Resource : 6000000001d2c9d0 ] has completed
2013-05-23 11:49:21.673: [UiServer][51] {1:41988:45880} Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2676: Start of 'ora.LISTENER.lsnr' on 'crm01' succeeded]
MSGTYPE:
TextMessage[3]
OBJID:
TextMessage[ora.crm01.vip]
WAIT:
TextMessage[0]
]
2013-05-23 11:49:21.673: [UiServer][51] {1:41988:45880} CS(6000000001aab810)No connection to client.6
2013-05-23 11:49:21.674: [UiServer][51] {1:41988:45880} Communication exception sending reply back to client.FatalCommsException : F
ailed to send response to client.
(File: clsMessageStream.cpp, line: 275
2013-05-23 11:49:21.674: [ CRSPE][50] {1:41988:45880} UI Command [Relocate Resource : 6000000001d2c9d0] is replying to sender.
2013-05-23 11:49:21.674: [ AGFW][45] {1:41988:45880} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:554249
2013-05-23 11:49:21.674: [ AGFW][45] {1:41988:45880} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:554
249
2013-05-23 11:49:21.676: [UiServer][51] {1:41988:45880} Container [ Name: UI_DATA
ora.crm01.vip:
TextMessage[0]
]
2013-05-23 11:49:21.676: [UiServer][51] {1:41988:45880} CS(6000000001aab810)No connection to client.6
@
?
2013-05-23 11:49:19.647: [ AGFW][45] {1:41988:45880} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.oraq
dcs01.vip 1 1] ID 4098:554213
2013-05-23 11:49:19.647: [ CRSPE][50] {1:41988:45880} Received reply to action [Start] message ID: 554213
2013-05-23 11:49:19.649: [ AGFW][45] {1:41988:45880} Received the reply to the message: RESOURCE_START[ora.crm01.vip 1 1] ID 4
098:554214 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:49:19.650: [ AGFW][45] {1:41988:45880} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[or
a.crm01.vip 1 1] ID 4098:554213
2013-05-23 11:49:19.650: [ CRSPE][50] {1:41988:45880} Received reply to action [Start] message ID: 554213
2013-05-23 11:49:19.650: [ CRSPE][50] {1:41988:45880} RI [ora.crm01.vip 1 1] new internal state: [STABLE] old value: [STARTING]
2013-05-23 11:49:19.650: [ CRSPE][50] {1:41988:45880} RI [ora.crm01.vip 1 1] new external state [ONLINE] old value: [OFFLINE] o
n crm01 label = []
2013-05-23 11:49:19.650: [ CRSPE][50] {1:41988:45880} Set LAST_SERVER to crm01 for [ora.crm01.vip 1 1]
2013-05-23 11:49:19.651: [ CRSPE][50] {1:41988:45880} CRS-2676: Start of 'ora.crm01.vip' on 'crm01' succeeded
2013-05-23 11:49:19.653: [ CRSOCR][47] {1:41988:45880} Multi Write Batch processing...
2013-05-23 11:49:19.654: [ CRSRPT][51] {1:41988:45880} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.crm01.vip
2013-05-23 11:49:19.654: [UiServer][51] {1:41988:45880} Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2676: Start of 'ora.crm01.vip' on 'crm01' succeeded]
MSGTYPE:
TextMessage[3]
OBJID:
TextMessage[ora.crm01.vip]
WAIT:
TextMessage[0]
]
节点2 crsd:
2013-05-23 11:48:37.842: [ AGFW][41] {1:41988:45880} Agfw Proxy Server received the message: RESOURCE_STOP[ora.crm01.vip 1 1]
ID 4099:554197
2013-05-23 11:48:37.843: [ AGFW][41] {1:41988:45880} Agfw Proxy Server forwarding the message: RESOURCE_STOP[ora.crm01.vip 1 1
] ID 4099:554197 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:48:37.968: [ AGFW][41] {1:41988:45880} Received the reply to the message: RESOURCE_STOP[ora.crm01.vip 1 1] ID 40
99:564355 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:48:37.969: [ AGFW][41] {1:41988:45880} Agfw Proxy Server sending the reply to PE for message:RESOURCE_STOP[ora.oraqd
cs01.vip 1 1] ID 4099:554197
2013-05-23 11:49:18.369: [ AGFW][41] {1:41988:45880} Received the reply to the message: RESOURCE_STOP[ora.crm01.vip 1 1] ID 40
99:564355 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 11:49:18.370: [ AGFW][41] {1:41988:45880} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_STOP[ora
.crm01.vip 1 1] ID 4099:554197
2013-05-23 11:49:20.584: [UiServer][48] CS(60000000029217f0)set Properties ( grid,60000000027d3290)
2013-05-23 11:49:20.603: [UiServer][47] {2:55972:4332} Sending message to PE. ctx= 6000000001233b20, Client PID: 27951
2013-05-23 11:49:20.619: [UiServer][47] {2:55972:4332} Done for ctx=6000000001233b20
2013-05-23 11:49:21.677: [ AGFW][41] {1:41988:45880} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:554250
2013-05-23 11:49:21.677: [ AGFW][41] {1:41988:45880} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:554
250
将网线插回后,集群会侦测到故障排除并且自动启动listener 和ons ,VIP会自动切换回节点1
测试效果总结
1.客户外网网络配置了冗余策略,一根网线故障不会对集群和数据库造可用性造成影响。
2.外网所有网线(2根网线)被拔除之后,crs检测到网络异常,将会尝试修复,进而将vip切换到能够正常对外服务的节点上。
当外网网线重新恢复连接之后,crs 将会立即检测到并且尝试重新启动listener 和ons vip将自动切换到原始节点。
Inter-connect网卡故障模拟测试
模拟操作步骤
在RAC的一个节点上,移去该节点上私有网卡的网线,模拟私有网络中断。
预期测试结果
GI应侦测到集群分裂,节点及数据库实例将会当机 节点将会被集群逐出。
在两节点RAC集群中,具有最小实例号的节点将继续存活。
在多节点RAC集群中,具有最大子集群的集群将继续存活。
若出现的等大小子集群,则具有最小实例号的子集群将继续存活。
在11.2.0.2以前版本,Oracle将直接重新启动另外一个节点。
在11.2.0.2以上版本,Oracle将启动“Reboot less Restart”功能,即在被逐出的节点上,Clusterware尝试graceful方式的shutdown。具体过程如下:
所有I/O相关的客户端进程将被终止,并且所有资源将进行清理。如果该过程没有顺利结束,则被逐出节点将重新启动,否则节点不会被重启。
如果上述过程顺利结束,并且私网被顺利恢复,则OHASD进程将尝试重新启动该节点的Clusterware和RAC。
在11.2.0.2以上版本,被逐出的节点可能不会重新启动。
请检查如下日志,查看详细过程:
$/u03/oracle/app/11.2.0/grid/log/orajdcpt*/alertorajdcpt*.log
$/u03/oracle/app/11.2.0/grid/log/orajdcpt*/cssd/ocssd.log
测量过程记录
settings can be changed online without taking any node down:
1) Execute crsctl as root to confirm the change:
/u03/oracle/app/11.2.0/grid/bin/crsctl get css misscount
600
/u03/oracle/app/11.2.0/grid/bin/crsctl get css reboottime
3
/u03/oracle/app/11.2.0/grid/bin/crsctl get css disktimeout
597
宕节点2私有网卡:
(1).宕节点2 的1个私网网卡(DOWN)
查看操作系统日志:
vi /var/adm/syslog/syslog.log
May 23 16:02:31 crm02 syslog: /usr/sbin/cmmodnet -a -i 192.168.1.104 192.168.1.96
May 23 16:14:02 crm02 cmnetd[7016]: Link level address on network interface lan901 has been changed from 0x00237d3e3306 to 0x00237d3e3301.
GI状态无影响。
(2).宕节点2 的2个私网网卡(DOWN)
查看操作系统日志16:23:21 私网失效:
vi /var/adm/syslog/syslog.log
May 23 16:23:21 crm02 cmnetd[7016]: lan901 is down at the data link layer.
May 23 16:23:21 crm02 cmnetd[7016]: lan901 failed.
May 23 16:23:21 crm02 cmnetd[7016]: Subnet 192.168.1.128 down
第二节点私有网线都网路down后,由于集群发生脑裂故障,数据库无法使用,5分钟后ocssd日志中提示,将于300秒后故障节点被踢出,此时无故障节点恢复使用。300秒后,故障节点(2节点)被踢出集群,集群进入cleanup 和graceful shutdown 阶段。
根据之前描述,如果集群可以完成清理的过程,ohasd 进程将会一直检测内网连接恢复,恢复后会自动启动所有其他组件,重新加入集群并且恢复数据库,
如果清理无法完成,将会重启节点服务器。
ocssd 和 crsd 日志 alert 日志如下:
节点1 alert日志:
2013-05-23 16:28:23.029
[cssd(5898)]CRS-1612:Network communication with node crm02 (2) missing for 50% of timeout interval. Removal of this node from cluster in 299.997 seconds
2013-05-23 16:29:02.450
[cssd(5898)]CRS-1663:Member kill issued by PID 20001 for 1 members, group DBoraCRM. Details at (:CSSGM00044:) in /u03/oracle/app/11.2.0/grid/log/crm01/cssd/ocssd.log.
2013-05-23 16:30:53.408
[cssd(5898)]CRS-1611:Network communication with node crm02 (2) missing for 75% of timeout interval. Removal of this node from cluster in 149.617 seconds
2013-05-23 16:32:00.530
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(5874)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u03/oracle/app/11.2.0/grid/log/crm01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2013-05-23 16:32:00.624
[ohasd(5846)]CRS-2765:Resource 'ora.asm' has failed on server 'crm01'.
2013-05-23 16:32:00.647
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(5874)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u03/oracle/app/11.2.0/grid/log/crm01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2013-05-23 16:32:00.711
[crsd(20117)]CRS-2765:Resource 'ora.asm' has failed on server 'crm01'.
2013-05-23 16:32:00.769
[crsd(20117)]CRS-2765:Resource 'ora.CRM1GIDG.dg' has failed on server 'crm01'.
2013-05-23 16:32:23.038
[cssd(5898)]CRS-1610:Network communication with node crm02 (2) missing for 90% of timeout interval. Removal of this node from cluster in 59.986 seconds
2013-05-23 16:33:23.028
[cssd(5898)]CRS-1607:Node crm02 is being evicted in cluster incarnation 261665312; details at (:CSSNM00007:) in /u03/oracle/app/11.2.0/grid/log/crm01/cssd/ocssd.log.
2013-05-23 16:34:01.325
[cssd(5898)]CRS-1625:Node crm02, number 2, was manually shut down
2013-05-23 16:34:01.958
[cssd(5898)]CRS-1601:CSSD Reconfiguration complete. Active nodes are crm01 .
2013-05-23 16:34:01.965
[crsd(20117)]CRS-5504:Node down event reported for node 'crm02'.
2013-05-23 16:34:01.991
[crsd(20117)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u03/oracle/app/11.2.0/grid/log/crm01/crsd/crsd.log.
2013-05-23 16:34:01.991
[crsd(20117)]CRS-1006:The OCR location is inaccessible. Details in /u03/oracle/app/11.2.0/grid/log/crm01/crsd/crsd.log.
2013-05-23 16:34:03.345
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(20148)]CRS-5822:Agent '/u03/oracle/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:3:14} in /u03/oracle/app/11.2.0/grid/log/crm01/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 16:34:03.346
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(20143)]CRS-5822:Agent '/u03/oracle/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:2:38} in /u03/oracle/app/11.2.0/grid/log/crm01/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2013-05-23 16:34:03.347
[/u03/oracle/app/11.2.0/grid/bin/orarootagent.bin(20138)]CRS-5822:Agent '/u03/oracle/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:1:12} in /u03/oracle/app/11.2.0/grid/log/crm01/agent/crsd/orarootagent_root/orarootagent_root.log.
2013-05-23 16:34:03.347
[/u03/oracle/app/11.2.0/grid/bin/scriptagent.bin(20152)]CRS-5822:Agent '/u03/oracle/app/11.2.0/grid/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:4:4} in /u03/oracle/app/11.2.0/grid/log/crm01/agent/crsd/scriptagent_grid/scriptagent_grid.log.
2013-05-23 16:34:03.360
[ohasd(5846)]CRS-2765:Resource 'ora.crsd' has failed on server 'crm01'.
2013-05-23 16:34:19.853
[crsd(21467)]CRS-1012:The OCR service started on node crm01.
2013-05-23 16:34:20.344
[crsd(21467)]CRS-1201:CRSD started on node crm01.
节点2 alert日志:
2013-05-23 16:28:20.497
[cssd(5537)]CRS-1612:Network communication with node crm01 (1) missing for 50% of timeout interval. Removal of this node from cluster in 299.695 seconds
2013-05-23 16:28:50.833
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(5768)]CRS-5011:Check of resource "oracrm" failed: details at "(:CLSN00007:)" in "/u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2013-05-23 16:30:50.877
[cssd(5537)]CRS-1611:Network communication with node crm01 (1) missing for 75% of timeout interval. Removal of this node from cluster in 149.314 seconds
2013-05-23 16:31:38.210
[/u03/oracle/app/11.2.0/grid/bin/orarootagent.bin(5543)]CRS-5818:Aborted command 'check' for resource 'ora.crsd'. Details at (:CRSAGF00113:) {0:0:47160} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/ohasd/orarootagent_root/orarootagent_root.log.
2013-05-23 16:32:20.487
[cssd(5537)]CRS-1610:Network communication with node crm01 (1) missing for 90% of timeout interval. Removal of this node from cluster in 59.704 seconds
2013-05-23 16:33:20.198
[cssd(5537)]CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log.
2013-05-23 16:33:20.199
[cssd(5537)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log
2013-05-23 16:33:20.284
[cssd(5537)]CRS-1652:Starting clean up of CRSD resources.
2013-05-23 16:33:21.817
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(5670)]CRS-5016:Process "/u03/oracle/app/11.2.0/grid/opmn/bin/onsctli" spawned by agent "/u03/oracle/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log"
2013-05-23 16:33:22.347
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(5670)]CRS-5016:Process "/u03/oracle/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/u03/oracle/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log"
2013-05-23 16:33:23.266
[cssd(5537)]CRS-1608:This node was evicted by node 1, crm01; details at (:CSSNM00005:) in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log.
2013-05-23 16:33:23.267
[cssd(5537)]CRS-1608:This node was evicted by node 1, crm01; details at (:CSSNM00005:) in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log.
2013-05-23 16:33:23.268
[cssd(5537)]CRS-1608:This node was evicted by node 1, crm01; details at (:CSSNM00005:) in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log.
2013-05-23 16:33:45.550
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3916)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u03/oracle/app/11.2.0/grid/log/crm02/agent/ohasd/oraagent_grid/oraagent_grid.log"
2013-05-23 16:33:45.589
[ohasd(27763)]CRS-2765:Resource 'ora.asm' has failed on server 'crm02'.
2013-05-23 16:33:45.609
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3916)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u03/oracle/app/11.2.0/grid/log/crm02/agent/ohasd/oraagent_grid/oraagent_grid.log"
2013-05-23 16:34:00.857
[cssd(5537)]CRS-1654:Clean up of CRSD resources finished successfully.
2013-05-23 16:34:00.858
[cssd(5537)]CRS-1655:CSSD on node crm02 detected a problem and started to shutdown.
2013-05-23 16:34:00.866
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(5670)]CRS-5822:Agent '/u03/oracle/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:9} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 16:34:00.874
[/u03/oracle/app/11.2.0/grid/bin/orarootagent.bin(5680)]CRS-5822:Agent '/u03/oracle/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:3:8} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/orarootagent_root/orarootagent_root.log.
2013-05-23 16:34:00.927
[ohasd(27763)]CRS-2765:Resource 'ora.crsd' has failed on server 'crm02'.
2013-05-23 16:34:00.976
[cssd(5537)]CRS-1660:The CSS daemon shutdown has completed
2013-05-23 16:34:01.011
[ohasd(27763)]CRS-2765:Resource 'ora.ctssd' has failed on server 'crm02'.
2013-05-23 16:34:01.942
[ohasd(27763)]CRS-2765:Resource 'ora.evmd' has failed on server 'crm02'.
2013-05-23 16:34:02.117
[ctssd(7243)]CRS-2402:The Cluster Time Synchronization Service aborted on host crm02. Details at (:ctss_css_init1:) in /u03/oracle/app/11.2.0/grid/log/crm02/ctssd/octssd.log.
2013-05-23 16:34:06.246
[ctssd(7251)]CRS-2402:The Cluster Time Synchronization Service aborted on host crm02. Details at (:ctss_css_init1:) in /u03/oracle/app/11.2.0/grid/log/crm02/ctssd/octssd.log.
2013-05-23 16:34:10.377
[ctssd(7255)]CRS-2402:The Cluster Time Synchronization Service aborted on host crm02. Details at (:ctss_css_init1:) in /u03/oracle/app/11.2.0/grid/log/crm02/ctssd/octssd.log.
2013-05-23 16:34:16.822
[ohasd(27763)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'crm02'.
2013-05-23 16:34:16.848
[ohasd(27763)]CRS-2765:Resource 'ora.cluster_interconnect.haip' has failed on server 'crm02'.
2013-05-23 16:34:17.343
[ohasd(27763)]CRS-2765:Resource 'ora.cssd' has failed on server 'crm02'.
2013-05-23 16:34:20.627
[cssd(7274)]CRS-1713:CSSD daemon is started in clustered mode
2013-05-23 16:34:37.020
[cssd(7274)]CRS-1707:Lease acquisition for node crm02 number 2 completed
2013-05-23 16:34:49.438
[cssd(7274)]CRS-1605:CSSD voting file is online: /asmdisks/asmdisk1; details in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log.
2013-05-23 16:34:49.472
[cssd(7274)]CRS-1605:CSSD voting file is online: /asmdisks/asmdisk2; details in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log.
2013-05-23 16:34:49.504
[cssd(7274)]CRS-1605:CSSD voting file is online: /asmdisks/asmdisk3; details in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log.
节点1 ocssd日志:
2013-05-23 16:28:23.030: [ CSSD][37]clssnmPollingThread: node crm02 (2) at 50% heartbeat fatal, removal in 299.997 seconds
2013-05-23 16:28:23.030: [ CSSD][37]clssnmPollingThread: node crm02 (2) is impending reconfig, flag 2294796, misstime 300003
2013-05-23 16:28:23.030: [ CSSD][30]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177587, LATS 3561217541, lastSeqNo 7172837, uniqueness 1369296047, timestamp 1369297702/3559378581
2013-05-23 16:28:23.051: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3561217562/1369297703
2013-05-23 16:28:23.051: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3561217562/1369297703
2013-05-23 16:28:23.051: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3561217562/1369297703
2013-05-23 16:28:23.051: [ CSSD][27]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177589, LATS 3561217562, lastSeqNo 7172818, uniqueness 1369296047, timestamp 1369297703/3559379050
2013-05-23 16:28:23.140: [ CSSD][38]clssnmSendingThread: sending status msg to all nodes
2013-05-23 16:28:23.140: [ CSSD][38]clssnmSendingThread: sent 4 status msgs to all nodes
2013-05-23 16:28:23.559: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3561218070/1369297703
2013-05-23 16:28:23.559: [ CSSD][27]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177592, LATS 3561218070, lastSeqNo 7177589, uniqueness 1369296047, timestamp 1369297703/3559379560
2013-05-23 16:28:23.559: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3561218070/1369297703
2013-05-23 16:28:23.560: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3561218070/1369297703
2013-05-23 16:28:24.059: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3561218570/1369297704
2013-05-23 16:28:24.060: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3561218570/1369297704
2013-05-23 16:28:24.060: [ CSSD][33]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177594, LATS 3561218570, lastSeqNo 7172744, uniqueness 1369296047, timestamp 1369297704/3559380040
2013-05-23 16:28:24.060: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3561218570/1369297704
2013-05-23 16:28:24.569: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3561219080/1369297704
2013-05-23 16:28:24.569: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3561219080/1369297704
2013-05-23 16:28:24.570: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3561219080/1369297704
2013-05-23 16:28:24.599: [ CSSD][30]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177596, LATS 3561219110, lastSeqNo 7177587, uniqueness 1369296047, timestamp 1369297704/3559380121
2013-05-23 16:28:24.599: [ CSSD][27]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177598, LATS 3561219110, lastSeqNo 7177592, uniqueness 1369296047, timestamp 1369297704/3559380550
2013-05-23 16:28:25.079: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3561219590/1369297705
2013-05-23 16:28:25.079: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3561219590/1369297705
2013-05-23 16:28:25.080: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3561219590/1369297705
2013-05-23 16:28:25.100: [ CSSD][33]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177600, LATS 3561219610, lastSeqNo 7177594, uniqueness 1369296047, timestamp 1369297704/3559380990
2013-05-23 16:28:25.569: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3561220080/1369297705
2013-05-23 16:28:25.569: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3561220080/1369297705
2013-05-23 16:28:25.570: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3561220080/1369297705
2013-05-23 16:28:25.619: [ CSSD][30]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177602, LATS 3561220130, lastSeqNo 7177596, uniqueness 1369296047, timestamp 1369297705/3559381151
2013-05-23 16:28:25.619: [ CSSD][27]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177604, LATS 3561220130, lastSeqNo 7177598, uniqueness 1369296047, timestamp 1369297705/3559381500
2013-05-23 16:28:26.079: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3561220590/1369297706
2013-05-23 16:28:26.079: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3561220590/1369297706
2013-05-23 16:28:26.080: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3561220590/1369297706
2013-05-23 16:28:26.120: [ CSSD][33]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177607, LATS 3561220630, lastSeqNo 7177600, uniqueness 1369296047, timestamp 1369297705/3559381860
2013-05-23 16:28:26.549: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3561221060/1369297706
2013-05-23 16:28:26.549: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3561221060/1369297706
2013-05-23 16:28:26.550: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3561221060/1369297706
2013-05-23 16:28:26.649: [ CSSD][30]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177608, LATS 3561221160, lastSeqNo 7177602, uniqueness 1369296047, timestamp 1369297706/3559382180
2013-05-23 16:28:26.649: [ CSSD][27]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177610, LATS 3561221160, lastSeqNo 7177604, uniqueness 1369296047, timestamp 1369297706/3559382370
2013-05-23 16:28:27.059: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3561221570/1369297707
2013-05-23 16:28:27.059: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3561221570/1369297707
2013-05-23 16:28:27.060: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3561221570/1369297707
2013-05-23 16:28:27.150: [ CSSD][33]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177612, LATS 3561221660, lastSeqNo 7177607, uniqueness 1369296047, timestamp 1369297706/3559382701
2013-05-23 16:28:27.171: [ CSSD][38]clssnmSendingThread: sending status msg to all nodes
2013-05-23 16:28:27.171: [ CSSD][38]clssnmSendingThread: sent 4 status msgs to all nodes
2013-05-23 16:28:27.479: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3561221990/1369297707
2013-05-23 16:28:27.479: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3561221990/1369297707
2013-05-23 16:28:27.480: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3561221990/1369297707
2013-05-23 16:28:27.689: [ CSSD][30]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177614, LATS 3561222200, lastSeqNo 7177608, uniqueness 1369296047, timestamp 1369297707/3559383190
2013-05-23 16:28:27.689: [ CSSD][27]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177616, LATS 3561222200, lastSeqNo 7177610, uniqueness 1369296047, timestamp 1369297707/3559383210
2013-05-23 16:28:27.989: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3561222500/1369297707
2013-05-23 16:28:27.989: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3561222500/1369297707
2013-05-23 16:28:27.990: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3561222500/1369297707
2013-05-23 16:28:28.189: [ CSSD][33]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177619, LATS 3561222700, lastSeqNo 7177612, uniqueness 1369296047, timestamp 1369297707/3559383710
2013-05-23 16:28:28.201: [ CSSD][30]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177620, LATS 3561222712, lastSeqNo 7177614, uniqueness 1369296047, timestamp 1369297708/3559384200
2013-05-23 16:28:28.300: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3561222810/1369297708
2013-05-23 16:28:28.309: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3561222820/1369297708
2013-05-23 16:28:28.309: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3561222820/1369297708
2013-05-23 16:28:28.709: [ CSSD][27]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177622, LATS 3561223220, lastSeqNo 7177616, uniqueness 1369296047, timestamp 1369297708/3559384220
2013-05-23 16:28:28.709: [ CSSD][30]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177623, LATS 3561223220, lastSeqNo 7177620, uniqueness 1369296047, timestamp 1369297708/3559384710
2013-05-23 16:28:28.810: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3561223320/1369297708
2013-05-23 16:28:28.819: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3561223330/1369297708
2013-05-23 16:28:28.819: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3561223330/1369297708
2013-05-23 16:28:29.249: [ CSSD][33]clssnmvDHBValidateNcopy: node 2, crm02, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7177628, LATS 3561223760, lastSeqNo 7177619, uniqueness 1369296047, timestamp 1369297709/3559385230
节点2 ocssd日志:
2013-05-23 16:28:20.498: [ CSSD][37]clssnmPollingThread: node crm01 (1) at 50% heartbeat fatal, removal in 299.695 seconds
2013-05-23 16:28:20.498: [ CSSD][37]clssnmPollingThread: node crm01 (1) is impending reconfig, flag 2491406, misstime 300305
2013-05-23 16:28:20.499: [ CSSD][27]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7195497, LATS 3559376501, lastSeqNo 7190683, uniqueness 1366965579, timestamp 1369297700/3561214780
2013-05-23 16:28:20.499: [ CSSD][33]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7195499, LATS 3559376501, lastSeqNo 7190930, uniqueness 1366965579, timestamp 1369297700/3561214800
2013-05-23 16:28:20.518: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3559376521/1369297700
2013-05-23 16:28:20.518: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3559376521/1369297700
2013-05-23 16:28:20.519: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3559376521/1369297700
2013-05-23 16:28:21.027: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3559377030/1369297701
2013-05-23 16:28:21.027: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3559377030/1369297701
2013-05-23 16:28:21.028: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3559377031/1369297701
2013-05-23 16:28:21.528: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3559377530/1369297701
2013-05-23 16:28:21.528: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3559377530/1369297701
2013-05-23 16:28:21.528: [ CSSD][27]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7195500, LATS 3559377530, lastSeqNo 7195497, uniqueness 1366965579, timestamp 1369297701/3561215790
2013-05-23 16:28:21.528: [ CSSD][33]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7195502, LATS 3559377530, lastSeqNo 7195499, uniqueness 1366965579, timestamp 1369297701/3561215810
2013-05-23 16:28:21.548: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3559377551/1369297701
2013-05-23 16:28:22.037: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3559378040/1369297702
2013-05-23 16:28:22.037: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3559378040/1369297702
2013-05-23 16:28:22.068: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3559378071/1369297702
2013-05-23 16:28:22.547: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3559378550/1369297702
2013-05-23 16:28:22.548: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3559378550/1369297702
2013-05-23 16:28:22.558: [ CSSD][30]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7195504, LATS 3559378560, lastSeqNo 7190931, uniqueness 1366965579, timestamp 1369297702/3561216800
2013-05-23 16:28:22.558: [ CSSD][33]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7195505, LATS 3559378560, lastSeqNo 7195502, uniqueness 1366965579, timestamp 1369297702/3561216820
2013-05-23 16:28:22.578: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3559378581/1369297702
2013-05-23 16:28:23.047: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3559379050/1369297703
2013-05-23 16:28:23.047: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3559379050/1369297703
2013-05-23 16:28:23.087: [ CSSD][30]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7195508, LATS 3559379090, lastSeqNo 7195504, uniqueness 1366965579, timestamp 1369297703/3561217562
2013-05-23 16:28:23.088: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3559379091/1369297703
2013-05-23 16:28:23.227: [ CSSD][38]clssnmSendingThread: sending status msg to all nodes
2013-05-23 16:28:23.227: [ CSSD][38]clssnmSendingThread: sent 4 status msgs to all nodes
2013-05-23 16:28:23.557: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3559379560/1369297703
2013-05-23 16:28:23.557: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3559379560/1369297703
2013-05-23 16:28:23.588: [ CSSD][33]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7195510, LATS 3559379590, lastSeqNo 7195505, uniqueness 1366965579, timestamp 1369297703/3561218070
2013-05-23 16:28:23.608: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3559379611/1369297703
2013-05-23 16:28:23.968: [GIPCHALO][7] gipchaLowerProcessNode: no valid interfaces found to node for 300939 ms, node 60000000019c838
0 { host 'crm01', haName 'CSS_crm-cluster', srcLuid 9fbcd492-a378553f, dstLuid 2d2ace2c-3b06ccaf numInf 0, contigSeq 2185, las
tAck 2198, lastValidAck 2184, sendSeq [2199 : 2942], createTime 3557752586, sentRegister 1, localMonitor 1, flags 0x2408 }
2013-05-23 16:28:24.037: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3559380040/1369297704
2013-05-23 16:28:24.037: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3559380040/1369297704
2013-05-23 16:28:24.117: [ CSSD][30]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7195513, LATS 3559380120, lastSeqNo 7195508, uniqueness 1366965579, timestamp 1369297704/3561218570
2013-05-23 16:28:24.118: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3559380121/1369297704
2013-05-23 16:28:24.547: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3559380550/1369297704
2013-05-23 16:28:24.547: [ CSSD][25]clssnmvDiskPing: Writing with status 0x3, timestamp 3559380550/1369297704
2013-05-23 16:28:24.618: [ CSSD][33]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665312, wrtcnt, 7195516, LATS 3559380620, lastSeqNo 7195510, uniqueness 1366965579, timestamp 1369297704/3561219080
2013-05-23 16:28:24.638: [ CSSD][28]clssnmvDiskPing: Writing with status 0x3, timestamp 3559380641/1369297704
2013-05-23 16:28:24.987: [ CSSD][31]clssnmvDiskPing: Writing with status 0x3, timestamp 3559380990/1369297704
节点1 crsd日志:
节点2 crsd日志:
2013-05-23 16:28:50.842: [ AGFW][41] {0:5:61} Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:4473
2013-05-23 16:28:50.842: [ AGFW][41] {0:5:61} Received state change for ora.oracrm.db 2 1 [old state = ONLINE, new state = FAILED]
2013-05-23 16:28:50.842: [ AGFW][41] {0:5:61} Received state LABEL change for ora.oracrm.db 2 1 [old label = Open, new label = Ab
normal Termination]
2013-05-23 16:28:50.843: [ AGFW][41] {0:5:61} Agfw Proxy Server sending message to PE, Contents = [MIDTo:2|OpID:3|FromA:{Invalid|
Node:0|Process:0|Type:0}|ToA:{Invalid|Node:-1|Process:-1|Type:-1}|MIDFrom:0|Type:4|Pri2|Id:692:Ver:2]
2013-05-23 16:28:50.843: [ AGFW][41] {0:5:61} Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:4473
2013-05-23 16:28:50.844: [ AGFW][41] {2:15516:2} Agfw Proxy Server received the message: AGENT_SHUTDOWN_REQUEST[Proxy] ID 20486:4
474
2013-05-23 16:28:50.844: [ AGFW][41] {2:15516:2} Shutdown request received from /u03/oracle/app/11.2.0/grid/bin/oraagent_oracle
2013-05-23 16:28:50.844: [ AGFW][41] {2:15516:2} Agfw Proxy Server replying to the message: AGENT_SHUTDOWN_REQUEST[Proxy] ID 2048
6:4474
2013-05-23 16:28:50.846: [GIPCHALO][10] gipchaLowerProcessNode: no valid interfaces found to node for 437338 ms, node 60000000018bb9
80 { host 'crm01', haName '4e8b-e8b7-afb4-d585', srcLuid 6f4332f6-9d3c9bbe, dstLuid 0ff67092-226bdc1f numInf 1, contigSeq 204, la
stAck 152, lastValidAck 203, sendSeq [153 : 153], createTime 3557774841, sentRegister 1, localMonitor 0, flags 0x8 }
2013-05-23 16:28:50.956: [ CRSCOMM][38][FFAIL] Ipc: Couldnt clscreceive message, no message: 11
2013-05-23 16:28:50.956: [ CRSCOMM][38] Ipc: Client disconnected.
2013-05-23 16:28:50.956: [ CRSCOMM][38][FFAIL] IpcL: Listener got clsc error 11 for memNum. 5
2013-05-23 16:28:50.956: [ CRSCOMM][38] IpcL: connection to member 5 has been removed
2013-05-23 16:28:50.956: [CLSFRAME][38] Removing IPC Member:{Relative|Node:0|Process:5|Type:3}
2013-05-23 16:28:50.956: [CLSFRAME][38] Disconnected from AGENT process: {Relative|Node:0|Process:5|Type:3}
2013-05-23 16:28:50.962: [ AGFW][41] {2:15516:122} Agfw Proxy Server received process disconnected notification, count=1
2013-05-23 16:28:50.962: [ AGFW][41] {2:15516:122} /u03/oracle/app/11.2.0/grid/bin/oraagent_oracle disconnected.
2013-05-23 16:28:50.963: [ AGFW][41] {2:15516:122} Agent /u03/oracle/app/11.2.0/grid/bin/oraagent_oracle[5768] stopped!
2013-05-23 16:28:50.963: [ CRSCOMM][41] {2:15516:122} IpcL: removeConnection: Member 5 does not exist.
2013-05-23 16:28:55.858: [GIPCHALO][10] gipchaLowerProcessNode: no valid interfaces found to node for 442350 ms, node 60000000018bb9
80 { host 'crm01', haName '4e8b-e8b7-afb4-d585', srcLuid 6f4332f6-9d3c9bbe, dstLuid 0ff67092-226bdc1f numInf 1, contigSeq 204, la
stAck 152, lastValidAck 203, sendSeq [153 : 153], createTime 3557774841, sentRegister 1, localMonitor 0, flags 0x8 }
2013-05-23 16:29:00.878: [GIPCHALO][10] gipchaLowerProcessNode: no valid interfaces found to node for 447370 ms, node 60000000018bb9
80 { host 'crm01', haName '4e8b-e8b7-afb4-d585', srcLuid 6f4332f6-9d3c9bbe, dstLuid 0ff67092-226bdc1f numInf 1, contigSeq 204, la
stAck 152, lastValidAck 203, sendSeq [153 : 153], createTime 3557774841, sentRegister 1, localMonitor 0, flags 0x8 }
2013-05-23 16:29:05.898: [GIPCHALO][10] gipchaLowerProcessNode: no valid interfaces found to node for 452390 ms, node 60000000018bb9
80 { host 'crm01', haName '4e8b-e8b7-afb4-d585', srcLuid 6f4332f6-9d3c9bbe, dstLuid 0ff67092-226bdc1f numInf 1, contigSeq 204, la
stAck 152, lastValidAck 203, sendSeq [153 : 153], createTime 3557774841, sentRegister 1, localMonitor 0, flags 0x8 }
2013-05-23 16:29:10.918: [GIPCHALO][10] gipchaLowerProcessNode: no valid interfaces found to node for 457410 ms, node 60000000018bb9
80 { host 'crm01', haName '4e8b-e8b7-afb4-d585', srcLuid 6f4332f6-9d3c9bbe, dstLuid 0ff67092-226bdc1f numInf 1, contigSeq 204, la
stAck 152, lastValidAck 203, sendSeq [153 : 153], createTime 3557774841, sentRegister 1, localMonitor 0, flags 0x8 }
节点1 GI状态:
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t
/u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE crm01 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm01
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE ONLINE crm01
ora.cssdmonitor
1 ONLINE ONLINE crm01
ora.ctssd
1 ONLINE ONLINE crm01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm01
ora.gipcd
1 ONLINE ONLINE crm01
ora.gpnpd
1 ONLINE ONLINE crm01
ora.mdnsd
1 ONLINE ONLINE crm01
节点2 GI状态:
$ /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
$ /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE
ora.cssdmonitor
1 ONLINE OFFLINE
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE crm02
ora.gpnpd
1 ONLINE ONLINE crm02
ora.mdnsd
1 ONLINE ONLINE crm02
(3).恢复节点2的2个私网网卡(UP)
手动将内网连接恢复后,ohasd 进程将会将集群恢复并且重启数据库进程。
网卡于 16:47:56 启动
May 23 16:47:56 crm02 cmnetd[7016]: lan901 is up at the data link layer.
May 23 16:47:56 crm02 cmnetd[7016]: lan901 recovered.
May 23 16:47:56 crm02 cmnetd[7016]: Subnet 192.168.1.128 up
May 23 16:49:04 crm02 cmviputil[7921]: Issuing cmmodnet to add 192.168.1.104 address in 192.168.1.96
May 23 16:49:04 crm02 cmviputil[7921]:
May 23 16:49:04 crm02 syslog: /usr/sbin/cmmodnet -a -i 192.168.1.104 192.168.1.96
节点1: alert`hostname`.log
节点2: alert`hostname`.log
2013-05-23 16:47:57.629
[ctssd(7686)]CRS-2403:The Cluster Time Synchronization Service on host crm02 is in observer mode.
2013-05-23 16:47:58.015
[ctssd(7686)]CRS-2407:The new Cluster Time Synchronization Service reference node is host crm01.
2013-05-23 16:47:58.018
[ctssd(7686)]CRS-2401:The Cluster Time Synchronization Service started on host crm02.
2013-05-23 16:47:59.620
[ohasd(27763)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2013-05-23 16:48:15.408
[crsd(7782)]CRS-1012:The OCR service started on node crm02.
2013-05-23 16:48:15.441
[evmd(7695)]CRS-1401:EVMD started on node crm02.
2013-05-23 16:48:18.033
[crsd(7782)]CRS-1201:CRSD started on node crm02.
2013-05-23 16:48:19.714
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(7838)]CRS-5011:Check of resource "oracrm" failed: details at "(:CLSN00007:)" in "/u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2013-05-23 16:48:20.043
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(7833)]CRS-5016:Process "/u03/oracle/app/11.2.0/grid/opmn/bin/onsctli" spawned by agent "/u03/oracle/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log"
2013-05-23 16:48:22.991
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(7833)]CRS-5016:Process "/u03/oracle/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/u03/oracle/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log"
恢复后状态如下:
节点1 GI 状态:
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRM1GIDG.dg
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.LISTENER.lsnr
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.asm
ONLINE ONLINE crm01 Started
ONLINE ONLINE crm02 Started
ora.gsd
OFFLINE OFFLINE crm01
OFFLINE OFFLINE crm02
ora.net1.network
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.ons
ONLINE ONLINE crm01
ONLINE ONLINE crm02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE crm01
ora.cvu
1 ONLINE ONLINE crm01
ora.oc4j
1 OFFLINE OFFLINE
ora.scan1.vip
1 ONLINE ONLINE crm01
ora.oracrm.db
1 ONLINE ONLINE crm01 Open
2 ONLINE ONLINE crm02 Open
ora.crm01.vip
1 ONLINE ONLINE crm01
ora.crm02.vip
1 ONLINE ONLINE crm02
crm01:/#
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE crm01 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm01
ora.crsd
1 ONLINE ONLINE crm01
ora.cssd
1 ONLINE ONLINE crm01
ora.cssdmonitor
1 ONLINE ONLINE crm01
ora.ctssd
1 ONLINE ONLINE crm01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm01
ora.gipcd
1 ONLINE ONLINE crm01
ora.gpnpd
1 ONLINE ONLINE crm01
ora.mdnsd
1 ONLINE ONLINE crm01
节点2 GI 状态:
crm02:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRM1GIDG.dg
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.LISTENER.lsnr
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.asm
ONLINE ONLINE crm01 Started
ONLINE ONLINE crm02 Started
ora.gsd
OFFLINE OFFLINE crm01
OFFLINE OFFLINE crm02
ora.net1.network
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.ons
ONLINE ONLINE crm01
ONLINE ONLINE crm02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE crm01
ora.cvu
1 ONLINE ONLINE crm01
ora.oc4j
1 OFFLINE OFFLINE
ora.scan1.vip
1 ONLINE ONLINE crm01
ora.oracrm.db
1 ONLINE ONLINE crm01 Open
2 ONLINE ONLINE crm02 Open
ora.crm01.vip
1 ONLINE ONLINE crm01
ora.crm02.vip
1 ONLINE ONLINE crm02
crm02:/#
crm02:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE crm02 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm02
ora.crsd
1 ONLINE ONLINE crm02
ora.cssd
1 ONLINE ONLINE crm02
ora.cssdmonitor
1 ONLINE ONLINE crm02
ora.ctssd
1 ONLINE ONLINE crm02 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm02
ora.gipcd
1 ONLINE ONLINE crm02
ora.gpnpd
1 ONLINE ONLINE crm02
ora.mdnsd
1 ONLINE ONLINE crm02
宕节点1私有网卡:
(1).宕节点1 的1个私网网卡(DOWN)
14:25:13 节点1的私有网卡down
May 23 14:25:13 crm01 cmnetd[4301]: Link level address on network interface lan901 has been changed from 0x00237d3e3394 to 0x0023
7d3e32dc.
May 23 14:25:38 crm01 sshd[16413]: SSH: Server;Ltype: Version;Remote: 10.70.213.133-61140;Protocol: 2.0;Client: OpenSSH_5.6p1+sft
pfilecontrol-v1.3-hpn13v7
May 23 14:25:38 crm01 sshd[16413]: Accepted publickey for root from 10.70.213.133 port 61140 ssh2
GI状态无影响。
(2).宕节点1 的2个私网网卡(DOWN)
14:32:01 节点1的私有网卡down
May 23 14:32:01 crm01 cmnetd[4301]: lan901 is down at the data link layer.
May 23 14:32:01 crm01 cmnetd[4301]: lan901 failed.
May 23 14:32:01 crm01 cmnetd[4301]: Subnet 192.168.1.128 down
节点1 gi alertlog
2013-05-23 14:36:12.020
[cssd(5898)]CRS-1663:Member kill issued by PID 20001 for 1 members, group DBoraCRM. Details at (:CSSGM00044:) in /u03/oracle/app/11.2.0/grid/log/crm01/cssd/ocssd.log.
2013-05-23 14:37:00.408
[cssd(5898)]CRS-1612:Network communication with node crm02 (2) missing for 50% of timeout interval. Removal of this node from cluster in 299.856 seconds
2013-05-23 14:39:30.786
[cssd(5898)]CRS-1611:Network communication with node crm02 (2) missing for 75% of timeout interval. Removal of this node from cluster in 149.476 seconds
2013-05-23 14:40:20.581
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(5874)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u03/oracle/app/11.2.0/grid/log/crm01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2013-05-23 14:40:20.680
[ohasd(5846)]CRS-2765:Resource 'ora.asm' has failed on server 'crm01'.
2013-05-23 14:40:20.708
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(5874)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u03/oracle/app/11.2.0/grid/log/crm01/agent/ohasd/oraagent_grid/oraagent_grid.log"
2013-05-23 14:40:20.758
[crsd(5996)]CRS-2765:Resource 'ora.asm' has failed on server 'crm01'.
2013-05-23 14:40:20.830
[crsd(5996)]CRS-2765:Resource 'ora.CRM1GIDG.dg' has failed on server 'crm01'.
2013-05-23 14:41:00.396
[cssd(5898)]CRS-1610:Network communication with node crm02 (2) missing for 90% of timeout interval. Removal of this node from cluster in 59.865 seconds
2013-05-23 14:42:00.267
[cssd(5898)]CRS-1607:Node crm02 is being evicted in cluster incarnation 261665310; details at (:CSSNM00007:) in /u03/oracle/app/11.2.0/grid/log/crm01/cssd/ocssd.log.
2013-05-23 14:42:41.064
[cssd(5898)]CRS-1625:Node crm02, number 2, was manually shut down
2013-05-23 14:42:41.348
[cssd(5898)]CRS-1601:CSSD Reconfiguration complete. Active nodes are crm01 .
2013-05-23 14:42:41.359
[crsd(5996)]CRS-5504:Node down event reported for node 'crm02'.
2013-05-23 14:42:41.459
[crsd(5996)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u03/oracle/app/11.2.0/grid/log/crm01/crsd/crsd.log.
2013-05-23 14:42:41.460
[crsd(5996)]CRS-1006:The OCR location is inaccessible. Details in /u03/oracle/app/11.2.0/grid/log/crm01/crsd/crsd.log.
2013-05-23 14:42:42.043
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(6047)]CRS-5822:Agent '/u03/oracle/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:29} in /u03/oracle/app/11.2.0/grid/log/crm01/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:42:42.045
[/u03/oracle/app/11.2.0/grid/bin/orarootagent.bin(6056)]CRS-5822:Agent '/u03/oracle/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:2:16234} in /u03/oracle/app/11.2.0/grid/log/crm01/agent/crsd/orarootagent_root/orarootagent_root.log.
2013-05-23 14:42:42.046
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(20085)]CRS-5822:Agent '/u03/oracle/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:16:3170} in /u03/oracle/app/11.2.0/grid/log/crm01/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2013-05-23 14:42:42.055
[ohasd(5846)]CRS-2765:Resource 'ora.crsd' has failed on server 'crm01'.
2013-05-23 14:42:42.061
[/u03/oracle/app/11.2.0/grid/bin/scriptagent.bin(17356)]CRS-5822:Agent '/u03/oracle/app/11.2.0/grid/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:17:3} in /u03/oracle/app/11.2.0/grid/log/crm01/agent/crsd/scriptagent_grid/scriptagent_grid.log.
1号节点asm 实例会重新启动,CRSD资源会offline.
## 实例1 ASM实例日志:
Thu May 23 14:32:30 2013
SKGXP: ospid 5958: network interface with IP address 169.254.124.177 is DOWN
Thu May 23 14:37:21 2013
IPC Send timeout detected. Sender: ospid 5974 [oracle@crm01 (RBAL)]
Receiver: inst 2 binc 122731746 ospid 27864
IPC Send timeout to 2.0 inc 4 for msg type 16 from opid 18
Thu May 23 14:37:25 2013
Communications reconfiguration: instance_number 2
Thu May 23 14:37:25 2013
IPC Send timeout detected. Sender: ospid 5952 [oracle@crm01 (PING)]
Receiver: inst 2 binc 122731727 ospid 27858
Thu May 23 14:40:11 2013
IPC Send timeout detected. Sender: ospid 5958 [oracle@crm01 (LMD0)]
Receiver: inst 2 binc 122731746 ospid 27864
Thu May 23 14:40:13 2013
LMON (ospid: 5956) waits for event 'CSS operation: data update' for 168 secs.
IPC Send timeout to 2.0 inc 4 for msg type 72 from opid 10
Errors in file /u03/oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_lmhb_5962.trc (incident=124897):
ORA-29770: global enqueue process LMON (OSID 5956) is hung for more than 150 seconds
Incident details in: /u03/oracle/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_124897/+ASM1_lmhb_5962_i124897.trc
Thu May 23 14:40:19 2013
Sweep [inc][124897]: completed
Sweep [inc2][124897]: completed
ERROR: Some process(s) is not making progress.
LMHB (ospid: 5962) is terminating the instance.
Please check LMHB trace file for more details.
Please also check the CPU load, I/O load and other system properties for anomalous behavior
ERROR: Some process(s) is not making progress.
LMHB (ospid: 5962): terminating the instance due to error 29770
Thu May 23 14:40:20 2013
ORA-1092 : opitsk aborting process
Thu May 23 14:40:20 2013
License high water mark = 8
Thu May 23 14:40:23 2013
System state dump requested by (instance=1, osid=5962 (LMHB)), summary=[abnormal instance termination].
System State dumped to trace file /u03/oracle/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_diag_5950.trc
Dumping diagnostic data in directory=[cdmp_20130523144020], requested by (instance=1, osid=5962 (LMHB)), summary=[abnormal instance termination].
Thu May 23 14:40:24 2013
Instance terminated by LMHB, pid = 5962
USER (ospid: 17152): terminating the instance
Instance terminated by USER, pid = 17152
Thu May 23 14:42:42 2013
NOTE: No asm libraries found in the system
* instance_number obtained from CSS = 1, checking for the existence of node 0...
* node 0 does not exist. instance_number = 1
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Public Interface 'lan900:1' configured from GPnP for use as a public interface.
[name='lan900:1', type=1, ip=192.168.1.101, mac=00-23-7d-3e-33-03, net=192.168.1.96/27, mask=255.255.255.224, use=public/1]
Public Interface 'lan900' configured from GPnP for use as a public interface.
[name='lan900', type=1, ip=192.168.1.100, mac=00-23-7d-3e-33-03, net=192.168.1.96/27, mask=255.255.255.224, use=public/1]
WARNING: No cluster interconnect has been specified. Depending on
the communication driver configured Oracle cluster traffic
may be directed to the public interface of this machine.
Oracle recommends that RAC clustered databases be configured
with a private interconnect for enhanced security and
performance.
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u03/oracle/app/11.2.0/grid/dbs/arch
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options.
ORACLE_HOME = /u03/oracle/app/11.2.0/grid
System name: HP-UX
Node name: crm01
Release: B.11.31
Version: U
Machine: ia64
Using parameter settings in server-side spfile +CRM1GIDG/crm-cluster/asmparameterfile/registry.253.812983713
System parameters with non-default values:
large_pool_size = 16M
instance_type = "asm"
memory_target = 1536M
memory_max_target = 1536M
remote_login_passwordfile= "EXCLUSIVE"
asm_diskstring = "/asmdisks/asmdisk*"
asm_power_limit = 1
diagnostic_dest = "/u03/oracle/app/oracle"
Cluster communication is configured to use the following interface(s) for this instance
192.168.1.100
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Thu May 23 14:42:44 2013
PMON started with pid=2, OS id=17380
Thu May 23 14:42:44 2013
PSP0 started with pid=3, OS id=17382
Thu May 23 14:42:45 2013
VKTM started with pid=4, OS id=17384 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
Thu May 23 14:42:45 2013
GEN0 started with pid=5, OS id=17386
Thu May 23 14:42:45 2013
DIAG started with pid=6, OS id=17388
Thu May 23 14:42:45 2013
PING started with pid=7, OS id=17390
Thu May 23 14:42:45 2013
DIA0 started with pid=8, OS id=17392
Thu May 23 14:42:45 2013
LMON started with pid=9, OS id=17394
Thu May 23 14:42:45 2013
LMD0 started with pid=10, OS id=17396
* Load Monitor used for high load check
* New Low - High Load Threshold Range = [30720 - 40960]
Thu May 23 14:42:45 2013
LMS0 started with pid=11, OS id=17398 at elevated priority
Thu May 23 14:42:45 2013
LMHB started with pid=12, OS id=17400
Thu May 23 14:42:45 2013
MMAN started with pid=13, OS id=17402
Thu May 23 14:42:45 2013
DBW0 started with pid=14, OS id=17404
Thu May 23 14:42:46 2013
LGWR started with pid=15, OS id=17406
Thu May 23 14:42:46 2013
CKPT started with pid=16, OS id=17408
Thu May 23 14:42:46 2013
SMON started with pid=17, OS id=17410
Thu May 23 14:42:46 2013
RBAL started with pid=18, OS id=17412
Thu May 23 14:42:46 2013
GMON started with pid=19, OS id=17414
Thu May 23 14:42:46 2013
MMON started with pid=20, OS id=17416
Thu May 23 14:42:46 2013
MMNL started with pid=21, OS id=17418
lmon registered with NM - instance number 1 (internal mem no 0)
Reconfiguration started (old inc 0, new inc 2)
ASM instance
List of instances:
1 (myinst: 1)
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
Thu May 23 14:42:47 2013
LCK0 started with pid=22, OS id=17422
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Thu May 23 14:42:49 2013
SQL> ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:5:5} */
NOTE: Diskgroup used for Voting files is:
CRM1GIDG
Diskgroup with spfile:CRM1GIDG
Diskgroup used for OCR is:CRM1GIDG
NOTE: cache registered group CRM1GIDG number=1 incarn=0x28b05028
NOTE: cache began mount (first) of group CRM1GIDG number=1 incarn=0x28b05028
NOTE: Assigning number (1,0) to disk (/asmdisks/asmdisk1)
NOTE: Assigning number (1,1) to disk (/asmdisks/asmdisk2)
NOTE: Assigning number (1,2) to disk (/asmdisks/asmdisk3)
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 3 for pid 24, osid 17442
NOTE: cache opening disk 0 of grp 1: CRM1GIDG_0000 path:/asmdisks/asmdisk1
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 1: CRM1GIDG_0001 path:/asmdisks/asmdisk2
NOTE: F1X0 found on disk 1 au 2 fcn 0.0
NOTE: cache opening disk 2 of grp 1: CRM1GIDG_0002 path:/asmdisks/asmdisk3
NOTE: F1X0 found on disk 2 au 2 fcn 0.0
NOTE: cache mounting (first) normal redundancy group 1/0x28B05028 (CRM1GIDG)
Thu May 23 14:42:56 2013
* allocate domain 1, invalid = TRUE
Thu May 23 14:42:56 2013
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=8.180 group=1 (CRM1GIDG)
NOTE: starting recovery of thread=2 ckpt=9.14 group=1 (CRM1GIDG)
NOTE: advancing ckpt for group 1 (CRM1GIDG) thread=1 ckpt=8.180
NOTE: advancing ckpt for group 1 (CRM1GIDG) thread=2 ckpt=9.14
NOTE: cache recovered group 1 to fcn 0.1435
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Thu May 23 14:42:56 2013
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (CRM1GIDG)
NOTE: LGWR found thread 1 closed at ABA 8.179
NOTE: LGWR mounted thread 1 for diskgroup 1 (CRM1GIDG)
NOTE: LGWR opening thread 1 at fcn 0.1435 ABA 9.180
NOTE: cache mounting group 1/0x28B05028 (CRM1GIDG) succeeded
NOTE: cache ending mount (success) of group CRM1GIDG number=1 incarn=0x28b05028
Thu May 23 14:42:56 2013
GMON querying group 1 at 4 for pid 18, osid 17412
Thu May 23 14:42:56 2013
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup CRM1GIDG was mounted
SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:5:5} */
SQL> ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent *//* {0:5:5} */
SUCCESS: ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent *//* {0:5:5} */
Thu May 23 14:42:58 2013
WARNING: failed to online diskgroup resource ora.CRM1GIDG.dg (unable to communicate with CRSD/OHASD)
NOTE: Attempting voting file refresh on diskgroup CRM1GIDG
NOTE: Voting file relocation is required in diskgroup CRM1GIDG
NOTE: Attempting voting file relocation on diskgroup CRM1GIDG
## 实例1 crsd实例日志:
2013-05-23 14:40:20.671: [UiServer][51] {1:41988:45930} Done for ctx=60000000018e18d0
2013-05-23 14:40:20.737: [UiServer][52] CS(6000000001aab940)set Properties ( grid,60000000019f83f0)
2013-05-23 14:40:20.755: [ AGFW][45] {0:1:23} Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:557982
2013-05-23 14:40:20.755: [ AGFW][45] {0:1:23} Received state change for ora.asm crm01 1 [old state = ONLINE, new state = OFFLI
NE]
2013-05-23 14:40:20.755: [ AGFW][45] {0:1:23} Received state LABEL change for ora.asm crm01 1 [old label = Started, new label
= Abnormal Termination]
2013-05-23 14:40:20.756: [ AGFW][45] {0:1:23} Agfw Proxy Server sending message to PE, Contents = [MIDTo:2|OpID:3|FromA:{Invalid|
Node:0|Process:0|Type:0}|ToA:{Invalid|Node:-1|Process:-1|Type:-1}|MIDFrom:0|Type:4|Pri2|Id:556532:Ver:2]
2013-05-23 14:40:20.756: [ AGFW][45] {0:1:23} Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:557982
2013-05-23 14:40:20.756: [ CRSPE][50] {0:1:23} State change received from crm01 for ora.asm crm01 1
2013-05-23 14:40:20.756: [ AGFW][45] {1:41988:45930} Received the reply to the message: RESOURCE_PROBE[ora.asm crm01 1] ID 409
7:556530 from the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 14:40:20.756: [ AGFW][45] {1:41988:45930} ora.asm crm01 1 received state from probe request. Old state = OFFLINE, N
ew state = OFFLINE
2013-05-23 14:40:20.756: [ CRSPE][50] {0:1:23} Processing PE command id=45872. Description: [Resource State Change (ora.asm crm
01 1) : 6000000002b63630]
2013-05-23 14:40:20.757: [ AGFW][45] {1:41988:45930} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_PROBE[or
a.asm crm01 1] ID 4097:556529
2013-05-23 14:40:20.757: [ CRSPE][50] {0:1:23} RI [ora.asm crm01 1] new external state [OFFLINE] old value: [ONLINE] on crm0
1 label = [Abnormal Termination]
2013-05-23 14:40:20.757: [ CRSPE][50] {0:1:23} Set State Details to [Abnormal Termination] from [ Started] for [ora.asm crm01 1
]
2013-05-23 14:40:20.758: [UiServer][51] {1:41988:45931} Sending message to PE. ctx= 60000000018e1830, Client PID: 6047
2013-05-23 14:40:20.758: [ CRSPE][50] {0:1:23} Resource Resource Instance ID[ora.asm crm01 1]. Values:
STATE=OFFLINE
TARGET=ONLINE
LAST_SERVER=crm01
CURRENT_RCOUNT=0
LAST_RESTART=0
FAILURE_COUNT=0
FAILURE_HISTORY=
STATE_DETAILS=Abnormal Termination
INCARNATION=1
STATE_CHANGE_VERS=0
LAST_FAULT=0
LAST_STATE_CHANGE=1369291220
INTERNAL_STATE=0
DEGREE_ID=1
ID=ora.asm crm01 1
Lock Info:
Write Locks:none
ReadLocks:|STATE INITED||INITIAL CHECK DONE| has failed!
2013-05-23 14:40:20.760: [ CRSPE][50] {0:1:23} Processing unplanned state change for [ora.asm crm01 1]
2013-05-23 14:40:20.760: [ CRSRPT][51] {0:1:23} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.asm
2013-05-23 14:40:20.761: [ CRSPE][50] {0:1:23} Scheduled local recovery for [ora.asm crm01 1]
2013-05-23 14:40:20.761: [ CRSPE][50] {0:1:23} Op 6000000001979220 has 3 WOs
2013-05-23 14:40:20.763: [ CRSPE][50] {0:1:23} ICE has queued an operation. Details: Operation [START of [ora.asm crm01 1] on [
crm01] : local=1, unplanned=06000000001979220] cannot run cause it needs W lock for: WO for Placement Path RI:[ora.asm crm01 1
] server [crm01] target states [ONLINE INTERMEDIATE ], locked by op [Resource Check: [ora.asm crm01 1]]. Owner: CRS-2682: It i
s locked by 'grid' for command 'Check Resource State' issued from 'crm01'
2013-05-23 14:40:20.763: [ AGFW][45] {0:1:23} Agfw Proxy Server received the message: RESOURCE_PROBE[ora.CRM1GIDG.dg crm01 1]
ID 4097:556537
2013-05-23 14:40:20.763: [ AGFW][45] {0:1:23} Agfw Proxy Server forwarding the message: RESOURCE_PROBE[ora.CRM1GIDG.dg crm01 1
] ID 4097:556537 to the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 14:40:20.764: [ AGFW][45] {0:1:23} Agfw Proxy Server received the message: RESOURCE_PROBE[ora.oracrm.db 1 1] ID 4097:55
6539
2013-05-23 14:40:20.765: [ AGFW][45] {0:1:23} Agfw Proxy Server forwarding the message: RESOURCE_PROBE[ora.oracrm.db 1 1] ID 4097:
556539 to the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_oracle
2013-05-23 14:40:20.766: [ AGFW][45] {0:1:23} Agfw Proxy Server received the message: RESOURCE_PROBE[ora.oracrm.db 1 1] ID 4097:55
6541
2013-05-23 14:40:20.766: [ AGFW][45] {0:1:23} Agfw Proxy Server forwarding the message: RESOURCE_PROBE[ora.oracrm.db 1 1] ID 4097:
556541 to the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_oracle
2013-05-23 14:40:20.767: [ CRSPE][50] {0:1:23} Operation [START of [ora.asm crm01 1] on [crm01] : local=1, unplanned=0600000
0001979220] has been replaced with [START of [ora.asm crm01 1] on [crm01] : local=1, unplanned=06000000001d30e80
2013-05-23 14:40:20.768: [ CRSPE][50] {0:1:23} Op 6000000001d30e80 has 3 WOs
2013-05-23 14:40:20.771: [ CRSPE][50] {0:1:23} RI [ora.asm crm01 1] new internal state: [STARTING] old value: [STABLE]
2013-05-23 14:40:20.772: [ CRSPE][50] {0:1:23} Sending message to agfw: id = 556543
2013-05-23 14:40:20.772: [ AGFW][45] {0:1:23} Agfw Proxy Server received the message: RESOURCE_START[ora.asm crm01 1] ID 4098:
556543
2013-05-23 14:40:20.772: [ CRSPE][50] {0:1:23} CRS-2672: Attempting to start 'ora.asm' on 'crm01'
2013-05-23 14:40:20.773: [ AGFW][45] {0:1:23} Agfw Proxy Server forwarding the message: RESOURCE_START[ora.asm crm01 1] ID 409
8:556543 to the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 14:40:20.774: [ CRSPE][50] {1:41988:45931} Processing PE command id=45873. Description: [Stat Resource : 60000000019792
20]
2013-05-23 14:40:20.782: [ AGFW][45] {0:1:23} Received the reply to the message: RESOURCE_PROBE[ora.oracrm.db 1 1] ID 4097:556540
from the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_oracle
2013-05-23 14:40:20.782: [ AGFW][45] {0:1:23} ora.oracrm.db 1 1 received state from probe request. Old state = ONLINE, New state =
ONLINE
2013-05-23 14:40:20.783: [ AGFW][45] {0:1:23} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_PROBE[ora.oracrm
.db 1 1] ID 4097:556539
2013-05-23 14:40:20.783: [ AGFW][45] {0:1:23} Received the reply to the message: RESOURCE_PROBE[ora.oracrm.db 1 1] ID 4097:556542
from the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_oracle
2013-05-23 14:40:20.783: [UiServer][51] {1:41988:45931} Done for ctx=60000000018e1830
2013-05-23 14:40:20.784: [ AGFW][45] {0:1:23} ora.oracrm.db 1 1 received state from probe request. Old state = ONLINE, New state =
ONLINE
2013-05-23 14:40:20.784: [ AGFW][45] {0:1:23} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_PROBE[ora.oracrm
.db 1 1] ID 4097:556541
2013-05-23 14:40:20.798: [UiServer][52] CS(6000000001aab940)set Properties ( grid,60000000019f83f0)
2013-05-23 14:40:20.817: [UiServer][51] {1:41988:45932} Container [ Name: UI_REGISTER
API_HDR_VER:
TextMessage[2]
API_REGUPDATE_TAG:
TextMessage[1]
ASYNC_TAG:
TextMessage[1]
ATTR_LIST:
TextMessage[GEN_USR_ORA_INST_NAME@SERVERNAME(crm01)=+ASM1^B^A]
CLIENT:
TextMessage[]
CLIENT_NAME:
TextMessage[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin]
CLIENT_PID:
TextMessage[6047]
CLIENT_PRIMARY_GROUP:
TextMessage[AMERICAN_AMERICA.WE8ISO8859P1]
QUEUE_TAG:
TextMessage[1]
RESOURCE:
TextMessage[ora.asm]
UPDATE_TAG:
TextMessage[1]
]
2013-05-23 14:40:20.817: [UiServer][51] {1:41988:45932} Sending message to PE. ctx= 60000000018e1740, Client PID: 6047
2013-05-23 14:40:20.818: [ CRSPE][50] {1:41988:45932} Cmd : 6000000001979220 : flags: API_REGUPDATE_TAG | QUEUE_TAG | UPDATE_TAG
2013-05-23 14:40:20.820: [ CRSPE][50] {1:41988:45932} Processing PE command id=45874. Description: [Register|Update Resource : :
6000000001979220]
2013-05-23 14:40:20.821: [ CRSPE][50] {1:41988:45932} Update attr: GEN_USR_ORA_INST_NAME@SERVERNAME(crm01)
2013-05-23 14:40:20.821: [ CRSPE][50] {1:41988:45932} Update attr: NAME
2013-05-23 14:40:20.825: [ AGFW][45] {0:1:24} Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:557993
2013-05-23 14:40:20.825: [ AGFW][45] {0:1:24} Received state change for ora.CRM1GIDG.dg crm01 1 [old state = ONLINE, new state
= OFFLINE]
2013-05-23 14:40:20.825: [ AGFW][45] {0:1:24} Agfw Proxy Server sending message to PE, Contents = [MIDTo:2|OpID:3|FromA:{Invalid|
Node:0|Process:0|Type:0}|ToA:{Invalid|Node:-1|Process:-1|Type:-1}|MIDFrom:0|Type:4|Pri2|Id:556551:Ver:2]
2013-05-23 14:40:20.825: [ AGFW][45] {0:1:24} Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:557993
2013-05-23 14:40:20.827: [ AGFW][45] {0:1:23} Received the reply to the message: RESOURCE_PROBE[ora.CRM1GIDG.dg crm01 1] ID 40
97:556538 from the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_grid
2013-05-23 14:40:20.827: [ AGFW][45] {0:1:23} ora.CRM1GIDG.dg crm01 1 received state from probe request. Old state = OFFLINE,
New state = OFFLINE
2013-05-23 14:40:20.828: [ AGFW][45] {0:1:23} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_PROBE[ora.CRM1G
IDG.dg crm01 1] ID 4097:556537
2013-05-23 14:40:20.828: [ CRSPE][50] {1:41988:45932} ICE has queued an operation. Details: Operation [Update Resource Op [ora.asm
]] cannot run cause it needs W lock for: IWaitObject wrapper for:ora.asm, locked by op [START of [ora.asm crm01 1] on [crm01]
: local=1, unplanned=06000000001d30e80]. Owner: CRS-2683: It is locked by 'SYSTEM' for command 'Unplanned Resource State Change : or
a.asm'
2013-05-23 14:40:20.829: [ CRSPE][50] {0:1:24} State change received from crm01 for ora.CRM1GIDG.dg crm01 1
2013-05-23 14:40:20.829: [ CRSPE][50] {0:1:24} Processing PE command id=45875. Description: [Resource State Change (ora.CRM1GIDG.d
g crm01 1) : 6000000002b63840]
2013-05-23 14:40:20.829: [ CRSPE][50] {0:1:24} RI [ora.CRM1GIDG.dg crm01 1] new external state [OFFLINE] old value: [ONLINE] on
crm01 label = []
2013-05-23 14:40:20.830: [ CRSPE][50] {0:1:24} Resource Resource Instance ID[ora.CRM1GIDG.dg crm01 1]. Values:
STATE=OFFLINE
TARGET=ONLINE
LAST_SERVER=crm01
CURRENT_RCOUNT=0
LAST_RESTART=0
FAILURE_COUNT=0
FAILURE_HISTORY=
STATE_DETAILS=
INCARNATION=1
STATE_CHANGE_VERS=0
LAST_FAULT=0
LAST_STATE_CHANGE=1369291220
INTERNAL_STATE=0
DEGREE_ID=1
ID=ora.CRM1GIDG.dg crm01 1
Lock Info:
Write Locks:none
ReadLocks:|STATE INITED||INITIAL CHECK DONE| has failed!
2013-05-23 14:40:20.830: [ CRSPE][50] {0:1:24} Processing unplanned state change for [ora.CRM1GIDG.dg crm01 1]
2013-05-23 14:40:20.831: [ CRSRPT][51] {0:1:24} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.CRM1GIDG.dg
2013-05-23 14:40:20.832: [ CRSPE][50] {0:1:24} Scheduled local recovery for [ora.CRM1GIDG.dg crm01 1]
2013-05-23 14:40:20.832: [ CRSPE][50] {0:1:24} Op 60000000030ca860 has 5 WOs
2013-05-23 14:40:20.834: [ CRSPE][50] {0:1:24} ICE has queued an operation. Details: Operation [START of [ora.CRM1GIDG.dg crm01
1] on [crm01] : local=1, unplanned=060000000030ca860] cannot run cause it needs R lock for: IWaitObject wrapper for:ora.asm, loc
ked by op [Update Resource Op [ora.asm]]. Owner: CRS-2682: It is locked by 'grid' for command 'Modify Resource : ora.asm' issued fro
m 'crm01'
2013-05-23 14:40:20.834: [ AGFW][45] {0:1:24} Agfw Proxy Server received the message: RESOURCE_PROBE[ora.oracrm.db 1 1] ID 4097:55
6554
2013-05-23 14:40:20.834: [ AGFW][45] {0:1:24} Agfw Proxy Server forwarding the message: RESOURCE_PROBE[ora.oracrm.db 1 1] ID 4097:
556554 to the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_oracle
2013-05-23 14:40:20.861: [ AGFW][45] {0:1:24} Received the reply to the message: RESOURCE_PROBE[ora.oracrm.db 1 1] ID 4097:556555
from the agent /u03/oracle/app/11.2.0/grid/bin/oraagent_oracle
2013-05-23 14:40:20.861: [ AGFW][45] {0:1:24} ora.oracrm.db 1 1 received state from probe request. Old state = ONLINE, New state =
ONLINE
2013-05-23 14:40:20.861: [ AGFW][45] {0:1:24} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_PROBE[ora.oracrm
.db 1 1] ID 4097:556554
2013-05-23 14:40:20.926: [UiServer][52] CS(6000000001aab810)set Properties ( oracle,60000000019f7eb0)
2013-05-23 14:40:20.947: [UiServer][51] {1:41988:45933} Sending message to PE. ctx= 60000000018e1a10, Client PID: 20085
2013-05-23 14:40:20.947: [ CRSPE][50] {1:41988:45933} Processing PE command id=45876. Description: [Stat Resource : 6000000002d9fd
90]
2013-05-23 14:40:20.956: [UiServer][51] {1:41988:45933} Done for ctx=60000000018e1a10
2013-05-23 14:40:20.967: [UiServer][52] CS(6000000001aab6e0)set Properties ( oracle,60000000019f7df0)
2013-05-23 14:40:20.987: [UiServer][51] {1:41988:45934} Sending message to PE. ctx= 60000000018e1880, Client PID: 20085
2013-05-23 14:43:19.394: [ CRSMAIN][1] Getting local node number
2013-05-23 14:43:19.396: [ CRSMAIN][1] Initializing OCR
[ CLWAL][1]clsw_Initialize: OLR initlevel [70000]
2013-05-23 14:43:19.529: [ OCRAPI][1]clsu_get_private_ip_addresses: no ip addresses found.
[ OCRAPI][1]a_init_clsss: failed to call clsu_get_private_ip_addr (7)
2013-05-23 14:43:19.763: [ OCRAPI][1]a_init:13!: Clusterware init unsuccessful : [44]
2013-05-23 14:43:19.764: [ CRSOCR][1] OCR context init failure. Error: PROC-44: Error in network address and interface operations
Network address and interface operations error [7]
2013-05-23 14:43:19.765: [ CRSMAIN][1] Created alert : (:CRSD00111:) : Could not init OCR, error: PROC-44: Error in network address
and interface operations Network address and interface operations error [7]
2013-05-23 14:43:19.765: [ CRSD][1][PANIC] CRSD exiting: Could not init OCR, code: 44
2013-05-23 14:43:19.765: [ CRSD][1] Done.
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown,S
TARTING
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm01
ora.crsd
1 ONLINE ONLINE crm01
ora.cssd
1 ONLINE ONLINE crm01
ora.cssdmonitor
1 ONLINE ONLINE crm01
ora.ctssd
1 ONLINE ONLINE crm01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm01
ora.gipcd
1 ONLINE ONLINE crm01
ora.gpnpd
1 ONLINE ONLINE crm01
ora.mdnsd
1 ONLINE ONLINE crm01
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown,S
TARTING
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm01
ora.crsd
1 ONLINE ONLINE crm01
ora.cssd
1 ONLINE ONLINE crm01
ora.cssdmonitor
1 ONLINE ONLINE crm01
ora.ctssd
1 ONLINE ONLINE crm01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm01
ora.gipcd
1 ONLINE ONLINE crm01
ora.gpnpd
1 ONLINE ONLINE crm01
ora.mdnsd
1 ONLINE ONLINE crm01
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown,S
TARTING
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm01
ora.crsd
1 ONLINE ONLINE crm01
ora.cssd
1 ONLINE ONLINE crm01
ora.cssdmonitor
1 ONLINE ONLINE crm01
ora.ctssd
1 ONLINE ONLINE crm01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm01
ora.gipcd
1 ONLINE ONLINE crm01
ora.gpnpd
1 ONLINE ONLINE crm01
ora.mdnsd
1 ONLINE ONLINE crm01
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown,S
TARTING
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm01
ora.crsd
1 ONLINE ONLINE crm01
ora.cssd
1 ONLINE ONLINE crm01
ora.cssdmonitor
1 ONLINE ONLINE crm01
ora.ctssd
1 ONLINE ONLINE crm01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm01
ora.gipcd
1 ONLINE ONLINE crm01
ora.gpnpd
1 ONLINE ONLINE crm01
ora.mdnsd
1 ONLINE ONLINE crm01
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown,S
TARTING
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm01
ora.crsd
1 ONLINE ONLINE crm01
ora.cssd
1 ONLINE ONLINE crm01
ora.cssdmonitor
1 ONLINE ONLINE crm01
ora.ctssd
1 ONLINE ONLINE crm01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm01
ora.gipcd
1 ONLINE ONLINE crm01
ora.gpnpd
1 ONLINE ONLINE crm01
ora.mdnsd
1 ONLINE ONLINE crm01
crm01:/# ps -ef|grep ASM
root 17242 16872 0 14:41:28 pts/1 0:00 grep ASM
crm01:/# ps -ef|grep ASM
root 17244 16872 0 14:41:32 pts/1 0:00 grep ASM
crm01:/# ps -ef|grep ASM
root 17246 16872 1 14:41:33 pts/1 0:00 grep ASM
crm01:/# ps -ef|grep ASM
root 17248 16872 0 14:41:34 pts/1 0:00 grep ASM
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE crm01 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm01
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE ONLINE crm01
ora.cssdmonitor
1 ONLINE ONLINE crm01
ora.ctssd
1 ONLINE ONLINE crm01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm01
ora.gipcd
1 ONLINE ONLINE crm01
ora.gpnpd
1 ONLINE ONLINE crm01
ora.mdnsd
1 ONLINE ONLINE crm01
查看当时1节点的crsd.log
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE crm01 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm01
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE ONLINE crm01
ora.cssdmonitor
1 ONLINE ONLINE crm01
ora.ctssd
1 ONLINE ONLINE crm01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm01
ora.gipcd
1 ONLINE ONLINE crm01
ora.gpnpd
1 ONLINE ONLINE crm01
ora.mdnsd
1 ONLINE ONLINE crm01
节点2 GI alert:
2013-05-23 14:37:01.306
[cssd(27814)]CRS-1612:Network communication with node crm01 (1) missing for 50% of timeout interval. Removal of this node from cluster in 299.825 seconds
2013-05-23 14:37:24.447
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3675)]CRS-5818:Aborted command 'check' for resource 'ora.LISTENER.lsnr'. Details at (:CRSAGF00113:) {0:13:2} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:37:24.447
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3675)]CRS-5818:Aborted command 'check' for resource 'ora.LISTENER_SCAN1.lsnr'. Details at (:CRSAGF00113:) {0:13:2} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:37:24.475
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3675)]CRS-5818:Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:13:2} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:38:43.448
[/u03/oracle/app/11.2.0/grid/bin/orarootagent.bin(27818)]CRS-5818:Aborted command 'check' for resource 'ora.crsd'. Details at (:CRSAGF00113:) {0:0:2} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/ohasd/orarootagent_root/orarootagent_root.log.
2013-05-23 14:38:55.236
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3724)]CRS-5818:Aborted command 'check' for resource 'ora.LISTENER.lsnr'. Details at (:CRSAGF00113:) {0:14:2} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:38:55.265
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3724)]CRS-5818:Aborted command 'check' for resource 'ora.LISTENER_SCAN1.lsnr'. Details at (:CRSAGF00113:) {0:14:2} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:39:25.295
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3724)]CRS-5818:Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:14:2} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:39:31.694
[cssd(27814)]CRS-1611:Network communication with node crm01 (1) missing for 75% of timeout interval. Removal of this node from cluster in 149.436 seconds
2013-05-23 14:40:26.057
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3753)]CRS-5818:Aborted command 'check' for resource 'ora.LISTENER.lsnr'. Details at (:CRSAGF00113:) {0:15:2} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:40:26.057
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3753)]CRS-5818:Aborted command 'check' for resource 'ora.LISTENER_SCAN1.lsnr'. Details at (:CRSAGF00113:) {0:15:2} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:40:56.095
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3753)]CRS-5818:Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:15:2} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:41:01.303
[cssd(27814)]CRS-1610:Network communication with node crm01 (1) missing for 90% of timeout interval. Removal of this node from cluster in 59.826 seconds
2013-05-23 14:41:56.857
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3817)]CRS-5818:Aborted command 'check' for resource 'ora.LISTENER.lsnr'. Details at (:CRSAGF00113:) {0:16:2} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:41:56.857
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3817)]CRS-5818:Aborted command 'check' for resource 'ora.LISTENER_SCAN1.lsnr'. Details at (:CRSAGF00113:) {0:16:2} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:42:00.334
[cssd(27814)]CRS-1608:This node was evicted by node 1, crm01; details at (:CSSNM00005:) in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log.
2013-05-23 14:42:00.334
[cssd(27814)]CRS-1608:This node was evicted by node 1, crm01; details at (:CSSNM00005:) in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log.
2013-05-23 14:42:00.335
[cssd(27814)]CRS-1652:Starting clean up of CRSD resources.
2013-05-23 14:42:00.335
[cssd(27814)]CRS-1608:This node was evicted by node 1, crm01; details at (:CSSNM00005:) in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log.
2013-05-23 14:42:00.447
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3817)]CRS-5818:Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:16:2} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:42:03.285
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3817)]CRS-5016:Process "/u03/oracle/app/11.2.0/grid/opmn/bin/onsctli" spawned by agent "/u03/oracle/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log"
2013-05-23 14:42:05.155
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3817)]CRS-5016:Process "/u03/oracle/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/u03/oracle/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log"
2013-05-23 14:42:10.935
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3817)]CRS-5016:Process "/u03/oracle/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/u03/oracle/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log"
2013-05-23 14:42:28.908
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(27790)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u03/oracle/app/11.2.0/grid/log/crm02/agent/ohasd/oraagent_grid/oraagent_grid.log"
2013-05-23 14:42:28.965
[ohasd(27763)]CRS-2765:Resource 'ora.asm' has failed on server 'crm02'.
2013-05-23 14:42:28.992
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(27790)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u03/oracle/app/11.2.0/grid/log/crm02/agent/ohasd/oraagent_grid/oraagent_grid.log"
2013-05-23 14:42:41.002
[cssd(27814)]CRS-1654:Clean up of CRSD resources finished successfully.
2013-05-23 14:42:41.003
[cssd(27814)]CRS-1655:CSSD on node crm02 detected a problem and started to shutdown.
2013-05-23 14:42:41.019
[/u03/oracle/app/11.2.0/grid/bin/orarootagent.bin(27956)]CRS-5822:Agent '/u03/oracle/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:3:21043} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/orarootagent_root/orarootagent_root.log.
2013-05-23 14:42:41.017
[/u03/oracle/app/11.2.0/grid/bin/oraagent.bin(3817)]CRS-5822:Agent '/u03/oracle/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:16:10} in /u03/oracle/app/11.2.0/grid/log/crm02/agent/crsd/oraagent_grid/oraagent_grid.log.
2013-05-23 14:42:41.078
[ohasd(27763)]CRS-2765:Resource 'ora.crsd' has failed on server 'crm02'.
2013-05-23 14:42:41.124
[cssd(27814)]CRS-1660:The CSS daemon shutdown has completed
2013-05-23 14:42:41.217
[ohasd(27763)]CRS-2765:Resource 'ora.ctssd' has failed on server 'crm02'.
2013-05-23 14:42:42.325
[ctssd(3920)]CRS-2402:The Cluster Time Synchronization Service aborted on host crm02. Details at (:ctss_css_init1:) in /u03/oracle/app/11.2.0/grid/log/crm02/ctssd/octssd.log.
2013-05-23 14:42:42.498
[ohasd(27763)]CRS-2765:Resource 'ora.evmd' has failed on server 'crm02'.
2013-05-23 14:42:46.444
[ctssd(3928)]CRS-2402:The Cluster Time Synchronization Service aborted on host crm02. Details at (:ctss_css_init1:) in /u03/oracle/app/11.2.0/grid/log/crm02/ctssd/octssd.log.
2013-05-23 14:42:50.555
[ctssd(3932)]CRS-2402:The Cluster Time Synchronization Service aborted on host crm02. Details at (:ctss_css_init1:) in /u03/oracle/app/11.2.0/grid/log/crm02/ctssd/octssd.log.
2013-05-23 14:42:50.689
[ohasd(27763)]CRS-2765:Resource 'ora.cssd' has failed on server 'crm02'.
2013-05-23 14:42:50.714
[ohasd(27763)]CRS-2765:Resource 'ora.cluster_interconnect.haip' has failed on server 'crm02'.
2013-05-23 14:42:53.218
[cssd(3938)]CRS-1713:CSSD daemon is started in clustered mode
2013-05-23 14:43:09.694
[cssd(3938)]CRS-1707:Lease acquisition for node crm02 number 2 completed
2013-05-23 14:43:11.391
[ohasd(27763)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'crm02'.
2013-05-23 14:43:22.195
[cssd(3938)]CRS-1605:CSSD voting file is online: /asmdisks/asmdisk1; details in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log.
2013-05-23 14:43:22.231
[cssd(3938)]CRS-1605:CSSD voting file is online: /asmdisks/asmdisk2; details in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log.
2013-05-23 14:43:22.261
[cssd(3938)]CRS-1605:CSSD voting file is online: /asmdisks/asmdisk3; details in /u03/oracle/app/11.2.0/grid/log/crm02/cssd/ocssd.log.
2013-05-23 14:48:41.781
[ohasd(27763)]CRS-5828:Could not start agent '/u03/oracle/app/11.2.0/grid/bin/cssdmonitor_root'. Details at (:CRSAGF00130:) {0:0:47086} in /u03/oracle/app/11.2.0/grid/log/crm02/ohasd/ohasd.log.
节点2会循环自动侦测新加入的集群:
节点2 GI ocssd.log
2013-05-23 14:48:40.081: [ CSSD][39]clssnmRcfgMgrThread: Local Join
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: begin on node(2), waittime 590000
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: set curtime (3553396050) for my node
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: scanning 32 nodes
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 0 is down according to vendor clusterware
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node crm01, number 1, is in an existing cluster with disk state 3
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 3 is down according to vendor clusterware
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 4 is down according to vendor clusterware
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 5 is down according to vendor clusterware
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 6 is down according to vendor clusterware
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 7 is down according to vendor clusterware
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 8 is down according to vendor clusterware
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 9 is down according to vendor clusterware
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 10 is down according to vendor clusterware
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 11 is down according to vendor clusterware
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 12 is down according to vendor clusterware
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 13 is down according to vendor clusterware
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 14 is down according to vendor clusterware
2013-05-23 14:48:40.081: [ CSSD][39]clssnmLocalJoinEvent: Node , number 15 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 16 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 17 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 18 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 19 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 20 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 21 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 22 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 23 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 24 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 25 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 26 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 27 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 28 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 29 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 30 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: Node , number 31 is down according to vendor clusterware
2013-05-23 14:48:40.082: [ CSSD][39]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
2013-05-23 14:48:40.421: [ CSSD][36]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2013-05-23 14:48:40.422: [ CSSD][30]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177734, LATS 3553396391, lastSeqNo 7177731, uniqueness 1366965579, timestamp 1369291720/3555234451
2013-05-23 14:48:40.422: [ CSSD][27]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177735, LATS 3553396391, lastSeqNo 7177732, uniqueness 1366965579, timestamp 1369291720/3555234460
2013-05-23 14:48:40.441: [ CSSD][33]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177736, LATS 3553396410, lastSeqNo 7177733, uniqueness 1366965579, timestamp 1369291720/3555234560
2013-05-23 14:48:41.431: [ CSSD][36]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2013-05-23 14:48:41.432: [ CSSD][30]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177737, LATS 3553397401, lastSeqNo 7177734, uniqueness 1366965579, timestamp 1369291721/3555235461
2013-05-23 14:48:41.432: [ CSSD][27]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177738, LATS 3553397401, lastSeqNo 7177735, uniqueness 1366965579, timestamp 1369291721/3555235470
2013-05-23 14:48:41.451: [ CSSD][33]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177739, LATS 3553397420, lastSeqNo 7177736, uniqueness 1366965579, timestamp 1369291721/3555235570
2013-05-23 14:48:42.353: [ CSSD][5]clssscSelect: cookie accept request 600000000068ae10
2013-05-23 14:48:42.353: [ CSSD][5]clssgmAllocProc: (6000000000cb0930) allocated
2013-05-23 14:48:42.353: [ CSSD][5]clssgmClientConnectMsg: properties of cmProc 6000000000cb0930 - 1,2,3,4,5
2013-05-23 14:48:42.353: [ CSSD][5]clssgmClientConnectMsg: Connect from con(0000000000003f27) proc(6000000000cb0930) pid(4046) ve
rsion 11:2:1:4, properties: 1,2,3,4,5
2013-05-23 14:48:42.353: [ CSSD][5]clssgmClientConnectMsg: The CSSD monitor is process (6000000000cb0930), number 4
2013-05-23 14:48:42.370: [ CSSD][5]clssgmEvtInformation: reqtype (22) cmProc (6000000000cb0930) client (0000000000000000)
2013-05-23 14:48:42.370: [ CSSD][5]clssgmEvtInformation: reqtype (22) req (6000000000c47b60)
2013-05-23 14:48:42.370: [ CSSD][5]clssnmQueueNotification: type (22) 6000000000c47b60
2013-05-23 14:48:42.441: [ CSSD][36]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2013-05-23 14:48:42.442: [ CSSD][30]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177740, LATS 3553398411, lastSeqNo 7177737, uniqueness 1366965579, timestamp 1369291722/3555236471
2013-05-23 14:48:42.442: [ CSSD][27]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177741, LATS 3553398411, lastSeqNo 7177738, uniqueness 1366965579, timestamp 1369291722/3555236480
2013-05-23 14:48:42.461: [ CSSD][33]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177742, LATS 3553398430, lastSeqNo 7177739, uniqueness 1366965579, timestamp 1369291722/3555236580
2013-05-23 14:48:42.891: [ CSSD][38]clssnmSendingThread: sending join msg to all nodes
2013-05-23 14:48:42.892: [ CSSD][38]clssnmSendingThread: sent 4 join msgs to all nodes
2013-05-23 14:48:43.451: [ CSSD][36]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2013-05-23 14:48:43.452: [ CSSD][30]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177744, LATS 3553399421, lastSeqNo 7177740, uniqueness 1366965579, timestamp 1369291723/3555237531
2013-05-23 14:48:43.471: [ CSSD][33]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177745, LATS 3553399440, lastSeqNo 7177742, uniqueness 1366965579, timestamp 1369291723/3555237590
2013-05-23 14:48:44.461: [ CSSD][36]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2013-05-23 14:48:44.462: [ CSSD][30]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261665311, wrtcnt, 7177747, LATS 3553400431, lastSeqNo 7177744, uniqueness 1366965579, timestamp 1369291724/3555238541
2013-05-23 14:48:44.481: [ CSSD][33]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177748, LATS 3553400450, lastSeqNo 7177745, uniqueness 1366965579, timestamp 1369291724/3555238600
2013-05-23 14:48:45.471: [ CSSD][36]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2013-05-23 14:48:45.472: [ CSSD][30]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177750, LATS 3553401441, lastSeqNo 7177747, uniqueness 1366965579, timestamp 1369291725/3555239551
2013-05-23 14:48:45.491: [ CSSD][33]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177751, LATS 3553401460, lastSeqNo 7177748, uniqueness 1366965579, timestamp 1369291725/3555239610
2013-05-23 14:48:46.481: [ CSSD][36]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2013-05-23 14:48:46.482: [ CSSD][30]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177753, LATS 3553402451, lastSeqNo 7177750, uniqueness 1366965579, timestamp 1369291726/3555240561
2013-05-23 14:48:46.501: [ CSSD][33]clssnmvDHBValidateNcopy: node 1, crm01, has a disk HB, but no network HB, DHB has rcfg 261
665311, wrtcnt, 7177754, LATS 3553402470, lastSeqNo 7177751, uniqueness 1366965579, timestamp 1369291726/3555240620
节点1 GI状态:
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t
/u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE crm01 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm01
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE ONLINE crm01
ora.cssdmonitor
1 ONLINE ONLINE crm01
ora.ctssd
1 ONLINE ONLINE crm01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm01
ora.gipcd
1 ONLINE ONLINE crm01
ora.gpnpd
1 ONLINE ONLINE crm01
ora.mdnsd
1 ONLINE ONLINE crm01
节点2 GI状态:
$ /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
$ /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE
ora.cssdmonitor
1 ONLINE OFFLINE
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE crm02
ora.gpnpd
1 ONLINE ONLINE crm02
ora.mdnsd
1 ONLINE ONLINE crm02
节点1 d.bin 进程:
crm01:/# ps -ef|grep d.bin
grid 6025 5922 0 Apr 26 ? 0:00 /u03/oracle/app/11.2.0/grid/bin/evmlogger.bin -o /u03/oracle/app/11.2.0/grid/evm/log/evmlogger.info -l /u03/oracle/app/11.2.0/g
root 5911 1 0 Apr 26 ? 739:48 /u03/oracle/app/11.2.0/grid/bin/orarootagent.bin
grid 5922 1 0 Apr 26 ? 298:02 /u03/oracle/app/11.2.0/grid/bin/evmd.bin
root 18358 16872 1 15:11:12 pts/1 0:00 grep d.bin
root 5846 1 0 Apr 26 ? 411:25 /u03/oracle/app/11.2.0/grid/bin/ohasd.bin reboot
root 5890 1 0 Apr 26 ? 45:57 /u03/oracle/app/11.2.0/grid/bin/cssdmonitor
root 5892 1 0 Apr 26 ? 43:58 /u03/oracle/app/11.2.0/grid/bin/cssdagent
grid 5874 1 0 Apr 26 ? 493:06 /u03/oracle/app/11.2.0/grid/bin/oraagent.bin
grid 5878 1 0 Apr 26 ? 1:37 /u03/oracle/app/11.2.0/grid/bin/mdnsd.bin
grid 5882 1 0 Apr 26 ? 25:16 /u03/oracle/app/11.2.0/grid/bin/gpnpd.bin
grid 5898 1 0 Apr 26 ? 693:18 /u03/oracle/app/11.2.0/grid/bin/ocssd.bin
grid 10933 1 0 11:49:19 ? 0:00 /u03/oracle/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
root 5915 1 0 Apr 26 ? 298:16 /u03/oracle/app/11.2.0/grid/bin/octssd.bin reboot
grid 5886 1 0 Apr 26 ? 467:15 /u03/oracle/app/11.2.0/grid/bin/gipcd.bin
节点2 d.bin进程:
$ ps -ef|grep d.bin
root 27763 1 0 Apr 26 ? 388:47 /u03/oracle/app/11.2.0/grid/bin/ohasd.bin reboot
grid 27798 1 0 Apr 26 ? 21:13 /u03/oracle/app/11.2.0/grid/bin/gpnpd.bin
grid 3916 1 0 14:42:41 ? 0:01 /u03/oracle/app/11.2.0/grid/bin/oraagent.bin
grid 27802 1 0 Apr 26 ? 409:33 /u03/oracle/app/11.2.0/grid/bin/gipcd.bin
root 4221 1 0 14:54:14 ? 0:00 /u03/oracle/app/11.2.0/grid/bin/orarootagent.bin
grid 27794 1 0 Apr 26 ? 1:13 /u03/oracle/app/11.2.0/grid/bin/mdnsd.bin
grid 4537 3168 0 15:11:05 pts/0 0:00 grep d.bin
(3).恢复节点1 的2个私网网卡(UP)
15:30:02节点1私网UP
May 23 15:30:02 crm01 cmnetd[4301]: lan901 is up at the data link layer.
May 23 15:30:02 crm01 cmnetd[4301]: lan901 recovered.
May 23 15:30:02 crm01 cmnetd[4301]: Subnet 192.168.1.128 up
May 23 15:31:32 crm01 cmnetd[4301]: Link level address on network interface lan901 has been changed from 0x00237d3e32dc to 0x0023
7d3e3394.
在一个多小时后,节点1,节点2的日志没有继续写入(最后时间戳为: 2013-05-22 15:42),GI状态:
节点1 GI状态:
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE crm01 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm01
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE ONLINE crm01
ora.cssdmonitor
1 ONLINE ONLINE crm01
ora.ctssd
1 ONLINE ONLINE crm01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm01
ora.gipcd
1 ONLINE ONLINE crm01
ora.gpnpd
1 ONLINE ONLINE crm01
ora.mdnsd
1 ONLINE ONLINE crm01
节点2状态:
$ /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
$ /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE
ora.cssdmonitor
1 ONLINE OFFLINE
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE crm02
ora.gpnpd
1 ONLINE ONLINE crm02
ora.mdnsd
1 ONLINE ONLINE crm02
节点1 ohasd.log日志结果:
ReadLocks:|STATE INITED||INITIAL CHECK DONE| has failed!
2013-05-23 14:43:16.166: [ CRSPE][29] {0:9:19} Processing unplanned state change for [ora.crsd 1 1]
2013-05-23 14:43:16.167: [ CRSPE][29] {0:9:19} Local Recovery picked for [ora.crsd 1 1]
2013-05-23 14:43:16.167: [ CRSPE][29] {0:9:19} Op 60000000023fcd20 has 11 WOs
2013-05-23 14:43:16.173: [ CRSPE][29] {0:9:19} RI [ora.crsd 1 1] new internal state: [STARTING] old value: [STABLE]
2013-05-23 14:43:16.173: [ CRSPE][29] {0:9:19} Sending message to agfw: id = 403447
2013-05-23 14:43:16.174: [ AGFW][24] {0:9:19} Agfw Proxy Server received the message: RESOURCE_START[ora.crsd 1 1] ID 4098:403447
2013-05-23 14:43:16.174: [ CRSPE][29] {0:9:19} CRS-2672: Attempting to start 'ora.crsd' on 'crm01'
2013-05-23 14:43:16.174: [ AGFW][24] {0:9:19} Agfw Proxy Server forwarding the message: RESOURCE_START[ora.crsd 1 1] ID 4098:403447 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:43:17.257: [ AGFW][24] {0:9:19} Received the reply to the message: RESOURCE_START[ora.crsd 1 1] ID 4098:403448 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:43:17.258: [ AGFW][24] {0:9:19} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.crsd 1 1] ID 4098:403447
2013-05-23 14:43:17.259: [ CRSPE][29] {0:9:19} Received reply to action [Start] message ID: 403447
2013-05-23 14:43:17.262: [ AGFW][24] {0:9:19} Received the reply to the message: RESOURCE_START[ora.crsd 1 1] ID 4098:403448 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:43:17.263: [ AGFW][24] {0:9:19} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.crsd 1 1] ID 4098:403447
2013-05-23 14:43:17.263: [ CRSPE][29] {0:9:19} Received reply to action [Start] message ID: 403447
2013-05-23 14:43:17.263: [ CRSPE][29] {0:9:19} RI [ora.crsd 1 1] new internal state: [STABLE] old value: [STARTING]
2013-05-23 14:43:17.263: [ CRSPE][29] {0:9:19} RI [ora.crsd 1 1] new external state [INTERMEDIATE] old value: [OFFLINE] on crm01 label = []
2013-05-23 14:43:17.263: [ CRSPE][29] {0:9:19} CRS-2676: Start of 'ora.crsd' on 'crm01' succeeded
2013-05-23 14:43:17.265: [ CRSPE][29] {0:9:19} PE Command [ Resource State Change (ora.crsd 1 1) : 60000000028ffe90 ] has completed
2013-05-23 14:43:17.265: [ AGFW][24] {0:9:19} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:403451
2013-05-23 14:43:17.265: [ AGFW][24] {0:9:19} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:403451
2013-05-23 14:43:17.265: [ AGFW][24] {0:9:19} Agfw received reply from PE for resource state change for ora.crsd 1 1
2013-05-23 14:43:18.272: [ AGFW][24] {0:9:20} Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:774958
2013-05-23 14:43:18.272: [ AGFW][24] {0:9:20} Received state change for ora.crsd 1 1 [old state = PARTIAL, new state = OFFLINE]
2013-05-23 14:43:18.272: [ AGFW][24] {0:9:20} Agfw Proxy Server sending message to PE, Contents = [MIDTo:2|OpID:3|FromA:{Invalid|Node:0|Process:0|Type:0}|ToA:{Invalid|Node:-1|Process:-1|Type:-1}|MIDFrom:0|Type:4|Pri2|Id:403453:Ver:2]
2013-05-23 14:43:18.272: [ AGFW][24] {0:9:20} Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:774958
2013-05-23 14:43:18.272: [ CRSPE][29] {0:9:20} State change received from crm01 for ora.crsd 1 1
2013-05-23 14:43:18.272: [ CRSPE][29] {0:9:20} Processing PE command id=47066. Description: [Resource State Change (ora.crsd 1 1) : 60000000028ffe90]
2013-05-23 14:43:18.273: [ CRSPE][29] {0:9:20} RI [ora.crsd 1 1] new external state [OFFLINE] old value: [INTERMEDIATE] on crm01 label = []
2013-05-23 14:43:18.273: [ CRSPE][29] {0:9:20} Resource Resource Instance ID[ora.crsd 1 1]. Values:
STATE=OFFLINE
TARGET=ONLINE
LAST_SERVER=crm01
CURRENT_RCOUNT=9
LAST_RESTART=1369291397
FAILURE_COUNT=0
FAILURE_HISTORY=
STATE_DETAILS=
INCARNATION=11
STATE_CHANGE_VERS=11
LAST_FAULT=1369291396
LAST_STATE_CHANGE=1369291398
INTERNAL_STATE=0
DEGREE_ID=1
ID=ora.crsd 1 1
CARDINALITY_ID=1
Lock Info:
Write Locks:none
ReadLocks:|STATE INITED||INITIAL CHECK DONE| has failed!
2013-05-23 14:43:18.273: [ CRSPE][29] {0:9:20} Processing unplanned state change for [ora.crsd 1 1]
2013-05-23 14:43:18.275: [ CRSPE][29] {0:9:20} Local Recovery picked for [ora.crsd 1 1]
2013-05-23 14:43:18.275: [ CRSPE][29] {0:9:20} Op 6000000002426570 has 11 WOs
2013-05-23 14:43:18.281: [ CRSPE][29] {0:9:20} RI [ora.crsd 1 1] new internal state: [STARTING] old value: [STABLE]
2013-05-23 14:43:18.281: [ CRSPE][29] {0:9:20} Sending message to agfw: id = 403454
2013-05-23 14:43:18.281: [ AGFW][24] {0:9:20} Agfw Proxy Server received the message: RESOURCE_START[ora.crsd 1 1] ID 4098:403454
2013-05-23 14:43:18.281: [ CRSPE][29] {0:9:20} CRS-2672: Attempting to start 'ora.crsd' on 'crm01'
2013-05-23 14:43:18.282: [ AGFW][24] {0:9:20} Agfw Proxy Server forwarding the message: RESOURCE_START[ora.crsd 1 1] ID 4098:403454 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:43:19.395: [ AGFW][24] {0:9:20} Received the reply to the message: RESOURCE_START[ora.crsd 1 1] ID 4098:403455 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:43:19.396: [ AGFW][24] {0:9:20} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.crsd 1 1] ID 4098:403454
2013-05-23 14:43:19.396: [ CRSPE][29] {0:9:20} Received reply to action [Start] message ID: 403454
2013-05-23 14:43:19.402: [ AGFW][24] {0:9:20} Received the reply to the message: RESOURCE_START[ora.crsd 1 1] ID 4098:403455 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:43:19.403: [ AGFW][24] {0:9:20} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.crsd 1 1] ID 4098:403454
2013-05-23 14:43:19.403: [ CRSPE][29] {0:9:20} Received reply to action [Start] message ID: 403454
2013-05-23 14:43:19.403: [ CRSPE][29] {0:9:20} RI [ora.crsd 1 1] new internal state: [STABLE] old value: [STARTING]
2013-05-23 14:43:19.403: [ CRSPE][29] {0:9:20} RI [ora.crsd 1 1] new external state [INTERMEDIATE] old value: [OFFLINE] on crm01 label = []
2013-05-23 14:43:19.404: [ CRSPE][29] {0:9:20} CRS-2676: Start of 'ora.crsd' on 'crm01' succeeded
2013-05-23 14:43:19.405: [ CRSPE][29] {0:9:20} PE Command [ Resource State Change (ora.crsd 1 1) : 60000000028ffe90 ] has completed
2013-05-23 14:43:19.405: [ AGFW][24] {0:9:20} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:403458
2013-05-23 14:43:19.406: [ AGFW][24] {0:9:20} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:403458
2013-05-23 14:43:19.406: [ AGFW][24] {0:9:20} Agfw received reply from PE for resource state change for ora.crsd 1 1
2013-05-23 14:43:20.414: [ AGFW][24] {0:9:21} Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:774973
2013-05-23 14:43:20.414: [ AGFW][24] {0:9:21} Received state change for ora.crsd 1 1 [old state = PARTIAL, new state = OFFLINE]
2013-05-23 14:43:20.415: [ AGFW][24] {0:9:21} Agfw Proxy Server sending message to PE, Contents = [MIDTo:2|OpID:3|FromA:{Invalid|Node:0|Process:0|Type:0}|ToA:{Invalid|Node:-1|Process:-1|Type:-1}|MIDFrom:0|Type:4|Pri2|Id:403460:Ver:2]
2013-05-23 14:43:20.415: [ AGFW][24] {0:9:21} Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:774973
2013-05-23 14:43:20.415: [ CRSPE][29] {0:9:21} State change received from crm01 for ora.crsd 1 1
2013-05-23 14:43:20.415: [ CRSPE][29] {0:9:21} Processing PE command id=47067. Description: [Resource State Change (ora.crsd 1 1) : 60000000028ffe90]
2013-05-23 14:43:20.415: [ CRSPE][29] {0:9:21} RI [ora.crsd 1 1] new external state [OFFLINE] old value: [INTERMEDIATE] on crm01 label = []
2013-05-23 14:43:20.416: [ CRSPE][29] {0:9:21} Resource Resource Instance ID[ora.crsd 1 1]. Values:
STATE=OFFLINE
TARGET=ONLINE
LAST_SERVER=crm01
CURRENT_RCOUNT=10
LAST_RESTART=1369291399
FAILURE_COUNT=0
FAILURE_HISTORY=
STATE_DETAILS=
INCARNATION=12
STATE_CHANGE_VERS=12
LAST_FAULT=1369291398
LAST_STATE_CHANGE=1369291400
INTERNAL_STATE=0
DEGREE_ID=1
ID=ora.crsd 1 1
CARDINALITY_ID=1
Lock Info:
Write Locks:none
ReadLocks:|STATE INITED||INITIAL CHECK DONE| has failed!
2013-05-23 14:43:20.416: [ CRSPE][29] {0:9:21} Processing unplanned state change for [ora.crsd 1 1]
2013-05-23 14:43:20.416: [ CRSPE][29] {0:9:21} No Local Restart of [ora.crsd 1 1], restart attempt exhausted
2013-05-23 14:43:20.417: [ CRSPE][29] {0:9:21} Recovery impossible, stopping :ora.crsd 1 1
2013-05-23 14:43:20.417: [ CRSPE][29] {0:9:21} Failover cannot be completed for [ora.crsd 1 1]. Stopping it and the resource tree
2013-05-23 14:43:20.417: [ CRSPE][29] {0:9:21} Op 6000000002421bf0 has 4 WOs
2013-05-23 14:43:20.418: [ CRSPE][29] {0:9:21} PE Command [ Resource State Change (ora.crsd 1 1) : 60000000028ffe90 ] has completed
2013-05-23 14:43:20.419: [ AGFW][24] {0:9:21} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:403461
2013-05-23 14:43:20.419: [ AGFW][24] {0:9:21} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:403461
2013-05-23 14:43:20.419: [ AGFW][24] {0:9:21} Agfw received reply from PE for resource state change for ora.crsd 1 1
2013-05-23 14:43:40.046: [UiServer][31] CS(600000000160d100)set Properties ( root,6000000001e9abd0)
2013-05-23 14:43:40.046: [UiServer][31] SS(600000000131a2f0)Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=696)(KEY=OHASD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_UI_SOCKET))
2013-05-23 14:43:40.064: [UiServer][30] {0:0:47057} processMessage called
2013-05-23 14:43:40.064: [UiServer][30] {0:0:47057} Sending message to PE. ctx= 6000000001ade190, Client PID: 17507
2013-05-23 14:43:40.064: [UiServer][30] {0:0:47057} Sending command to PE: 46950
2013-05-23 14:43:40.066: [ CRSPE][29] {0:0:47057} Processing PE command id=47068. Description: [Stat Resource : 6000000001c9f370]
2013-05-23 14:43:40.090: [UiServer][30] {0:0:47057} Done for ctx=6000000001ade190
2013-05-23 14:43:40.093: [UiServer][31] Closed: remote end failed/disc.
2013-05-23 14:58:25.146: [UiServer][31] CS(600000000160d100)set Properties ( root,6000000001e9abd0)
2013-05-23 14:58:25.146: [UiServer][31] SS(600000000131a2f0)Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=696)(KEY=OHASD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_UI_SOCKET))
2013-05-23 14:58:25.163: [UiServer][30] {0:0:47058} processMessage called
2013-05-23 14:58:25.164: [UiServer][30] {0:0:47058} Sending message to PE. ctx= 6000000001ade320, Client PID: 17946
2013-05-23 14:58:25.165: [UiServer][30] {0:0:47058} Sending command to PE: 46951
2013-05-23 14:58:25.166: [ CRSPE][29] {0:0:47058} Processing PE command id=47069. Description: [Stat Resource : 6000000001c9f370]
2013-05-23 14:58:25.190: [UiServer][30] {0:0:47058} Done for ctx=6000000001ade320
2013-05-23 14:58:25.192: [UiServer][31] Closed: remote end failed/disc.
2013-05-23 15:00:23.204: [UiServer][31] CS(600000000160d100)set Properties ( root,6000000001e9abd0)
2013-05-23 15:00:23.204: [UiServer][31] SS(600000000131a2f0)Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=696)(KEY=OHASD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_UI_SOCKET))
2013-05-23 15:00:23.222: [UiServer][30] {0:0:47059} processMessage called
2013-05-23 15:00:23.223: [UiServer][30] {0:0:47059} Sending message to PE. ctx= 6000000001ade280, Client PID: 17977
2013-05-23 15:00:23.223: [UiServer][30] {0:0:47059} Sending command to PE: 46952
2013-05-23 15:00:23.224: [ CRSPE][29] {0:0:47059} Processing PE command id=47070. Description: [Stat Resource : 6000000001c9f370]
2013-05-23 15:00:23.249: [UiServer][30] {0:0:47059} Done for ctx=6000000001ade280
2013-05-23 15:00:23.251: [UiServer][31] Closed: remote end failed/disc.
2013-05-23 15:29:54.135: [GIPCHGEN][33]gipchaNodeAddInterface: adding interface information for inf 600000000120b950 { host '', haName 'CLSFRAME_crm-cluster', local 0000000000000000, ip '192.168.1.132', subnet '192.168.1.128', mask '255.255.255.224', mac '00-23-7d-3e-32-dc', ifname 'lan901', numRef 0, numFail 0, idxBoot 0, flags 0x1841 }
2013-05-23 15:29:54.152: [GIPCHTHR][32]gipchaWorkerCreateInterface: created local interface for node 'crm01', haName 'CLSFRAME_crm-cluster', inf 'udp://192.168.1.132:65383'
2013-05-23 15:29:54.152: [GIPCHTHR][32]gipchaWorkerCreateInterface: created local bootstrap multicast interface for node 'crm01', haName 'CLSFRAME_crm-cluster', inf 'mcast://224.0.0.251:42424/192.168.1.132'
2013-05-23 15:29:54.153: [GIPCHTHR][32]gipchaWorkerCreateInterface: created local bootstrap multicast interface for node 'crm01', haName 'CLSFRAME_crm-cluster', inf 'mcast://230.0.1.0:42424/192.168.1.132'
2013-05-23 15:29:54.153: [GIPCHTHR][32]gipchaWorkerCreateInterface: created local bootstrap broadcast interface for node 'crm01', haName 'CLSFRAME_crm-cluster', inf 'udp://192.168.1.159:42424'
2013-05-23 15:31:34.350: [GIPCHGEN][33]gipchaNodeAddInterface: adding interface information for inf 600000000120d830 { host '', haName 'CLSFRAME_crm-cluster', local 0000000000000000, ip '192.168.1.132', subnet '192.168.1.128', mask '255.255.255.224', mac '00-23-7d-3e-33-94', ifname 'lan901', numRef 0, numFail 0, idxBoot 0, flags 0x1841 }
2013-05-23 15:31:34.351: [GIPCHGEN][33]gipchaInterfaceFail: marking interface failing 600000000120b950 { host '', haName 'CLSFRAME_crm-cluster', local 0000000000000000, ip '192.168.1.132:65383', subnet '192.168.1.128', mask '255.255.255.224', mac '00-23-7d-3e-32-dc', ifname 'lan901', numRef 0, numFail 0, idxBoot 0, flags 0x184d }
2013-05-23 15:31:34.370: [GIPCHGEN][32]gipchaInterfaceDisable: disabling interface 600000000120b950 { host '', haName 'CLSFRAME_crm-cluster', local 0000000000000000, ip '192.168.1.132:65383', subnet '192.168.1.128', mask '255.255.255.224', mac '00-23-7d-3e-32-dc', ifname 'lan901', numRef 0, numFail 0, idxBoot 0, flags 0x19cd }
2013-05-23 15:31:34.370: [GIPCHDEM][32]gipchaWorkerCleanInterface: performing cleanup of disabled interface 600000000120b950 { host '', haName 'CLSFRAME_crm-cluster', local 0000000000000000, ip '192.168.1.132:65383', subnet '192.168.1.128', mask '255.255.255.224', mac '00-23-7d-3e-32-dc', ifname 'lan901', numRef 0, numFail 0, idxBoot 0, flags 0x19ed }
2013-05-23 15:31:34.371: [GIPCHTHR][32]gipchaWorkerCreateInterface: created local interface for node 'crm01', haName 'CLSFRAME_crm-cluster', inf 'udp://192.168.1.132:9013'
2013-05-23 15:31:34.371: [GIPCHTHR][32]gipchaWorkerCreateInterface: created local bootstrap multicast interface for node 'crm01', haName 'CLSFRAME_crm-cluster', inf 'mcast://224.0.0.251:42424/192.168.1.132'
2013-05-23 15:31:34.372: [GIPCHTHR][32]gipchaWorkerCreateInterface: created local bootstrap multicast interface for node 'crm01', haName 'CLSFRAME_crm-cluster', inf 'mcast://230.0.1.0:42424/192.168.1.132'
2013-05-23 15:31:34.372: [GIPCHTHR][32]gipchaWorkerCreateInterface: created local bootstrap broadcast interface for node 'crm01', haName 'CLSFRAME_crm-cluster', inf 'udp://192.168.1.159:42424'
2013-05-23 15:42:56.984: [UiServer][31] CS(600000000160d100)set Properties ( root,6000000001e9abd0)
2013-05-23 15:42:56.984: [UiServer][31] SS(600000000131a2f0)Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=713)(KEY=OHASD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_UI_SOCKET))
2013-05-23 15:42:57.002: [UiServer][30] {0:0:47060} processMessage called
2013-05-23 15:42:57.003: [UiServer][30] {0:0:47060} Sending message to PE. ctx= 6000000001ade230, Client PID: 19573
2013-05-23 15:42:57.003: [UiServer][30] {0:0:47060} Sending command to PE: 46953
2013-05-23 15:42:57.004: [ CRSPE][29] {0:0:47060} Processing PE command id=47071. Description: [Stat Resource : 6000000001c9f370]
2013-05-23 15:42:57.028: [UiServer][30] {0:0:47060} Done for ctx=6000000001ade230
2013-05-23 15:42:57.031: [UiServer][31] Closed: remote end failed/disc.
节点2ohasd.log 日志结果:
2013-05-23 14:54:14.385: [ AGFW][24] {0:16:2} Agfw Proxy Server sending message: RESTYPE_ADD[ora.ctss.type] ID 8196:404018 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.385: [ AGFW][24] {0:16:2} Agfw Proxy Server sending message: RESTYPE_ADD[ora.diskmon.type] ID 8196:404019 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.386: [ AGFW][24] {0:16:2} Agfw Proxy Server sending message: RESTYPE_ADD[ora.drivers.acfs.type] ID 8196:404020 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.386: [ AGFW][24] {0:16:2} Could not find the resource type [ ora.gns.type ]
2013-05-23 14:54:14.387: [ AGFW][24] {0:16:2} Agfw Proxy Server sending message: RESTYPE_ADD[ora.haip.type] ID 8196:404021 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.387: [ AGFW][24] {0:16:2} Could not find the resource type [ ora.network.type ]
2013-05-23 14:54:14.387: [ AGFW][24] {0:16:2} Could not find the resource type [ ora.registry.acfs.type ]
2013-05-23 14:54:14.388: [ AGFW][24] {0:16:2} Agfw Proxy Server sending message: RESOURCE_ADD[ora.cluster_interconnect.haip 1 1] ID 4356:404022 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.389: [ AGFW][24] {0:9:10} Agfw Proxy Server forwarding the message: RESOURCE_CLEAN[ora.cluster_interconnect.haip 1 1] ID 4100:403998 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.392: [ AGFW][24] {0:9:10} Agfw Proxy Server sending message: RESOURCE_ADD[ora.crsd 1 1] ID 4356:404024 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.393: [ AGFW][24] {0:9:10} Agfw Proxy Server sending message: RESOURCE_ADD[ora.ctssd 1 1] ID 4356:404025 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.394: [ AGFW][24] {0:9:10} Agfw Proxy Server sending message: RESOURCE_ADD[ora.diskmon 1 1] ID 4356:404026 to the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.394: [ AGFW][24] {0:9:10} Agfw Proxy Server replying to the message: AGENT_HANDSHAKE[Proxy] ID 20484:11
2013-05-23 14:54:14.422: [ AGFW][24] {0:16:2} Received the reply to the message: RESTYPE_ADD[ora.crf.type] ID 8196:404016 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.438: [ AGFW][24] {0:16:2} Received the reply to the message: RESTYPE_ADD[ora.crs.type] ID 8196:404017 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.452: [ AGFW][24] {0:16:2} Received the reply to the message: RESTYPE_ADD[ora.ctss.type] ID 8196:404018 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.466: [ AGFW][24] {0:16:2} Received the reply to the message: RESTYPE_ADD[ora.diskmon.type] ID 8196:404019 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.482: [ AGFW][24] {0:16:2} Received the reply to the message: RESTYPE_ADD[ora.drivers.acfs.type] ID 8196:404020 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.498: [ AGFW][24] {0:16:2} Received the reply to the message: RESTYPE_ADD[ora.haip.type] ID 8196:404021 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.502: [ AGFW][24] {0:16:2} Received the reply to the message: RESOURCE_ADD[ora.cluster_interconnect.haip 1 1] ID 4356:404022 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.503: [ AGFW][24] {0:9:10} Received the reply to the message: RESOURCE_ADD[ora.crsd 1 1] ID 4356:404024 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.504: [ AGFW][24] {0:9:10} Received the reply to the message: RESOURCE_CLEAN[ora.cluster_interconnect.haip 1 1] ID 4100:404023 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.505: [ AGFW][24] {0:9:10} Agfw Proxy Server sending the reply to PE for message:RESOURCE_CLEAN[ora.cluster_interconnect.haip 1 1] ID 4100:403998
2013-05-23 14:54:14.506: [ AGFW][24] {0:9:10} Received the reply to the message: RESOURCE_ADD[ora.ctssd 1 1] ID 4356:404025 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.506: [ CRSPE][29] {0:9:10} Received reply to action [Clean] message ID: 403998
2013-05-23 14:54:14.507: [ AGFW][24] {0:9:10} Received the reply to the message: RESOURCE_CLEAN[ora.cluster_interconnect.haip 1 1] ID 4100:404023 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.507: [ AGFW][24] {0:9:10} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_CLEAN[ora.cluster_interconnect.haip 1 1] ID 4100:403998
2013-05-23 14:54:14.508: [ AGFW][24] {0:9:10} Agfw Proxy Server received the message: AGENT_SHUTDOWN_REQUEST[Proxy] ID 20486:30
2013-05-23 14:54:14.508: [ AGFW][24] {0:9:10} Shutdown request received from /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.508: [ AGFW][24] {0:9:10} Shutdown request rejected for the agent: /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.508: [ AGFW][24] {0:9:10} Agfw Proxy Server replying to the message: AGENT_SHUTDOWN_REQUEST[Proxy] ID 20486:30
2013-05-23 14:54:14.508: [ CRSPE][29] {0:9:10} Received reply to action [Clean] message ID: 403998
2013-05-23 14:54:14.509: [ CRSPE][29] {0:9:10} RI [ora.cluster_interconnect.haip 1 1] new internal state: [STABLE] old value: [CLEANING]
2013-05-23 14:54:14.509: [ CRSPE][29] {0:9:10} RI [ora.cluster_interconnect.haip 1 1] new external state [OFFLINE] old value: [UNKNOWN] label = []
2013-05-23 14:54:14.509: [ CRSPE][29] {0:9:10} CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'crm02' succeeded
2013-05-23 14:54:14.510: [ CRSPE][29] {0:9:10} Sequencer for [ora.cluster_interconnect.haip 1 1] has completed with error: CRS-0215: Could not start resource 'ora.cluster_interconnect.haip'.
2013-05-23 14:54:14.512: [ CRSPE][29] {0:9:10} Failover cannot be completed for [ora.cluster_interconnect.haip 1 1]. Stopping it and the resource tree
2013-05-23 14:54:14.513: [ AGFW][24] {0:9:10} Received the reply to the message: RESOURCE_ADD[ora.diskmon 1 1] ID 4356:404026 from the agent /u03/oracle/app/11.2.0/grid/bin/orarootagent_root
2013-05-23 14:54:14.514: [ CRSPE][29] {0:9:10} Op 6000000001fc5280 has 4 WOs
2013-05-23 14:54:14.515: [ CRSPE][29] {0:9:10} PE Command [ Resource State Change (ora.cluster_interconnect.haip 1 1) : 60000000029a02a0 ] has completed
2013-05-23 14:54:14.516: [ AGFW][24] {0:9:10} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:404040
2013-05-23 14:54:14.516: [ AGFW][24] {0:9:10} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:404040
2013-05-23 14:54:14.517: [ AGFW][24] {0:9:10} Agfw received reply from PE for resource state change for ora.cluster_interconnect.haip 1 1
2013-05-23 14:57:24.371: [UiServer][31] CS(600000000222d750)set Properties ( grid,6000000001948690)
2013-05-23 14:57:24.371: [UiServer][31] SS(600000000135d780)Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=683)(KEY=OHASD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_UI_SOCKET))
2013-05-23 14:57:24.389: [UiServer][30] {0:0:47157} processMessage called
2013-05-23 14:57:24.391: [UiServer][30] {0:0:47157} Sending message to PE. ctx= 6000000002d06f40, Client PID: 4282
2013-05-23 14:57:24.391: [UiServer][30] {0:0:47157} Sending command to PE: 46938
2013-05-23 14:57:24.392: [ CRSPE][29] {0:0:47157} Processing PE command id=47052. Description: [Stat Resource : 6000000001b4ee60]
2013-05-23 14:57:24.418: [UiServer][30] {0:0:47157} Done for ctx=6000000002d06f40
2013-05-23 14:57:24.439: [UiServer][31] Closed: remote end failed/disc.
2013-05-23 15:01:16.379: [UiServer][31] CS(600000000222d750)set Properties ( grid,6000000001948690)
2013-05-23 15:01:16.380: [UiServer][31] SS(600000000135d780)Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=683)(KEY=OHASD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_UI_SOCKET))
2013-05-23 15:01:16.397: [UiServer][30] {0:0:47158} processMessage called
2013-05-23 15:01:16.399: [UiServer][30] {0:0:47158} Sending message to PE. ctx= 6000000002d07620, Client PID: 4328
2013-05-23 15:01:16.399: [UiServer][30] {0:0:47158} Sending command to PE: 46939
2013-05-23 15:01:16.400: [ CRSPE][29] {0:0:47158} Processing PE command id=47053. Description: [Stat Resource : 6000000001b4ee60]
2013-05-23 15:01:16.425: [UiServer][30] {0:0:47158} Done for ctx=6000000002d07620
2013-05-23 15:01:16.427: [UiServer][31] Closed: remote end failed/disc.
2013-05-23 15:43:12.236: [UiServer][31] CS(600000000222d750)set Properties ( grid,6000000001948690)
2013-05-23 15:43:12.237: [UiServer][31] SS(600000000135d780)Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=683)(KEY=OHASD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_UI_SOCKET))
2013-05-23 15:43:12.254: [UiServer][30] {0:0:47159} processMessage called
2013-05-23 15:43:12.255: [UiServer][30] {0:0:47159} Sending message to PE. ctx= 6000000002d07260, Client PID: 5200
2013-05-23 15:43:12.255: [UiServer][30] {0:0:47159} Sending command to PE: 46940
2013-05-23 15:43:12.256: [ CRSPE][29] {0:0:47159} Processing PE command id=47054. Description: [Stat Resource : 6000000001b4ee60]
2013-05-23 15:43:12.281: [UiServer][30] {0:0:47159} Done for ctx=6000000002d07260
2013-05-23 15:43:12.303: [UiServer][31] Closed: remote end failed/disc.
此时需要在节点1上启动处于offline状态的ora.crsd资源:
/u03/oracle/app/11.2.0/grid/bin/crsctl start res ora.crsd -init
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl start res ora.crsd -init
CRS-2672: Attempting to start 'ora.crsd' on 'crm01'
CRS-2676: Start of 'ora.crsd' on 'crm01' succeeded
然后再在节点2上手工启动 cssd资源:
/u03/oracle/app/11.2.0/grid/bin/crsctl start res ora.cssd -init
crm02:/# /u03/oracle/app/11.2.0/grid/bin/crsctl start res ora.cssd -init
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'crm02'
CRS-2676: Start of 'ora.cssdmonitor' on 'crm02' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'crm02'
CRS-2672: Attempting to start 'ora.diskmon' on 'crm02'
CRS-2676: Start of 'ora.diskmon' on 'crm02' succeeded
CRS-2676: Start of 'ora.cssd' on 'crm02' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'crm02'
CRS-2672: Attempting to start 'ora.ctssd' on 'crm02'
CRS-2676: Start of 'ora.ctssd' on 'crm02' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'crm02'
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'crm02' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'crm02'
CRS-2676: Start of 'ora.evmd' on 'crm02' succeeded
CRS-2676: Start of 'ora.asm' on 'crm02' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'crm02'
CRS-2676: Start of 'ora.crsd' on 'crm02' succeeded
测试效果总结
如果网卡配置冗余,那么拔除1根网线不会发生问题。网线故障会对集群和数据库可用性造成影响。
内网网络中断的情况下,CRS将会侦测到集群分裂,由于集群发生脑裂故障,数据库无法使用,在300秒(misscount 600秒的50%)的时候,集群将会hung住,并且于300秒后故障集群被踢出,此时无故障节点恢复使用。300秒后,故障节点(2节点)被踢出集群,集群进入cleanup 和graceful shutdown 阶段。
经过多次测试发现:
(1).只要在故障时间内将网线恢复,故障排除,集群将会自动侦测到并且重新加入集群,恢复所有的资源和数据库实例。
(2).如果网络中断时间过程(如大于1小时),那么首先启动一下1节点可能处于offline状态的ora.crsd资源;然后再启动2号节点的ora.cssd 资源。
提请用户注意:
网络故障恢复后查看是否数据库实例已经正常启动,如果没有恢复需要手动恢复数据库实例。
此时不仅需要用srvctl status database -d oraCRM 来确定数据库状态,而且需要在每个节点通过 sqlplus, ps -ef |grep pmon 和查看alert log 来确定数据库的状态。
下阶段任务:
(1).3-7277044651 进行进一步分析
(2).安装PSU6 ,7月份安装PSU7.
ORACLE 监听程序crash模拟测试
模拟操作步骤
采用 ‘kill -9 listener进程ID’模拟Listener进程失效
预期测试结果
GI侦测到监听进程异常,会自动尝试重启监听服务。监听将在很短的时间之内重新启动。
原有连接不受影响
新的连接请求重定向至其他节点上存活的Listener上
当前节点上运行的实例仍可以通过其他节点上尚存活的Listener接受新的请求。
测量过程记录
kill 一节点监听
$date
Thu May 23 17:01:19 EAT 2013
ps -ef|grep tns
$ kill -9 15208
kill 第1节点监听 :
grid@crm01> ps -ef |grep tns
grid 20248 1 0 16:01:13 ? 0:00 /u03/oracle/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid 10933 1 0 11:49:19 ? 0:00 /u03/oracle/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
root 22582 16416 0 17:01:31 pts/0 0:00 grep tns
crm01: >kill -9 10933
系统检测到监听异常,并立刻进行修复。
ps -ef |grep tns
grid 20248 1 0 16:01:13 ? 0:00 /u03/oracle/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
root 22645 16416 0 17:02:45 pts/0 0:00 grep tns
grid 22626 1 0 17:02:26 ? 0:00 /u03/oracle/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
2013-05-22 17:02 ,kill监听后发现listener已经启动起来.
测试效果总结
监听程序被kill造成丢失会被集群侦测到,集群会自动重启监听服务。对系统生产无明显影响。GI会侦测到监听进程异常,并且会立即尝试重启恢复,前后时间不超过5秒。
ORACLE GI故障模拟测试
OHASD进程crash模拟测试
模拟操作步骤
采用 ‘kill -9 ohasd进程ID’模拟OHASD进程失效
预期测试结果
OHASD进程被重启
测量过程记录
检查服务状态和crsd进程状态
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRM1GIDG.dg
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.LISTENER.lsnr
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.asm
ONLINE ONLINE crm01 Started
ONLINE ONLINE crm02 Started
ora.gsd
OFFLINE OFFLINE crm01
OFFLINE OFFLINE crm02
ora.net1.network
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.ons
ONLINE ONLINE crm01
ONLINE ONLINE crm02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE crm01
ora.cvu
1 ONLINE ONLINE crm01
ora.oc4j
1 OFFLINE OFFLINE
ora.scan1.vip
1 ONLINE ONLINE crm01
ora.oracrm.db
1 ONLINE ONLINE crm01 Open
2 ONLINE ONLINE crm02 Open
ora.crm01.vip
1 ONLINE ONLINE crm01
ora.crm02.vip
1 ONLINE ONLINE crm02
$ps -ef |grep ohasd
root 22797 16416 1 17:05:12 pts/0 0:00 grep ohasd
root 5846 1 0 Apr 26 ? 412:36 /u03/oracle/app/11.2.0/grid/bin/ohasd.bin reboot
root 4583 1 0 Apr 26 ? 0:00 /bin/sh /sbin/init.d/init.ohasd run
杀掉crsd进程
$date
Thu May 23 17:05:33 EAT 2013
$kill -9 4583
crm01:/# ps -ef|grep ohasd
crm01:/# ps -ef|grep ohasd
root 5846 1 0 Apr 26 ? 412:36 /u03/oracle/app/11.2.0/grid/bin/ohasd.bin reboot
root 22823 16416 0 17:05:54 pts/0 0:00 grep ohasd
root 22805 1 0 17:05:49 ? 0:00 /bin/sh /sbin/init.d/init.ohasd run
服务立刻重启。检查ohasd.log日志,可以看到在极短时间内,ohasd启动并且恢复完成
vi /u03/oracle/app/11.2.0/grid/log/crm01/ohasd/ohasd.log
2013-05-23 17:05:49.466: [UiServer][31] CS(600000000160cfd0)set Properties ( root,6000000001eb70b0)
2013-05-23 17:05:49.466: [UiServer][31] SS(600000000131a2f0)Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=715)(KEY=
OHASD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_UI_SOCKET))
2013-05-23 17:05:49.482: [UiServer][30] {0:0:47152} processMessage called
2013-05-23 17:05:49.483: [UiServer][30] {0:0:47152} Sending message to PE. ctx= 6000000001ade320, Client PID: 22819
2013-05-23 17:05:49.483: [UiServer][30] {0:0:47152} Sending command to PE: 47045
2013-05-23 17:05:49.485: [ CRSPE][29] {0:0:47152} Processing PE command id=47171. Description: [Stat Resource : 6000000001c9f370]
2013-05-23 17:05:49.486: [ CRSPE][29] {0:0:47152} Expression Filter : (((NAME == ora.crsd) OR (NAME == ora.cssd)) OR (NAME == ora.
evmd))
2013-05-23 17:05:49.493: [UiServer][30] {0:0:47152} Done for ctx=6000000001ade320
2013-05-23 17:05:49.502: [UiServer][31] Closed: remote end failed/disc.
2013-05-23 17:06:22.794: [UiServer][31] CS(600000000160cfd0)set Properties ( grid,6000000001eb70b0)
2013-05-23 17:06:22.794: [UiServer][31] SS(600000000131a2f0)Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=715)(KEY=
OHASD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_UI_SOCKET))
2013-05-23 17:06:22.812: [UiServer][30] {0:0:47153} processMessage called
2013-05-23 17:06:22.813: [UiServer][30] {0:0:47153} Sending message to PE. ctx= 6000000001ade280, Client PID: 21493
2013-05-23 17:06:22.813: [UiServer][30] {0:0:47153} Sending command to PE: 47046
2013-05-23 17:06:22.814: [ CRSPE][29] {0:0:47153} Processing PE command id=47172. Description: [Stat Resource : 6000000001c9f370]
2013-05-23 17:06:22.825: [UiServer][30] {0:0:47153} Done for ctx=6000000001ade280
2013-05-23 17:06:22.832: [UiServer][31] Closed: remote end failed/disc.
测试效果总结
OHASD进程重启的时间:<1S
CRSD进程crash模拟测试
CRSD 进程失败是由 orarootagent检测并重启的,可以检查日志::
$/u03/oracle/app/11.2.0/grid/log/orajdcpt*/crsd/crsd.log
$/u03/oracle/app/11.2.0/grid/log/orajdcpt*/agent/ohasd/orarootagent_root/orarootagent_root.log
模拟操作步骤
采用 ‘kill -9 crsd进程ID’模拟CRSD进程失效
预期测试结果
CRSD进程立刻被重启,时间:<1S
测量过程记录
检查服务状态和crsd进程状态
$ crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE crm01 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE crm01
ora.crsd
1 ONLINE ONLINE crm01
ora.cssd
1 ONLINE ONLINE crm01
ora.cssdmonitor
1 ONLINE ONLINE crm01
ora.ctssd
1 ONLINE ONLINE crm01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE crm01
ora.gipcd
1 ONLINE ONLINE crm01
ora.gpnpd
1 ONLINE ONLINE crm01
ora.mdnsd
1 ONLINE ONLINE crm01
$
$date
Thu Mar 14 11:37:58 EAT 2013
$ps -ef |grep crsd
crm01:/# ps -ef |grep crsd
root 8661 6356 0 15:30:47 pts/3 0:00 grep crsd
root 7223 1 0 15:05:27 ? 0:21 /u03/oracle/app/11.2.0/grid/bin/crsd.bin reboot
杀掉crsd进程:
$ kill -9 7223
服务立刻重启。检查crsd.log日志,可以看到在极短时间内,crsd启动并且恢复完成
vi /u03/oracle/app/11.2.0/grid/log/crm01/crsd/crsd.log
crm01:/# ps -ef|grep crsd
root 9166 6356 1 15:37:01 pts/3 0:00 grep crsd
root 9059 1 0 15:35:53 ? 0:03 /u03/oracle/app/11.2.0/grid/bin/crsd.bin reboot
测试效果总结
crsd进程一旦异常,数据库集群将会自动进行修复并且重新启动,用时在10秒钟之内。整个数据库系统对外提供服务不受影响。
OCSSD进程crash模拟测试
模拟操作步骤
采用 ‘kill -9 ocssd进程ID’模拟OCSSD进程失效
预期测试结果
该节点将重启,启动GI会启动数据库
测试过程记录
杀掉一节点ocssd进程
$date
Thu May 23 17:16:44 EAT 2013
ps -ef|grep ocssd
crm01:/# ps -ef|grep ocssd
root 23495 16872 0 17:16:57 pts/1 0:00 grep ocssd
grid 5898 1 0 Apr 26 ? 695:24 /u03/oracle/app/11.2.0/grid/bin/ocssd.bin
$kill -9 5898
可以看到,一节点立刻重启,vip切换到二节点。
[oracle@node2]$ crsctl stat res -t
$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRM1GIDG.dg
ONLINE ONLINE crm02
ora.LISTENER.lsnr
ONLINE ONLINE crm02
ora.asm
ONLINE ONLINE crm02 Started
ora.gsd
OFFLINE OFFLINE crm02
ora.net1.network
ONLINE ONLINE crm02
ora.ons
ONLINE ONLINE crm02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE crm02
ora.cvu
1 ONLINE ONLINE crm02
ora.oc4j
1 OFFLINE OFFLINE
ora.scan1.vip
1 ONLINE ONLINE crm02
ora.oracrm.db
1 ONLINE OFFLINE
2 ONLINE ONLINE crm02 Open
ora.crm01.vip
1 ONLINE INTERMEDIATE crm02 FAILED OVER
ora.crm02.vip
1 ONLINE ONLINE crm02
crm02:/# netstat -in
$ netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
lo0 32808 127.0.0.0 127.0.0.1 9640532 0 9640181 0 0
lan900:1 1500 192.168.1.96 192.168.1.104 58 0 57 0 0
lan900:3 1500 192.168.1.96 192.168.1.102 0 0 0 0 0
lan900:2 1500 192.168.1.96 192.168.1.101 0 0 0 0 0
lan901 1500 192.168.1.128 192.168.1.133 34154630 0 33044797 0 0
lan900 1500 192.168.1.96 192.168.1.103 15398402 0 14346964 0 0
lan901:8 1500 169.254.0.0 169.254.135.57 211638715 0 105058153 0 0
Warning: The above name 'lan901:801' is truncated, use -w to show the output in wide format
2节点 ocssd.log 日志中记载:
2013-05-23 17:17:32.772: [ CSSD][24]clssnm_skgxncheck: checking up on reconfig
2013-05-23 17:17:32.772: [ CSSD][24]clssnm_skgxncheck: node(1) is off in supermap
2013-05-23 17:17:32.772: [ CSSD][24]clssnm_skgxncheck: CSS daemon failed on node 1
2013-05-23 17:17:32.772: [ CSSD][24]clssnmMarkNodeForRemoval: node 1, crm01 marked for removal
2013-05-23 17:17:32.772: [ CSSD][24]clssnmDiscHelper: crm01, node(1) connection failed, endp (0000000000000000), probe(0000000
000000000), ninf->endp 00000000000005f5
2013-05-23 17:17:32.772: [ CSSD][24]clssnmDiscHelper: node 1 clean up, endp (00000000000005f5), init state 5, cur state 5
2013-05-23 17:17:32.772: [ CSSD][24]clssnmDiscHelper: endp_nmnode (00000000000005f5); endp(NULL).
2013-05-23 17:17:32.772: [ CSSD][24]clssnm_skgxncheck: finished skgxn check. 1 fatality
2013-05-23 17:17:32.772: [ CSSD][39]clssnmDoSyncUpdate: Initiating sync 261665314
2013-05-23 17:17:32.772: [ CSSD][39]clssscCompareSwapEventValue: changed NMReconfigInProgress val 2, from -1, changes 4
2013-05-23 17:17:32.772: [ CSSD][39]clssnmDoSyncUpdate: local disk timeout set to 597000 ms, remote disk timeout set to 597000
2013-05-23 17:17:32.772: [ CSSD][39]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will take effec
t when the sync is completed.
2013-05-23 17:17:32.772: [ CSSD][39]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 261665314
2013-05-23 17:17:32.772: [ CSSD][39]clssnmSetupAckWait: Ack message type (11)
2013-05-23 17:17:32.772: [ CSSD][39]clssnmSetupAckWait: node(2) is ALIVE
2013-05-23 17:17:32.772: [ CSSD][39]clssnmSendSync: syncSeqNo(261665314), indicating EXADATA fence initialization complete
2013-05-23 17:17:32.772: [ CSSD][39]List of nodes that have ACKed my sync: NULL
2013-05-23 17:17:32.772: [ CSSD][39]clssnmSendSync: syncSeqNo(261665314)
2013-05-23 17:17:32.772: [ CSSD][39]clssnmWaitForAcks: Ack message type(11), ackCount(1)
2013-05-23 17:17:32.772: [ CSSD][40]clssnmHandleSync: Node crm02, number 2, is EXADATA fence capable
2013-05-23 17:17:32.772: [ CSSD][40]clssscUpdateEventValue: NMReconfigInProgress val 2, changes 5
2013-05-23 17:17:32.772: [ CSSD][40]clssnmHandleSync: local disk timeout set to 597000 ms, remote disk timeout set to 597000
2013-05-23 17:17:32.772: [ CSSD][40]clssnmHandleSync: initleader 2 newleader 2
2013-05-23 17:17:32.772: [ CSSD][40]clssnmQueueClientEvent: Sending Event(2), type 2, incarn 261665313
2013-05-23 17:17:32.772: [ CSSD][40]clssnmQueueClientEvent: Node[1] state = 5, birth = 261665308, unique = 1366965579
2013-05-23 17:17:32.772: [ CSSD][40]clssnmQueueClientEvent: Node[2] state = 3, birth = 261665313, unique = 1369298667
2013-05-23 17:17:32.773: [ CSSD][40]clssnmHandleSync: Acknowledging sync: src[2] srcName[crm02] seq[1] sync[261665314]
2013-05-23 17:17:32.773: [ CSSD][40]clssnmSendAck: node 2, crm02, syncSeqNo(261665314) type(11)
2013-05-23 17:17:32.773: [ CSSD][1]clssgmStartNMMon: node 1 active, birth 261665308
2013-05-23 17:17:32.773: [ CSSD][1]clssgmStartNMMon: node 2 active, birth 261665313
2013-05-23 17:17:32.773: [ CSSD][1]NMEVENT_SUSPEND [00][00][00][06]
2013-05-23 17:17:32.773: [ CSSD][40]clssnmHandleAck: Received ack type 11 from node crm02, number 2, with seq 0 for sync 26166
5314, waiting for 0 acks
2013-05-23 17:17:32.773: [ CSSD][39]clssnmSendSync: syncSeqNo(261665314), indicating EXADATA fence initialization complete
2013-05-23 17:17:32.773: [ CSSD][1]clssgmUpdateEventValue: CmInfo State val 5, changes 12
2013-05-23 17:17:32.773: [ CSSD][39]List of nodes that have ACKed my sync: 2
2013-05-23 17:17:32.773: [ CSSD][1]clssgmSuspendAllGrocks: Issue SUSPEND
2013-05-23 17:17:32.773: [ CSSD][39]clssnmWaitForAcks: done, syncseq(261665314), msg type(11)
2013-05-23 17:17:32.773: [ CSSD][39]clssnmSetMinMaxVersion:node2 product/protocol (11.2/1.4)
2013-05-23 17:17:32.773: [ CSSD][39]clssnmSetMinMaxVersion: properties common to all nodes: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
6,17
2013-05-23 17:17:32.773: [ CSSD][1]clssgmQueueGrockEvent: groupName(IGoraCRMoracrm) count(2) master(2) event(2), incarn 8, mbrc 2,
to member 2, events 0x0, state 0x0
2013-05-23 17:17:32.773: [ CSSD][39]clssnmSetMinMaxVersion: min product/protocol (11.2/1.4)
2013-05-23 17:17:32.773: [ CSSD][39]clssnmSetMinMaxVersion: max product/protocol (11.2/1.4)
2013-05-23 17:17:32.773: [ CSSD][39]clssnmNeedConfReq: No configuration to change
2013-05-23 17:17:32.773: [ CSSD][39]clssnmDoSyncUpdate: Terminating node 1, crm01, misstime(18027) state(5)
2013-05-23 17:17:32.773: [ CSSD][39]clssnmDoSyncUpdate: Wait for 0 vote ack(s)
2013-05-23 17:17:32.773: [ CSSD][39]clssnmCheckDskInfo: Checking disk info...
2013-05-23 17:17:32.773: [ CSSD][1]clssgmQueueGrockEvent: groupName(IG+ASMSYS$USERS) count(2) master(2) event(2), incarn 4, mbrc
2, to member 2, events 0x0, state 0x0
2013-05-23 17:17:32.773: [ CSSD][39]clssnmRemove: Start
2013-05-23 17:17:32.773: [ CSSD][39]clssnmrRemoveNode: Removing node 1, crm01, from the cluster in incarnation 261665314, node
birth incarnation 261665308, death incarnation 261665314, stateflags 0x262000 uniqueness value 1366965579
2013-05-23 17:17:32.773: [ CSSD][1]clssgmQueueGrockEvent: groupName(crs_version) count(3) master(0) event(2), incarn 13, mbrc 3,
to member 2, events 0x0, state 0x0
2013-05-23 17:17:32.773: [ CSSD][1]clssgmQueueGrockEvent: groupName(IGoraCRMSYS$USERS) count(2) master(2) event(2), incarn 8, mbrc
2, to member 2, events 0x0, state 0x0
2013-05-23 17:17:32.773: [ CSSD][1]clssgmQueueGrockEvent: groupName(CLSN.ONSPROC.MASTER) count(2) master(2) event(2), incarn 4, m
brc 2, to member 2, events 0xa0, state 0x0
2013-05-23 17:17:32.774: [ default][39]kgzf_gen_node_reid2: generated reid cid=9a7a00e5d7a26fb5bf9c1006af836554,icin=261665308,nmn=1
,lnid=261665308,gid=0,gin=0,gmn=0,umemid=0,opid=0,opsn=0,lvl=node hdr=0xfece0100
2013-05-23 17:17:32.774: [ CSSD][1]clssgmQueueGrockEvent: groupName(DB+ASM) count(2) master(1) event(2), incarn 5, mbrc 2, to mem
ber 1, events 0x68, state 0x0
2013-05-23 17:17:32.774: [ CSSD][39]clssnmrFenceSage: Fenced node crm01, number 1, with EXADATA, handle 0
2013-05-23 17:17:32.774: [ CSSD][39]clssnmrRemoveNode: Bypassing node 1, birth 261665308, death 261665314, impendingRcfg 0, LATSv
alid 1, flags 0x2e2000m timeout 82012, joining 0
说明:本次测试主机syslog.log重启前内容已经清理。服务器重启完毕后会自动启动相关进程并且加入集群,恢复数据库实例。
测试效果总结
ocssd进程是数据库集群运行的核心进程之一。Kill ocssd进程会造成主机立刻重启。重启之后nodeapps 会自动重启,VIP会切换回原节点,GI会自动启动所有相关进程,恢复数据库实例。
EVMD进程crash模拟测试
EVMD 进程失败是由 OHASD orarootagent检测并重启的,可以检查日志::
$/u03/oracle/app/11.2.0/grid/log/orajdcpt*/evmd/evmd.log
$/u03/oracle/app/11.2.0/grid/log/orajdcpt*/agent/ohasd/oraagent_grid /oraagent_grid.log
模拟操作步骤
采用 ‘kill -9 evmd进程ID’模拟EVMD进程失效
预期测试结果
EVMD进程将被重启
测量过程记录
查看evmd进程状态
$ps -ef |grep evmd
root 1007 1 0 Apr 12 ? 0:03 /usr/sbin/evmd
grid 5922 1 0 Apr 26 ? 298:57 /u03/oracle/app/11.2.0/grid/bin/evmd.bin
root 23392 16872 1 17:15:21 pts/1 0:00 grep evmd
kill evmd进程
$date
Thu May 23 17:15:45 EAT 2013
$kill -9 5922
立刻检查evmd进程,可以看到evmd进程已经重启
crm01:/# ps -ef|grep evmd
root 1007 1 0 Apr 12 ? 0:03 /usr/sbin/evmd
grid 23439 1 0 17:16:04 ? 0:00 /u03/oracle/app/11.2.0/grid/bin/evmd.bin
root 23482 16872 0 17:16:13 pts/1 0:00 grep evmd
evmd进程恢复时间不超过3秒。
二节点结果与一节点相同,evmd进程在被杀死之后将很快重启。
测试效果总结
evmd进程是集群服务的事件记录进程,该进程被kill情况下将会立刻修复,对系统不造成影响。
ORACLE数据库实例故障模拟测试
数据库实例故障模拟测试
模拟操作步骤
启动客户应用
在负载较大的数据库节点上采用’kill -9 pmon进程’的命令异常终止pmon后台进程
预期测试结果
RAC集群中存活的数据库实例将进行实例恢复
服务将切换到存活的的数据库实例上
取决于客户端的类型及连接配置,客户连接将切换到存活的实例上(SQL*PLUS的连接能够自动连切换到存活节点上)
在短暂的负载加载后,存活的数据库实例应继续提供服务
GI应重启失败的数据库实例
测量过程记录
节点1:
ps -ef|grep ora_pmon
crm01:/# ps -ef|grep ora_pmon
oracle 19965 1 0 May 22 ? 3:26 ora_pmon_oracrm1
root 22892 16416 1 17:08:09 pts/0 0:00 grep ora_pmon
kill -9 19965
kill 掉pmon 进程后 alert 日志显示如下
tail -f /u03/oracle/app/oracle/diag/rdbms/oracrm/oracrm*/trace/alert*.log
Thu May 23 17:08:31 2013
System state dump requested by (instance=1, osid=20273 (SMCO)), summary=[abnormal instance termination].
System State dumped to trace file /u03/oracle/app/oracle/diag/rdbms/oracrm/oracrm1/trace/oracrm1_diag_19991.trc
Thu May 23 17:08:31 2013
SMCO (ospid: 20273): terminating the instance due to error 472
Thu May 23 17:08:32 2013
ORA-1092 : opitsk aborting process
Thu May 23 17:08:32 2013
License high water mark = 44
Instance terminated by SMCO, pid = 20273
USER (ospid: 22918): terminating the instance
Instance terminated by USER, pid = 22918
Thu May 23 17:08:58 2013
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'lan901:801' configured from GPnP for use as a private interconnect.
[name='lan901:801', type=1, ip=169.254.124.177, mac=00-23-7d-3e-33-94, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'lan900:1' configured from GPnP for use as a public interface.
[name='lan900:1', type=1, ip=192.168.1.101, mac=00-23-7d-3e-33-03, net=192.168.1.96/27, mask=255.255.255.224, use=public/1]
Public Interface 'lan900:2' configured from GPnP for use as a public interface.
[name='lan900:2', type=1, ip=192.168.1.102, mac=00-23-7d-3e-33-03, net=192.168.1.96/27, mask=255.255.255.224, use=public/1]
Public Interface 'lan900' configured from GPnP for use as a public interface.
[name='lan900', type=1, ip=192.168.1.100, mac=00-23-7d-3e-33-03, net=192.168.1.96/27, mask=255.255.255.224, use=public/1]
Picked latch-free SCN scheme 3
Autotune of undo retention is turned off.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
ORACLE_HOME = /u03/oracle/app/oracle/product/11.2
System name: HP-UX
Node name: crm01
Release: B.11.31
Version: U
Machine: ia64
Using parameter settings in server-side pfile /u03/oracle/app/oracle/product/11.2/dbs/initoracrm1.ora
System parameters with non-default values:
processes = 9000
timed_statistics = TRUE
resource_limit = FALSE
event = "10949 TRACE NAME CONTEXT FOREVER:28401 trace name context forever, level 1"
sga_max_size = 80G
_enable_NUMA_optimization= FALSE
shared_pool_size = 11520M
large_pool_size = 1G
java_pool_size = 1G
streams_pool_size = 256M
shared_pool_reserved_size= 336M
_kghdsidx_count = 3
spfile = "+CRM1GIDG/oracrm/parameterfile/spfileoracrm.ora"
disk_asynch_io = TRUE
resource_manager_plan = ""
control_files = "/dev/vgsys/rcontrol_01.dbf"
control_files = "/dev/vgsys/rcontrol_02.dbf"
control_files = "/dev/vgsys/rcontrol_03.dbf"
control_file_record_keep_time= 60
db_block_size = 8192
db_cache_size = 56G
_db_block_numa = 1
db_keep_cache_size = 8G
db_writer_processes = 24
db_cache_advice = "OFF"
compatible = "10.2.0.3.0"
log_archive_dest = "/arch"
log_archive_format = "arch_%t_%s_%r.arc"
log_archive_max_processes= 9
log_buffer = 15728640
log_checkpoint_interval = 0
log_checkpoint_timeout = 300
db_files = 3000
db_file_multiblock_read_count= 16
_db_file_optimizer_read_count= 16
cluster_database = TRUE
cluster_database_instances= 2
_gc_policy_time = 0
_gc_defer_time = 3
thread = 1
fast_start_mttr_target = 300
log_checkpoints_to_alert = TRUE
recovery_parallelism = 8
_datafile_write_errors_crash_instance= FALSE
_cleanup_rollback_entries= 400
undo_management = "AUTO"
undo_tablespace = "UNDOTBS1"
_smu_debug_mode = 33554432
_undo_autotune = FALSE
undo_retention = 16200
instance_number = 1
sec_case_sensitive_logon = FALSE
O7_DICTIONARY_ACCESSIBILITY= TRUE
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
distributed_lock_timeout = 59
instance_name = "oracrm1"
session_cached_cursors = 100
job_queue_processes = 10
_dbms_sql_security_level = 0
cursor_sharing = "FORCE"
parallel_max_servers = 640
parallel_instance_group = "oracrm1"
parallel_execution_message_size= 16384
_PX_use_large_pool = TRUE
_fix_control = "8893626:off"
_fix_control = "9195582:off"
_fix_control = "9344709:OFF"
core_dump_dest = "/u03/oracle/app/oracle/diag/rdbms/oracrm/oracrm1/cdump"
hpux_sched_noage = 178
session_max_open_files = 1000
optimizer_features_enable= "11.2.0.3"
audit_trail = "TRUE"
sort_area_size = 2000000
sort_area_retained_size = 1000000
db_name = "oracrm"
open_cursors = 1500
_optimizer_null_aware_antijoin= FALSE
_b_tree_bitmap_plans = FALSE
star_transformation_enabled= "FALSE"
_simple_view_merging = TRUE
_push_join_predicate = FALSE
parallel_adaptive_multi_user= FALSE
optimizer_index_cost_adj = 40
_nlj_batching_enabled = 0
_nlj_batching_misses_enabled= 0
query_rewrite_enabled = "TRUE"
_index_join_enabled = TRUE
_like_with_bind_as_equality= FALSE
_optim_peek_user_binds = FALSE
pga_aggregate_target = 8G
workarea_size_policy = "AUTO"
_mv_refresh_use_stats = TRUE
_mv_refresh_use_hash_sj = FALSE
_connect_by_use_union_all= "OLD_PLAN_MODE"
statistics_level = "TYPICAL"
_optimizer_mjc_enabled = FALSE
_bloom_filter_enabled = FALSE
_gby_hash_aggregation_enabled= FALSE
optimizer_secure_view_merging= FALSE
_bloom_pruning_enabled = FALSE
_optimizer_skip_scan_guess= FALSE
deferred_segment_creation= FALSE
optimizer_capture_sql_plan_baselines= FALSE
parallel_force_local = TRUE
_optimizer_use_feedback = FALSE
diagnostic_dest = "/u03/oracle/app/oracle"
max_dump_file_size = "1048576000"
trace_enabled = TRUE
Cluster communication is configured to use the following interface(s) for this instance
169.254.124.177
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Thu May 23 17:09:57 2013
PMON started with pid=2, OS id=22962
Thu May 23 17:09:57 2013
PSP0 started with pid=3, OS id=22977
Thu May 23 17:09:58 2013
VKTM started with pid=4, OS id=22979 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
Thu May 23 17:09:58 2013
GEN0 started with pid=5, OS id=22981
Thu May 23 17:09:58 2013
DIAG started with pid=6, OS id=22983
Thu May 23 17:09:58 2013
DBRM started with pid=7, OS id=22985
Thu May 23 17:09:59 2013
PING started with pid=8, OS id=22987
Thu May 23 17:09:59 2013
ACMS started with pid=9, OS id=22989
Thu May 23 17:09:59 2013
DIA0 started with pid=10, OS id=22991
Thu May 23 17:09:59 2013
LMON started with pid=11, OS id=22993
Thu May 23 17:09:59 2013
LMD0 started with pid=12, OS id=22995
* Load Monitor used for high load check
* New Low - High Load Threshold Range = [30720 - 40960]
Thu May 23 17:09:59 2013
LMS0 started with pid=13, OS id=22997 at elevated priority
Thu May 23 17:09:59 2013
LMS1 started with pid=14, OS id=22999 at elevated priority
Thu May 23 17:09:59 2013
LMS2 started with pid=15, OS id=23001 at elevated priority
Thu May 23 17:09:59 2013
RMS0 started with pid=16, OS id=23003
Thu May 23 17:09:59 2013
LMHB started with pid=17, OS id=23005
Thu May 23 17:09:59 2013
MMAN started with pid=18, OS id=23007
Thu May 23 17:09:59 2013
DBW0 started with pid=19, OS id=23009
Thu May 23 17:09:59 2013
DBW1 started with pid=20, OS id=23011
Thu May 23 17:09:59 2013
DBW2 started with pid=21, OS id=23014
Thu May 23 17:10:00 2013
DBW3 started with pid=22, OS id=23016
Thu May 23 17:10:00 2013
DBW4 started with pid=23, OS id=23018
Thu May 23 17:10:00 2013
DBW5 started with pid=24, OS id=23020
Thu May 23 17:10:00 2013
DBW6 started with pid=25, OS id=23022
Thu May 23 17:10:00 2013
DBW7 started with pid=26, OS id=23024
Thu May 23 17:10:00 2013
DBW8 started with pid=27, OS id=23026
Thu May 23 17:10:00 2013
DBW9 started with pid=28, OS id=23028
Thu May 23 17:10:00 2013
DBWa started with pid=29, OS id=23030
Thu May 23 17:10:00 2013
DBWb started with pid=30, OS id=23032
Thu May 23 17:10:00 2013
DBWc started with pid=31, OS id=23034
Thu May 23 17:10:00 2013
DBWd started with pid=32, OS id=23036
Thu May 23 17:10:00 2013
DBWe started with pid=33, OS id=23038
Thu May 23 17:10:01 2013
DBWf started with pid=34, OS id=23040
Thu May 23 17:10:01 2013
DBWg started with pid=35, OS id=23042
Thu May 23 17:10:01 2013
DBWh started with pid=36, OS id=23044
Thu May 23 17:10:01 2013
DBWi started with pid=37, OS id=23046
Thu May 23 17:10:01 2013
DBWj started with pid=38, OS id=23048
Thu May 23 17:10:01 2013
DBWk started with pid=39, OS id=23050
Thu May 23 17:10:01 2013
DBWl started with pid=40, OS id=23052
Thu May 23 17:10:01 2013
DBWm started with pid=41, OS id=23054
Thu May 23 17:10:01 2013
DBWn started with pid=42, OS id=23056
Thu May 23 17:10:01 2013
LGWR started with pid=43, OS id=23058
Thu May 23 17:10:01 2013
CKPT started with pid=44, OS id=23060
Thu May 23 17:10:01 2013
SMON started with pid=45, OS id=23062
Thu May 23 17:10:01 2013
RECO started with pid=46, OS id=23064
Thu May 23 17:10:02 2013
MMON started with pid=47, OS id=23066
Thu May 23 17:10:02 2013
MMNL started with pid=48, OS id=23068
lmon registered with NM - instance number 1 (internal mem no 0)
Reconfiguration started (old inc 0, new inc 20)
List of instances:
1 2 (myinst: 1)
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 valid according to instance 2
* domain 0 valid = 1 according to instance 2
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Thu May 23 17:10:09 2013
LCK0 started with pid=49, OS id=23078
Thu May 23 17:10:09 2013
Starting background process RSMN
Thu May 23 17:10:09 2013
RSMN started with pid=50, OS id=23080
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /u03/oracle/app/oracle
Thu May 23 17:10:10 2013
ALTER SYSTEM SET local_listener=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.101)(PORT=1521))))' SCOPE=MEMORY SID='oracrm1';
ALTER DATABASE MOUNT /* db agent *//* {0:2:101} */
Thu May 23 17:10:14 2013
Successful mount of redo thread 1, with mount id 2141923922
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: ALTER DATABASE MOUNT /* db agent *//* {0:2:101} */
ALTER DATABASE OPEN /* db agent *//* {0:2:101} */
************************************************************
Warning: The SCN headroom for this database is only 61 days!
************************************************************
Picked broadcast on commit scheme to generate SCNs
Thread 1 opened at log sequence 38
Current log# 6 seq# 38 mem# 0: /dev/vgsys/rredo_06.dbf
Successful open of redo thread 1
Thu May 23 17:10:15 2013
SMON: enabling cache recovery
Thu May 23 17:10:16 2013
Incremental checkpoint up to RBA [0x26.2.0], current log tail at RBA [0x26.3.0]
Thu May 23 17:10:17 2013
minact-scn: Inst 1 is a slave inc#:20 mmon proc-id:23066 status:0x2
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
[23082] Successfully onlined Undo Tablespace 1.
Undo initialization finished serial:0 start:3563730085 end:3563732325 diff:2240 (22 seconds)
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Starting background process RCBG
Thu May 23 17:10:18 2013
RCBG started with pid=52, OS id=23084
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Thu May 23 17:10:19 2013
QMNC started with pid=53, OS id=23088
Thu May 23 17:10:19 2013
NOTE: dependency between database oracrm and diskgroup resource ora.CRM1GIDG.dg is established
Thu May 23 17:10:20 2013
Completed: ALTER DATABASE OPEN /* db agent *//* {0:2:101} */
最终恢复后状态如下:
/u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRM1GIDG.dg
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.LISTENER.lsnr
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.asm
ONLINE ONLINE crm01 Started
ONLINE ONLINE crm02 Started
ora.gsd
OFFLINE OFFLINE crm01
OFFLINE OFFLINE crm02
ora.net1.network
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.ons
ONLINE ONLINE crm01
ONLINE ONLINE crm02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE crm01
ora.cvu
1 ONLINE ONLINE crm01
ora.oc4j
1 OFFLINE OFFLINE
ora.scan1.vip
1 ONLINE ONLINE crm01
ora.oracrm.db
1 ONLINE ONLINE crm01 Open
2 ONLINE ONLINE crm02 Open
ora.crm01.vip
1 ONLINE ONLINE crm01
ora.crm02.vip
1 ONLINE ONLINE crm02
测量指标及结果
侦测到实例失效的时间:9S
完成实例恢复的时间:38S
客户应用恢复到同等服务水平的时间:47S
数据库实例暂停服务的时间:47S
GI重启失效实例的所需时间:47S
ASM实例故障模拟测试
模拟操作步骤
在数据库集群中,启动客户应用
在负载较大的数据库节点上采用’kill -9 pmon进程’的命令异常终止asm pmon后台进程
预期测试结果
ASM pmon进程停止后,GI检测到,这个节点上的数据库将异常关闭,故而可以看到数据库pmon进程也停止。
GI重新启动ASM实例
GI重新启动数据库实例
数据库实例进行实例恢复、数据库打开、数据库实例向Listener进行注册
客户应用应重新发出连接请求,数据库实例开始提供服务
测量指标及结果
GI侦测到数据库实例失效的时间:12S
GI重新启动数据库实例的所需时间:38S
数据库实例完成实例恢复的时间:50S
测量过程记录
节点1:
ps -ef|grep pmon_+ASM1
grid 21389 1 0 16:34:03 ? 0:01 asm_pmon_+ASM1
kill -9 21389
由于文件存放于裸设备,此时查看,发现oracrm数据库没有收到任何影响。
ps -ef|grep pmon
crm01:/# ps -ef|grep pmon
oracle 22962 1 0 17:09:07 ? 0:50 ora_pmon_oracrm1
grid 23203 1 0 17:13:08 ? 0:00 asm_pmon_+ASM1
root 23320 16872 1 17:13:26 pts/1 0:00 grep pmon
tail -f /u03/oracle/app/oracle/diag/rdbms/oracrm/oracrm*/trace/alert*.log
Thu May 23 17:10:25 2013
Starting background process CJQ0
Thu May 23 17:10:25 2013
CJQ0 started with pid=61, OS id=23114
GI重新启动ASM实例,没有使用到ASM的实例不受影响.
最终恢复后状态如下:
crm01:/# /u03/oracle/app/11.2.0/grid/bin/crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRM1GIDG.dg
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.LISTENER.lsnr
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.asm
ONLINE ONLINE crm01 Started
ONLINE ONLINE crm02 Started
ora.gsd
OFFLINE OFFLINE crm01
OFFLINE OFFLINE crm02
ora.net1.network
ONLINE ONLINE crm01
ONLINE ONLINE crm02
ora.ons
ONLINE ONLINE crm01
ONLINE ONLINE crm02
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE crm01
ora.cvu
1 ONLINE ONLINE crm01
ora.oc4j
1 OFFLINE OFFLINE
ora.scan1.vip
1 ONLINE ONLINE crm01
ora.oracrm.db
1 ONLINE ONLINE crm01 Open
2 ONLINE ONLINE crm02 Open
ora.crm01.vip
1 ONLINE ONLINE crm01
ora.crm02.vip
1 ONLINE ONLINE crm02
|
|