baolei 发表于 2014-1-20 15:54:25

11g Rac 实例重启

hi ML:

我们生产系统这个库连续2次发生实例重启:

相应报错日志 :

9:02:31 开始ncxdb11 就没有alert 日志 直到 08分开始数据库启动,之前一直在报no heartbeat have disk hb , oswbb 中心跳存在一定延时 但 之前也是那么多:

ncxdb11 oswprvtnet:

zzz ***Mon Jan 20 09:01:29 GMT+08:00 2014
trying to get source for ncxdb11-pri
source should be 172.32.204.29
traceroute to ncxdb11-pri (172.32.204.29) from 172.32.204.29 (172.32.204.29), 30 hops max
outgoing MTU = 1500
1  ncxdb11-pri (172.32.204.29)  58 ms  0 ms  0 ms
trying to get source for ncxdb12-pri
source should be 172.32.204.29
traceroute to ncxdb12-pri (172.32.204.30) from 172.32.204.29 (172.32.204.29), 30 hops max
outgoing MTU = 1500
1  ncxdb12-pri (172.32.204.30)  58 ms  0 ms  1 ms
zzz ***Mon Jan 20 09:02:01 GMT+08:00 2014
trying to get source for ncxdb11-pri
source should be 172.32.204.29
traceroute to ncxdb11-pri (172.32.204.29) from 172.32.204.29 (172.32.204.29), 30 hops max
outgoing MTU = 1500
1  ncxdb11-pri (172.32.204.29)  45 ms  0 ms  0 ms
trying to get source for ncxdb12-pri
source should be 172.32.204.29
traceroute to ncxdb12-pri (172.32.204.30) from 172.32.204.29 (172.32.204.29), 30 hops max
outgoing MTU = 1500
1  ncxdb12-pri (172.32.204.30)  46 ms  0 ms  0 ms
zzz ***Mon Jan 20 09:10:29 GMT+08:00 2014
trying to get source for ncxdb11-pri
source should be 172.32.204.29
traceroute to ncxdb11-pri (172.32.204.29) from 172.32.204.29 (172.32.204.29), 30 hops max
outgoing MTU = 1500
1  ncxdb11-pri (172.32.204.29)  33 ms  0 ms  0 ms
trying to get source for ncxdb12-pri
source should be 172.32.204.29
traceroute to ncxdb12-pri (172.32.204.30) from 172.32.204.29 (172.32.204.29), 30 hops max
outgoing MTU = 1500
1  ncxdb12-pri (172.32.204.30)  32 ms  0 ms  0 ms



baolei 发表于 2014-1-20 15:54:52

ncxdb11 ocssd.log:

2014-01-20 09:02:20.226: [    CSSD]clssnmPollingThread: node ncxdb12 (2) at 50% heartbeat fatal, removal in 14.903 seconds
2014-01-20 09:02:20.226: [    CSSD]clssnmPollingThread: node ncxdb12 (2) is impending reconfig, flag 2229260, misstime 15097
2014-01-20 09:02:20.227: [    CSSD]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)
2014-01-20 09:02:20.227: [    CSSD]clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB, DHB has rcfg 254422787, wrtcnt, 93434934, LATS 670
757723, lastSeqNo 93427805, uniqueness 1382516743, timestamp 1390179739/670661093
2014-01-20 09:02:20.227: [    CSSD]clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB, DHB has rcfg 254422787, wrtcnt, 93434936, LATS 670
757723, lastSeqNo 93427803, uniqueness 1382516743, timestamp 1390179740/670661392
2014-01-20 09:02:20.366: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670757862/1390179740
2014-01-20 09:02:20.736: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670758232/1390179740
2014-01-20 09:02:20.776: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670758272/1390179740
2014-01-20 09:02:20.876: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670758372/1390179740
2014-01-20 09:02:21.227: [    CSSD]clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB, DHB has rcfg 254422787, wrtcnt, 93434939, LATS 670
758723, lastSeqNo 93434936, uniqueness 1382516743, timestamp 1390179741/670662392
2014-01-20 09:02:21.236: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670758732/1390179741
2014-01-20 09:02:21.277: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670758772/1390179741
2014-01-20 09:02:21.376: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670758872/1390179741
2014-01-20 09:02:21.729: [    CSSD]clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB, DHB has rcfg 254422787, wrtcnt, 93434941, LATS 670
759224, lastSeqNo 93405380, uniqueness 1382516743, timestamp 1390179741/670662648
2014-01-20 09:02:21.737: [    CSSD]clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB, DHB has rcfg 254422787, wrtcnt, 93434942, LATS 670
759233, lastSeqNo 93434939, uniqueness 1382516743, timestamp 1390179741/670662893
2014-01-20 09:02:21.738: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670759233/1390179741
2014-01-20 09:02:21.778: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670759273/1390179741
2014-01-20 09:02:21.879: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670759375/1390179741
2014-01-20 09:02:22.231: [    CSSD]clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB, DHB has rcfg 254422787, wrtcnt, 93434944, LATS 670
759727, lastSeqNo 93434941, uniqueness 1382516743, timestamp 1390179741/670663148
2014-01-20 09:02:22.231: [    CSSD]clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB, DHB has rcfg 254422787, wrtcnt, 93434945, LATS 670
759727, lastSeqNo 93434942, uniqueness 1382516743, timestamp 1390179742/670663399
2014-01-20 09:02:22.240: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670759736/1390179742
2014-01-20 09:02:22.280: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670759775/1390179742
2014-01-20 09:02:22.380: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670759876/1390179742
2014-01-20 09:02:22.480: [    CSSD]clssnmSendingThread: sending status msg to all nodes
2014-01-20 09:02:22.480: [    CSSD]clssnmSendingThread: sent 4 status msgs to all nodes
2014-01-20 09:02:22.734: [    CSSD]clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB, DHB has rcfg 254422787, wrtcnt, 93434947, LATS 670
760230, lastSeqNo 93434944, uniqueness 1382516743, timestamp 1390179742/670663651
2014-01-20 09:02:22.734: [    CSSD]clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB, DHB has rcfg 254422787, wrtcnt, 93434948, LATS 670
760230, lastSeqNo 93434945, uniqueness 1382516743, timestamp 1390179742/670663900
2014-01-20 09:02:22.744: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670760240/1390179742
2014-01-20 09:02:22.781: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670760277/1390179742
2014-01-20 09:02:22.881: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670760377/1390179742
2014-01-20 09:02:23.232: [    CSSD]clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB, DHB has rcfg 254422787, wrtcnt, 93434951, LATS 670
760727, lastSeqNo 93434948, uniqueness 1382516743, timestamp 1390179743/670664402
2014-01-20 09:02:23.245: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670760741/1390179743
2014-01-20 09:02:23.282: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670760777/1390179743
2014-01-20 09:02:23.386: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670760881/1390179743
2014-01-20 09:02:23.733: [    CSSD]clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB, DHB has rcfg 254422787, wrtcnt, 93434953, LATS 670
761229, lastSeqNo 93434947, uniqueness 1382516743, timestamp 1390179743/670664657
2014-01-20 09:02:23.735: [    CSSD]clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB, DHB has rcfg 254422787, wrtcnt, 93434954, LATS 670
761231, lastSeqNo 93434951, uniqueness 1382516743, timestamp 1390179743/670664907
2014-01-20 09:02:23.746: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670761242/1390179743
2014-01-20 09:02:23.784: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670761279/1390179743
2014-01-20 09:02:23.887: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670761382/1390179743
2014-01-20 09:02:24.235: [    CSSD]clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB, DHB has rcfg 254422787, wrtcnt, 93434956, LATS 670
761730, lastSeqNo 93434953, uniqueness 1382516743, timestamp 1390179743/670665157

在ncxdb11的ocssd log中,从9点02分20秒开始,报no network HB的错误。

baolei 发表于 2014-1-20 15:57:30

2014-01-20 09:02:35.463: [    CSSD]clssnmCheckDskInfo: My cohort: 2
2014-01-20 09:02:35.463: [    CSSD]clssnmCheckDskInfo: Surviving cohort: 1
2014-01-20 09:02:35.463: [    CSSD](:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 1 nodes with leader 2, ncxdb12, is smalle
r than cohort of 1 nodes led by node 1, ncxdb11, based on map type 2
2014-01-20 09:02:35.463: [    CSSD]clssgmQueueGrockEvent: groupName(IGCXDB1SYS$USERS) count(2) master(1) event(2), incarn 8, mbrc 2, to member 2, events 0x0, state
0x0
2014-01-20 09:02:35.463: [    CSSD]###################################
2014-01-20 09:02:35.463: [    CSSD]clssgmQueueGrockEvent: groupName(crs_version) count(3) master(1) event(2), incarn 15, mbrc 3, to member 0, events 0x0, state 0x0
2014-01-20 09:02:35.463: [    CSSD]clssscExit: CSSD aborting from thread clssnmRcfgMgrThread

baolei 发表于 2014-1-20 15:59:11

ncxdb12 :

2014-01-20 09:02:35.463: [    CSSD]clssnmCheckDskInfo: My cohort: 2
2014-01-20 09:02:35.463: [    CSSD]clssnmCheckDskInfo: Surviving cohort: 1
2014-01-20 09:02:35.463: [    CSSD](:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 1 nodes with leader 2, ncxdb12, is smalle
r than cohort of 1 nodes led by node 1, ncxdb11, based on map type 2
2014-01-20 09:02:35.463: [    CSSD]clssgmQueueGrockEvent: groupName(IGCXDB1SYS$USERS) count(2) master(1) event(2), incarn 8, mbrc 2, to member 2, events 0x0, state
0x0
2014-01-20 09:02:35.463: [    CSSD]###################################
2014-01-20 09:02:35.463: [    CSSD]clssgmQueueGrockEvent: groupName(crs_version) count(3) master(1) event(2), incarn 15, mbrc 3, to member 0, events 0x0, state 0x0
2014-01-20 09:02:35.463: [    CSSD]clssscExit: CSSD aborting from thread clssnmRcfgMgrThread  

为何ncxdb12 的node number < ncxdb11 的 node number ? 将ncxdb11 剔除?

另外 ncxdb12 lmon 日志中有kjxggpoll: change db group poll time to 50 ms

这段信息如何解读?

baolei 发表于 2014-1-20 15:59:52

os版本: aix  6.1
db: 11.2.0.3

Liu Maclean(刘相兵 发表于 2014-1-20 16:21:41

需要ocssd.log ,请打包上传

baolei 发表于 2014-1-20 17:06:10

已上传ossd.log alert ,lmon trace 等。希望从这个案例中找到根本原因,现在IBM原厂也与我们一起分析,之前down过第二个节点 SR 给出是访问存储通道有问题,不过和上一次不是一样的报错。

baolei 发表于 2014-1-20 17:08:18

本帖最后由 baolei 于 2014-1-20 17:10 编辑

这是我们这工程师给点建议,我个人认为有点随意,分析过程就不发了,很长。。结论如下

1 在9点02分20秒出现脑裂,节点1在9点02分35秒被逐出cluster,从而导致节点1的主机crash
2. 在9点02分35秒脑裂之后,由于节点1被逐出,从而节点2在9点02分36秒接管节点1,不过由于asm的lmon进程在接管投票的时间超过了50ms,从而引起asm的pmon进程误以为lmon进程已经僵死,进而导致了pmon进程异常终止,最终导致了节点2的数据库的crash

建议:
本次故障主要是由于心跳网络异常导致的一系列问题,建议,主机检查网络心跳为何远超过正常值。

Liu Maclean(刘相兵 发表于 2014-1-20 20:30:09

你给的日志最早记录是在1月14日

timeline:

node 1 2014-01-20 09:02:20.227: clssnmvDHBValidateNcopy: node 2, ncxdb12, has a disk HB, but no network HB,

node 2 2014-01-20 09:02:20.769  clssnmvDHBValidateNcopy: node 1, ncxdb11, has a disk HB, but no network HB

node 1  2014-01-20 09:02:31.423: [    CSSD]clssnmvDiskPing: Writing with status 0x3, timestamp 670768918/1390179751        重启前最后一条日志

node 2 2014-01-20 09:02:46.401: [    CSSD]clssgmClientShutdown: sending shutdown, fence_done 1    IO fench 后CSSD SHUTDOWN


node 2 2014-01-20 09:02:54.091: [    CSSD]clssscmain: Starting CSS daemon, version 11.2.0.3.0, in (clustered) mode with uniqueness value 1390179774   CRS shutdown 启动CSS

node 1 2014-01-20 09:08:22.972: [    CSSD]clssscmain: Starting CSS daemon, version 11.2.0.3.0, in (clustered) mode with uniqueness value 1390180102   重启后 启动CSS






node 1 没有太多可用信息


node 2  可以看到原来这里是想 Aborting local node to avoid splitbrain,  因为这个sub-cluster的权重小于节点1
2014-01-20 09:02:35.463: [    CSSD]clssnmCheckDskInfo: Checking disk info...
2014-01-20 09:02:35.463: [    CSSD]clssnmCheckSplit: Node 1, ncxdb11, is alive, DHB (1390179755, 670772496) more than disk timeout of 27000 after the last NHB (1390179725, 670742947)
2014-01-20 09:02:35.463: [    CSSD]clssnmCheckDskInfo: My cohort: 2
2014-01-20 09:02:35.463: [    CSSD]clssnmCheckDskInfo: Surviving cohort: 1
2014-01-20 09:02:35.463: [    CSSD](:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 1 nodes with leader 2, ncxdb12, is smaller than cohort of 1 nod
es led by node 1, ncxdb11, based on map type 2
2014-01-20 09:02:35.463: [    CSSD]clssgmQueueGrockEvent: groupName(IGCXDB1SYS$USERS) count(2) master(1) event(2), incarn 8, mbrc 2, to member 2, events 0x0, state 0x0
2014-01-20 09:02:35.463: [    CSSD]###################################
2014-01-20 09:02:35.463: [    CSSD]clssgmQueueGrockEvent: groupName(crs_version) count(3) master(1) event(2), incarn 15, mbrc 3, to member 0, events 0x0, state 0x0
2014-01-20 09:02:35.463: [    CSSD]clssscExit: CSSD aborting from thread clssnmRcfgMgrThread
2014-01-20 09:02:35.464: [    CSSD]###################################
2014-01-20 09:02:35.464: [    CSSD](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally



这里的奇怪在于 node 1的ocssd.log 中没有显示有clssnmCheckDskInfo的部分就重启了,照理说2个节点都做clssnmCheckDskInfo的话,1节点奖获胜 并存活。

但1节点 09:02:31左右就直接reboot了,这个时间点其实2个节点还没有通过votedisk商讨谁存活下去。


疑问:这2个节点的 时钟是否一致,

crsctl query votedisk 什么结果?

Liu Maclean(刘相兵 发表于 2014-1-20 20:40:35

PS: 就你提供的日志而言 仅仅2014-01-20 有no network HB的现象,没有看到其他时候有这种现象。

baolei 发表于 2014-1-21 12:33:23

Liu Maclean(刘相兵 发表于 2014-1-20 20:30 static/image/common/back.gif
你给的日志最早记录是在1月14日

timeline:


我也是非常奇怪,按道理 都有disk hb 情况下 应该是 cxdb12 down ,怎么是 cxdb11 的node number 比12大,而且 9:02:31 秒node1就没有日志了,不知道这个reboot到底是人为还是系统,当时问了一圈人都没有人会做这个操作。
crsctl query votedisk 稍后晚些输出,好像这个库是我建的。。

baolei 发表于 2014-1-21 12:33:56

Liu Maclean(刘相兵 发表于 2014-1-20 20:40 static/image/common/back.gif
PS: 就你提供的日志而言 仅仅2014-01-20 有no network HB的现象,没有看到其他时候有这种现象。 ...

是啊,日志里面也就这些信息,也没看到其他。

baolei 发表于 2014-1-21 17:49:44

Liu Maclean(刘相兵 发表于 2014-1-20 20:30 static/image/common/back.gif
你给的日志最早记录是在1月14日

timeline:


grid@ncxdb11:/home/grid$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   5e380d62b6cb4f6ebf13846fa0e0f0c8 (/dev/rhdiskpower6001)
2. ONLINE   4397d72c00174fe7bf13acd712071024 (/dev/rhdiskpower6002)
3. ONLINE   9b1cb71a838c4ffdbfb93565d5c1cb4c (/dev/rhdiskpower6003)
Located 3 voting disk(s).


ncxdb12:[/]#crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   5e380d62b6cb4f6ebf13846fa0e0f0c8 (/dev/rhdiskpower6001)
2. ONLINE   4397d72c00174fe7bf13acd712071024 (/dev/rhdiskpower6002)
3. ONLINE   9b1cb71a838c4ffdbfb93565d5c1cb4c (/dev/rhdiskpower6003)

Liu Maclean(刘相兵 发表于 2014-1-21 19:47:54

疑问:这2个节点的 时钟是否一致,如9楼

baolei 发表于 2014-1-21 20:10:11

Liu Maclean(刘相兵 发表于 2014-1-21 19:47 static/image/common/back.gif
疑问:这2个节点的 时钟是否一致,如9楼

oracle@ncxdb11:/home/oracle$ ssh ncxdb12 date;date;
Tue Jan 21 20:09:44 GMT+08:00 2014
Tue Jan 21 20:09:44 GMT+08:00 2014

一致的啊

baolei 发表于 2014-1-23 11:34:04

Liu Maclean(刘相兵 发表于 2014-1-21 19:47 static/image/common/back.gif
疑问:这2个节点的 时钟是否一致,如9楼

请问ML还有什么新发现吗?

Liu Maclean(刘相兵 发表于 2014-1-23 15:40:51


1、 显然要控制这种网络故障 包括采用bind等技术

2、 设置crsctl set log css CSSD:2   保证下次发生时让node1 重启产生日志

3、 需要2节点 errpt 数据

baolei 发表于 2014-1-23 16:24:58

节点1:
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
ECCE4018   0122181714 T S fcs4           SOFTWARE PROGRAM ERROR
ECCE4018   0122181714 T S fcs6           SOFTWARE PROGRAM ERROR
ECCE4018   0122181414 T S fcs4           SOFTWARE PROGRAM ERROR
ECCE4018   0122181314 T S fcs6           SOFTWARE PROGRAM ERROR
ECCE4018   0122181314 T S fcs6           SOFTWARE PROGRAM ERROR
ECCE4018   0122181314 T S fcs6           SOFTWARE PROGRAM ERROR
ECCE4018   0122181314 T S fcs6           SOFTWARE PROGRAM ERROR
ECCE4018   0122181314 T S fcs8           SOFTWARE PROGRAM ERROR
A6DF45AA   0120090714 I O RMCdaemon      The daemon is started.
2BFA76F6   0120090314 T S SYSPROC        SYSTEM SHUTDOWN BY USER
9DBCFDEE   0120090614 T O errdemon       ERROR LOGGING TURNED ON
E87EF1BE   0119150014 P O dumpcheck      The largest dump device is too small.
E87EF1BE   0118150014 P O dumpcheck      The largest dump device is too small.

ncxdb11:[/]#lsattr -El ent17
adapter_names   ent4           EtherChannel Adapters                           True
alt_addr        0x000000000000 Alternate EtherChannel Address                  True
auto_recovery   yes            Enable automatic recovery after failover        True
backup_adapter  ent10          Adapter used when whole channel fails           True
hash_mode       default        Determines how outgoing adapter is chosen       True
interval        long           Determines interval value for IEEE 802.3ad mode True
mode            standard       EtherChannel mode of operation                  True
netaddr         0              Address to ping                                 True
noloss_failover yes            Enable lossless failover after ping failure     True
num_retries     3              Times to retry ping before failing              True
retry_time      1              Wait time (in seconds) between pings            True
use_alt_addr    no             Enable Alternate EtherChannel Address           True
use_jumbo_frame no             Enable Gigabit Ethernet Jumbo Frames            True
节点2 :

IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
E87EF1BE   0120150014 P O dumpcheck      The largest dump device is too small.
A924A5FC   0120090214 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED
E87EF1BE   0119150014 P O dumpcheck      The largest dump device is too small.
E87EF1BE   0118150014 P O dumpcheck      The largest dump device is too small.

ncxdb12:[/]#lsattr -El ent17
adapter_names   ent4           EtherChannel Adapters                           True
alt_addr        0x000000000000 Alternate EtherChannel Address                  True
auto_recovery   yes            Enable automatic recovery after failover        True
backup_adapter  ent10          Adapter used when whole channel fails           True
hash_mode       default        Determines how outgoing adapter is chosen       True
interval        long           Determines interval value for IEEE 802.3ad mode True
mode            standard       EtherChannel mode of operation                  True
netaddr         0              Address to ping                                 True
noloss_failover yes            Enable lossless failover after ping failure     True
num_retries     3              Times to retry ping before failing              True
retry_time      1              Wait time (in seconds) between pings            True
use_alt_addr    no             Enable Alternate EtherChannel Address           True
use_jumbo_frame no             Enable Gigabit Ethernet Jumbo Frames            True


没有异常的啊。。

baolei 发表于 2014-1-23 16:27:51

Liu Maclean(刘相兵 发表于 2014-1-23 15:40 static/image/common/back.gif
1、 显然要控制这种网络故障 包括采用bind等技术

2、 设置crsctl set log css CSSD:2   保证下次发生时让 ...

我同事的结论 貌似也不对吧?
页: [1]
查看完整版本: 11g Rac 实例重启