Oracle RAC 10g 两节点,某节点不定期被驱逐,求原因
本帖最后由 不了峰 于 2014-11-21 09:37 编辑环境 :
RHEL 5.2 64位 Oracle 10.2.0.4 RAC ,2个节点 ,每节点上有两个实例
现象:
两个节点的,其中某个节点,(不一定是特定一个节点),不定期会被驱逐。
被驱逐的节点,主机的状态应该是死机的状态,(可能是down的状态 ),只有轻按一下主机的电源键,才能开机。
分析
两台主机的时间不同步,db1节点的时间会比db2节点时间 慢了 15分钟 。 --- 难道是由于时间不同步导致的驱逐 ?
还有一个可疑的事情是,存储上有两个坏盘,总共有30块盘,hp eva3000的磁阵列
查看两节点的关于节点被会驱逐时刻的日志,感觉基本没有可以用来判断是什么原因导致节点被驱逐.(也可能是我水平不行,没有发现问题 ) ,请大家帮忙分析一下.
以2014-11-20 00:24:44 发现db1节点实现被驱逐为例, db2节点的主机无响应,可能死机,(后面是点击电源键,启动主机)
查看/var/log/message alter_db1.log crsd.log cssd.log 没有那一时间点的任何有用日志。
只有在第二个节点发现一些日志,见上传的附件
谢谢!
odm finding:
Sat Oct 11 17:37:14 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10505.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
--
NOTE: ASMB process exiting due to lack of ASM file activity
Mon Oct 13 10:05:37 2014
Starting ORACLE instance (normal)
--
Mon Oct 13 10:05:41 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10494.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
ERROR: ORA-600 in COD recovery for diskgroup 2/0x32084373 (SGDB)
ERROR: ORA-600 thrown in RBAL for group number 2
Mon Oct 13 10:05:42 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10494.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
--
Sat Oct 18 09:57:38 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10494.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
ERROR: ORA-600 in COD recovery for diskgroup 2/0x32084373 (SGDB)
ERROR: ORA-600 thrown in RBAL for group number 2
Sat Oct 18 09:57:38 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10494.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
--
Tue Oct 21 13:39:01 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10494.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
ERROR: ORA-600 in COD recovery for diskgroup 2/0x32084373 (SGDB)
ERROR: ORA-600 thrown in RBAL for group number 2
Tue Oct 21 13:39:01 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10494.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
--
Trace dumping is performing id=
Thu Oct 23 10:59:58 2014
Starting ORACLE instance (normal)
--
Thu Oct 23 11:00:02 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10487.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
ERROR: ORA-600 in COD recovery for diskgroup 2/0x3208463b (SGDB)
ERROR: ORA-600 thrown in RBAL for group number 2
Thu Oct 23 11:00:03 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10487.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
--
Sat Oct 25 00:35:53 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10487.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
ERROR: ORA-600 in COD recovery for diskgroup 2/0x3208463b (SGDB)
ERROR: ORA-600 thrown in RBAL for group number 2
Sat Oct 25 00:35:53 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10487.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
--
Fri Nov 7 18:02:49 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10487.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
ERROR: ORA-600 in COD recovery for diskgroup 2/0x3208463b (SGDB)
ERROR: ORA-600 thrown in RBAL for group number 2
Fri Nov 7 18:02:49 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10487.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
--
Thu Nov 20 00:40:47 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10487.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
ERROR: ORA-600 in COD recovery for diskgroup 2/0x3208463b (SGDB)
ERROR: ORA-600 thrown in RBAL for group number 2
Thu Nov 20 00:40:47 2014
Errors in file /oracle/app/admin/+ASM/bdump/+asm2_rbal_10487.trc:
ORA-00600: internal error code, arguments: , , , , [], [], [], []
这几个kfdAuDealloc2 和 ORA-600 in COD recovery 具体诊断过吗? Liu Maclean(刘相兵 发表于 2014-11-21 15:10 static/image/common/back.gif
odm finding:
Sat Oct 11 17:37:14 2014
这几个kfdAuDealloc2 和 ORA-600 in COD recovery 具体诊断过吗?
--没有
我当时认为这个是发现在节点被驱逐之后发生的
从ocssd的来看
eviction in 0.230 seconds
[ CSSD]2014-11-20 00:40:44.823 >TRACE: clssnmPollingThread: Eviction started for node jtjdb1 (1), flags 0x040d, state 3, wt4c 0
而ASM的log 记录的时间点 在Thu Nov 20 00:40:47 2014
我就没有去看这个报错的原因了~
页:
[1]