WMLM 发表于 2014-12-2 12:56:12

Oracle RAC 10g 两节点,某节点不定期被驱逐,求原因

本帖最后由 WMLM 于 2014-12-2 12:56 编辑

环境:
IBM 550 两台小机 + 一套浪潮存储
AIX 6100-08-02
oracle 10g RAC  10.2.0.4

症状:
CRS不稳定 某节点不定期被驱逐。

目标:
希望刘大或者其他大侠抽空儿看看上传的日志,给些建议,不胜感激。



Liu Maclean(刘相兵 发表于 2014-12-2 13:46:51

node1:


[    CSSD]2014-12-01 07:21:14.172 >WARNING: clssnmPollingThread: node node2 (2) at 50 2.481040e-265artbeat fatal, eviction in 14.435 seconds
[    CSSD]2014-12-01 07:21:14.172 >TRACE:   clssnmPollingThread: node node2 (2) is impending reconfig, flag 1, misstime 15565
[    CSSD]2014-12-01 07:21:14.172 >TRACE:   clssnmPollingThread: diskTimeout set to (27000)ms impending reconfig status(1)
[    CSSD]2014-12-01 07:21:21.202 >WARNING: clssnmPollingThread: node node2 (2) at 75 2.481040e-265artbeat fatal, eviction in 7.405 seconds
[    CSSD]2014-12-01 07:21:22.202 >WARNING: clssnmPollingThread: node node2 (2) at 75 2.481040e-265artbeat fatal, eviction in 6.405 seconds
[    CSSD]2014-12-01 07:21:26.228 >WARNING: clssnmPollingThread: node node2 (2) at 90 2.481040e-265artbeat fatal, eviction in 2.379 seconds
[    CSSD]2014-12-01 07:21:27.232 >WARNING: clssnmPollingThread: node node2 (2) at 90 2.481040e-265artbeat fatal, eviction in 1.375 seconds
[    CSSD]2014-12-01 07:21:28.239 >WARNING: clssnmPollingThread: node node2 (2) at 90 2.481040e-265artbeat fatal, eviction in 0.368 seconds
[    CSSD]2014-12-01 07:21:28.612 >TRACE:   clssnmPollingThread: Eviction started for node node2 (2), flags 0x0001, state 3, wt4c 0
[    CSSD]2014-12-01 07:21:28.612 >TRACE:   clssnmDoSyncUpdate: Initiating sync 3
[    CSSD]2014-12-01 07:21:28.612 >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (27000)ms


node2 : [    CSSD]2014-12-01 09:26:52.478 >TRACE:   clssnmPollingThread: node node1 (1) is impending reconfig, flag 1039, misstime 15026
[    CSSD]2014-12-01 09:26:52.478 >TRACE:   clssnmPollingThread: diskTimeout set to (27000)ms impending reconfig status(1)
[    CSSD]2014-12-01 09:26:53.480 >WARNING: clssnmPollingThread: node node1 (1) at 50 2.481040e-265artbeat fatal, eviction in 13.973 seconds
[    CSSD]2014-12-01 09:27:00.478 >WARNING: clssnmPollingThread: node node1 (1) at 75 2.481040e-265artbeat fatal, eviction in 6.975 seconds
[    CSSD]2014-12-01 09:27:04.482 >WARNING: clssnmPollingThread: node node1 (1) at 90 2.481040e-265artbeat fatal, eviction in 2.971 seconds
[    CSSD]2014-12-01 09:27:05.480 >WARNING: clssnmPollingThread: node node1 (1) at 90 2.481040e-265artbeat fatal, eviction in 1.973 seconds
[    CSSD]2014-12-01 09:27:06.478 >WARNING: clssnmPollingThread: node node1 (1) at 90 2.481040e-265artbeat fatal, eviction in 0.975 seconds
[    CSSD]2014-12-01 09:27:07.455 >TRACE:   clssnmPollingThread: Eviction started for node node1 (1), flags 0x040f, state 3, wt4c 0
[    CSSD]2014-12-01 09:27:07.456 >TRACE:   clssnmDoSyncUpdate: Initiating sync 7


Liu Maclean(刘相兵 发表于 2014-12-2 13:47:14

孤证不立,  部署 osw 和ping private network 脚本 以便下次确认

WMLM 发表于 2014-12-2 14:37:40

我也注意到 node node2 (2) at 50 2.481040e-265artbeat fatal, eviction in 14.435 seconds
但不知道这个地方指的是心跳磁盘问题,还是心跳网络的问题。
刘大既然指出, 我这就着手去部署OSW 和 ping private network  .  后续再放日志。多谢

不了峰 发表于 2014-12-2 16:40:48

跟我遇到的一个好象,

请问 节点被驱逐后,主机的状态是什么样子? 死机? 还是自动重启?

WMLM 发表于 2014-12-8 10:03:38

节点被驱逐后 监听器就停止了。因为这几天故障没有重现,所以还没有去收集OSW日志,如果故障再现,收集OSW日志之后,再行上传,多谢关注。

不了峰 发表于 2014-12-10 10:34:16

WMLM 发表于 2014-12-8 10:03 static/image/common/back.gif
节点被驱逐后 监听器就停止了。因为这几天故障没有重现,所以还没有去收集OSW日志,如果故障再现,收集OSW ...

是否有必要按
http://t.askmaclean.com/thread-3551-1-1.html
11gR2之前版本的集群,将Diagwait设置为13。

WMLM 发表于 2014-12-10 14:21:14

Diagwait设置为13 是最佳实践中的配置,原来安装数据库时,已经设置过了。
页: [1]
查看完整版本: Oracle RAC 10g 两节点,某节点不定期被驱逐,求原因