Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

0

积分

1

好友

2

主题
1#
发表于 2013-11-30 23:17:26 | 查看: 4977| 回复: 4
我遇到一个RAC的问题,通过朋友推荐,我向您请教一下,希望您能帮忙解惑,先谢谢了。
        我维护一个4节点RAC,数据库版本11.2.0.3.3 操作系统 AIX,出现4个节点相继重启的问题。

        事情的过程是这样的:
        1、node1的私网断了一段时间(20分钟左右),node1被踢出集群。
        2、node1重启(这是node1的私网还没好),重启过程中把其他节点通过IMR 给kill掉了(红色部分是node1 kill 其他节点的信息):
见附件
node1_alert.txt (8.94 KB, 下载次数: 859)

我不明白的是,node1重启,为什么会kill其他正常节点呢?
您有遇到过类似的问题吗?这是什么原因呢?
是不是因为node1,启动过程中自己的私网不通,但他认为是其他节点有问题,所以把其他节点kill了?
2#
发表于 2013-12-1 10:22:21
  1. Starting background process MARK
  2. Thu Nov 28 10:44:07 2013
  3. MARK started with pid=34, OS id=49873248
  4. NOTE: MARK has subscribed
  5. lmon registered with NM - instance number 1 (internal mem no 0)
  6. Thu Nov 28 10:44:44 2013
  7. Dumping diagnostic data in directory=[cdmp_20131128104444], requested by (instance=2, osid=55378232 (LGWR)), summary=[incident=312193].
  8. Thu Nov 28 10:44:53 2013
  9. LMON (ospid: 63439316) detects hung instances during IMR reconfiguration
  10. LMON (ospid: 63439316) tries to kill the instance 4 in 20 seconds.
  11. Please check instance 4's alert log and LMON trace file for more details.
  12. Thu Nov 28 10:45:13 2013
  13. Remote instance kill is issued with system inc 0
  14. Remote instance kill map (size 1) : 4
  15. LMON received an instance eviction notification from instance 1
  16. The instance eviction reason is 0x20000000
  17. The instance eviction map is 4
  18. Thu Nov 28 10:45:15 2013
  19. Dumping diagnostic data in directory=[cdmp_20131128104515], requested by (instance=4, osid=12124406 (PMON)), summary=[abnormal instance termination].
  20. Thu Nov 28 10:48:18 2013
  21. Remote instance kill is issued with system inc 66
  22. Remote instance kill map (size 1) : 4
  23. LMON received an instance eviction notification from instance 1
  24. The instance eviction reason is 0x40000000
  25. The instance eviction map is 4
  26. Thu Nov 28 10:48:22 2013
  27. Dumping diagnostic data in directory=[cdmp_20131128104821], requested by (instance=4, osid=53281130 (PMON)), summary=[abnormal instance termination].
  28. Thu Nov 28 10:49:04 2013
  29. LMON (ospid: 63439316) detects hung instances during IMR reconfiguration
  30. LMON (ospid: 63439316) tries to kill the instance 3 in 20 seconds.
  31. Please check instance 3's alert log and LMON trace file for more details.
  32. Thu Nov 28 10:49:24 2013
  33. Remote instance kill is issued with system inc 66
  34. Remote instance kill map (size 1) : 3
  35. LMON received an instance eviction notification from instance 1
  36. The instance eviction reason is 0x20000000
  37. The instance eviction map is 3
  38. Thu Nov 28 10:50:12 2013
  39. No connectivity to other instances in the cluster during startup. Hence, LMON is terminating the instance. Please check the LMON trace file for details. Also, please check the network logs of this instance along with clusterwide network health for problems and then re-start this instance.
  40. LMON (ospid: 63439316): terminating the instance
  41. Thu Nov 28 10:50:13 2013
  42. System state dump requested by (instance=1, osid=63439316 (LMON)), summary=[abnormal instance termination].
  43. System State dumped to trace file /u01/app/oracle/diag/rdbms/bjscnfzc/bjscnfzc1/trace/bjscnfzc1_diag_65536264.trc
  44. Thu Nov 28 10:50:13 2013
  45. ORA-1092 : opitsk aborting process
  46. Thu Nov 28 10:50:13 2013
  47. License high water mark = 2
  48. Dumping diagnostic data in directory=[cdmp_20131128105012], requested by (instance=1, osid=63439316 (LMON)), summary=[abnormal instance termination].
  49. Instance terminated by LMON, pid = 63439316
  50. USER (ospid: 62128628): terminating the instance
  51. Instance terminated by USER, pid = 62128628
  52. Thu Nov 28 11:00:21 2013
复制代码
压缩打包后上传

/u01/app/oracle/diag/rdbms/bjscnfzc/bjscnfzc1/trace/bjscnfzc1_diag_65536264.trc

以及当时LMON的TRACE

回复 只看该作者 道具 举报

3#
发表于 2013-12-1 10:30:11
BUG 14135323 - RAC INSTANCE CANNOT JOIN RUNNING RAC
BUG 14550939 - LMON TERMINATING INSTANCE

没有找到确认为real bug的案例, 一种思路是使用 cluster_interconnects 来避免使用HAIP,因为 HAIP有一些问题

回复 只看该作者 道具 举报

4#
发表于 2013-12-1 22:22:05
哦,使用HAIP会有什么问题呢,这个好像很关键啊

回复 只看该作者 道具 举报

5#
发表于 2013-12-2 18:45:28
ks2000ks1 发表于 2013-12-1 22:22
哦,使用HAIP会有什么问题呢,这个好像很关键啊

如2楼的 action plan ,  action plan是解答问题的 KEY!

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-5-20 01:15 , Processed in 0.054269 second(s), 23 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569