Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

52

积分

0

好友

6

主题
1#
发表于 2012-12-11 00:09:03 | 查看: 9495| 回复: 12
本帖最后由 xinxin415415 于 2012-12-12 15:20 编辑

环境是AIX5.3 数据库版本是10.2.0.4 rac 裸设备
2节点在nov 30 22:00左右当机
附件是两个节点的alert日志和相关的trace文件。
请刘大和各位高手帮忙分析一下,谢谢!

alert trace.rar

5.53 MB, 下载次数: 1118

ocssd clusteralert.rar

69.96 KB, 下载次数: 2483

errpt.rar

12.49 KB, 下载次数: 2444

2#
发表于 2012-12-11 12:15:30
  1. Fri Nov 30 22:02:38 2012
  2. LMS1 (ospid: 6320324) is not heartbeating for 203 seconds.
  3. LMS2 (ospid: 6053930) is not heartbeating for 218 seconds.
  4. LMS3 (ospid: 892940) is not heartbeating for 209 seconds.
  5. LMS5 (ospid: 5898294) is not heartbeating for 206 seconds.
  6. Fri Nov 30 22:03:44 2012
  7. LMD0 (ospid: 1122498) is not heartbeating for 248 seconds.
  8. LMS0 (ospid: 790670) is not heartbeating for 218 seconds.
  9. LMS4 (ospid: 4989168) is not heartbeating for 230 seconds.
  10. Fri Nov 30 22:04:23 2012
  11. IPC Send timeout detected. Receiver ospid 5898294
  12. Receiver is waiting for a latch dumping latch state for receiver -15820
  13. Fri Nov 30 22:04:31 2012
  14. Errors in file /oracle/admin/accdb/udump/accdb1_ora_1757348.trc:
  15. Fri Nov 30 22:04:53 2012
  16. Errors in file /oracle/admin/accdb/bdump/accdb1_lms5_5898294.trc:
  17. Fri Nov 30 22:05:15 2012
复制代码
AIX + 10.2.0.4.0
建议你优先排除网络因素 导致心跳不正常的可能, 至少给出当时errpt 的日志

回复 只看该作者 道具 举报

3#
发表于 2012-12-11 12:37:32
硬件上没有什么问题   
xjcrmdb2:oracle:/oracle>errpt -d H
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
DCB47997   1210210812 T H hdisk55        DISK OPERATION ERROR
DCB47997   1210210612 T H hdisk59        DISK OPERATION ERROR
DCB47997   1123115612 T H hdisk125       DISK OPERATION ERROR
B6267342   0919171412 P H hdisk48        DISK OPERATION ERROR
B6267342   0919171312 P H hdisk48        DISK OPERATION ERROR
B6267342   0919171312 P H hdisk47        DISK OPERATION ERROR
B6267342   0919171312 P H hdisk47        DISK OPERATION ERROR
B6267342   0919171312 P H hdisk46        DISK OPERATION ERROR
B6267342   0919171312 P H hdisk46        DISK OPERATION ERROR

回复 只看该作者 道具 举报

4#
发表于 2012-12-11 12:56:49
xjcrmdb1:oracle:/oracle/admin/crmdb/bdump>errpt|more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
B6267342   1210170212 P H hdisk48        DISK OPERATION ERROR
B6267342   1210170212 P H hdisk46        DISK OPERATION ERROR
9D30B78E   1203141112 T S tty0           RECEIVER OVER-RUN ON INPUT
9D30B78E   1128121712 T S tty0           RECEIVER OVER-RUN ON INPUT
9D30B78E   1123135512 T S tty0           RECEIVER OVER-RUN ON INPUT
DCB47997   1123115612 T H hdisk83        DISK OPERATION ERROR
DCB47997   1123115612 T H hdisk30        DISK OPERATION ERROR
C69F5C9B   1113220112 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED
C69F5C9B   1113214112 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED
C69F5C9B   1113212112 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED
C69F5C9B   1113204112 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED
3C81E43F   0628133512 P U topsvcs        Late in sending heartbeat
3C81E43F   0701011411 P U topsvcs        Late in sending heartbeat
3C81E43F   0629222111 P U topsvcs        Late in sending heartbeat
3C81E43F   0628204011 P U topsvcs        Late in sending heartbeat
3C81E43F   0627191211 P U topsvcs        Late in sending heartbeat
3C81E43F   0625000811 P U topsvcs        Late in sending heartbeat
3C81E43F   0623203911 P U topsvcs        Late in sending heartbeat
3C81E43F   0619200511 P U topsvcs        Late in sending heartbeat
3C81E43F   0619112711 P U topsvcs        Late in sending heartbeat
3C81E43F   0608013611 P U topsvcs        Late in sending heartbeat
3C81E43F   0606202411 P U topsvcs        Late in sending heartbeat
3C81E43F   0530191911 P U topsvcs        Late in sending heartbeat
3C81E43F   0511114711 P U topsvcs        Late in sending heartbeat
3C81E43F   0426175811 P U topsvcs        Late in sending heartbeat
3C81E43F   0425192511 P U topsvcs        Late in sending heartbeat

回复 只看该作者 道具 举报

5#
发表于 2012-12-11 16:08:03
本帖最后由 xinxin415415 于 2012-12-11 16:10 编辑
Liu Maclean(刘相兵 发表于 2012-12-11 12:15
AIX + 10.2.0.4.0
建议你优先排除网络因素 导致心跳不正常的可能, 至少给出当时errpt 的日志 ...


将附件里面的alert日志替换成完整日志,请刘大重新查看附件。 谢谢

回复 只看该作者 道具 举报

6#
发表于 2012-12-11 21:22:10
时间线

[    CSSD]2012-11-30 09:27:51.137 [3857] >TRACE:   clssgmSendClient: Send failed rc 6, con (111f34e50), client (111f35230), proc (0)
[    CSSD]2012-11-30 21:41:16.150 [4114] >WARNING: clssnmPollingThread: node xjaccdb1 (1) at 50 2.481040e-265artbeat fatal, eviction in 14.315 seconds seedhbimpd 0
[    CSSD]2012-11-30 21:41:16.150 [4114] >TRACE:   clssnmPollingThread: node xjaccdb1 (1) is impending reconfig, flag 1039, misstime 15685
[    CSSD]2012-11-30 21:41:16.150 [4114] >TRACE:   clssnmPollingThread: diskTimeout set to (27000)ms impending reconfig status(1)
[    CSSD]2012-11-30 21:41:23.153 [4114] >WARNING: clssnmPollingThread: node xjaccdb1 (1) at 75 2.481040e-265artbeat fatal, eviction in 7.312 seconds seedhbimpd 1
[    CSSD]2012-11-30 21:41:24.153 [4114] >WARNING: clssnmPollingThread: node xjaccdb1 (1) at 75 2.481040e-265artbeat fatal, eviction in 6.312 seconds seedhbimpd 1
[    CSSD]2012-11-30 21:41:27.179 [4114] >TRACE:   clssnmPollingThread: diskTimeout set to (200000)ms impending reconfig status(0)
[    CSSD]2012-11-30 22:00:15.975 [4114] >WARNING: clssnmPollingThread: node xjaccdb1 (1) at 50 2.481040e-265artbeat fatal, eviction in 14.744 seconds seedhbimpd 0
[    CSSD]2012-11-30 22:00:15.975 [4114] >TRACE:   clssnmPollingThread: node xjaccdb1 (1) is impending reconfig, flag 1039, misstime 15256
[    CSSD]2012-11-30 22:00:15.976 [4114] >TRACE:   clssnmPollingThread: diskTimeout set to (27000)ms impending reconfig status(1)
[    CSSD]2012-11-30 22:00:16.976 [4114] >WARNING: clssnmPollingThread: node xjaccdb1 (1) at 50 2.481040e-265artbeat fatal, eviction in 13.743 seconds seedhbimpd 1
[    CSSD]2012-11-30 22:00:18.976 [4114] >TRACE:   clssnmPollingThread: diskTimeout set to (200000)ms impending reconfig status(0)
[    CSSD]2012-11-30 22:06:04.262 [4114] >WARNING: clssnmPollingThread: node xjaccdb1 (1) at 50 2.481040e-265artbeat fatal, eviction in 14.753 seconds seedhbimpd 0
[    CSSD]2012-11-30 22:06:04.262 [4114] >TRACE:   clssnmPollingThread: node xjaccdb1 (1) is impending reconfig, flag 1039, misstime 15247
[    CSSD]2012-11-30 22:06:04.262 [4114] >TRACE:   clssnmPollingThread: diskTimeout set to (27000)ms impending reconfig status(1)
[    CSSD]2012-11-30 22:06:05.262 [4114] >WARNING: clssnmPollingThread: node xjaccdb1 (1) at 50 2.481040e-265artbeat fatal, eviction in 13.753 seconds seedhbimpd 1
[    CSSD]2012-11-30 22:06:12.267 [4114] >WARNING: clssnmPollingThread: node xjaccdb1 (1) at 75 2.481040e-265artbeat fatal, eviction in 6.748 seconds seedhbimpd 1
[    CSSD]2012-11-30 22:49:33.956 >USER:    Copyright 2012, Oracle version 10.2.0.4.0


2012-11-30 09:27:51.137 2节点发现1节点 心跳问题,到22:06 evict
  1. [    CSSD]2012-11-30 22:06:19.558 [2829] >WARNING: clssnmeventhndlr: Receive failure with node 2 (xjaccdb2), state 3, con(111f1a410), probe(0), rc=11
  2. [    CSSD]2012-11-30 22:06:29.797 [3857] >TRACE:   clssgmPeerDeactivate: node 2 (xjaccdb2), death 0, state 0x1 connstate 0xf
  3. [    CSSD]2012-11-30 22:06:29.797 [2829] >TRACE:   clssnmDiscHelper: xjaccdb2, node(2) connection failed, con (111f1a410), probe(0)
  4. [    CSSD]2012-11-30 22:06:29.796 [4371] >TRACE:   clscsendx: (111f1a410) Physical connection (111f18eb0) not active

  5. [    CSSD]2012-11-30 22:06:29.801 [4371] >WARNING: clssnmsendmsg: send failed, node 2, type 3, rc 11

  6. [    CSSD]2012-11-30 22:06:33.812 [4114] >WARNING: clssnmPollingThread: node xjaccdb2 (2) at 50 2.481040e-265artbeat fatal, eviction in 14.820 seconds seedhbimpd 0
  7. [    CSSD]2012-11-30 22:06:33.812 [4114] >TRACE:   clssnmPollingThread: node xjaccdb2 (2) is impending reconfig, flag 1, misstime 15180
  8. [    CSSD]2012-11-30 22:06:33.812 [4114] >TRACE:   clssnmPollingThread: diskTimeout set to (27000)ms impending reconfig status(1)
  9. [    CSSD]2012-11-30 22:06:34.814 [4114] >WARNING: clssnmPollingThread: node xjaccdb2 (2) at 50 2.481040e-265artbeat fatal, eviction in 13.818 seconds seedhbimpd 1
  10. [    CSSD]2012-11-30 22:06:41.819 [4114] >WARNING: clssnmPollingThread: node xjaccdb2 (2) at 75 2.481040e-265artbeat fatal, eviction in 6.813 seconds seedhbimpd 1
  11. [    CSSD]2012-11-30 22:06:45.819 [4114] >WARNING: clssnmPollingThread: node xjaccdb2 (2) at 90 2.481040e-265artbeat fatal, eviction in 2.813 seconds seedhbimpd 1
  12. [    CSSD]2012-11-30 22:06:46.827 [4114] >WARNING: clssnmPollingThread: node xjaccdb2 (2) at 90 2.481040e-265artbeat fatal, eviction in 1.805 seconds seedhbimpd 1
  13. [    CSSD]2012-11-30 22:06:47.834 [4114] >WARNING: clssnmPollingThread: node xjaccdb2 (2) at 90 2.481040e-265artbeat fatal, eviction in 0.797 seconds seedhbimpd 1
  14. [    CSSD]2012-11-30 22:06:48.635 [4114] >TRACE:   clssnmPollingThread: Eviction started for node xjaccdb2 (2), flags 0x0001, state 3, wt4c 0 seedhbimpd 1
  15. [    CSSD]2012-11-30 22:06:48.635 [4628] >TRACE:   clssnmDoSyncUpdate: Initiating sync 3
  16. [    CSSD]2012-11-30 22:06:48.635 [4628] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (27000)ms
  17. [    CSSD]2012-11-30 22:06:48.635 [4628] >TRACE:   clssnmSetupAckWait: Ack message type (11)
  18. [    CSSD]2012-11-30 22:06:48.635 [4628] >TRACE:   clssnmSetupAckWait: node(1) is ALIVE
  19. [    CSSD]2012-11-30 22:06:48.635 [4628] >TRACE:   clssnmSendSync: syncSeqNo(3)
  20. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)
  21. [    CSSD]2012-11-30 22:06:48.636 [2829] >TRACE:   clssnmHandleSync: diskTimeout set to (27000)ms
  22. [    CSSD]2012-11-30 22:06:48.636 [2829] >TRACE:   clssnmHandleSync: Acknowledging sync: src[1] srcName[xjaccdb1] seq[9] sync[3]
  23. [    CSSD]2012-11-30 22:06:48.636 [1] >USER:    NMEVENT_SUSPEND [00][00][00][06]
  24. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmWaitForAcks: done, msg type(11)
  25. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmDoSyncUpdate: Terminating node 2, xjaccdb2, misstime(30004) state(5), seedhbimpd(1)
  26. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmSetupAckWait: Ack message type (13)
  27. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmSetupAckWait: node(1) is ACTIVE
  28. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)
  29. [    CSSD]2012-11-30 22:06:48.636 [2829] >TRACE:   clssnmSendVoteInfo: node(1) syncSeqNo(3)
  30. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmWaitForAcks: done, msg type(13)
  31. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmCheckDskInfo: Checking disk info...
  32. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmEvict: Start
  33. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmEvict: Evicting node 2, xjaccdb2, birth 2, death 3, impendingrcfg 1, stateflags 0x1
  34. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmWaitOnEvictions: Start
  35. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmWaitOnEvictions: node 2, xjaccdb2, is not dead, seedhbimpd 1
  36. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmCheckKillStatus: Node 2, xjaccdb2, down, LATS(1290002121),timeout(14822)
  37. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmSetupAckWait: Ack message type (15)
  38. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmSetupAckWait: node(1) is ACTIVE
  39. [    CSSD]2012-11-30 22:06:48.636 [4628] >TRACE:   clssnmSendUpdate: syncSeqNo(3)
  40. [    CSSD]2012-11-30 22:06:59.438 [4628] >WARNING: CLSSNMCTX_NODEDB_UNLOCK: lock held for 10802 ms
  41. [    CSSD]2012-11-30 22:06:59.438 [4628] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(1)
  42. [    CSSD]2012-11-30 22:06:59.441 [2829] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
  43. [    CSSD]2012-11-30 22:06:59.441 [2829] >TRACE:   clssnmUpdateNodeState: node 1, state (3/3) unique (1354218233/1354218233) prevConuni(0) birth (1/1) (old/new)
  44. [    CSSD]2012-11-30 22:06:59.441 [2829] >TRACE:   clssnmUpdateNodeState: node 2, state (5/0) unique (1354219454/1354219454) prevConuni(1354219454) birth (2/2) (old/new)
  45. [    CSSD]2012-11-30 22:06:59.441 [2829] >TRACE:   clssnmDeactivateNode: node 2 (xjaccdb2) left cluster

  46. [    CSSD]2012-11-30 22:06:59.441 [2829] >USER:    clssnmHandleUpdate: SYNC(3) from node(1) completed
  47. [    CSSD]2012-11-30 22:06:59.441 [2829] >USER:    clssnmHandleUpdate: NODE 1 (xjaccdb1) IS ACTIVE MEMBER OF CLUSTER
  48. [    CSSD]2012-11-30 22:06:59.441 [2829] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms
  49. [    CSSD]2012-11-30 22:06:59.442 [4628] >TRACE:   clssnmWaitForAcks: done, msg type(15)
  50. [    CSSD]2012-11-30 22:06:59.442 [4628] >TRACE:   clssnmDoSyncUpdate: Sync 3 complete!
  51. [    CSSD]2012-11-30 22:06:59.450 [4887] >TRACE:   clssgmReconfigThread:  started for reconfig (3)
  52. [    CSSD]2012-11-30 22:06:59.450 [4887] >USER:    NMEVENT_RECONFIG [00][00][00][02]
  53. [    CSSD]2012-11-30 22:06:59.450 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2
  54. [    CSSD]2012-11-30 22:06:59.450 [4887] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(crs_version) birth(2/0)
  55. [    CSSD]2012-11-30 22:06:59.453 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_1_accdb type 2
  56. [    CSSD]2012-11-30 22:06:59.453 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_1_accdb type 3
  57. [    CSSD]2012-11-30 22:06:59.453 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_2_accdb type 2
  58. [    CSSD]2012-11-30 22:07:00.190 [4887] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(ORA_CLSRD_2_accdb) birth(2/0)
  59. [    CSSD]2012-11-30 22:07:00.190 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_2_accdb type 3
  60. [    CSSD]2012-11-30 22:07:00.190 [4887] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(ORA_CLSRD_2_accdb) birth(2/0)
  61. [    CSSD]2012-11-30 22:07:00.190 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock DBACCDB type 2
  62. [    CSSD]2012-11-30 22:07:00.190 [4887] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(DBACCDB) birth(2/0)
  63. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock DGACCDB type 2
  64. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(DGACCDB) birth(2/0)
  65. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock DAALL_DB type 2
  66. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(DAALL_DB) birth(2/0)
  67. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2
  68. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(CRSDMAIN) birth(2/0)
  69. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2
  70. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(EVMDMAIN) birth(2/0)
  71. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock IGACCDBALL type 2
  72. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(IGACCDBALL) birth(2/0)
  73. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock ocr_crs type 2
  74. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(ocr_crs) birth(2/0)
  75. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock IGACCDBACCDB1 type 2
  76. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock IGACCDBACCDB2 type 2
  77. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(IGACCDBACCDB2) birth(2/0)
  78. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock #CSS_CLSSOMON type 2
  79. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(#CSS_CLSSOMON) birth(2/0)
  80. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_xjaccdb1 type 3
  81. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_xjaccdb2 type 3
  82. [    CSSD]2012-11-30 22:07:00.191 [4887] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(_ORA_CRS_MEMBER_xjaccdb2) birth(2/0)
  83. [    CSSD]2012-11-30 22:07:00.192 [4887] >TRACE:   clssgmEstablishConnections: 1 nodes in cluster incarn 3
  84. [    CSSD]2012-11-30 22:07:00.192 [3857] >TRACE:   clssgmPeerListener: connects done (1/1)
  85. [    CSSD]2012-11-30 22:07:00.192 [4887] >TRACE:   clssgmEstablishMasterNode: MASTER for 3 is node(1) birth(1)
  86. [    CSSD]2012-11-30 22:07:00.192 [4887] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status
  87. [    CSSD]2012-11-30 22:07:00.241 [4887] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete
  88. [    CSSD]CLSS-3000: reconfiguration successful, incarnation 3 with 1 nodes

  89. [    CSSD]CLSS-3001: local node number 1, master node number 1
复制代码
1节点在22:06:19.558 发现2节点心跳问题,22:06:59.441 驱逐2节点

回复 只看该作者 道具 举报

7#
发表于 2012-12-11 21:27:20
考虑到可见的 ipc send timeout和 network heartbeat问题 , 我认为如果不是网络中断造成的 那么可能与 异常的主机负载有关

提供当时的AWR 和 OSW日志

回复 只看该作者 道具 举报

8#
发表于 2012-12-11 21:54:06
现在只有nmon的监控报表,比较大传不了

回复 只看该作者 道具 举报

9#
发表于 2012-12-11 22:27:49
Liu Maclean(刘相兵 发表于 2012-12-11 21:27
考虑到可见的 ipc send timeout和 network heartbeat问题 , 我认为如果不是网络中断造成的 那么可能与 异 ...

我传到网盘上节点1上的nmon  
http://pan.baidu.com/share/link?shareid=188412&uk=704910983
节点2的nmon
http://pan.baidu.com/share/link?shareid=188413&uk=704910983

回复 只看该作者 道具 举报

10#
发表于 2012-12-12 14:43:19
瞄了一眼 22:00左右cpu和磁盘似乎都不是非常忙,如果需要进一步诊断 建议你找一个 现场工程师看一下

回复 只看该作者 道具 举报

11#
发表于 2012-12-12 15:24:57
Liu Maclean(刘相兵 发表于 2012-12-12 14:43
瞄了一眼 22:00左右cpu和磁盘似乎都不是非常忙,如果需要进一步诊断 建议你找一个 现场工程师看一下 ...

刘大,看到errpt里面22点有以下报错:貌似负载太重,导致心跳连接超时
  1. LABEL:          TS_NIM_ERROR_STUCK_
  2. IDENTIFIER:     3D32B80D

  3. Date/Time:       Fri Nov 30 21:41:38 BEIST 2012
  4. Sequence Number: 189005
  5. Machine Id:      00C200A44C00
  6. Node Id:         xjaccdb1
  7. Class:           S
  8. Type:            PERM
  9. Resource Name:   topsvcs         

  10. Description
  11. NIM thread blocked

  12. Probable Causes
  13. A thread in a Topology Services Network Interface Module (NIM) process
  14. was blocked
  15. Topology Services NIM process cannot get timely access to CPU

  16. User Causes
  17. Excessive memory consumption is causing high memory contention
  18. Excessive disk I/O is causing high memory contention

  19.         Recommended Actions
  20.         Examine I/O and memory activity on the system
  21.         Reduce load on the system
  22.         Tune virtual memory parameters
  23.         Call IBM Service if problem persists

  24. Failure Causes
  25. Excessive virtual memory activity prevents NIM from making progress
  26. Excessive disk I/O traffic is interfering with paging I/O

  27.         Recommended Actions
  28.         Examine I/O and memory activity on the system
  29.         Reduce load on the system
  30.         Tune virtual memory parameters
  31.         Call IBM Service if problem persists

  32. Detail Data
  33. DETECTING MODULE
  34. rsct,nim_control.C,1.39.1.24,5973            
  35. ERROR ID
  36. 6BUfAx.GS9iE/fGV/56EK00...................
  37. REFERENCE CODE
  38.                                           
  39. Thread which was blocked
  40. receive thread
  41. Interval in seconds during which process was blocked
  42.           32
  43. Interface name
  44. en1
复制代码

回复 只看该作者 道具 举报

12#
发表于 2012-12-12 16:09:35
xinxin415415 发表于 2012-12-12 15:24
刘大,看到errpt里面22点有以下报错:貌似负载太重,导致心跳连接超时

简单看了下 内存和IO在问题时段还是可以的, 建议你也找一下主机供应商 介入该问题。

网络求助到此为止了,这个实际涉及到 商业利益了。

回复 只看该作者 道具 举报

13#
发表于 2012-12-12 16:56:08
OK  感谢刘老师帮忙

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-11-16 04:49 , Processed in 0.056233 second(s), 23 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569