Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

5

积分

1

好友

3

主题
1#
发表于 2013-6-7 09:53:31 | 查看: 5542| 回复: 7
本帖最后由 huiwenshu 于 2013-6-7 10:07 编辑

环境:
AIX6.1
HACMP6.1
RAC 10.2.0.5
出现的现象,节点1宕机后,节点2的CRS也启动不起来了,interconnect使用直连方式,netstat -in 还能查看到public和priv的ip.大家帮忙看下呢。附件为当时节点2的cssd.log和crsd.log. 节点1宕机时间在06.06日3点过. 以下是部分cssd.log:
$ tail -n 30000 ocssd.l01 | more
[    CSSD]2013-06-06 05:42:08.502 [4885] >TRACE:   clssnmLocalJoinEvent: set curtime (815061278) for my node
[    CSSD]2013-06-06 05:42:08.502 [4885] >TRACE:   clssnmLocalJoinEvent: scan 5 nodes
[    CSSD]2013-06-06 05:42:08.502 [4885] >TRACE:   clssnmLocalJoinEvent: node(3), state(0), cont (1), sleep (0), diskHB 1, diskinfo
110951850
[    CSSD]2013-06-06 05:42:08.502 [4885] >TRACE:   clssnmLocalJoinEvent: node(3), LAT (1)
[    CSSD]2013-06-06 05:42:08.502 [4885] >TRACE:   clssnmLocalJoinEvent: node(4), state(1), cont (0), sleep (0), diskHB 1, diskinfo
110951850
[    CSSD]2013-06-06 05:42:08.502 [4885] >TRACE:   clssnmLocalJoinEvent: No sleeping for mynode(4)
[    CSSD]2013-06-06 05:42:08.502 [4885] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[    CSSD]2013-06-06 05:42:09.283 [4114] >TRACE:   clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
[    CSSD]2013-06-06 05:42:10.283 [4114] >TRACE:   clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
[    CSSD]2013-06-06 05:42:11.283 [4114] >TRACE:   clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
[    CSSD]2013-06-06 05:42:11.364 [4628] >TRACE:   clssnmSendingThread: sending join msg to all nodes
[    CSSD]2013-06-06 05:42:11.364 [4628] >TRACE:   clssnmSendingThread: sent 5 join msgs to all nodes
[    CSSD]2013-06-06 05:42:12.283 [4114] >TRACE:   clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
[    CSSD]2013-06-06 05:42:13.283 [4114] >TRACE:   clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
[    CSSD]2013-06-06 05:42:14.284 [4114] >TRACE:   clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
[    CSSD]2013-06-06 05:42:15.284 [4114] >TRACE:   clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
[    CSSD]2013-06-06 05:42:15.504 [4885] >TRACE:   clssnmRcfgMgrThread: Local Join
[    CSSD]2013-06-06 05:42:15.504 [4885] >TRACE:   clssnmLocalJoinEvent: begin on node(4), waittime 193000
[    CSSD]2013-06-06 05:42:15.504 [4885] >TRACE:   clssnmLocalJoinEvent: curTime (815068280) - LAT (814983171) = 85109, for node (3)
, waittime 193000
[    CSSD]2013-06-06 05:42:15.504 [4885] >TRACE:   clssnmLocalJoinEvent: curTime (815068280) - LAT (814983427) = 84853, for node (4)
, waittime 193000
[    CSSD]2013-06-06 05:42:15.504 [4885] >TRACE:   clssnmLocalJoinEvent: set curtime (815068280) for my node
[    CSSD]2013-06-06 05:42:15.504 [4885] >TRACE:   clssnmLocalJoinEvent: scan 5 nodes
[    CSSD]2013-06-06 05:42:15.504 [4885] >TRACE:   clssnmLocalJoinEvent: node(3), state(0), cont (1), sleep (0), diskHB 1, diskinfo
110951850
[    CSSD]2013-06-06 05:42:15.504 [4885] >TRACE:   clssnmLocalJoinEvent: node(3), LAT (1)
[    CSSD]2013-06-06 05:42:15.504 [4885] >TRACE:   clssnmLocalJoinEvent: node(4), state(1), cont (0), sleep (0), diskHB 1, diskinfo
110951850
[    CSSD]2013-06-06 05:42:15.504 [4885] >TRACE:   clssnmLocalJoinEvent: No sleeping for mynode(4)
[    CSSD]2013-06-06 05:42:15.504 [4885] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[    CSSD]2013-06-06 05:42:15.504 [4885] >WARNING: clssnmRcfgMgrThread: not possible to join the cluster. Please reboot the node.
[    CSSD]2013-06-06 05:42:15.504 [4885] >WARNING: clssnmReconfigThread: state(1) clusterState(0) exit
[    CSSD]2013-06-06 05:42:15.504 [4885] >ERROR:   ###################################
[    CSSD]2013-06-06 05:42:15.504 [4885] >ERROR:   clssscExit: CSSD aborting from thread clssnmRcfgMgrThread
[    CSSD]2013-06-06 05:42:15.504 [4885] >ERROR:   ###################################
[    CSSD]--- DUMP GROCK STATE DB ---
[    CSSD]--- END OF GROCK STATE DUMP ---

debug.zip

3.56 MB, 下载次数: 621

cssd和crsd日志

2#
发表于 2013-6-7 17:11:59
[    CSSD]2013-06-06 05:42:16.505 [4885] >TRACE:   0x110017190 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
[    CSSD]2013-06-06 06:19:17.670 >USER:    Copyright 2013, Oracle version 10.2.0.5.0
[  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=bidb02DBG_CSSD))


06:19:17.670节点重启后CSS开始工作

回复 只看该作者 道具 举报

3#
发表于 2013-6-7 17:16:04
[    CSSD]2013-06-06 06:57:20.155 [5142] >TRACE:   clssgmUpdateEventValue: CmInfo State  val 9, changes 8
[    CSSD]2013-06-06 06:57:20.155 [5142] >TRACE:   clssgmCompareSwapEventValue: changed CmInfo State  val 10, from 9, changes 9
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 2 with 2 nodes

[    CSSD]CLSS-3001: local node number 4, master node number 3



从日志看cssd 起来了

回复 只看该作者 道具 举报

4#
发表于 2013-6-7 17:17:16
2013-06-06 06:57:34.353: [  CRSRES][11010]32Start of `ora.bidb02.LISTENER_BIDB02.lsnr` on member `bidb02` succeeded.
2013-06-06 06:57:35.308: [  CRSRES][12311]32CRS-1002: Resource 'ora.bidb02.LISTENER_BIDB02.lsnr' is already running on member 'bidb02'

2013-06-06 06:58:01.531: [  CRSRES][12318]32startRunnable: setting CLI values
2013-06-06 06:58:01.540: [  CRSRES][12318]32Attempting to start `ora.bidb02.ons` on member `bidb02`
2013-06-06 06:58:03.965: [  CRSRES][12318]32Start of `ora.bidb02.ons` on member `bidb02` succeeded.
2013-06-06 06:58:04.555: [  CRSRES][11267]32Start of `ora.GZBI.GZBI2.inst` on member `bidb02` succeeded.
2013-06-06 06:58:04.557: [  CRSRES][11299]32Skip online resource: ora.bidb02.ons
2013-06-06 06:58:04.581: [  CRSRES][11042]32startRunnable: setting CLI values
2013-06-06 06:58:04.591: [  CRSRES][11042]32Attempting to start `ora.bidb02.gsd` on member `bidb02`
2013-06-06 06:58:04.624: [  CRSRES][12581]32CRS-1002: Resource 'ora.bidb02.LISTENER_BIDB02.lsnr' is already running on member 'bidb02'

2013-06-06 06:58:09.739: [  CRSRES][11042]32Start of `ora.bidb02.gsd` on member `bidb02` succeeded.
2013-06-06 06:58:09.760: [  OCRUTL][6891]u_freem: mem passed is null



06:58左右 LISTENER_BIDB02 、gsd、ons、GZBI2.inst都在bidb02上起来了 不知道你说的起不来是什么意思

回复 只看该作者 道具 举报

5#
发表于 2013-6-7 17:27:31
[    CSSD]2013-06-06 05:40:51.381 [1287] >ERROR:   Internal Error Information:
  Category: 1234
  Operation: scls_block_write
  Location: fwrite_faile
  Other: fwrite unable to write buffer
  Dep: 6

[    CSSD]2013-06-06 05:40:51.381 [1287] >ERROR:   clssnmvWriteBlocks: write failed 1 at offset 149 of /dev/rvote2
[    CSSD]2013-06-06 05:40:51.381 [1287] >TRACE:   clssnmDiskStateChange: state from 4 to 3 disk (1//dev/rvote2)
[    CSSD]2013-06-06 05:40:51.381 [1030] >ERROR:   Internal Error Information:
  Category: 1234
  Operation: scls_block_write
  Location: fwrite_faile
  Other: fwrite unable to write buffer
  Dep: 6

[    CSSD]2013-06-06 05:40:51.381 [1030] >ERROR:   clssnmvWriteBlocks: write failed 1 at offset 149 of /dev/rvote1
[    CSSD]2013-06-06 05:40:51.381 [1544] >ERROR:   Internal Error Information:
  Category: 1234
  Operation: scls_block_write
  Location: fwrite_faile
  Other: fwrite unable to write buffer
  Dep: 6


2节点上 /dev/rvote2、/dev/rvote1 2个votedisk 出现过无法写buffer问题

回复 只看该作者 道具 举报

6#
发表于 2013-6-7 17:29:17
6点过的时候,是因为宕机节点1启动起来了,所以节点2crs也正常可以拉起来了,主要是3点过节点1宕机了,节点2拉不起来,能帮忙分析下原因么?

回复 只看该作者 道具 举报

7#
发表于 2013-6-7 17:45:02
什么叫 6点过、 三点过

回复 只看该作者 道具 举报

8#
发表于 2013-6-7 17:45:05
什么叫 6点过、 三点过

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-11-16 15:53 , Processed in 0.053772 second(s), 23 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569