Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

5

积分

1

好友

2

主题
1#
发表于 2013-6-28 10:21:29 | 查看: 4900| 回复: 3
环境:rac 11.2.0.1    aix 6.1
情况:某一个节点不定期的重启,不是固定的一个节点,重启周期大概一个月一次

最近一次的日志如下
dbs02

# errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
A924A5FC   0626225013 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED
A924A5FC   0626225013 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED
A6DF45AA   0626004413 I O RMCdaemon      The daemon is started.
2BFA76F6   0626004113 T S SYSPROC        SYSTEM SHUTDOWN BY USER
9DBCFDEE   0626004313 T O errdemon       ERROR LOGGING TURNED ON

alertdbs02.log

2013-06-26 00:44:18.653
[ohasd(1966340)]CRS-2112:The OLR service started on node dbs02.
2013-06-26 00:44:19.158
[ohasd(1966340)]CRS-8017:location: /etc/oracle/lastgasp has 28 reboot advisory log files, 0 were announced and 0 errors occurred
2013-06-26 00:44:41.675
[ohasd(1966340)]CRS-2772:Server 'dbs02' has been assigned to pool 'Free'.
2013-06-26 00:45:16.970
[ctssd(4260068)]CRS-2403:The Cluster Time Synchronization Service on host dbs02 is in observer mode.
2013-06-26 00:45:17.021
[ctssd(4260068)]CRS-2407:The new Cluster Time Synchronization Service reference node is host dbs01.
2013-06-26 00:45:17.724
[ctssd(4260068)]CRS-2401:The Cluster Time Synchronization Service started on host dbs02.
2013-06-26 00:45:49.488
[crsd(3997986)]CRS-1012:The OCR service started on node dbs02.
2013-06-26 00:45:53.067
[crsd(3997986)]CRS-1201:CRSD started on node dbs02.
2013-06-26 00:50:06.051
[ctssd(4260068)]CRS-2409:The clock on host dbs02 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode.

alert_orcl2.log
Wed Jun 26 00:46:26 2013
Starting ORACLE instance (normal)



dbs01

alertdbs01.log
-------------------------------------
2013-06-26 00:40:42.052
[ctssd(2097634)]CRS-2407:The new Cluster Time Synchronization Service reference node is host dbs01.
2013-06-26 00:41:23.794
[crsd(5046304)]CRS-5504:Node down event reported for node 'dbs02'.
2013-06-26 00:41:53.052
[crsd(5046304)]CRS-2773:Server 'dbs02' has been removed from pool 'Generic'.
2013-06-26 00:41:53.079
[crsd(5046304)]CRS-2773:Server 'dbs02' has been removed from pool 'ora.orcl'.

----------------------------

crsd.log
2013-06-26 00:40:41.222: [  CRSCCL][9773]Reconfig event received by clssgsgrpstat

2013-06-26 00:40:41.308: [  CRSCCL][9773]cclGetMemberData called
2013-06-26 00:40:41.400: [  CRSCCL][9773]Disconnecting connection to member 2 node 192.168.60.222.
2013-06-26 00:40:41.423: [  CRSCCL][9773]clsdisc con = 115939a10.
2013-06-26 00:40:41.628: [CLSFRAME][9773] CCL MEMBER LEFT:2:1:CRSD:dbs02
2013-06-26 00:40:41.664: [   CRSSE][8744] Forwarding Node Leave to PE for: dbs02
2013-06-26 00:40:41.693: [CLSFRAME][9773] Disconnected from CRSD:dbs02 process: {Absolute|Node:2|Process:1727121070|Type:1}
2013-06-26 00:40:41.756: [  CRSCCL][9773]Reconfig handled

2013-06-26 00:40:41.806: [  OCRSRV][4371]th_not_master_change: Invoking master change callback. Master [1] Inc [427]
2013-06-26 00:40:41.949: [    AGFW][11829] Agfw Proxy Server received process disconnected notification, count=1
2013-06-26 00:40:41.995: [   CRSSE][13371] Master Change Event; New Master Node ID:1 This Node's ID:1
2013-06-26 00:40:42.030: [  OCRMAS][4371]th_master:13: I AM THE NEW OCR MASTER at incar 426. Node Number 1
2013-06-26 00:40:42.225: [   CRSPE][13114] PE Role|State Update: old role [SLAVE] new [MASTER]; old state [Running] new [Configuring]
2013-06-26 00:40:42.231: [   CRSPE][13114] PE MASTER NAME: dbs01
2013-06-26 00:40:42.232: [   CRSPE][13114] Starting to read configuration
2013-06-26 00:40:42.587: [  OCRSRV][12600]proas_amiwriter: ctx is MASTER CHANGING/CONNECTING
2013-06-26 00:40:42.591: [  OCRSRV][12857]proas_amiwriter: ctx is MASTER CHANGING/CONNECTING

ocssd.log
2013-06-26 00:39:59.072: [    CSSD][5157]clssnmSendingThread: sending status msg to all nodes
2013-06-26 00:39:59.072: [    CSSD][5157]clssnmSendingThread: sent 4 status msgs to all nodes
2013-06-26 00:40:03.076: [    CSSD][5157]clssnmSendingThread: sending status msg to all nodes
2013-06-26 00:40:03.076: [    CSSD][5157]clssnmSendingThread: sent 4 status msgs to all nodes
2013-06-26 00:40:07.083: [    CSSD][5157]clssnmSendingThread: sending status msg to all nodes
2013-06-26 00:40:07.083: [    CSSD][5157]clssnmSendingThread: sent 4 status msgs to all nodes
2013-06-26 00:40:10.628: [    CSSD][4900]clssnmPollingThread: node dbs02 (2) at 50% heartbeat fatal, removal in 14.310 seconds
2013-06-26 00:40:10.628: [    CSSD][4900]clssnmPollingThread: node dbs02 (2) is impending reconfig, flag 394254, misstime 15690
2013-06-26 00:40:10.628: [    CSSD][4900]clssnmPollingThread: local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)
2013-06-26 00:40:11.088: [    CSSD][5157]clssnmSendingThread: sending status msg to all nodes
2013-06-26 00:40:11.088: [    CSSD][5157]clssnmSendingThread: sent 4 status msgs to all nodes
2013-06-26 00:40:15.090: [    CSSD][5157]clssnmSendingThread: sending status msg to all nodes
2013-06-26 00:40:15.090: [    CSSD][5157]clssnmSendingThread: sent 4 status msgs to all nodes
2013-06-26 00:40:16.477: [    CSSD][1286]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 926 > margin 750 cur_ms 4215208529 lastalive 4215207603
2013-06-26 00:40:16.477: [    CSSD][1286]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 926 > margin 750 cur_ms 4215208529 lastalive 4215207603
2013-06-26 00:40:16.477: [    CSSD][1286]clssnmvSchedDiskThreads: DiskPingMonitorThread sched delay 926 > margin 750 cur_ms 4215208529 lastalive 4215207603
2013-06-26 00:40:17.636: [    CSSD][4900]clssnmPollingThread: node dbs02 (2) at 75% heartbeat fatal, removal in 7.298 seconds
2013-06-26 00:40:19.096: [    CSSD][5157]clssnmSendingThread: sending status msg to all nodes
2013-06-26 00:40:19.096: [    CSSD][5157]clssnmSendingThread: sent 4 status msgs to all nodes
2013-06-26 00:40:22.641: [    CSSD][4900]clssnmPollingThread: node dbs02 (2) at 90% heartbeat fatal, removal in 2.293 seconds, seedhbimpd 1
2013-06-26 00:40:23.100: [    CSSD][5157]clssnmSendingThread: sending status msg to all nodes
2013-06-26 00:40:23.100: [    CSSD][5157]clssnmSendingThread: sent 4 status msgs to all nodes
2013-06-26 00:40:24.934: [    CSSD][4900]clssnmPollingThread: Removal started for node dbs02 (2), flags 0x6040e, state 3, wt4c 0
2013-06-26 00:40:24.934: [    CSSD][4900]clssnmDiscHelper: dbs02, node(2) connection failed, endp (32e), probe(0), ninf->endp 32e
2013-06-26 00:40:24.934: [    CSSD][4900]clssnmDiscHelper: node 2 clean up, endp (32e), init state 5, cur state 5
2013-06-26 00:40:24.934: [GIPCXCPT][4900]gipcInternalDissociate: obj 112dea4b0 [000000000000032e] { gipcEndpoint : localAddr 'gipc://dbs01:3182-54d3-ae85-1f02#192.168.60.221#40199', remoteAddr 'gipc://dbs02:nm_dbs-cluster#192.168.60.222#52530', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x3801fc, pidPeer 0, flags 0x2616, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)
2013-06-26 00:40:24.934: [GIPCXCPT][4900]gipcDissociateF [clssnmDiscHelper : clssnm.c : 3260]: EXCEPTION[ ret gipcretFail (1) ]  failed to dissociate obj 112dea4b0 [000000000000032e] { gipcEndpoint : localAddr 'gipc://dbs01:3182-54d3-ae85-1f02#192.168.60.221#40199', remoteAddr 'gipc://dbs02:nm_dbs-cluster#192.168.60.222#52530', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x3801fc, pidPeer 0, flags 0x2616, usrFlags 0x0 }, flags 0x0
2013-06-26 00:40:24.934: [    CSSD][5414]clssnmDoSyncUpdate: Initiating sync 239553874
2013-06-26 00:40:24.934: [    CSSD][5414]clssscUpdateEventValue: NMReconfigInProgress  val 1, changes 3
2013-06-26 00:40:24.934: [    CSSD][5414]clssnmDoSyncUpdate: local disk timeout set to 27000 ms, remote disk timeout set to 27000
2013-06-26 00:40:24.934: [    CSSD][5414]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will take effect when the sync is completed.
2013-06-26 00:40:24.934: [    CSSD][5414]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 239553874
2013-06-26 00:40:24.934: [    CSSD][5414]clssnmSetupAckWait: Ack message type (11)
2013-06-26 00:40:24.934: [    CSSD][5414]clssnmSetupAckWait: node(1) is ALIVE

从日志看节点二心跳超时导致脑裂
附上nmon信息


请帮忙分析节点重启的原因,看看是不是私有网络的问题

nmonlog.rar

8.28 MB, 下载次数: 656

2#
发表于 2013-6-28 11:06:02
更详细的日志如下

log.txt

193.28 KB, 下载次数: 670

回复 只看该作者 道具 举报

3#
发表于 2013-6-28 16:49:50
没有必要先放NMON的数据

你后面放的日志 经过你的处理了, 看着更乱

给出问题发生的时间点 和 cssd.log

回复 只看该作者 道具 举报

4#
发表于 2013-6-28 17:25:07
从操作系统的日志看到06-26 00:41:13 节点二发生了重启
dbs02
# errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
A924A5FC   0626225013 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED
A924A5FC   0626225013 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED
A6DF45AA   0626004413 I O RMCdaemon      The daemon is started.
2BFA76F6   0626004113 T S SYSPROC        SYSTEM SHUTDOWN BY USER
9DBCFDEE   0626004313 T O errdemon       ERROR LOGGING TURNED ON

上传两个节点00:38---00:45这个时间段的cssd日志

cssdlog.rar

45.5 KB, 下载次数: 715

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-12-29 15:56 , Processed in 0.051580 second(s), 23 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569