Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

8

积分

1

好友

20

主题
1#
发表于 2013-11-22 12:00:43 | 查看: 4432| 回复: 4
环境描述:AIX 5.3/Oracle 10.2.0.4 64bit
问题描述: 双节点rac中节点1的db 实例在11月20号凌晨2点多abort,直到早上9点上班才通过手工将实例启动,通过后台日志分析发现是由于asm实例无法与db实例通信引发重启导致instance abort,随后crsd进程尝试重启asm实例和db实例,但是只成功启动了asm实例而
无法重启db实例。

诊断日志信息:
1.asm实例的警告日志
Wed Nov 20 02:43:50 2013
Errors in file /home/oracle/admin/+ASM/bdump/+asm1_ckpt_6553660.trc:
ORA-15082: ASM failed to communicate with database instance
CKPT: terminating instance due to error 15082
Wed Nov 20 02:43:50 2013
System state dump is made for local instance
System State dumped to trace file /home/oracle/admin/+ASM/bdump/+asm1_diag_19988986.trc
Wed Nov 20 02:43:55 2013
Instance terminated by CKPT, pid = 6553660

2.db实例的警告日志
Wed Nov 20 02:47:57 2013
Errors in file /home/oracle/admin/zljg/bdump/zljg1_asmb_27983948.trc:
ORA-15064: communication failure with ASM instance
ORA-03135: connection lost contact
Wed Nov 20 02:47:57 2013
ASMB: terminating instance due to error 15064
Wed Nov 20 02:47:58 2013
System state dump is made for local instance
System State dumped to trace file /home/oracle/admin/zljg/bdump/zljg1_diag_29425790.trc
Wed Nov 20 02:48:00 2013
Shutting down instance (abort)
Wed Nov 20 02:48:03 2013
Instance terminated by ASMB, pid = 27983948
Wed Nov 20 02:48:05 2013
Instance terminated by USER, pid = 28967452
Wed Nov 20 10:27:16 2013
Starting ORACLE instance (normal)

3.crd进程的日志
2013-11-20 02:08:10.533: [  CRSEVT][14032]32CAAMonitorHandler :: 0:Could not join /home/oracle/product/10.2.0/crs/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

2013-11-20 02:08:10.534: [  CRSEVT][14032]32CAAMonitorHandler :: 0:Action Script /home/oracle/product/10.2.0/crs/bin/racgwrap(check) timed out for ora.yjbsdb5.vip! (timeout=60)
2013-11-20 02:08:10.534: [  CRSAPP][14032]32CheckResource error for ora.yjbsdb5.vip error code = -2

2013-11-20 02:08:10.533: [  CRSEVT][14032]32CAAMonitorHandler :: 0:Could not join /home/oracle/product/10.2.0/crs/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

2013-11-20 02:08:10.534: [  CRSEVT][14032]32CAAMonitorHandler :: 0:Action Script /home/oracle/product/10.2.0/crs/bin/racgwrap(check) timed out for ora.yjbsdb5.vip! (timeout=60)
2013-11-20 02:08:10.534: [  CRSAPP][14032]32CheckResource error for ora.yjbsdb5.vip error code = -2

2013-11-20 02:47:59.033: [  CRSRES][15999]32In stateChanged, ora.zljg.zljg1.inst target is ONLINE
2013-11-20 02:47:59.033: [  CRSRES][15999]32ora.zljg.zljg1.inst on yjbsdb5 went OFFLINE unexpectedly
2013-11-20 02:47:59.033: [  CRSRES][15999]32StopResource: setting CLI values
2013-11-20 02:47:59.103: [  CRSRES][15999]32Attempting to stop `ora.zljg.zljg1.inst` on member `yjbsdb5`
2013-11-20 02:48:06.821: [  CRSRES][13936]32Start of `ora.yjbsdb5.ASM1.asm` on member `yjbsdb5` succeeded.
2013-11-20 02:48:06.821: [  CRSRES][13936]32Successfully restarted ora.yjbsdb5.ASM1.asm on yjbsdb5, RESTART_COUNT=1
2013-11-20 02:48:06.905: [  CRSRES][13936]32ora.yjbsdb5.ASM1.asm Updated LAST_RESTART time in ocr
2013-11-20 02:48:20.466: [  OCRSRV][11473]th_select_w_f_r: Error processing request [5]
2013-11-20 02:48:20.538: [  OCRSRV][6693]th_select_w_f_r: Error processing request [5]
2013-11-20 02:48:36.861: [  CRSRES][15999]32Stop of `ora.zljg.zljg1.inst` on member `yjbsdb5` succeeded.
2013-11-20 02:48:36.862: [  CRSRES][15999]32ora.zljg.zljg1.inst RESTART_COUNT=5 RESTART_ATTEMPTS=5
2013-11-20 02:48:36.862: [  CRSRES][15999]32ora.zljg.zljg1.inst Uptime does not exceed uptime_threshold
2013-11-20 02:48:36.862: [  CRSRES][15999]32ora.zljg.zljg1.inst ran out of restarts on yjbsdb5
2013-11-20 02:48:36.884: [  CRSRES][15999]32ora.zljg.zljg1.inst failed on yjbsdb5 relocating.
2013-11-20 02:48:37.399: [  CRSRES][15999]32Cannot relocate ora.zljg.zljg1.instStopping dependents
2013-11-20 02:48:37.424: [  CRSRES][15999]32StopResource: setting CLI values

自己的疑问:
1.是什么导致asm实例重启(IO负载过重 or bug or other)
2.为什么crsd无法启动db实例的resourcedd

补充说明:
ASM警告日志中在重启ASM实例时有时会报ORA-00600: internal error code, arguments: [kfgrpGetByNum02], [2], [318121198], [2], [317804538], [], [], [],
并且节点1上在凌晨2点都会开始做RMAN备份,之前在凌晨2点多db instance会不定期被ORA-29740: evicted by member 1, group incarnation 28,但是事后可以自动重启。这次的问题貌似更严重,db实例已经无法被crsd启动了!故障时间段的相关日志都在附件中!

log.rar

1.84 MB, 下载次数: 1019

2#
发表于 2013-11-24 20:57:12
Bug 9094013 - OERI:kfgrpGetByNum02 from 11g ASM with 10g DB instances (Doc ID 9094013.8)

回复 只看该作者 道具 举报

3#
发表于 2013-11-25 21:21:32
lunar 发表于 2013-11-24 20:57
Bug 9094013 - OERI:kfgrpGetByNum02 from 11g ASM with 10g DB instances (Doc ID 9094013.8)

首先感谢你的答复,我有个疑问,这个bug是专门针对11g asm与10g db才会触发吗,我现有环境asm和db都是10.2.0.4的

回复 只看该作者 道具 举报

4#
发表于 2013-11-25 22:42:41
你看看呀,那个问题列出了几个版本,也有问题描述,其中有10.2.0.4.。。。。。。

回复 只看该作者 道具 举报

5#
发表于 2013-11-25 23:16:55
lunar 发表于 2013-11-25 22:42
你看看呀,那个问题列出了几个版本,也有问题描述,其中有10.2.0.4.。。。。。。 ...

知道了,感谢!

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-12-21 09:44 , Processed in 0.078971 second(s), 23 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569