Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

0

积分

1

好友

5

主题
1#
发表于 2013-6-8 21:08:17 | 查看: 5442| 回复: 4
本帖最后由 godspeed 于 2013-6-9 13:30 编辑

从4月22日到6月4日一共出现过9次的样子。
出现故障时连接在库上的应用很奇怪,能连接上数据库,但操作被阻塞,很长时间貌似都无法工作。
6月4日16:22出现故障,虽然几分钟后,节点好像回到群集中了,但应用直到18:00还是不正常。这次DBA想看看数据库是否能自动恢复,就没有动。第二天早上居然好了。
附件是alert文件和对应的trace文件。还有crs下面的log文件。

alert_orcl2.zip

177.23 KB, 下载次数: 1131

trace.zip

103.59 KB, 下载次数: 1107

crs_log.zip

393.35 KB, 下载次数: 1031

2#
发表于 2013-6-9 14:09:06
crs日志
  1. 2013-06-04 16:23:55.015: [  CRSRES][344528]32In stateChanged, ora.orcl.orcl2.inst target is ONLINE
  2. 2013-06-04 16:23:55.015: [  CRSRES][344528]32ora.orcl.orcl2.inst on dbsvr2 went OFFLINE unexpectedly
  3. 2013-06-04 16:23:55.015: [  CRSRES][344528]32StopResource: setting CLI values
  4. 2013-06-04 16:23:55.031: [  CRSRES][344528]32Attempting to stop `ora.orcl.orcl2.inst` on member `dbsvr2`
  5. 2013-06-04 16:26:57.703: [  CRSRES][344528]32Stop of `ora.orcl.orcl2.inst` on member `dbsvr2` succeeded.
  6. 2013-06-04 16:26:57.703: [  CRSRES][344528]32ora.orcl.orcl2.inst RESTART_COUNT=2 RESTART_ATTEMPTS=5
  7. 2013-06-04 16:26:57.703: [  CRSRES][344528]32ora.orcl.orcl2.inst Uptime does not exceed uptime_threshold
  8. 2013-06-04 16:26:57.718: [  CRSRES][344528]32Restarting ora.orcl.orcl2.inst on dbsvr2
  9. 2013-06-04 16:26:57.718: [  CRSRES][344528]32startRunnable: setting CLI values
  10. 2013-06-04 16:26:57.718: [  CRSRES][344528]32Attempting to start `ora.orcl.orcl2.inst` on member `dbsvr2`
  11. 2013-06-04 16:26:57.843: [  OCRUTL][3048]u_freem: mem passed is null
  12. 2013-06-04 16:27:37.953: [  CRSRES][344528]32Start of `ora.orcl.orcl2.inst` on member `dbsvr2` succeeded.
  13. 2013-06-04 16:27:37.953: [  CRSRES][344528]32Successfully restarted ora.orcl.orcl2.inst on dbsvr2, RESTART_COUNT=3
  14. 2013-06-04 16:27:37.968: [  CRSRES][344528]32ora.orcl.orcl2.inst Updated LAST_RESTART time in ocr
  15. 2013-06-04 16:27:42.843: [  OCRUTL][3048]u_freem: mem passed is null
复制代码
这里看到集群的db节点2异常,通过alert日志看到

  1. Thu May 23 12:08:58 2013
  2. Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl2_asmb_341964.trc:
  3. ORA-15064: ? ASM ??????
  4. ORA-01092: ORACLE ???????????

  5. Thu May 23 12:08:58 2013
  6. ASMB: terminating instance due to error 15064
  7. Thu May 23 12:08:58 2013
  8. Errors in file e:\oracle\product\10.2.0\admin\orcl\bdump\orcl2_lmon_343704.trc:
  9. ORA-15064: 与 ASM 实例通信失败
复制代码
因为无对应的asm日志,从错误看,很可能是asm实例异常导致数据库

  1. Tue Jun 04 16:27:33 2013
  2. Incremental checkpoint up to RBA [0x16c2.2.0], current log tail at RBA [0x16c2.36.0]
  3. Tue Jun 04 16:27:33 2013
  4. Completed: ALTER DATABASE OPEN
  5. Tue Jun 04 16:42:34 2013
  6. Incremental checkpoint up to RBA [0x16c2.4187.0], current log tail at RBA [0x16c2.46e6.0]
  7. Tue Jun 04 16:57:34 2013
  8. Incremental checkpoint up to RBA [0x16c2.5134.0], current log tail at RBA [0x16c2.56dd.0]
  9. Tue Jun 04 17:12:34 2013
  10. Incremental checkpoint up to RBA [0x16c2.6a57.0], current log tail at RBA [0x16c2.6fdc.0]
  11. Tue Jun 04 17:27:34 2013
  12. Incremental checkpoint up to RBA [0x16c2.794a.0], current log tail at RBA [0x16c2.7e91.0]
  13. Tue Jun 04 17:42:34 2013
  14. Incremental checkpoint up to RBA [0x16c2.8742.0], current log tail at RBA [0x16c2.8c69.0]
  15. Tue Jun 04 17:55:48 2013
  16. Beginning log switch checkpoint up to RBA [0x16c3.2.10], SCN: 282230756
  17. Tue Jun 04 17:55:48 2013
  18. Thread 2 advanced to log sequence 5827 (LGWR switch)
  19.   Current log# 10 seq# 5827 mem# 0: +DATA/orcl/onlinelog/group_10.597.816688219
  20.   Current log# 10 seq# 5827 mem# 1: +DATA/orcl/onlinelog/group_10.732.816688229
  21. Tue Jun 04 17:57:34 2013
  22. Incremental checkpoint up to RBA [0x16c2.94b6.0], current log tail at RBA [0x16c3.1c0.0]
  23. Tue Jun 04 18:00:53 2013
  24. Completed checkpoint up to RBA [0x16c3.2.10], SCN: 282230756
  25. Tue Jun 04 18:12:35 2013
  26. Incremental checkpoint up to RBA [0x16c3.c2b.0], current log tail at RBA [0x16c3.cf9.0]
  27. Tue Jun 04 18:27:35 2013
  28. Incremental checkpoint up to RBA [0x16c3.e7b.0], current log tail at RBA [0x16c3.eee.0]
  29. Tue Jun 04 18:42:35 2013
  30. Incremental checkpoint up to RBA [0x16c3.f9d.0], current log tail at RBA [0x16c3.ff1.0]
  31. Tue Jun 04 18:57:35 2013
  32. Incremental checkpoint up to RBA [0x16c3.1109.0], current log tail at RBA [0x16c3.11ba.0]
  33. Tue Jun 04 19:12:35 2013
  34. Incremental checkpoint up to RBA [0x16c3.1977.0], current log tail at RBA [0x16c3.1a5e.0]
  35. Tue Jun 04 19:27:36 2013
  36. Incremental checkpoint up to RBA [0x16c3.1b6a.0], current log tail at RBA [0x16c3.1bea.0]
  37. Tue Jun 04 19:42:36 2013
  38. Incremental checkpoint up to RBA [0x16c3.1cd7.0], current log tail at RBA [0x16c3.1d21.0]
复制代码
这里显示数据库的checkpoint是可以正常操作,证明数据库原则上问题不大

如果需要深入分析,可能还需要系统负载,asm 日志,系统日志等信息

回复 只看该作者 道具 举报

3#
发表于 2013-6-9 22:57:05
感谢。我再去搞ASM日志和系统日志。
另外,我们有个Java应用使用JDBC的Thin驱动连接到主服务的实IP(192.168.0.2),出问题的时候,这服务不能用了,重启的时候似乎能建立连接,启动过程中读了一些数据貌似都还行,但写一些数据的时候貌似卡住了。这个现象到18:00都有。后来现场人员想等等看,就先撤了,第二天早上再连就可以了,据DBA说数据库没有动(重启)。
系统是WindowsServer 2003 x64的,PageFile用了默认的2GB(不知道这个会不会有问题)
目前我在系统上针对性能计数器做了5秒一次的日志记录,处理器、磁盘IO、网络IO都开始记录。还做了一个脚本连续的ping外网和内网。

回复 只看该作者 道具 举报

4#
发表于 2013-6-12 16:11:53
1)据DBA说数据库没有动(重启) > 实例有无重启,看看alert.log

2)既然是RAC,使用VIP,你提到的 主服务的实IP(192.168.0.2) 是VIP么?

3)不要完全相信现场人员的描述

回复 只看该作者 道具 举报

5#
发表于 2013-6-19 21:38:41
学习了解

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-11-16 15:25 , Processed in 0.053103 second(s), 23 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569