Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

81

积分

0

好友

4

主题
1#
发表于 2012-3-7 13:54:00 | 查看: 7732| 回复: 3
ORACLE版本:10.2.0.5.0
2
节点RAC
OSlinux 2.6.9-67.ELsmp #1 SMP
Symptom:某个节点的oracle instance当机,但是ASMclusterware都运行正常

Alertlog
Mon Mar 05 00:58:29 MET 2012
Thread 1 advanced to log sequence 14780 (LGWR switch)

Current log# 3 seq# 14780 mem# 0: +ARCH/justin/onlinelog/group_3.3251.658322121

Mon Mar 05 00:58:47 MET 2012
Thread 1 advanced to log sequence 14781 (LGWR switch)

Current log# 1 seq# 14781 mem# 0: +ARCH/justin/onlinelog/group_1.3253.658321909

Mon Mar 05 17:16:23 MET 2012
SUCCESS: disk DATA_0012 (11.3915949927) added to diskgroup DATA
Mon Mar 05 18:24:20 MET 2012
Thread 1 advanced to log sequence 14782 (LGWR switch)

Current log# 2 seq# 14782 mem# 0: +ARCH/justin/onlinelog/group_2.3252.658322071

Mon Mar 05 19:18:33 MET 2012
Terminating process 22754, due to ORA-15061 raised in ASM I/O path
Mon Mar 05 19:18:33 MET 2012
Errors in file /opt/oracle/orabase/admin/JUSTIN/udump/justinn1_ora_22754.trc:
ORA-15061: ASM operation not supported [40]
Mon Mar 05 19:19:12 MET 2012
Errors in file /opt/oracle/orabase/admin/JUSTIN/bdump/justinn1_pmon_5064.trc:
ORA-15061: ASM operation not supported [40]
Mon Mar 05 19:19:12 MET 2012
Errors in file /opt/oracle/orabase/admin/JUSTIN/bdump/justinn1_pmon_5064.trc:
ORA-15061: ASM operation not supported [40]
Mon Mar 05 19:19:12 MET 2012
Errors in file /opt/oracle/orabase/admin/JUSTIN/bdump/justinn1_pmon_5064.trc:
ORA-15061: ASM operation not supported [40]
Mon Mar 05 19:19:15 MET 2012
Errors in file /opt/oracle/orabase/admin/JUSTIN/bdump/justinn1_pmon_5064.trc:
ORA-00600: internal error code, arguments: [17090], [], [], [], [], [], [], []
Mon Mar 05 19:19:16 MET 2012
Trace dumping is performing id=[cdmp_20120305191916]
Mon Mar 05 19:19:16 MET 2012
Errors in file /opt/oracle/orabase/admin/JUSTIN/bdump/justinn1_pmon_5064.trc:
ORA-00600: internal error code, arguments: [17090], [], [], [], [], [], [], []
Mon Mar 05 19:19:16 MET 2012
PMON: terminating instance due to error 472
Mon Mar 05 19:19:18 MET 2012
Dump system state for local instance only
System State dumped to trace file /opt/oracle/orabase/admin/JUSTIN/bdump/justinn1_diag_5066.trc
Mon Mar 05 19:19:20 MET 2012
Shutting down instance (abort)

maclean帮忙查看一下原因。
其中pmon跟踪文件的最后几行为
error 600 detected in background process
ORA-00600: internal error code, arguments: [17090], [], [], [], [], [], [], []
ksuitm: waiting up to [5] seconds before killing DIAG(5066)

是否因为pmon杀死DIAG进程导致instance crash?如果是,DIAG又是因为什么原因(或者BUG)导致被kill,这是否与前面出现的ORA-15061/ORA-600有关联;

Documents.zip

1.2 MB, 下载次数: 957

2#
发表于 2012-3-7 16:27:40
Question:是否因为pmon杀死DIAG进程导致instance crash?

Answer:不是 , PMON 遇到了fatal error 所以abort instance , diag只是做systemstate dump , PMON在 crash instance之前会给diag几秒钟 以便diag 能完成systemstate dump

PMON的 stack call:
  1. Unix process pid: 5064, image: oracle@semldslx5108 (PMON)

  2. *** 2012-03-05 19:19:12.432
  3. *** SERVICE NAME:(SYS$BACKGROUND) 2012-03-05 19:19:12.430
  4. *** SESSION ID:(555.1) 2012-03-05 19:19:12.430
  5. NOTE: unlock gn=2 disk=10 au=75303 mapid=66
  6. *** 2012-03-05 19:19:12.432
  7. kssxdl: error deleting SO: 0x4d169cf08, type: 83, owner: 0x4e03c0618, flag: I/-/-/0x00:
  8. ORA-15061: ASM operation not supported [40]
  9. NOTE: unlock gn=2 disk=10 au=75303 mapid=66
  10. *** 2012-03-05 19:19:12.433
  11. kssxdl: error deleting SO: 0x4d169cf08, type: 83, owner: 0x4e03c0618, flag: I/-/-/0x00:
  12. ORA-15061: ASM operation not supported [40]
  13. NOTE: unlock gn=2 disk=10 au=75303 mapid=66
  14. *** 2012-03-05 19:19:12.433
  15. kssxdl: error deleting SO: 0x4d169cf08, type: 83, owner: 0x4e03c0618, flag: I/-/-/0x00:
  16. ORA-15061: ASM operation not supported [40]
  17. NOTE: unlock gn=2 disk=10 au=75303 mapid=66
  18. *** 2012-03-05 19:19:15.473
  19. ksedmp: internal or fatal error
  20. ORA-00600: internal error code, arguments: [17090], [], [], [], [], [], [], []
  21. ----- Call Stack Trace -----
  22. calling              call     entry                argument values in hex      
  23. location             type     point                (? means dubious value)     
  24. -------------------- -------- -------------------- ----------------------------
  25. ssd_unwind_bp: unhandled instruction at 0x3d039ad instr=f
  26. ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
  27.                                                    7FBFFFC7A0 ? 7FBFFFC800 ?
  28.                                                    7FBFFFC740 ? 000000000 ?
  29. ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
  30.                                                    7FBFFFC7A0 ? 7FBFFFC800 ?
  31.                                                    7FBFFFC740 ? 000000000 ?
  32. ksfdmp()+63          call     ksedmp()             000000003 ? 000000001 ?
  33.                                                    7FBFFFC7A0 ? 7FBFFFC800 ?
  34.                                                    7FBFFFC740 ? 000000000 ?
  35. kgeriv()+176         call     ksfdmp()             0069DCA20 ? 000000003 ?
  36.                                                    7FBFFFC7A0 ? 7FBFFFC800 ?
  37.                                                    7FBFFFC740 ? 000000000 ?
  38. kgesiv()+119         call     kgeriv()             0069DCA20 ? 2A972254D8 ?
  39.                                                    000000000 ? 000000000 ?
  40.                                                    7FBFFFC740 ? 000000000 ?
  41. kgesic0()+152        call     kgesiv()             0069DCA20 ? 2A972254D8 ?
  42.                                                    0000042C2 ? 000000000 ?
  43.                                                    7FBFFFD6E0 ? 000000000 ?
  44. kgerse()+797         call     kgesic0()            0069DCA20 ? 2A972254D8 ?
  45.                                                    0000042C2 ? 000000000 ?
  46.                                                    000000048 ? 000002000 ?
  47. kffmDoDone()+306     call     kgerse()             0069DCA20 ? 2A972254D8 ?
  48.                                                    0000042C2 ? 000000000 ?
  49.                                                    000000048 ? 000002000 ?
  50. kffmsoDelete()+824   call     kffmDoDone()         4A355DFA0 ? 2A972254D8 ?
  51.                                                    0000042C2 ? 000000000 ?
  52.                                                    000000048 ? 000002000 ?
  53. kssxdl()+384         call     kffmsoDelete()       4D169CF08 ? 000000003 ?
  54.                                                    0000042C2 ? 000000000 ?
  55.                                                    000000048 ? 000002000 ?
  56. kssdch()+1875        call     kssxdl()             4D169CF08 ? 000000003 ?
复制代码
kssxdl=> kffmsoDelete => kffmDoDone => kgerse error here!

kssxdl: error deleting SO: 0x4d169cf08, type: 83, owner: 0x4e03c0618, flag: I/-/-/0x00:

PMON: fatal error while deleting s.o. 0x4d169cf08 in this tree:

  SO: 0x4d169cf08, type: 83, owner: 0x4e03c0618, flag: INIT/-/-/0x00
  freelist:[4a3556a00,4a3556af0]

    KFFMOP: hash link:[4a35569f0,4a35569f0] sobj link:[4a3556988,4d169cf28]
      map kggrp:[0x0x4e2da3ac0, 0, valid]  map id:66
      group:[2,295196583] file:[312,673463449] extent:32426
      flags:0x0001 disk:0 au:76297 lock:0 proc:0x0x4e03c0618

PMON在视图清理一个 SO freelist

回复 只看该作者 道具 举报

3#
发表于 2012-3-7 16:35:12
17090         generic/vos         this layer implementations error management operations: signalling errors, catching  errors, recovering from errors, setting error frames, etc.;

ora-600 17090         没有太多可用信息

kssxdl=> kffmsoDelete => kffmDoDone 该stack call 没有找到类似的案例

[oracle@rh2 ~]$ oerr ora 15061
15061, 00000, "ASM operation not supported [%s]"
// *Cause:  An ASM operation was attempted that is invalid or not supported
//          by this version of the ASM instance.
// *Action: This is an internal error code that is used for maintaining
//          compatibility between software versions and should never be
//          visible to the user; contact Oracle support Services.
//

提交SR后,根据Oracle GCS确认为BUG:9788316:

1.The following error indicates that it failed to resize the controlfile to 612 blocks. If the DB_BLOCK_SIZE is 8192,
then 612 blocks is not more than 5MB. According to the results of the query on V$ASM_DISKGROUP in 'results01.txt' file,
the ASM diskgroup +DATA has 108605 MB free space. So, the ASM diskgroup +DATA has enough space for the 612 blocks.

2. By the way, please confirm whether you have recently applied PSU #1. Anyway,
please try to relink the Oracle executables, as shown here. Before you run the "relink" command,
make sure to shutdown both the ASM instance and target database.

$ORACLE_HOME/bin/relink all

3 After relinking the Oracle executables, please confirm whether you are still
experiencing the same ORA-15061 error.

ORA-15061: ASM Operation Not Supported [41] After Apply PSU #1 (Doc ID 1126113.1)
ORA-15061 reported while doing a file operation with 11.1 or 11.2 ASM after PSU applied in database home (Doc ID 1070880.1)

回复 只看该作者 道具 举报

4#
发表于 2012-3-7 16:35:48
建议你尝试 $ORACLE_HOME/bin/relink all, 再观察是否发现该问题

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-12-24 00:36 , Processed in 0.054286 second(s), 24 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569