oracle 12C 奇怪的报错

chen1999 发表于 2017-7-17 16:19:21

本帖最后由 chen1999 于 2017-7-17 16:28 编辑

近期  经常出现数据库宕机，重启操作系统后  数据库  恢复正常。

报错信息：
Mon Jul 17 00:43:42 2017
Errors in file /u01/app/oracle/diag/rdbms/jxcdb/JXCDB1/trace/JXCDB1_ora_46419.trc:
Mon Jul 17 00:43:42 2017
WARNING: Read Failed. group:5 disk:10 AU:300 offset:0 size:49152
path:ORCL:DISK1537
      incarnation:0x0 synchronous result:'I/O error'
      subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so krq:0x7f927a7aaba8 bufp:0x7f927a74be00 osderr1:0x3 osderr2:0x2e
      IO elapsed time: 0 usec Time waited on I/O: 0 usec
WARNING: failed to read mirror side 1 of virtual extent 4 logical extent 0 of file 302 in group from disk DISK1537  allocation unit 300 reason error; if possible, will try another mirror side
Mon Jul 17 00:43:42 2017
Errors in file /u01/app/oracle/diag/rdbms/jxcdb/JXCDB1/trace/JXCDB1_ora_46419.trc:
ORA-00202: control file: '+DATA/JXCDB/CONTROLFILE/current.302.912853177'
ORA-15081: failed to submit an I/O operation to a disk
ORA-15186: ASMLIB error function = ,  error = ,  mesg =
ORA-204 signalled during: ALTER DATABASE MOUNT /* db agent *//* {0:7:14} */...

之前有联系过相关的ORACLE 工程师，说是掉盘了，但是  找不到原因为什么会掉盘，替换硬盘也无法解决。
请高手看看。
ORACLE linux enterprise 6.6 系统，ORACLE 12.1    RAC ,  NETAPP 多路径存储环境下的ASM磁盘

chen1999 发表于 2017-7-17 17:20:25

我自己无法判断是不是BUG ，对于 kfk_asm_ioerror这个错误也不知道如何去解决

Liu Maclean(刘相兵 发表于 2017-7-17 17:46:54

WARNING: failed to read mirror side 1 of virtual extent 4 logical extent 0 of file 302 in group from disk DISK1537 allocation unit 300 reason error; if possible, will try another mirror side
Mon Jul 17 00:43:42 2017
Errors in file /u01/app/oracle/diag/rdbms/jxcdb/JXCDB1/trace/JXCDB1_ora_46419.trc:
ORA-00202: control file: '+DATA/JXCDB/CONTROLFILE/current.302.912853177'
ORA-15081: failed to submit an I/O operation to a disk

这个就是普通的IO 错误，不信你可以尝试把这套系统搞一个单机本地磁盘+文件系统的环境去测试。

找不到原因为什么会掉盘，替换硬盘也无法解决。

==》换了所有的盘？换了整个存储？

这种存储类问题我经常都碰到用户这样的情况，用户不相信存储有问题，存储厂商也不相信存储有问题， ORACLE就说存储有问题。换一套存储就是没这个问题。所以到最后还是存储有问题。

chen1999 发表于 2017-7-17 17:52:50

感谢刘总抽时间回答问题。

chen1999 发表于 2017-7-19 12:53:03

Liu Maclean(刘相兵发表于 2017-7-17 17:46 static/image/common/back.gif
WARNING: failed to read mirror side 1 of virtual extent 4 logical extent 0 of file 302 in group

刘总，今天的故障，操作系统层面有重启的现象，数据库层面有时间漂移告警，但是无法确认实例挂掉的真正原因。
两台windows环境的RAC，其中一台突然宕机，以下是宕机那台ORACLE报错日志：

Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
Wed Jul 19 11:11:20 2017
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
Wed Jul 19 11:46:43 2017
Reconfiguration started (old inc 28, new inc 30)
List of instances:
1 (myinst: 1)
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed Jul 19 11:46:45 2017
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Wed Jul 19 11:46:45 2017
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Wed Jul 19 11:46:45 2017
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Wed Jul 19 11:46:48 2017
minact-scn: Inst 1 is now the master inc#:30 mmon proc-id:5644 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.1d0ecabe gcalc-scn:0x0000.1d0ecad8
minact-scn: master found reconf/inst-rec before recscn scan old-inc#:30 new-inc#:30
Wed Jul 19 11:46:48 2017
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
parallel recovery started with 31 processes
Started redo scan
Completed redo scan
read 936 KB redo, 186 data blocks need recovery
Started redo application at
Thread 2: logseq 2804, block 200490
Recovery of Online Redo Log: Thread 2 Group 4 Seq 2804 Reading mem 0
  Mem# 0: +DATA/cdjcws/onlinelog/group_4.266.868297653
  Mem# 1: +FRA/cdjcws/onlinelog/group_4.260.868297655
Completed redo application of 0.29MB
Completed instance recovery at
Thread 2: logseq 2804, block 202362, scn 487528908
182 data blocks read, 189 data blocks written, 936 redo k-bytes read
Thread 2 advanced to log sequence 2805 (thread recovery)
Redo thread 2 internally disabled at seq 2805 (SMON)
minact-scn: master continuing after IR
Wed Jul 19 11:46:55 2017
Thread 1 advanced to log sequence 1873 (LGWR switch)
  Current log# 1 seq# 1873 mem# 0: +DATA/cdjcws/onlinelog/group_1.261.868297405
  Current log# 1 seq# 1873 mem# 1: +FRA/cdjcws/onlinelog/group_1.257.868297407
Wed Jul 19 11:46:56 2017
Archived Log entry 4675 added for thread 1 sequence 1872 ID 0x96266c76 dest 1:
Wed Jul 19 11:46:57 2017
Archived Log entry 4676 added for thread 2 sequence 2804 ID 0x96266c76 dest 1:
Wed Jul 19 11:46:58 2017
ARC2: Archiving disabled thread 2 sequence 2805
Archived Log entry 4677 added for thread 2 sequence 2805 ID 0x96266c76 dest 1:
Wed Jul 19 11:47:48 2017
Decreasing number of real time LMS from 3 to 0
Wed Jul 19 11:56:58 2017
db_recovery_file_dest_size of 819200 MB is 0.55% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Wed Jul 19 12:05:44 2017
Reconfiguration started (old inc 30, new inc 32)
List of instances:
1 2 (myinst: 1)
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed Jul 19 12:05:44 2017
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Wed Jul 19 12:05:44 2017
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Wed Jul 19 12:05:44 2017
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Wed Jul 19 12:05:46 2017
minact-scn: Master returning as live inst:2 has inc# mismatch instinc:0 cur:32 errcnt:0
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Wed Jul 19 12:06:58 2017
Increasing number of real time LMS from 0 to 3
Wed Jul 19 12:11:58 2017
db_recovery_file_dest_size of 819200 MB is 0.55% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.

页: [1]

Oracle数据库数据恢复、性能优化's Archiver

oracle 12C 奇怪的报错