oracle 报错Linux-x86_64 Error: 30: Read-only file system
本帖最后由 ccton 于 2014-2-18 12:08 编辑# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
# uname -a
Linux gywsj.hyb210 2.6.18-238.el5 #1 SMP Sun Dec 19 14:22:44 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
数据库版本:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
问题描述,运行一段时间后出现挂载阵列的文件系统会逻辑错误,自动变为只读
另外,曾经用循环批量写入大文件,将磁盘写满也未报过错误,重新mount后写文件也正常
我怀疑是阵列的电压不稳定导致磁盘逻辑块错误,或者是ORACLE bug,但未找到相关资料证明
请各位高手帮忙诊断下
下面是相关日志
数据库日志:
Tue Feb 18 09:53:28 2014
Archived Log entry 18018 added for thread 1 sequence 565 ID 0x51475291 dest 1:
Tue Feb 18 10:19:41 2014
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ckpt_30996.trc:
ORA-00206: 写入控制文件时出错 (块 3, # 块 1)
ORA-00202: 控制文件: ''/hydata/flash_recovery_area/orcl/control02.ctl''
ORA-27072: 文件 I/O 错误
Linux-x86_64 Error: 30: Read-only file system
Additional information: 4
Additional information: 3
Additional information: -1
Tue Feb 18 10:19:41 2014
KCF: read, write or open error, block=0xaa13a online=1
Tue Feb 18 10:19:41 2014
KCF: read, write or open error, block=0xa5dfd online=1
file=5 '/hydata/tablespaces/cmsservergy.dat'
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ckpt_30996.trc:
ORA-00221: 写入控制文件时出错
ORA-00206: 写入控制文件时出错 (块 3, # 块 1)
ORA-00202: 控制文件: ''/hydata/flash_recovery_area/orcl/control02.ctl''
ORA-27072: 文件 I/O 错误
Linux-x86_64 Error: 30: Read-only file system
Additional information: 4
Additional information: 3
Additional information: -1
file=5 '/hydata/tablespaces/cmsservergy.dat'
Tue Feb 18 10:19:41 2014
KCF: read, write or open error, block=0x21593e online=1
error=27072 txt: 'Linux-x86_64 Error: 30: Read-only file system
CKPT (ospid: 30996): terminating the instance due to error 221
file=10 '/hydata/tablespaces/cmsservergy4.dat'
error=27072 txt: 'Linux-x86_64 Error: 30: Read-only file system
Additional information: 4
error=27072 txt: 'Linux-x86_64 Error: 30: Read-only file system
Additional information: 4
Tue Feb 18 10:19:41 2014
KCF: read, write or open error, block=0x153877 online=1
Additional information: 4
Additional information: 696634
file=10 '/hydata/tablespaces/cmsservergy4.dat'
Additional information: 2185534
Additional information: 679421
Additional information: -1'
error=27072 txt: 'Linux-x86_64 Error: 30: Read-only file system
Additional information: -1'
Additional information: -1'
Additional information: 4
Additional information: 1390711
Additional information: -1'
Tue Feb 18 10:19:41 2014
Some DDE async actions failed or were cancelled
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_lgwr_30992.trc:
ORA-00345: 重做日志写入块 193051 计数 13 出错
ORA-00312: 联机日志 2 线程 1: '/hydata/orcl/redo02.log'
ORA-27072: 文件 I/O 错误
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 193051
Additional information: -1
Tue Feb 18 10:19:42 2014
opiodr aborting process unknown ospid (11121) as a result of ORA-1092
Tue Feb 18 10:19:42 2014
ORA-1092 : opitsk aborting process
Instance terminated by CKPT, pid = 30996
操作系统日志:
Feb 18 10:19:05 gywsj kernel: INFO: task extract:32604 blocked for more than 120 seconds.
Feb 18 10:19:05 gywsj kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 18 10:19:05 gywsj kernel: extract D ffffffff80153806 0 32604 9994 32577 (NOTLB)
Feb 18 10:19:05 gywsj kernel: ffff8101d9957b78 0000000000000082 ffff810001059800 0000000000000000
Feb 18 10:19:05 gywsj kernel: ffffffff804d3480 000000000000000a ffff8102c9a5e080 ffff81087fffb080
Feb 18 10:19:05 gywsj kernel: 00067b698e9d15a4 000000000000322f ffff8102c9a5e268 0000001a00000000
Feb 18 10:19:05 gywsj kernel: Call Trace:
Feb 18 10:19:05 gywsj kernel: [<ffffffff8006ec4e>] do_gettimeofday+0x40/0x90
Feb 18 10:19:05 gywsj kernel: [<ffffffff80028ae9>] sync_page+0x0/0x43
Feb 18 10:19:05 gywsj kernel: [<ffffffff800637ca>] io_schedule+0x3f/0x67
Feb 18 10:19:05 gywsj kernel: [<ffffffff80028b27>] sync_page+0x3e/0x43
Feb 18 10:19:05 gywsj kernel: [<ffffffff800639f6>] __wait_on_bit+0x40/0x6e
Feb 18 10:19:05 gywsj kernel: [<ffffffff80035317>] wait_on_page_bit+0x6c/0x72
Feb 18 10:19:05 gywsj kernel: [<ffffffff800a28e2>] wake_bit_function+0x0/0x23
Feb 18 10:19:05 gywsj kernel: [<ffffffff80048015>] pagevec_lookup_tag+0x1a/0x21
Feb 18 10:19:05 gywsj kernel: [<ffffffff8004a17a>] wait_on_page_writeback_range+0x62/0x12e
Feb 18 10:19:05 gywsj kernel: [<ffffffff8005ac26>] do_writepages+0x29/0x2f
Feb 18 10:19:05 gywsj kernel: [<ffffffff8004fbe4>] __filemap_fdatawrite_range+0x50/0x5b
Feb 18 10:19:05 gywsj kernel: [<ffffffff800c8641>] filemap_write_and_wait+0x26/0x31
Feb 18 10:19:05 gywsj kernel: [<ffffffff800c86cd>] generic_file_direct_IO+0x81/0x122
Feb 18 10:19:05 gywsj kernel: [<ffffffff8000c603>] __generic_file_aio_read+0xb8/0x198
Feb 18 10:19:05 gywsj kernel: [<ffffffff80016e0c>] generic_file_aio_read+0x34/0x39
Feb 18 10:19:05 gywsj kernel: [<ffffffff8000cee6>] do_sync_read+0xc7/0x104
Feb 18 10:19:05 gywsj kernel: [<ffffffff800a28b4>] autoremove_wake_function+0x0/0x2e
Feb 18 10:19:05 gywsj kernel: [<ffffffff8005a4a7>] hrtimer_cancel+0xc/0x16
Feb 18 10:19:05 gywsj kernel: [<ffffffff8005a394>] hrtimer_nanosleep+0x58/0x118
Feb 18 10:19:05 gywsj kernel: [<ffffffff8000b787>] vfs_read+0xcb/0x171
Feb 18 10:19:05 gywsj kernel: [<ffffffff80011c5c>] sys_read+0x45/0x6e
Feb 18 10:19:05 gywsj kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Feb 18 10:19:05 gywsj kernel:
Feb 18 10:19:41 gywsj kernel: sd 3:0:0:1: timing out command, waited 360s
Feb 18 10:19:41 gywsj kernel: sd 3:0:0:1: SCSI error: return code = 0x060d0000
Feb 18 10:19:41 gywsj kernel: end_request: I/O error, dev sdc, sector 662064886
Feb 18 10:19:41 gywsj kernel: Buffer I/O error on device sdc5, logical block 82758095
Feb 18 10:19:41 gywsj kernel: lost page write due to I/O error on sdc5
Feb 18 10:19:41 gywsj kernel: Buffer I/O error on device sdc5, logical block 82758096
Feb 18 10:19:41 gywsj kernel: lost page write due to I/O error on sdc5
Feb 18 10:19:41 gywsj kernel: Aborting journal on device sdc5.
Feb 18 10:19:41 gywsj kernel: ext3_abort called.
Feb 18 10:19:41 gywsj kernel: EXT3-fs error (device sdc5): ext3_journal_start_sb: Detected aborted journal
Feb 18 10:19:41 gywsj kernel: Remounting filesystem read-only
--重新mount后可以写入文件
操作系统日志:
Feb 18 10:47:35 gywsj kernel: __journal_remove_journal_head: freeing b_frozen_data
Feb 18 10:47:35 gywsj last message repeated 2 times
Feb 18 10:47:35 gywsj kernel: ext3_abort called.
Feb 18 10:47:35 gywsj kernel: EXT3-fs error (device sdc5): ext3_put_super: Couldn't clean up the journal
Feb 18 10:47:51 gywsj kernel: kjournald starting. Commit interval 5 seconds
Feb 18 10:47:51 gywsj kernel: EXT3-fs warning (device sdc5): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Feb 18 10:47:51 gywsj kernel: EXT3-fs warning (device sdc5): ext3_clear_journal_err: Marking fs in need of filesystem check.
Feb 18 10:47:51 gywsj kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
Feb 18 10:47:51 gywsj kernel: EXT3 FS on sdc5, internal journal
Feb 18 10:47:51 gywsj kernel: EXT3-fs: recovery complete.
Feb 18 10:47:51 gywsj kernel: EXT3-fs: mounted filesystem with ordered data mode.
这个比较明确就是 FS文件系统的问题, 建议你做全库的 validate 避免潜在的坏块。
并必要地备份,且不要备份在文件系统上。 我的系统重建了controlfile后,和你遇到的问题一样,系统说做个e2fsck,修复下,我的系统是意外掉电造成的。现在还没有做fsck,准备将其备份后再做。 谢谢建议,请问下FS文件系统的问题 有办法解决吗? Feb 18 10:19:41 gywsj kernel: sd 3:0:0:1: timing out command, waited 360s
Feb 18 10:19:41 gywsj kernel: sd 3:0:0:1: SCSI error: return code = 0x060d0000
Feb 18 10:19:41 gywsj kernel: end_request: I/O error, dev sdc, sector 662064886
Feb 18 10:19:41 gywsj kernel: Buffer I/O error on device sdc5, logical block 82758095
Feb 18 10:19:41 gywsj kernel: lost page write due to I/O error on sdc5
Feb 18 10:19:41 gywsj kernel: Buffer I/O error on device sdc5, logical block 82758096
Feb 18 10:19:41 gywsj kernel: lost page write due to I/O error on sdc5
Feb 18 10:19:41 gywsj kernel: Aborting journal on device sdc5.
从系统日志上来是访问后端存储出现问题, 有可能是存储 阵列、光纤卡、光纤线缆。
另外对文件系统 做个fsck
fsck.ext3 -fvy /dev/sdc5 > /tmp/fsck-fvy.log
同时检查下磁盘是否有坏块
badblocks -sv /dev/sdxX
e2fsck -y /dev/sdc5
建议,先对整个lun做个tar备份,在去做这个操作,另外,注意先umount /dev/sdc5. Feb 18 10:19:41 gywsj kernel: ext3_abort called.
Feb 18 10:19:41 gywsj kernel: EXT3-fs error (device sdc5): ext3_journal_start_sb: Detected aborted journalFeb 18 10:19:41 gywsj kernel: Remounting filesystem read-only
对于这个问题,可以上redhat的KB看看,有个专门说这个问题的solution的。 谢谢各位的热心帮助,尝试做过e2fsck的但还是会报错,根据阵列厂商的建议,现在重做了RAID 将其中一个选项由原来“数据库”改为“文件系统”,再观察是否还会出现此类错误。
页:
[1]