61 积分	0 好友	0 主题

发消息

打开数据库时报ORA-00600

1^#

发表于 2012-2-7 17:21:47 | 查看: 9595| 回复: 7

环境:
DB:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE 11.2.0.1.0    Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

OS:
$ cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 8)

这是一个生产库的克隆环境，作为平时测试使用。我登陆进去的时候，发现数据库处于nomount状态。
oracle@wimngNB_test:~ $ sqlplus /nolog

SQL*Plus: Release 11.2.0.1.0 Production on Tue Feb 7 13:31:29 2012

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

SQL> conn / as sysdba
Connected.
SQL> select status from v$instance;

STATUS
------------
STARTED

SQL> SELECT OPEN_MODE FROM V$DATABASE;
SELECT OPEN_MODE FROM V$DATABASE
                  *
ERROR at line 1:
ORA-01507: database not mounted

SQL> alter database mount;
alter database mount
*
ERROR at line 1:
ORA-00214: control file '/orasys/flash_recovery_area/wimng2/control02.ctl'
version 140340 inconsistent with file '/data/oradata/wimng2/control01.ctl'
version 140332

我google之后，做了如下操作:
SQL> alter system set control_files='/orasys/flash_recovery_area/wimng2/control02.ctl' scope=spfile;

System altered.

SQL> shutdown immediate;
ORA-01507: database not mounted

ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 2042241024 bytes
Fixed Size                1337548 bytes
Variable Size          1509951284 bytes
Database Buffers       520093696 bytes
Redo Buffers             10858496 bytes
Database mounted.

数据库mount上了，但是open的时候报ORA-00600错误:
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1],
[1468], [57304], [57605], [], [], [], [], [], [], []

SQL> select status from v$instance;

STATUS
------------
MOUNTED

Google了下00600错误，我是不是可以做以下操作:
SQL>recover database;
SQL>alter database open;

PS:毕竟是第一次遇到ORA-00600，不敢随便乱动了..

分享0

收藏0 回复只看该作者道具举报

無限追云

2^#

发表于 2012-2-7 17:22:21

这是相应的trace文件

trace.tar

1.67 MB, 下载次数: 1119

回复只看该作者道具举报

Maclean Liu(刘相兵

3^#

发表于 2012-2-7 20:14:42

1. 什么存储环境？

2.

ODM finding :

ORA-600 [kcratr_nab_less_than_odr] during Instance Recovery after Database Crash [ID 1299564.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.2 - Release: 11.2 to 11.2
Information in this document applies to any platform.
Symptoms
Trying to open a Database after a Crash caused by Storage Problems the Instance Recovery fails with :
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [219], [25020], [25021], []
The Database can't open at this Point. In the corresponding Tracefile we can find this Error Callstack:
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=1h50ks4ncswfn) -----
ALTER DATABASE OPEN
----- Call Stack Trace -----
ksedst1 <- ksedst <- dbkedDefDump <- ksedmp <- dbgexPhaseII <- dbgexProcessError <- dbgePostErrorKGE <- kgeasnmierr <- kcratr_odr_check <- kcratr <- kctrec <- kcvcrv <- kcfopd <- adbdrv <- opiexe <- opiosq0 <- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main <- start
Cause
This Problem is caused by Storage Problem of the Database Files. The Subsystem (eg. SAN) crashed while the Database was open. The Database then crashed since the Database Files were not accessible anymore. This caused a lost Write into the Online RedoLogs and so Instance Recovery is not possible and raising the ORA-600.
Solution
There are two possible Solutions:
1. If you could restore your Storage Environment and the Online RedoLogs from the Time of the crash you can try a manual Recovery followed by a RESETLOGS:
SQL> startup mount;
SQL> recover database until cancel using backup controlfile;
-> manually provide Online RedoLog containing the last (current) Sequence when asked, eg.
ORA-00279: change 100000 generated at xx/xx/xxxx xx:xx:xx needed for thread 1
ORA-00289: suggestion :
/flash_recovery/archivelog/xxxx_xx_xx/o1_mf_1_100_%u_.arc
ORA-00280: change 100000 for thread 1 is in sequence #100
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/ora/oradata/dbtest/redo04_1.rdo
Log applied.
Media recovery complete.
SQL> alter database open resetlogs;
2. If step1. fails or you don't have the full Set of Files you have to restore and recover the Database from a recent Backup.
Alter database open fails with ORA-00600 kcratr_nab_less_than_odr [ID 1296264.1]
Applies to:
Oracle Server - Standard Edition - Version: 11.2.0.1 and later [Release: 11.2 and later ]
Information in this document applies to any platform.
Symptoms
After Power Fail Alter database open fails with
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr]
Changes
Power failure
Cause
There was a power failure causing logical corruption in controlfile
Solution
Option a
------------
SQL>Startup mount ;
SQL>Show parameter control_files
Query 1
------------
sql>select a.member,a.group#,b.status from v$logfile a ,v$log b where a.group#=b.group# and b.status='CURRENT'
Note down the name of the redo log
SQL>Shutdown abort ;
Take a OS Level back of the controlfile (This is to ensure we have a backup of current state of controlfile)
SQL>Startup mount ;
SQL>recover database using backup controlfile until cancel ;
Enter location of redo log shown as current in Query 1 when prompted for recovery
Hit Enter
SQL>Alter database open resetlogs ;
Option b
-----------
Recreate the controlfile using the Controlfile recreation script
With database in mount stage
rman target /
rman> spool log to '/tmp/rman.log';
Rman> list backup ;
Rman > exit
Keep this log handy
Go to sqlplus
SQL> Show parameter control_files
Keep this location handy.
SQL>oradebug setmypid
SQL>Alter session set tracefile_identifier='controlfilerecreate' ;
SQL>Alter database backup controlfile to trace ;
SQL>Oradebug tracefile_name ; --> This command will give the path and name of the trace file
Go to this location ,Open this trace file and select the controlfile recreation script with NO Resetlogs option
SQL>Shutdown immediate;
Rename the existing controlfile to <originalname>_old ---> This is Important as we need to have a backup of existing controlfile since we plan to recreate it
SQL>Startup nomount
Now run the Controlfile recreation script with NO Resetlogs mode
SQL>Alter database open ;
For database version 10g and above
Once database is opened you can recatalog the rman backup information present in the list /tmp/rman.log using
Rman> Catalog start with '<location of backupiece>' ;
Once the database has been opened using the option a or option b its recommended to take a hot backup of the database.
Same Steps are applicable to Rac if all instance are down with same error.

复制代码

回复只看该作者道具举报

Maclean Liu(刘相兵

4^#

发表于 2012-2-7 20:39:23

1.kcratr_nab_less_than_odr 可能因为存储问题引发：

Trying to open a Database after a Crash caused by Storage Problems the Instance Recovery fails with

2.  分析trace:

Dump continued from file: /orasys/diag/rdbms/wimng2/wimng2/trace/wimng2_ora_29785.trc
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [1468], [57304], [57605], [], [], [], [], [], [], []

========= Dump for incident 16953 (ORA 600 [kcratr_nab_less_than_odr]) ========

*** 2012-02-07 13:40:54.447
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=a01hp0psv0rrh) -----
alter database open

----- Call Stack Trace -----
calling             call    entry             argument values in hex
location          type    point             (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+41       call    kgdsdst()          BFFE7388 ? 2 ?
ksedst1()+77       call    skdstdst()          BFFE7388 ? 0 ? 1 ? AB8E3A8 ?
                                                853C46E ? AB8E3A8 ?
ksedst()+33       call    ksedst1()          0 ? 1 ?
dbkedDefDump()+2699  call    ksedst()          0 ? 5AF911 ? BFFE74AC ?
                                                1007B40C ? BFFE7794 ? 0 ?
ksedmp()+47       call    dbkedDefDump()    3 ? 2 ?
ksfdmp()+59       call    ksedmp()          3EB ? BFFE92D0 ? DFBE5A3 ?
                                                106AD160 ? 3EB ? 106AD160 ?
dbgexPhaseII()+1725  call    00000000          106AD160 ? 3EB ?
dbgexProcessError()  call    dbgexPhaseII()    B7FEB598 ? B7DBC888 ?
+2089                                           BFFECBA4 ?
dbkePostKGE_kgsf()+  call    dbgePostErrorKGE() 106AD160 ? B7FDD0D4 ? 258 ?
47
kgeadse()+286       call    00000000          106AD160 ? B7FDD0D4 ? 258 ?
kgerinv_internal()+  call    kgeadse()          106AD160 ? B7FDD0D4 ? 258 ?
47                                              FD8DC58 ? 0 ? 4 ? BFFED45C ?
kgerinv()+41       call    kgerinv_internal() 106AD160 ? B7FDD0D4 ?
                                                FD8DC58 ? 258 ? 0 ? 4 ?
                                                BFFED45C ?
kgeasnmierr()+47    call    kgerinv()          106AD160 ? B7FDD0D4 ?
                                                FD8DC58 ? 4 ? BFFED45C ?
kcratr_odr_check()+  call    kgeasnmierr()       106AD160 ? B7FDD0D4 ?
204                                              FD8DC58 ? 4 ? 0 ? 1 ?
kcratr()+1806       call    kcratr_odr_check() BFFED6EC ? 0 ? F386D53 ? 0 ?
                                                9 ? F386D53 ?
kctrec()+9311       call    kcratr()          BFFED6EC ? BFFF45D0 ? 0 ?
kcvcrv()+5906       call    kctrec()          BFFF5868 ? 0 ? B7FD0BD0 ?
                                                B7FD122C ? B7E1BE00 ? 0 ?

Kernel function kcratr 是forward recovery algorithm的起点 kcrfr.c Kernel Cache Redo

[kcratr_nab_less_than_odr], [1], [1468], [57304], [57605], [], [], [], [], [], [], []的argument 定义

(a) redo thread id
(b) redo log sequence
(c) NAB
(d) on-disk rda  block number

这个trace 日志里有一个过程是比较理想的 rolling forward 前滚的教学演示：

2012-02-07 13:40:53.366569 :80000687:KFNU:kfn.c@2200:kfnPrepareASM(): kfnPrepareASM force=0 state_kfnsg=0x7
2012-02-07 13:40:53.366569*:80000688:CACHE_RCV:kcv.c@16365:kcvcrv(): kcvcrv: Calling kctrec()
2012-02-07 13:40:53.366569*:80000689:CACHE_RCV:kct.c@4163:kctrec(): kctrec: Entering kctrec()
2012-02-07 13:40:53.413557*:8000068A:CACHE_RCV:kct.c@4271:kctrec(): kctrec: thread 1 cf thread ckpt: logseq 1468, block 2,scn 25917106
2012-02-07 13:40:53.413557*:8000068B:CACHE_RCV:kct.c@4285:kctrec(): kctrec: Checkpoint progress record contents
2012-02-07 13:40:53.413557*:8000068C:CACHE_RCV:kct.c@4287:kctrec(): kctrec: kcccpsta 2, kcccpflg 0, kcccpdrt 48, kcccplrba 0x0005bc.0000dfd8.0000 kcccpodr 0x0005bc.0000e105.0000
2012-02-07 13:40:53.413557*:8000068D:CACHE_RCV:kct.c@4299:kctrec(): kctrec: kcccpods 0x0000.018be694, kcccpodt 773934914, kcccprlc 753362405, kcccprls 0x0000.00000001, kcccphbt 774572255, kcccpmid 1635578584
2012-02-07 13:40:53.413557*:8000068E:CACHE_RCV:kct.c@4311:kctrec(): kctrec: kcccpsdr 0x0005bc.00000001.0000, kcccpfbend (krfbafln 0, krfbathr 0, krfbaseq 0, krfbabno 0 krfbabof 0), kcccprsv 0
2012-02-07 13:40:53.413557*:8000068F:CACHE_RCV:kct.c@4360:kctrec(): kctrec: cache-low rba: logseq 1468, block 57304
2012-02-07 13:40:53.413557*:80000690:CACHE_RCV:kct.c@4374:kctrec(): kctrec: on-disk rba: logseq 1468, block 57605, scn 25945748
2012-02-07 13:40:53.413557*:80000691:CACHE_RCV:kct.c@4450:kctrec(): kctrec: Current ckpt RBA < cache-low RBA, adjusted ckpt RBA to cache low RBA, zeroed ckpt SCN and timestamp to 0
2012-02-07 13:40:53.413557*:80000692:CACHE_RCV:kct.c@4604:kctrec(): kctrec: Recovery starting point for thread 1 - logseq 1468, block 57304, scn 0
2012-02-07 13:40:53.449498*:80000693:CACHE_RCV:kct.c@4664:kctrec(): kctrec: Do thread recovery, calling kcratr()
2012-02-07 13:40:53.456376 :80000694:CACHE_RCV:kcra.c@1517:kcratr(): kcratr: Entering kcratr()
2012-02-07 13:40:53.458293 :80000695:CACHE_RCV:kcra.c@1541:kcratr(): kcratr: Started redo scan
2012-02-07 13:40:53.458293*:80000696:CACHE_RCV:kcra.c@1862:kcratr_scan(): kcratr_scan: Entering kcratr_scan()
2012-02-07 13:40:53.458293*:80000697:CACHE_RCV:kcra.c@2000:kcratr_scan(): kcratr_scan: Log not open, opening online log for thread 1, RBA 0x0005bc.0000dfd8.0000, SCN 0x0000.00000000
2012-02-07 13:40:53.694427*:800006A4:CACHE_RCV:kcra.c@2036:kcratr_scan(): kcratr_scan: End of curr thread reached
2012-02-07 13:40:53.694427*:800006A5:CACHE_RCV:kcra.c@2038:kcratr_scan(): kcratr_scan: end rcv RBA 0x0005bc.0000dfd8. 0, end rcv SCN 0x0000.018b76b3 end SCN timestamp 773895659, NAB 57304
2012-02-07 13:40:53.694427*:800006A6:CACHE_RCV:kcra.c@2048:kcratr_scan(): kcratr_scan: (Previous) highest SCN seen in the redo stream 0x0000.00000000
2012-02-07 13:40:53.694427*:800006A7:CACHE_RCV:kcra.c@2162:kcratr_scan(): kcratr_scan: Exiting kcratr_scan()
2012-02-07 13:40:53.702245 :800006A8:CACHE_RCV:kcra.c@1559:kcratr(): kcratr: Completed redo scan, read 0 KB redo, 0 data blocks need recovery

复制代码

这里可以看到 kcratr_scan 负责scan redo log 读取了 redo logfile header ，发现NAB =57304 这个值小与 odr( on disk rba redo block adress) ，
说明 redo logfile header存在讹误，于是报 600 kcratr_nab_less_than_odr错误

回复只看该作者道具举报