打开数据库时报ORA-00600 - Oracle数据库管理 Oracle数据库数据恢复、性能优化

Maclean Liu(刘相兵

1^#

发表于 2012-2-7 20:14:42

1. 什么存储环境？

2.

ODM finding :

ORA-600 [kcratr_nab_less_than_odr] during Instance Recovery after Database Crash [ID 1299564.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.2 - Release: 11.2 to 11.2
Information in this document applies to any platform.
Symptoms
Trying to open a Database after a Crash caused by Storage Problems the Instance Recovery fails with :
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [219], [25020], [25021], []
The Database can't open at this Point. In the corresponding Tracefile we can find this Error Callstack:
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=1h50ks4ncswfn) -----
ALTER DATABASE OPEN
----- Call Stack Trace -----
ksedst1 <- ksedst <- dbkedDefDump <- ksedmp <- dbgexPhaseII <- dbgexProcessError <- dbgePostErrorKGE <- kgeasnmierr <- kcratr_odr_check <- kcratr <- kctrec <- kcvcrv <- kcfopd <- adbdrv <- opiexe <- opiosq0 <- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main <- start
Cause
This Problem is caused by Storage Problem of the Database Files. The Subsystem (eg. SAN) crashed while the Database was open. The Database then crashed since the Database Files were not accessible anymore. This caused a lost Write into the Online RedoLogs and so Instance Recovery is not possible and raising the ORA-600.
Solution
There are two possible Solutions:
1. If you could restore your Storage Environment and the Online RedoLogs from the Time of the crash you can try a manual Recovery followed by a RESETLOGS:
SQL> startup mount;
SQL> recover database until cancel using backup controlfile;
-> manually provide Online RedoLog containing the last (current) Sequence when asked, eg.
ORA-00279: change 100000 generated at xx/xx/xxxx xx:xx:xx needed for thread 1
ORA-00289: suggestion :
/flash_recovery/archivelog/xxxx_xx_xx/o1_mf_1_100_%u_.arc
ORA-00280: change 100000 for thread 1 is in sequence #100
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/ora/oradata/dbtest/redo04_1.rdo
Log applied.
Media recovery complete.
SQL> alter database open resetlogs;
2. If step1. fails or you don't have the full Set of Files you have to restore and recover the Database from a recent Backup.
Alter database open fails with ORA-00600 kcratr_nab_less_than_odr [ID 1296264.1]
Applies to:
Oracle Server - Standard Edition - Version: 11.2.0.1 and later [Release: 11.2 and later ]
Information in this document applies to any platform.
Symptoms
After Power Fail Alter database open fails with
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr]
Changes
Power failure
Cause
There was a power failure causing logical corruption in controlfile
Solution
Option a
------------
SQL>Startup mount ;
SQL>Show parameter control_files
Query 1
------------
sql>select a.member,a.group#,b.status from v$logfile a ,v$log b where a.group#=b.group# and b.status='CURRENT'
Note down the name of the redo log
SQL>Shutdown abort ;
Take a OS Level back of the controlfile (This is to ensure we have a backup of current state of controlfile)
SQL>Startup mount ;
SQL>recover database using backup controlfile until cancel ;
Enter location of redo log shown as current in Query 1 when prompted for recovery
Hit Enter
SQL>Alter database open resetlogs ;
Option b
-----------
Recreate the controlfile using the Controlfile recreation script
With database in mount stage
rman target /
rman> spool log to '/tmp/rman.log';
Rman> list backup ;
Rman > exit
Keep this log handy
Go to sqlplus
SQL> Show parameter control_files
Keep this location handy.
SQL>oradebug setmypid
SQL>Alter session set tracefile_identifier='controlfilerecreate' ;
SQL>Alter database backup controlfile to trace ;
SQL>Oradebug tracefile_name ; --> This command will give the path and name of the trace file
Go to this location ,Open this trace file and select the controlfile recreation script with NO Resetlogs option
SQL>Shutdown immediate;
Rename the existing controlfile to <originalname>_old ---> This is Important as we need to have a backup of existing controlfile since we plan to recreate it
SQL>Startup nomount
Now run the Controlfile recreation script with NO Resetlogs mode
SQL>Alter database open ;
For database version 10g and above
Once database is opened you can recatalog the rman backup information present in the list /tmp/rman.log using
Rman> Catalog start with '<location of backupiece>' ;
Once the database has been opened using the option a or option b its recommended to take a hot backup of the database.
Same Steps are applicable to Rac if all instance are down with same error.

复制代码

回复显示全部楼层道具举报

Maclean Liu(刘相兵

2^#

发表于 2012-2-7 20:39:23

1.kcratr_nab_less_than_odr 可能因为存储问题引发：

Trying to open a Database after a Crash caused by Storage Problems the Instance Recovery fails with

2.  分析trace:

Dump continued from file: /orasys/diag/rdbms/wimng2/wimng2/trace/wimng2_ora_29785.trc
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [1468], [57304], [57605], [], [], [], [], [], [], []

========= Dump for incident 16953 (ORA 600 [kcratr_nab_less_than_odr]) ========

*** 2012-02-07 13:40:54.447
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=a01hp0psv0rrh) -----
alter database open

----- Call Stack Trace -----
calling             call    entry             argument values in hex
location          type    point             (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+41       call    kgdsdst()          BFFE7388 ? 2 ?
ksedst1()+77       call    skdstdst()          BFFE7388 ? 0 ? 1 ? AB8E3A8 ?
                                                853C46E ? AB8E3A8 ?
ksedst()+33       call    ksedst1()          0 ? 1 ?
dbkedDefDump()+2699  call    ksedst()          0 ? 5AF911 ? BFFE74AC ?
                                                1007B40C ? BFFE7794 ? 0 ?
ksedmp()+47       call    dbkedDefDump()    3 ? 2 ?
ksfdmp()+59       call    ksedmp()          3EB ? BFFE92D0 ? DFBE5A3 ?
                                                106AD160 ? 3EB ? 106AD160 ?
dbgexPhaseII()+1725  call    00000000          106AD160 ? 3EB ?
dbgexProcessError()  call    dbgexPhaseII()    B7FEB598 ? B7DBC888 ?
+2089                                           BFFECBA4 ?
dbkePostKGE_kgsf()+  call    dbgePostErrorKGE() 106AD160 ? B7FDD0D4 ? 258 ?
47
kgeadse()+286       call    00000000          106AD160 ? B7FDD0D4 ? 258 ?
kgerinv_internal()+  call    kgeadse()          106AD160 ? B7FDD0D4 ? 258 ?
47                                              FD8DC58 ? 0 ? 4 ? BFFED45C ?
kgerinv()+41       call    kgerinv_internal() 106AD160 ? B7FDD0D4 ?
                                                FD8DC58 ? 258 ? 0 ? 4 ?
                                                BFFED45C ?
kgeasnmierr()+47    call    kgerinv()          106AD160 ? B7FDD0D4 ?
                                                FD8DC58 ? 4 ? BFFED45C ?
kcratr_odr_check()+  call    kgeasnmierr()       106AD160 ? B7FDD0D4 ?
204                                              FD8DC58 ? 4 ? 0 ? 1 ?
kcratr()+1806       call    kcratr_odr_check() BFFED6EC ? 0 ? F386D53 ? 0 ?
                                                9 ? F386D53 ?
kctrec()+9311       call    kcratr()          BFFED6EC ? BFFF45D0 ? 0 ?
kcvcrv()+5906       call    kctrec()          BFFF5868 ? 0 ? B7FD0BD0 ?
                                                B7FD122C ? B7E1BE00 ? 0 ?

Kernel function kcratr 是forward recovery algorithm的起点 kcrfr.c Kernel Cache Redo

[kcratr_nab_less_than_odr], [1], [1468], [57304], [57605], [], [], [], [], [], [], []的argument 定义

(a) redo thread id
(b) redo log sequence
(c) NAB
(d) on-disk rda  block number

这个trace 日志里有一个过程是比较理想的 rolling forward 前滚的教学演示：

2012-02-07 13:40:53.366569 :80000687:KFNU:kfn.c@2200:kfnPrepareASM(): kfnPrepareASM force=0 state_kfnsg=0x7
2012-02-07 13:40:53.366569*:80000688:CACHE_RCV:kcv.c@16365:kcvcrv(): kcvcrv: Calling kctrec()
2012-02-07 13:40:53.366569*:80000689:CACHE_RCV:kct.c@4163:kctrec(): kctrec: Entering kctrec()
2012-02-07 13:40:53.413557*:8000068A:CACHE_RCV:kct.c@4271:kctrec(): kctrec: thread 1 cf thread ckpt: logseq 1468, block 2,scn 25917106
2012-02-07 13:40:53.413557*:8000068B:CACHE_RCV:kct.c@4285:kctrec(): kctrec: Checkpoint progress record contents
2012-02-07 13:40:53.413557*:8000068C:CACHE_RCV:kct.c@4287:kctrec(): kctrec: kcccpsta 2, kcccpflg 0, kcccpdrt 48, kcccplrba 0x0005bc.0000dfd8.0000 kcccpodr 0x0005bc.0000e105.0000
2012-02-07 13:40:53.413557*:8000068D:CACHE_RCV:kct.c@4299:kctrec(): kctrec: kcccpods 0x0000.018be694, kcccpodt 773934914, kcccprlc 753362405, kcccprls 0x0000.00000001, kcccphbt 774572255, kcccpmid 1635578584
2012-02-07 13:40:53.413557*:8000068E:CACHE_RCV:kct.c@4311:kctrec(): kctrec: kcccpsdr 0x0005bc.00000001.0000, kcccpfbend (krfbafln 0, krfbathr 0, krfbaseq 0, krfbabno 0 krfbabof 0), kcccprsv 0
2012-02-07 13:40:53.413557*:8000068F:CACHE_RCV:kct.c@4360:kctrec(): kctrec: cache-low rba: logseq 1468, block 57304
2012-02-07 13:40:53.413557*:80000690:CACHE_RCV:kct.c@4374:kctrec(): kctrec: on-disk rba: logseq 1468, block 57605, scn 25945748
2012-02-07 13:40:53.413557*:80000691:CACHE_RCV:kct.c@4450:kctrec(): kctrec: Current ckpt RBA < cache-low RBA, adjusted ckpt RBA to cache low RBA, zeroed ckpt SCN and timestamp to 0
2012-02-07 13:40:53.413557*:80000692:CACHE_RCV:kct.c@4604:kctrec(): kctrec: Recovery starting point for thread 1 - logseq 1468, block 57304, scn 0
2012-02-07 13:40:53.449498*:80000693:CACHE_RCV:kct.c@4664:kctrec(): kctrec: Do thread recovery, calling kcratr()
2012-02-07 13:40:53.456376 :80000694:CACHE_RCV:kcra.c@1517:kcratr(): kcratr: Entering kcratr()
2012-02-07 13:40:53.458293 :80000695:CACHE_RCV:kcra.c@1541:kcratr(): kcratr: Started redo scan
2012-02-07 13:40:53.458293*:80000696:CACHE_RCV:kcra.c@1862:kcratr_scan(): kcratr_scan: Entering kcratr_scan()
2012-02-07 13:40:53.458293*:80000697:CACHE_RCV:kcra.c@2000:kcratr_scan(): kcratr_scan: Log not open, opening online log for thread 1, RBA 0x0005bc.0000dfd8.0000, SCN 0x0000.00000000
2012-02-07 13:40:53.694427*:800006A4:CACHE_RCV:kcra.c@2036:kcratr_scan(): kcratr_scan: End of curr thread reached
2012-02-07 13:40:53.694427*:800006A5:CACHE_RCV:kcra.c@2038:kcratr_scan(): kcratr_scan: end rcv RBA 0x0005bc.0000dfd8. 0, end rcv SCN 0x0000.018b76b3 end SCN timestamp 773895659, NAB 57304
2012-02-07 13:40:53.694427*:800006A6:CACHE_RCV:kcra.c@2048:kcratr_scan(): kcratr_scan: (Previous) highest SCN seen in the redo stream 0x0000.00000000
2012-02-07 13:40:53.694427*:800006A7:CACHE_RCV:kcra.c@2162:kcratr_scan(): kcratr_scan: Exiting kcratr_scan()
2012-02-07 13:40:53.702245 :800006A8:CACHE_RCV:kcra.c@1559:kcratr(): kcratr: Completed redo scan, read 0 KB redo, 0 data blocks need recovery

复制代码

这里可以看到 kcratr_scan 负责scan redo log 读取了 redo logfile header ，发现NAB =57304 这个值小与 odr( on disk rba redo block adress) ，
说明 redo logfile header存在讹误，于是报 600 kcratr_nab_less_than_odr错误

回复显示全部楼层道具举报

Maclean Liu(刘相兵

3^#

发表于 2012-2-7 20:42:18

1.
尝试用MOS的方案1解决该问题：

SQL> startup mount;

SQL> recover database until cancel using backup controlfile;

SQL> alter database open resetlogs;

回复显示全部楼层道具举报

返回列表

		自动登录	找回密码
密码			注册