Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

61

积分

0

好友

0

主题
发表于 2012-2-7 17:21:47 | 查看: 9203| 回复: 7
环境:
DB:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

OS:
$ cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 8)

这是一个生产库的克隆环境,作为平时测试使用。我登陆进去的时候,发现数据库处于nomount状态。
oracle@wimngNB_test:~ $ sqlplus /nolog

SQL*Plus: Release 11.2.0.1.0 Production on Tue Feb 7 13:31:29 2012

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

SQL> conn / as sysdba
Connected.
SQL> select status from v$instance;

STATUS
------------
STARTED

SQL> SELECT OPEN_MODE FROM V$DATABASE;
SELECT OPEN_MODE FROM V$DATABASE
                      *
ERROR at line 1:
ORA-01507: database not mounted


SQL> alter database mount;
alter database mount
*
ERROR at line 1:
ORA-00214: control file '/orasys/flash_recovery_area/wimng2/control02.ctl'
version 140340 inconsistent with file '/data/oradata/wimng2/control01.ctl'
version 140332


我google之后,做了如下操作:
SQL> alter system set control_files='/orasys/flash_recovery_area/wimng2/control02.ctl' scope=spfile;

System altered.

SQL> shutdown immediate;
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 2042241024 bytes
Fixed Size                  1337548 bytes
Variable Size            1509951284 bytes
Database Buffers          520093696 bytes
Redo Buffers               10858496 bytes
Database mounted.

数据库mount上了,但是open的时候报ORA-00600错误:
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1],
[1468], [57304], [57605], [], [], [], [], [], [], []


SQL> select status from v$instance;

STATUS
------------
MOUNTED

Google了下00600错误,我是不是可以做以下操作:
SQL>recover database;
SQL>alter database open;

PS:毕竟是第一次遇到ORA-00600,不敢随便乱动了..
发表于 2012-2-7 17:22:21
这是相应的trace文件

trace.tar

1.67 MB, 下载次数: 900

回复 显示全部楼层 道具 举报

发表于 2012-2-7 20:14:42
1. 什么存储环境?

2.

ODM finding :
  1. ORA-600 [kcratr_nab_less_than_odr] during Instance Recovery after Database Crash [ID 1299564.1]

  2. Applies to:
  3. Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.2 - Release: 11.2 to 11.2
  4. Information in this document applies to any platform.
  5. Symptoms
  6. Trying to open a Database after a Crash caused by Storage Problems the Instance Recovery fails with :
  7. ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [219], [25020], [25021], []

  8. The Database can't open at this Point. In the corresponding Tracefile we can find this Error Callstack:


  9. dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
  10. ----- Current SQL Statement for this session (sql_id=1h50ks4ncswfn) -----
  11. ALTER DATABASE OPEN

  12. ----- Call Stack Trace -----
  13. ksedst1 <- ksedst <- dbkedDefDump <- ksedmp <- dbgexPhaseII <- dbgexProcessError <- dbgePostErrorKGE  <- kgeasnmierr <- kcratr_odr_check  <- kcratr <- kctrec <- kcvcrv <- kcfopd <- adbdrv <- opiexe <- opiosq0 <- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main <- start


  14. Cause
  15. This Problem is caused by Storage Problem of the Database Files. The Subsystem (eg. SAN) crashed while the Database was open. The Database then crashed since the Database Files were not accessible anymore. This caused a lost Write into the Online RedoLogs and so Instance Recovery is not possible and raising the ORA-600.
  16. Solution
  17. There are two possible Solutions:

  18. 1. If you could restore your Storage Environment and the Online RedoLogs from the Time of the crash you can try a manual Recovery followed by a RESETLOGS:

  19. SQL> startup mount;

  20. SQL> recover database until cancel using backup controlfile;

  21. -> manually provide Online RedoLog containing the last (current) Sequence when asked, eg.

  22. ORA-00279: change 100000 generated at xx/xx/xxxx xx:xx:xx needed for thread 1
  23. ORA-00289: suggestion :
  24. /flash_recovery/archivelog/xxxx_xx_xx/o1_mf_1_100_%u_.arc
  25. ORA-00280: change 100000 for thread 1 is in sequence #100

  26. Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
  27. /ora/oradata/dbtest/redo04_1.rdo
  28. Log applied.
  29. Media recovery complete.

  30. SQL> alter database open resetlogs;

  31. 2. If  step1. fails or you don't have the full Set of Files you have to restore and recover the Database from a recent Backup.


  32. Alter database open fails with ORA-00600 kcratr_nab_less_than_odr [ID 1296264.1]

  33. Applies to:
  34. Oracle Server - Standard Edition - Version: 11.2.0.1 and later   [Release: 11.2 and later ]
  35. Information in this document applies to any platform.
  36. Symptoms
  37. After Power Fail Alter database open fails with

  38. ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr]

  39. Changes
  40. Power failure
  41. Cause
  42. There was a power failure causing logical corruption in controlfile
  43. Solution

  44. Option a
  45. ------------

  46. SQL>Startup mount ;

  47. SQL>Show parameter control_files

  48. Query 1
  49. ------------

  50. sql>select a.member,a.group#,b.status from v$logfile a ,v$log b where a.group#=b.group# and b.status='CURRENT'

  51. Note down the name of the redo log

  52. SQL>Shutdown abort ;

  53. Take a OS Level back of the controlfile (This is to ensure we have a backup of current state of controlfile)

  54. SQL>Startup mount ;

  55. SQL>recover database using backup controlfile until cancel ;

  56. Enter location of redo log shown as current in Query 1 when prompted for recovery

  57. Hit Enter

  58. SQL>Alter database open resetlogs ;





  59. Option b
  60. -----------


  61. Recreate the controlfile using the Controlfile recreation script

  62. With database in mount stage

  63. rman target /

  64. rman> spool log to '/tmp/rman.log';

  65. Rman> list backup ;

  66. Rman > exit

  67. Keep this log handy

  68. Go to sqlplus

  69. SQL> Show parameter control_files

  70. Keep this location handy.

  71. SQL>oradebug setmypid

  72. SQL>Alter session set tracefile_identifier='controlfilerecreate' ;

  73. SQL>Alter database backup controlfile to trace ;

  74. SQL>Oradebug tracefile_name ; --> This command will give the path and name of the trace file


  75. Go to this location ,Open this trace file and select the controlfile recreation script with NO Resetlogs option


  76. SQL>Shutdown immediate;


  77. Rename the existing controlfile to <originalname>_old ---> This is Important as we need to have a backup of existing controlfile since we plan to recreate it

  78. SQL>Startup nomount

  79. Now run the Controlfile recreation script with NO Resetlogs mode

  80. SQL>Alter database open ;

  81. For database version 10g and above

  82. Once database is opened you can recatalog the rman backup information present in the list /tmp/rman.log using

  83. Rman> Catalog start with '<location of backupiece>' ;


  84. Once the database has been opened using the option a or option b its recommended to take a hot backup of the database.
  85. Same Steps are applicable to Rac if all instance are down with same error.
复制代码

回复 显示全部楼层 道具 举报

发表于 2012-2-7 20:39:23
1.kcratr_nab_less_than_odr 可能因为 存储问题引发:

Trying to open a Database after a Crash caused by Storage Problems the Instance Recovery fails with


2.  分析trace:


Dump continued from file: /orasys/diag/rdbms/wimng2/wimng2/trace/wimng2_ora_29785.trc
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [1468], [57304], [57605], [], [], [], [], [], [], []

========= Dump for incident 16953 (ORA 600 [kcratr_nab_less_than_odr]) ========

*** 2012-02-07 13:40:54.447
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=a01hp0psv0rrh) -----
alter database open

----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+41        call     kgdsdst()            BFFE7388 ? 2 ?
ksedst1()+77         call     skdstdst()           BFFE7388 ? 0 ? 1 ? AB8E3A8 ?
                                                   853C46E ? AB8E3A8 ?
ksedst()+33          call     ksedst1()            0 ? 1 ?
dbkedDefDump()+2699  call     ksedst()             0 ? 5AF911 ? BFFE74AC ?
                                                   1007B40C ? BFFE7794 ? 0 ?
ksedmp()+47          call     dbkedDefDump()       3 ? 2 ?
ksfdmp()+59          call     ksedmp()             3EB ? BFFE92D0 ? DFBE5A3 ?
                                                   106AD160 ? 3EB ? 106AD160 ?
dbgexPhaseII()+1725  call     00000000             106AD160 ? 3EB ?
dbgexProcessError()  call     dbgexPhaseII()       B7FEB598 ? B7DBC888 ?
+2089                                              BFFECBA4 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   106AD160 ? B7FDD0D4 ? 258 ?
47
kgeadse()+286        call     00000000             106AD160 ? B7FDD0D4 ? 258 ?
kgerinv_internal()+  call     kgeadse()            106AD160 ? B7FDD0D4 ? 258 ?
47                                                 FD8DC58 ? 0 ? 4 ? BFFED45C ?
kgerinv()+41         call     kgerinv_internal()   106AD160 ? B7FDD0D4 ?
                                                   FD8DC58 ? 258 ? 0 ? 4 ?
                                                   BFFED45C ?
kgeasnmierr()+47     call     kgerinv()            106AD160 ? B7FDD0D4 ?
                                                   FD8DC58 ? 4 ? BFFED45C ?
kcratr_odr_check()+  call     kgeasnmierr()        106AD160 ? B7FDD0D4 ?
204                                                FD8DC58 ? 4 ? 0 ? 1 ?
kcratr()+1806        call     kcratr_odr_check()   BFFED6EC ? 0 ? F386D53 ? 0 ?
                                                   9 ? F386D53 ?
kctrec()+9311        call     kcratr()             BFFED6EC ? BFFF45D0 ? 0 ?
kcvcrv()+5906        call     kctrec()             BFFF5868 ? 0 ? B7FD0BD0 ?
                                                   B7FD122C ? B7E1BE00 ? 0 ?


Kernel function kcratr 是forward recovery algorithm的起点 kcrfr.c Kernel Cache Redo

[kcratr_nab_less_than_odr], [1], [1468], [57304], [57605], [], [], [], [], [], [], []的argument 定义

(a) redo thread id
(b) redo log sequence
(c) NAB
(d) on-disk rda  block number


这个trace 日志里有一个过程 是比较理想的 rolling forward 前滚的教学演示:

  1. 2012-02-07 13:40:53.366569 :80000687:KFNU:kfn.c@2200:kfnPrepareASM(): kfnPrepareASM force=0 state_kfnsg=0x7
  2. 2012-02-07 13:40:53.366569*:80000688:CACHE_RCV:kcv.c@16365:kcvcrv(): kcvcrv: Calling kctrec()
  3. 2012-02-07 13:40:53.366569*:80000689:CACHE_RCV:kct.c@4163:kctrec(): kctrec: Entering kctrec()
  4. 2012-02-07 13:40:53.413557*:8000068A:CACHE_RCV:kct.c@4271:kctrec(): kctrec: thread 1 cf thread ckpt: logseq 1468, block 2,scn 25917106
  5. 2012-02-07 13:40:53.413557*:8000068B:CACHE_RCV:kct.c@4285:kctrec(): kctrec: Checkpoint progress record contents
  6. 2012-02-07 13:40:53.413557*:8000068C:CACHE_RCV:kct.c@4287:kctrec(): kctrec:    kcccpsta 2, kcccpflg 0, kcccpdrt 48, kcccplrba 0x0005bc.0000dfd8.0000 kcccpodr 0x0005bc.0000e105.0000
  7. 2012-02-07 13:40:53.413557*:8000068D:CACHE_RCV:kct.c@4299:kctrec(): kctrec:    kcccpods 0x0000.018be694, kcccpodt 773934914, kcccprlc 753362405, kcccprls 0x0000.00000001, kcccphbt 774572255, kcccpmid 1635578584
  8. 2012-02-07 13:40:53.413557*:8000068E:CACHE_RCV:kct.c@4311:kctrec(): kctrec:    kcccpsdr 0x0005bc.00000001.0000, kcccpfbend (krfbafln 0, krfbathr 0, krfbaseq 0, krfbabno 0 krfbabof 0), kcccprsv 0
  9. 2012-02-07 13:40:53.413557*:8000068F:CACHE_RCV:kct.c@4360:kctrec(): kctrec: cache-low rba: logseq 1468, block 57304
  10. 2012-02-07 13:40:53.413557*:80000690:CACHE_RCV:kct.c@4374:kctrec(): kctrec: on-disk rba: logseq 1468, block 57605, scn 25945748
  11. 2012-02-07 13:40:53.413557*:80000691:CACHE_RCV:kct.c@4450:kctrec(): kctrec: Current ckpt RBA < cache-low RBA, adjusted ckpt RBA to cache low RBA, zeroed ckpt SCN and timestamp to 0
  12. 2012-02-07 13:40:53.413557*:80000692:CACHE_RCV:kct.c@4604:kctrec(): kctrec: Recovery starting point for thread 1 -  logseq 1468, block 57304, scn 0
  13. 2012-02-07 13:40:53.449498*:80000693:CACHE_RCV:kct.c@4664:kctrec(): kctrec: Do thread recovery, calling kcratr()
  14. 2012-02-07 13:40:53.456376 :80000694:CACHE_RCV:kcra.c@1517:kcratr(): kcratr: Entering kcratr()
  15. 2012-02-07 13:40:53.458293 :80000695:CACHE_RCV:kcra.c@1541:kcratr(): kcratr: Started redo scan
  16. 2012-02-07 13:40:53.458293*:80000696:CACHE_RCV:kcra.c@1862:kcratr_scan(): kcratr_scan: Entering kcratr_scan()
  17. 2012-02-07 13:40:53.458293*:80000697:CACHE_RCV:kcra.c@2000:kcratr_scan(): kcratr_scan: Log not open, opening online log for thread 1, RBA 0x0005bc.0000dfd8.0000, SCN 0x0000.00000000
  18. 2012-02-07 13:40:53.694427*:800006A4:CACHE_RCV:kcra.c@2036:kcratr_scan(): kcratr_scan: End of curr thread reached
  19. 2012-02-07 13:40:53.694427*:800006A5:CACHE_RCV:kcra.c@2038:kcratr_scan(): kcratr_scan:    end rcv RBA 0x0005bc.0000dfd8.   0, end rcv SCN 0x0000.018b76b3 end SCN timestamp 773895659, NAB 57304
  20. 2012-02-07 13:40:53.694427*:800006A6:CACHE_RCV:kcra.c@2048:kcratr_scan(): kcratr_scan:   (Previous) highest SCN seen in the redo stream 0x0000.00000000
  21. 2012-02-07 13:40:53.694427*:800006A7:CACHE_RCV:kcra.c@2162:kcratr_scan(): kcratr_scan: Exiting kcratr_scan()
  22. 2012-02-07 13:40:53.702245 :800006A8:CACHE_RCV:kcra.c@1559:kcratr(): kcratr: Completed redo scan, read 0 KB redo, 0 data blocks need recovery
复制代码


这里可以看到 kcratr_scan 负责scan redo log 读取了 redo logfile header ,发现NAB =57304 这个值小与 odr( on disk rba redo block adress) ,
说明 redo logfile header存在讹误,于是报 600 kcratr_nab_less_than_odr错误

回复 显示全部楼层 道具 举报

发表于 2012-2-7 20:42:18
1.
尝试用MOS的方案1解决该问题:


SQL> startup mount;

SQL> recover database until cancel using backup controlfile;

SQL> alter database open resetlogs;

回复 显示全部楼层 道具 举报

发表于 2012-2-7 20:52:25
谢谢Liu,明天回去后问下测试环境的存储

生产环境是HP 380服务器,5块磁盘做了RAID5

回复 显示全部楼层 道具 举报

发表于 2012-2-8 09:24:28
Liu,生产环境是HP DL388G7,4块硬盘做的RAID5.

测试环境是虚拟环境,虚拟化出来的。

回复 显示全部楼层 道具 举报

发表于 2012-2-8 10:45:18
问题解决了。
SQL> recover database until cancel using backup controlfile;
还是报错,后来我参照了惜分飞的方法,重做了controlfile
http://www.xifenfei.com/2347.html

谢谢Liu,谢谢分飞兄

回复 显示全部楼层 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-7-23 23:44 , Processed in 0.055307 second(s), 25 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569