Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

87

积分

0

好友

3

主题
1#
发表于 2012-3-13 15:05:52 | 查看: 12469| 回复: 20
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production

cat alert.log
Mon Mar 12 22:02:10 CST 2012
Errors in file /app/oracle/admin/tjfxdb/bdump/tjfxdb2_j002_28238.trc:
ORA-12012: error on auto execute of job 8939
ORA-01187: cannot read from file ORA-01187: cannot read from file 154 because it failed verification tests
ORA-01110: data file 154: '/dev/raw/raw191'
ORA-06512: at "SYS.PRVT_ADVISOR", line 1709
ORA-06512: at "SYS.DBMS_ADVISOR", line 186
ORA-06512: at "SYS.DBMS_SPACE", line 1500
ORA-06512: at "SYS.DBMS_SPACE", line 1566
because it failed verification tests

[oracle@tjfxdb2 bdump]$ cat  /app/oracle/admin/tjfxdb/bdump/tjfxdb2_j002_28238.trc
[code]
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /app/oracle/product/10.2.0/db
System name:    Linux
Node name:      tjfxdb2
Release:        2.6.18-164.el5
Version:        #1 SMP Tue Aug 18 15:51:48 EDT 2009
Machine:        x86_64
Instance name: tjfxdb2
Redo thread mounted by this instance: 2
Oracle process number: 37
Unix process pid: 28238, image: [email=oracle@tjfxdb2]oracle@tjfxdb2[/email] (J002)
*** ACTION NAME:(AUTO_SPACE_ADVISOR_JOB) 2012-03-12 22:02:10.157
*** MODULE NAME:(DBMS_SCHEDULER) 2012-03-12 22:02:10.157
*** SERVICE NAME:(SYS$USERS) 2012-03-12 22:02:10.157
*** SESSION ID:(2147.53861) 2012-03-12 22:02:10.157
*** 2012-03-12 22:02:10.157
ORA-12012: error on auto execute of job 8939
ORA-01187: cannot read from file ORA-01187: cannot read from file 154 because it failed verification tests
ORA-01110: data file 154: '/dev/raw/raw191'
ORA-06512: at "SYS.PRVT_ADVISOR", line 1709
ORA-06512: at "SYS.DBMS_ADVISOR", line 186
ORA-06512: at "SYS.DBMS_SPACE", line 1500
ORA-06512: at "SYS.DBMS_SPACE", line 1566
because it failed verification tests

$ ll /dev/raw/raw191
crw------- 1 oracle dba 162, 191 03-10 06:10 /dev/raw/raw191

SQL>select file#,creation_time,status,enabled,name,aux_name from v$datafile where name like '/dev/raw/raw191%'
    FILE#  CREATION_TIME          STATUS       ENABLED                 NAME AUX_NAME
    154      2011-8-19 22:01:27        ONLINE      READ WRITE           /dev/raw/raw191 NONE

请教,下一步要怎么检测,请给下思路?
2#
发表于 2012-3-13 15:12:46
http://docs.oracle.com/cd/B19306 ... torage.htm#CDECEBGH

Raw partitions: Raw partitions are disk partitions that are not mounted and written to using the Linux file system, but instead are accessed directly by the application.

Raw partitions (database files only): A raw partition is required for each database file.

On x86 and Itanium systems, although Red Hat Enterprise Linux 3 and SUSE Linux Enterprise Server provide a Logical Volume Manager (LVM), this LVM is not cluster-aware. For this reason, Oracle does not support the use of logical volumes with RAC on x86 and Itanium systems for either Oracle Clusterware or database files.On IBM zSeries based systems, Oracle supports raw logical volumes.

[ 本帖最后由 武汉-SSH 于 2012-3-13 20:00 编辑 ]

回复 只看该作者 道具 举报

3#
发表于 2012-3-13 15:18:42
Error:  ORA-1187
Text:   cannot read from file <name> because it failed verification tests
-------------------------------------------------------------------------------
Cause:  The datafile did not pass the checks to ensure it is part of the
        database.
        Reading the file is not allowed until it is verified.
Action: Make the correct file available to database.
        Then either open the database or execute ALTER SYSTEM CHECK DATAFILES.





oradebug setmypid;

oradebug unlimit;

ALTER SYSTEM CHECK DATAFILES;


oradebug dump file_hdrs 10;

alter system dump datafile '/dev/raw/raw191'  block 9;
alter system dump datafile '/dev/raw/raw191'  block 3;

oradebug tracefile_name;                         《================ 上传打包的trace文件


rman target /

list backup of datafile  154;
list copy of datafile        154;

回复 只看该作者 道具 举报

4#
发表于 2012-3-13 15:40:05
ALTER SYSTEM CHECK DATAFILES;

THEN:

ERROR at line 27:
ORA-01187: cannot read from file 152 because it failed verification tests
ORA-01110: data file 152: '/dev/raw/raw189'



RMAN>list backup of datafile 154;
using target database control file instead of recovery catalog
RMAN> list copy of datafile 154;

RMAN>

tjfxdb2_ora_21935.rar

14.94 KB, 下载次数: 1770

回复 只看该作者 道具 举报

5#
发表于 2012-3-13 16:02:29
ODM Finding:

from file_hdrs trace:


DATA FILE #154:
  (name #180) /dev/raw/raw191
creation size=1309440 block size=8192 status=0x40e head=180 tail=180 dup=1
tablespace 16, index=15 krfil=154 prev_file=153
unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
Checkpoint cnt:125 scn: 0x0001.f3902d05 03/12/2012 08:03:10
Stop scn: 0xffff.ffffffff 08/19/2011 22:01:27
Creation Checkpointed at scn:  0x0000.c271eade 08/19/2011 22:01:27
thread:1 rba:(0x3dd.337ed.10)


以上是控制文件中的内容

Hot Backup end marker scn: 0x0000.00000000
aux_file is NOT DEFINED
File header version cannot be determined due to corruption   ==》文件头存在讹误
Dump may be suspect
V10 STYLE FILE HEADER:
        Compatibility Vsn = 169870592=0xa200500
        Db ID=2466222416=0x92ff9150, Db Name='TJFXDB'
        Activation ID=0=0x0
        Control Seq=116972=0x1c8ec, File size=1309440=0x13fb00
        File Number=161, Blksiz=8192, File Type=3 DATA           ==> 161
Tablespace #16 - FRIENDDATA1  rel_fn:161
Creation   at   scn: 0x0000.c2730f61 08/19/2011 22:08:39
Backup taken at scn: 0x0000.00000000 01/01/1988 00:00:00 thread:0
reset logs count:0x2c5af5d0 scn: 0x0000.00000001 reset logs terminal rcv data:0x0 scn: 0x0000.00000000
prev reset logs count:0x0 scn: 0x0000.00000000 prev reset logs terminal rcv data:0x0 scn: 0x0000.00000000
recovered at 01/01/1988 00:00:00
status:0x4 root dba:0x00000000 chkpt cnt: 250 ctl cnt:249
begin-hot-backup file size: 0
Checkpointed at scn:  0x0001.f59891cb 03/13/2012 00:55:22
thread:1 rba:(0x11a1.2.10)


DATA FILE #154 在datafile header中的 File Number 居然变成了 161


其他file header 信息也与 FILE NUMBER 161 的一样



以下是 FILE NUMBER 161 的 datafile header信息:


Hot Backup end marker scn: 0x0000.00000000
aux_file is NOT DEFINED
V10 STYLE FILE HEADER:
        Compatibility Vsn = 169870592=0xa200500
        Db ID=2466222416=0x92ff9150, Db Name='TJFXDB'
        Activation ID=0=0x0
        Control Seq=116972=0x1c8ec, File size=1309440=0x13fb00
        File Number=161, Blksiz=8192, File Type=3 DATA
Tablespace #16 - FRIENDDATA1  rel_fn:161
Creation   at   scn: 0x0000.c2730f61 08/19/2011 22:08:39
Backup taken at scn: 0x0000.00000000 01/01/1988 00:00:00 thread:0
reset logs count:0x2c5af5d0 scn: 0x0000.00000001 reset logs terminal rcv data:0x0 scn: 0x0000.00000000
prev reset logs count:0x0 scn: 0x0000.00000000 prev reset logs terminal rcv data:0x0 scn: 0x0000.00000000
recovered at 01/01/1988 00:00:00
status:0x4 root dba:0x00000000 chkpt cnt: 250 ctl cnt:249
begin-hot-backup file size: 0
Checkpointed at scn:  0x0001.f59891cb 03/13/2012 00:55:22
thread:1 rba:(0x11a1.2.10)


Creation   at   scn: 0x0000.c2730f61 08/19/2011 22:08:39
Creation   at   scn: 0x0000.c2730f61 08/19/2011 22:08:39


在看 /dev/raw/raw191  转储的第9个块的信息:

Start dump data block from file /dev/raw/raw191 minblk 9 maxblk 9
V10 STYLE FILE HEADER:
        Compatibility Vsn = 169870592=0xa200500
        Db ID=2466222416=0x92ff9150, Db Name='TJFXDB'
        Activation ID=0=0x0
        Control Seq=116972=0x1c8ec, File size=1309440=0x13fb00
        File Number=161, Blksiz=8192, File Type=3 DATA
Dump all the blocks in range:
buffer tsn: 16 rdba: 0x28400009 (161/9)                 
scn: 0x0001.f5936be1 seq: 0x02 flg: 0x04 tail: 0x6be12002
frmt: 0x02 chkval: 0xaf80 type: 0x20=FIRST LEVEL BITMAP BLOCK

rdba: 0x28400009 (161/9)   FILE NUMBER 也是 161

数据文件 create 的scn 和 时间 分秒不差, 大致可以确定是 file 161 (/dev/raw/raw201)覆盖了 DATA FILE #154: (/dev/raw/raw191), 当然也有可能是 raw 绑错了设备, 把一个block device 绑到了2个不同名的raw device 上。

回复 只看该作者 道具 举报

6#
发表于 2012-3-13 16:16:17
注意 DATA FILE# 152 /dev/raw/raw189也存在类似的情况:



DATA FILE #152:
  (name #177) /dev/raw/raw189
creation size=1309440 block size=8192 status=0x40e head=177 tail=177 dup=1
tablespace 16, index=15 krfil=152 prev_file=0
unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
Checkpoint cnt:125 scn: 0x0001.f3902d05 03/12/2012 08:03:10
Stop scn: 0xffff.ffffffff 08/19/2011 21:52:40
Creation Checkpointed at scn:  0x0000.c26feced 08/19/2011 21:52:40
thread:1 rba:(0x3dd.32ed0.10)

File header version cannot be determined due to corruption
Dump may be suspect
V10 STYLE FILE HEADER:
    Compatibility Vsn = 169870592=0xa200500
    Db ID=2466222416=0x92ff9150, Db Name='TJFXDB'
    Activation ID=0=0x0
    Control Seq=116828=0x1c85c, File size=1309440=0x13fb00
    File Number=157, Blksiz=8192, File Type=3 DATA                       从 152 变成了 157

回复 只看该作者 道具 举报

7#
发表于 2012-3-13 23:41:10
On x86 and Itanium systems, although Red Hat Enterprise Linux 3 and SUSE Linux Enterprise Server provide a Logical Volume Manager (LVM), this LVM is not cluster-aware. For this reason, Oracle does not support the use of logical volumes with RAC on x86 and Itanium systems for either Oracle Clusterware or database files.

On IBM zSeries based systems, Oracle supports raw logical volumes.

LVM is not cluster aware
  Be very careful doing this, LVM is not currently cluster-aware and it is very easy to lose all your data.

AlthoughRed Hat Enterprise Linux 3.0 and SLES 8 provide a Logical Volume Manager (LVM),this LVM is not cluster aware. For this reason, Oracle does not support the useof logical volumes with RAC for either CRS or database files on Linux. The useof logical volumes for raw devices is supported only for single-instancedatabases. They are not supported for RAC databases.



http://www.itpub.net/thread-634625-4-1.html

http://xzh2000.itpub.net/post/96/211690

http://tldp.org/HOWTO/LVM-HOWTO/sharinglvm1.html

查了一些信息,LV 到底是不支持做RAC呢?还是仅仅是不受厂商支持呢?

对于那些已经采用这种方式做RAC的环境,到底会存在什么隐患呢?

如果用户只是在安装时采用这种方式,生产时并不会对LV、RAW做任何改变,是否就没有问题呢?

[ 本帖最后由 武汉-SSH 于 2012-3-13 23:56 编辑 ]

回复 只看该作者 道具 举报

8#
发表于 2012-3-14 17:35:57
增加一些素材吧,看看,我们最终能否定位问题产生的原因。

实现:
/etc/rc.local
raw /dev/raw/raw161 /dev/mapper/vgdata-data118
raw /dev/raw/raw162 /dev/mapper/vgdata-data119
。。。。。。

lvm1.tar.gz (106.23 KB, 下载次数: 1779)
lvm2.tar.gz (104.39 KB, 下载次数: 1683)
tjfxdb1_ora_23294.zip (20.09 KB, 下载次数: 1690)
tjfxdb2_ora_21935.zip (20.48 KB, 下载次数: 1623)
alert_tjfxdb1.zip (639 Bytes, 下载次数: 1691)
alert_tjfxdb2.zip (40.75 KB, 下载次数: 1678)

[ 本帖最后由 武汉-SSH 于 2012-3-14 17:39 编辑 ]

回复 只看该作者 道具 举报

9#
发表于 2012-3-14 17:44:51
DATA FILE #152:
        File Number=157, Blksiz=8192, File Type=3 DATA
DATA FILE #153:
        File Number=159, Blksiz=8192, File Type=3 DATA
DATA FILE #154:
        File Number=161, Blksiz=8192, File Type=3 DATA
DATA FILE #155:
        File Number=163, Blksiz=8192, File Type=3 DATA
DATA FILE #156:
        File Number=165, Blksiz=8192, File Type=3 DATA
DATA FILE #157:
        File Number=169, Blksiz=8192, File Type=3 DATA

这是二号节点的,一号节点不存在这种状况。

回复 只看该作者 道具 举报

10#
发表于 2012-3-14 17:45:23
ORA-01157: cannot identify/lock data file 149 - see DBWR trace file
ORA-01110: data file 149: '/dev/raw/raw184'
*** Error 1157 in open/read file # 149 ***

这也是二号节点的,在一号节点,该文件拿来做了TEMP,但是一号节点是如下信息:
DATA FILE #149:
  (name #175) /dev/raw/raw184
creation size=1309440 block size=8192 status=0x81c head=175 tail=175 dup=1
tablespace 13, index=12 krfil=149 prev_file=0
unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
Checkpoint cnt:2 scn: 0x0000.c26fd19e 08/19/2011 20:49:36
Stop scn: 0x0000.c26fe2f7 08/19/2011 21:30:59
Creation Checkpointed at scn:  0x0000.c26fd19d 08/19/2011 20:49:36
thread:1 rba:(0x3dd.3205a.10)
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

File header version cannot be determined due to corruption
Dump may be suspect
V10 STYLE FILE HEADER:
        Compatibility Vsn = 169870592=0xa200500
        Db ID=2466222416=0x92ff9150, Db Name='TJFXDB'
        Activation ID=0=0x0
        Control Seq=86245=0x150e5, File size=2620160=0x27fb00
        File Number=10, Blksiz=8192, File Type=6 TEMP FILE

[ 本帖最后由 武汉-SSH 于 2012-3-14 17:56 编辑 ]

回复 只看该作者 道具 举报

11#
发表于 2012-3-14 17:48:53
两边DUMP已经完全是不同的文件了。
X~]YYTAH0ZTN04]NLB[35C1.jpg

回复 只看该作者 道具 举报

12#
发表于 2012-3-14 17:55:39
Good!  加精!

回复 只看该作者 道具 举报

13#
发表于 2012-3-14 21:38:51
靠,还真有官方案例,不过是9i 时代的产物。

出处:
Configuring Raw Devices for Real Application Clusters on Linux [ID 246205.1]        

。。。。。。。。。。。。。。。。。。。。。。

The bind commands look like:

    /usr/sbin/raw /dev/raw1 /dev/oracle/db_name_raw_system_411m
    /usr/sbin/raw /dev/raw2 /dev/oracle/db_name_raw_users_120m
    /usr/sbin/raw /dev/raw3 /dev/oracle/db_name_raw_temp_41m
。。。。。。。。。。。。。。。。。。。。。。。。。。。

[ 本帖最后由 武汉-SSH 于 2012-3-14 22:03 编辑 ]

回复 只看该作者 道具 举报

14#
发表于 2012-3-14 21:48:32

回复 13# 的帖子

没有实践过, 即使可以也不推荐用LV 。

回复 只看该作者 道具 举报

15#
发表于 2012-3-14 22:24:39
出问题的是这几个文件,
BTJ8VS(%7~JEZPM{I8@0}83.jpg
但是这几个文件的绑定号确没问题
$N~K30_89MGYDWSSD@}C0_G.jpg

回复 只看该作者 道具 举报

16#
发表于 2012-3-14 22:34:47
二号节点没有 /dev/raw/raw184 /dev/raw/raw188 文件
H9%QH43PTXF_D8P)YELC58L.jpg


LZ 说 这是因为二号节点的被手工注释掉了,先忽略。。。。

[ 本帖最后由 武汉-SSH 于 2012-3-14 22:49 编辑 ]

回复 只看该作者 道具 举报

17#
发表于 2012-3-14 22:37:12
oradebug setmypid
oradebug dump controlf 10

附件为trace文件及/dev/mapper/vgdata-data*

两边的raw对应貌似也没有错。

vgdata.rar

1.38 KB, 下载次数: 1557

tjfxdb2_ora_18503.rar

107.46 KB, 下载次数: 1722

回复 只看该作者 道具 举报

18#
发表于 2012-3-14 22:41:25
oradebug dump controlf 10 ==>

CONTROLF 中的 datafile header记录应当是一致的,仅看controlfile控制文件中的记录意义不大。

回复 只看该作者 道具 举报

19#
发表于 2012-3-14 22:51:24
补rac1的 controlfile dump

请问:

文件号两边不对称,可能造成的原因是什么? 文件号是不是写在数据文件头文件呢?

如果节点2(问题节点)重启实例、重启系统,会有多大的风险,会不会造成数据覆盖?

tjfxdb1_ora_11332.rar

107.29 KB, 下载次数: 1747

回复 只看该作者 道具 举报

20#
发表于 2012-3-14 23:04:11
"文件号是不是写在数据文件头文件呢?"

数据文件头中有。

"文件号两边不对称,可能造成的原因是什么?“

我觉得最可能的原因 还是 raw设备绑定有问题,但是你的raw太多了 很难查

"会不会造成数据覆盖?"

ORA-01187 已经意识到FILE HEADER的问题,一般不会读写这些有问题的数据文件了,覆盖的可能性不大。

回复 只看该作者 道具 举报

21#
发表于 2012-3-16 22:16:46
最新动作-
将有问题的节点,系统重新启动(无其他操作)

结果喜人,表空间查询正常,无ORA-01187报错

附件为file_hds 及/dev/raw/raw191的trace文件

DATA FILE #154:
  (name #180) /dev/raw/raw191
creation size=1309440 block size=8192 status=0x40e head=180 tail=180 dup=1
tablespace 16, index=15 krfil=154 prev_file=153
unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
Checkpoint cnt:130 scn: 0x0001.fee673b0 03/16/2012 08:08:07

Hot Backup end marker scn: 0x0000.00000000
aux_file is NOT DEFINED
V10 STYLE FILE HEADER:
Compatibility Vsn = 169870592=0xa200500
Db ID=2466222416=0x92ff9150, Db Name='TJFXDB'
Activation ID=0=0x0
Control Seq=118653=0x1cf7d, File size=1309440=0x13fb00
File Number=154, Blksiz=8192, File Type=3 DATA        ==》变为154,正常了

由此看,raw绑定lv,确实出现了问题,但是场景无法复现,找不到原因了。。。

以后我们要使用asm或者cluster filesystem了,摒弃lvm



谢谢支持!

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-11-15 10:08 , Processed in 0.060237 second(s), 24 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569