关于修改11gRAC IP地址故障
数据库版本11.2.0.1操作系统版本:OEL5.5
节点数量:4个
由于业务需要,RAC 调整IP地址
修改完毕后
# su - griid
su: user griid does not exist
# su - grid
$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.DG1.dg ora....up.type ONLINE ONLINE bxrac01
ora.DG2.dg ora....up.type ONLINE ONLINE bxrac01
ora.DG3.dg ora....up.type OFFLINE OFFLINE
ora....ER.lsnr ora....er.type ONLINE ONLINE bxrac01
ora....N1.lsnr ora....er.type ONLINE ONLINE bxrac03
ora.OCRVDG.dg ora....up.type ONLINE ONLINE bxrac01
ora.asm ora.asm.type ONLINE ONLINE bxrac01
ora.bxrac.db ora....se.type ONLINE OFFLINE
ora....SM1.asm application ONLINE ONLINE bxrac01
ora....01.lsnr application ONLINE ONLINE bxrac01
ora....c01.gsd application ONLINE ONLINE bxrac01
ora....c01.ons application ONLINE ONLINE bxrac01
ora....c01.vip ora....t1.type ONLINE ONLINE bxrac01
ora....SM2.asm application ONLINE ONLINE bxrac02
ora....02.lsnr application ONLINE ONLINE bxrac02
ora....c02.gsd application ONLINE ONLINE bxrac02
ora....c02.ons application ONLINE ONLINE bxrac02
ora....c02.vip ora....t1.type ONLINE ONLINE bxrac02
ora....SM3.asm application ONLINE ONLINE bxrac03
ora....03.lsnr application ONLINE ONLINE bxrac03
ora....c03.gsd application ONLINE ONLINE bxrac03
ora....c03.ons application ONLINE ONLINE bxrac03
ora....c03.vip ora....t1.type ONLINE ONLINE bxrac03
ora....SM4.asm application ONLINE ONLINE bxrac04
ora....04.lsnr application ONLINE ONLINE bxrac04
ora....c04.gsd application ONLINE ONLINE bxrac04
ora....c04.ons application ONLINE ONLINE bxrac04
ora....c04.vip ora....t1.type ONLINE ONLINE bxrac04
ora.eons ora.eons.type ONLINE ONLINE bxrac01
ora.gsd ora.gsd.type ONLINE ONLINE bxrac01
ora....network ora....rk.type ONLINE ONLINE bxrac01
ora.oc4j ora.oc4j.type ONLINE ONLINE bxrac03
ora.ons ora.ons.type ONLINE ONLINE bxrac01
ora....ry.acfs ora....fs.type ONLINE ONLINE bxrac01
ora.scan1.vip ora....ip.type ONLINE ONLINE bxrac03
db 无法启动
# ./srvctl start database -d bxrac
PRCR-1079 : Failed to start resource ora.bxrac.db
CRS-5016: Process "/u01/app/11.2.0/grid/bin/setasmgidwrap" spawned by agent "/u01/app/11.2.0/grid/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/bxrac03/agent/crsd/oraagent_oracle/oraagent_oracle.log"
/u01/app/11.2.0/grid/bin/setasmgidwrap: line 67: /opt/oracle/bin/setasmgid: Permission denied
CRS-5016: Process "/u01/app/11.2.0/grid/bin/setasmgidwrap" spawned by agent "/u01/app/11.2.0/grid/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/bxrac04/agent/crsd/oraagent_oracle/oraagent_oracle.log"
/u01/app/11.2.0/grid/bin/setasmgidwrap: line 67: /opt/oracle/bin/setasmgid: Permission denied
CRS-5016: Process "/u01/app/11.2.0/grid/bin/setasmgidwrap" spawned by agent "/u01/app/11.2.0/grid/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/bxrac02/agent/crsd/oraagent_oracle/oraagent_oracle.log"
/u01/app/11.2.0/grid/bin/setasmgidwrap: line 67: /opt/oracle/bin/setasmgid: Permission denied
CRS-5016: Process "/u01/app/11.2.0/grid/bin/setasmgidwrap" spawned by agent "/u01/app/11.2.0/grid/bin/oraagent.bin" for action "start" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/bxrac01/agent/crsd/oraagent_oracle/oraagent_oracle.log"
/u01/app/11.2.0/grid/bin/setasmgidwrap: line 67: /opt/oracle/bin/setasmgid: Permission denied
ORA-01157: cannot identify/lock data file 36 - see DBWR trace file
ORA-01110: data file 36: '+DG3/bxrac/datafile/dqsidata01'
CRS-2674: Start of 'ora.bxrac.db' on 'bxrac04' failed
CRS-2632: There are no more servers to try to place resource 'ora.bxrac.db' on that would satisfy its placement policy
ORA-01157: cannot identify/lock data file 36 - see DBWR trace file
ORA-01110: data file 36: '+DG3/bxrac/datafile/dqsidata01'
CRS-2674: Start of 'ora.bxrac.db' on 'bxrac03' failed
ORA-01157: cannot identify/lock data file 36 - see DBWR trace file
ORA-01110: data file 36: '+DG3/bxrac/datafile/dqsidata01'
CRS-2674: Start of 'ora.bxrac.db' on 'bxrac02' failed
ORA-01157: cannot identify/lock data file 36 - see DBWR trace file
ORA-01110: data file 36: '+DG3/bxrac/datafile/dqsidata01'
CRS-2674: Start of 'ora.bxrac.db' on 'bxrac01' failed
su - oracle
ls -l $ORACLE_HOME/bin/oracle*
$ ls -l $ORACLE_HOME/bin/oracle*
-r-sr-s--x 1 oracle asmadmin 210824720 Jan 7 2012 /u01/app/oracle/product/11.2.0/bxrac/bin/oracle
-rwxr-x--- 1 oracle oinstall 0 Aug 15 2009 /u01/app/oracle/product/11.2.0/bxrac/bin/oracleO ora.DG3.dg ora....up.type OFFLINE OFFLINE
ORA-01157: cannot identify/lock data file 36 - see DBWR trace file
ORA-01110: data file 36: '+DG3/bxrac/datafile/dqsidata01'
+DG3的状态为什么是OFFLINE? DG3是从盘阵新分配给数据库两个vidisk (/dev/raw/raw10,/dev/raw/raw11)组成
是在修改ip之前干刚创建的。
当时的创建步骤如下:
su - grid
SQL> conn / as sysasm
Connected.
SQL> show sga
Total System Global Area 283930624 bytes
Fixed Size 2212656 bytes
Variable Size 256552144 bytes
ASM Cache 25165824 bytes
SQL> create diskgroup DG3 external redundancy disk '/dev/raw/raw10';
SQL> alter diskgroup DG3 add disk '/dev/raw/raw11';
SQL> col path for a30
SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,NAME,PATH from v$asm_disk;
SQL> set line 200
SQL> /
GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU NAME PATH
------------ ----------- ------- ------------ ------------------------------ ------------------------------
0 1 CLOSED MEMBER /dev/raw/raw9
4 1 CACHED MEMBER DG3_0001 /dev/raw/raw11
1 2 CACHED MEMBER DG1_0002 /dev/raw/raw8
3 1 CACHED MEMBER OCRVDG_0001 /dev/raw/raw7
3 0 CACHED MEMBER OCRVDG_0000 /dev/raw/raw6
2 2 CACHED MEMBER DG2_0002 /dev/raw/raw5
2 1 CACHED MEMBER DG2_0001 /dev/raw/raw4
2 0 CACHED MEMBER DG2_0000 /dev/raw/raw3
1 1 CACHED MEMBER DG1_0001 /dev/raw/raw2
1 0 CACHED MEMBER DG1_0000 /dev/raw/raw1
4 0 CACHED MEMBER DG3_0000 /dev/raw/raw10
SQL> select disk_number, path, name, total_mb, free_mb from v$asm_disk where group_number =4;
DISK_NUMBER PATH NAME TOTAL_MB FREE_MB
----------- ------------------------------ ------------------------------ ---------- ----------
1 /dev/raw/raw11 DG3_0001 307196 307168
0 /dev/raw/raw10 DG3_0000 307196 307168
修改完集群ip地址reboot 服务器后
现在的状态如下:
SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,NAME,PATH from v$asm_disk;
GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU NAME
------------ ----------- ------- ------------ ------------------------------
PATH
------------------------------
0 0 CLOSED MEMBER
/dev/raw/raw11
0 1 CLOSED MEMBER
/dev/raw/raw10
2 3 CACHED MEMBER DG1_0003
/dev/raw/raw9
GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU NAME
------------ ----------- ------- ------------ ------------------------------
PATH
------------------------------
2 2 CACHED MEMBER DG1_0002
/dev/raw/raw8
1 1 CACHED MEMBER OCRVDG_0001
/dev/raw/raw7
1 0 CACHED MEMBER OCRVDG_0000
/dev/raw/raw6
GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU NAME
------------ ----------- ------- ------------ ------------------------------
PATH
------------------------------
3 2 CACHED MEMBER DG2_0002
/dev/raw/raw5
3 1 CACHED MEMBER DG2_0001
/dev/raw/raw4
3 0 CACHED MEMBER DG2_0000
/dev/raw/raw3
GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU NAME
------------ ----------- ------- ------------ ------------------------------
PATH
------------------------------
2 1 CACHED MEMBER DG1_0001
/dev/raw/raw2
2 0 CACHED MEMBER DG1_0000
/dev/raw/raw1
11 rows selected.
SQL>
SQL>
SQL> !
$ ls /dev/raw* -l
crw------- 1 root root 162, 0 Feb 7 15:43 /dev/rawctl
/dev/raw:
total 0
crw-rw---- 1 grid asmadmin 162, 1 Feb 7 16:14 raw1
crw-rw---- 1 grid asmadmin 162, 10 Feb 7 15:44 raw10
crw-rw---- 1 grid asmadmin 162, 11 Feb 7 15:44 raw11
crw-rw---- 1 grid asmadmin 162, 2 Feb 7 15:48 raw2
crw-rw---- 1 grid asmadmin 162, 3 Feb 7 16:14 raw3
crw-rw---- 1 grid asmadmin 162, 4 Feb 7 16:14 raw4
crw-rw---- 1 grid asmadmin 162, 5 Feb 7 16:14 raw5
crw-rw---- 1 grid asmadmin 162, 6 Feb 10 10:40 raw6
crw-rw---- 1 grid asmadmin 162, 7 Feb 7 15:47 raw7
crw-rw---- 1 grid asmadmin 162, 8 Feb 7 15:48 raw8
crw-rw---- 1 grid asmadmin 162, 9 Feb 7 16:14 raw9
$ exit
exit
SQL> ALTER DISKGROUP DG3 ADD DISK '/dev/raw/raw10' force;
ALTER DISKGROUP DG3 ADD DISK '/dev/raw/raw10' force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15001: diskgroup "DG3" does not exist or is not mounted
SQL> create diskgroup DG3 external redundancy disk '/dev/raw/raw10';
create diskgroup DG3 external redundancy disk '/dev/raw/raw10'
*
ERROR at line 1:
ORA-15260: permission denied on ASM disk group
SQL> !
$ cd /dev/raw
$ ll raw*
crw-rw---- 1 grid asmadmin 162, 1 Feb 7 16:14 raw1
crw-rw---- 1 grid asmadmin 162, 10 Feb 7 15:44 raw10
crw-rw---- 1 grid asmadmin 162, 11 Feb 7 15:44 raw11
crw-rw---- 1 grid asmadmin 162, 2 Feb 7 15:48 raw2
crw-rw---- 1 grid asmadmin 162, 3 Feb 7 16:14 raw3
crw-rw---- 1 grid asmadmin 162, 4 Feb 7 16:14 raw4
crw-rw---- 1 grid asmadmin 162, 5 Feb 7 16:14 raw5
crw-rw---- 1 grid asmadmin 162, 6 Feb 10 10:43 raw6
crw-rw---- 1 grid asmadmin 162, 7 Feb 7 15:47 raw7
crw-rw---- 1 grid asmadmin 162, 8 Feb 7 15:48 raw8
crw-rw---- 1 grid asmadmin 162, 9 Feb 7 16:14 raw9
$ sqlplus /nolog
SQL*Plus: Release 11.2.0.1.0 Production on Mon Feb 10 10:55:31 2014
Copyright (c) 1982, 2009, Oracle. All rights reserved.
SQL> conn / as sysasm
SQL> ALTER DISKGROUP DG3 ADD DISK '/dev/raw/raw10' force
2 ;
ALTER DISKGROUP DG3 ADD DISK '/dev/raw/raw10' force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15001: diskgroup "DG3" does not exist or is not mounted
SQL> create diskgroup DG3 external redundancy disk '/dev/raw/raw10';
create diskgroup DG3 external redundancy disk '/dev/raw/raw10'
*
ERROR at line 1:
ORA-15018: diskgroup cannot be created
ORA-15033: disk '/dev/raw/raw10' belongs to diskgroup "DG3"
ORA-15030: diskgroup name "DG3" is in use by another diskgroup
SQL> alter diskgroup all mount;
alter diskgroup all mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DG2" cannot be mounted
ORA-15013: diskgroup "DG2" is already mounted
ORA-15017: diskgroup "DG1" cannot be mounted
ORA-15013: diskgroup "DG1" is already mounted
ORA-15017: diskgroup "OCRVDG" cannot be mounted
ORA-15013: diskgroup "OCRVDG" is already mounted
DG3是从盘阵新分配给数据库两个vidisk (/dev/raw/raw10,/dev/raw/raw11)组成
是在修改ip之前干刚创建的。
但为什么 控制文件中 已经有数据文件指向+DG3 DG3是我在 修改IP地址之前创建的! 我修改完IP地址以后,重启了服务器,发现集群组件全部启动了,但DB没起来
ora.bxrac.db ora....se.type ONLINE OFFLINE
SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,NAME,PATH from v$asm_disk;
GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU NAME PATH
------------ ----------- ------- ------------ ------------------------------ ------------------------------
0 0 CLOSED MEMBER /dev/raw/raw11
0 1 CLOSED MEMBER /dev/raw/raw10
2 3 CACHED MEMBER DG1_0003 /dev/raw/raw9
2 2 CACHED MEMBER DG1_0002 /dev/raw/raw8
1 1 CACHED MEMBER OCRVDG_0001 /dev/raw/raw7
1 0 CACHED MEMBER OCRVDG_0000 /dev/raw/raw6
3 2 CACHED MEMBER DG2_0002 /dev/raw/raw5
3 1 CACHED MEMBER DG2_0001 /dev/raw/raw4
3 0 CACHED MEMBER DG2_0000 /dev/raw/raw3
2 1 CACHED MEMBER DG1_0001 /dev/raw/raw2
2 0 CACHED MEMBER DG1_0000 /dev/raw/raw1
发现raw10,和raw11是closed状态,就运行了
SQL> ALTER DISKGROUP DG3 ADD DISK '/dev/raw/raw10' force
2 ;
ALTER DISKGROUP DG3 ADD DISK '/dev/raw/raw10' force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15001: diskgroup "DG3" does not exist or is not mounted
看到上面的报错,我以为DG3有问题或丢失了,所以我,做了重新创建的操作
SQL> create diskgroup DG3 external redundancy disk '/dev/raw/raw10';
create diskgroup DG3 external redundancy disk '/dev/raw/raw10'
*
ERROR at line 1:
ORA-15018: diskgroup cannot be created
ORA-15033: disk '/dev/raw/raw10' belongs to diskgroup "DG3"
ORA-15030: diskgroup name "DG3" is in use by another diskgroup
从上面的提示看DG3还存在,现在如何把,DG3 mount起来呢?
但为什么 控制文件中 已经有数据文件指向+DG3? 指向DG3的数据文件是我创建的。
由于DG3 起不来报错!所以我就把创建DG3的操作过程贴出来了(原因是我怕我的创建步骤存在有问题)。
现在来看这个问题应该与我修改RAC IP地址没有关系是吗? 很明显的是 目前的打不开就是 和文件 、DG 有关系
另: 不要认为别做坐在你电脑旁边,可以知道所有的背景信息 很明显的是 目前的打不开就是 和文件 、DG 有关系
从那些方面来定位此次故障,能给些诊断的方法吗? 为什么不先看看ASM的alert log? SQL> show parameter background
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
background_core_dump string partial
background_dump_dest string /u01/app/grid/diag/asm/+asm/+ASM1/trace
附上 ASM alert log $ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 270336
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 32768
cpu time (seconds, -t) unlimited
max user processes (-u) 2047
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
$ free
total used free shared buffers cached
Mem: 32959100 3535532 29423568 0 542228 1485312
-/+ buffers/cache: 1507992 31451108
Swap: 18555064 0 18555064 搞不懂你为什么 在要这个过程中建一个dg 并添加数据文件
但最简单的是将该数据文件offline drop之后 先打开数据库 原因是,业务数据数据增加需要对数据库扩容,所以我从存储划了两个Vdisk 给数据库,不是我在修改ip的过程中,非要这么做!
DG3有生产数据,如果不drop 有其他办法吗?
1. +DG3创建成功后有没有正常online?如果有,最后一次online的时间是什么时候?
2. 发现raw10,和raw11是closed状态,就运行了
SQL> ALTER DISKGROUP DG3 ADD DISK '/dev/raw/raw10' force
2 ;
ALTER DISKGROUP DG3 ADD DISK '/dev/raw/raw10' force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15001: diskgroup "DG3" does not exist or is not mounted
进行这个步骤的时间是什么时候,相关日志呢? 在上面的 ASM log 里面有 详细的时间轨迹! 希望您能看一下! 问题处理完毕
故障为: /dev/raw/raw10,/dev/raw/raw11 权限问题
在四个节点均操作一遍,故障消失。
chown grid:asmadmin /dev/raw/raw*
su - grid
conn / as sysasm
alter diskgroup DG3 mount
alter system set asm_diskgroups = DG3 scope=both sid='*';
在四个节点的/etc/rc.local 中添加
# vi /etc/rc.local
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.
touch /var/lock/subsys/local
chown grid:asmadmin /dev/raw/raw1
chown grid:asmadmin /dev/raw/raw2
chown grid:asmadmin /dev/raw/raw3
chown grid:asmadmin /dev/raw/raw4
chown grid:asmadmin /dev/raw/raw5
chown grid:asmadmin /dev/raw/raw8
chown grid:asmadmin /dev/raw/raw9
chown grid:asmadmin /dev/raw/raw6
chown grid:asmadmin /dev/raw/raw7
chown grid:asmadmin /dev/raw/raw10
chown grid:asmadmin /dev/raw/raw11
chmod 660 /dev/raw/raw*
swgsw 发表于 2014-2-10 19:50 static/image/common/back.gif
问题处理完毕
故障为: /dev/raw/raw10,/dev/raw/raw11 权限问题
> 故障为: /dev/raw/raw10,/dev/raw/raw11 权限问题
A:最初的权限是什么? 我只检查了 节点一的 其他三个节点 /dev/raw/raw10,raw11都是root:root swgsw 发表于 2014-2-11 10:37 static/image/common/back.gif
我只检查了 节点一的 其他三个节点 /dev/raw/raw10,raw11都是root:root
Thank you.
页:
[1]