- 最后登录
- 2015-3-16
- 在线时间
- 28 小时
- 威望
- 0
- 金钱
- 490
- 注册时间
- 2013-10-18
- 阅读权限
- 10
- 帖子
- 26
- 精华
- 0
- 积分
- 0
- UID
- 1322
|
1#
发表于 2014-4-11 16:36:24
|
查看: 2890 |
回复: 1
环境描述:
aix 5.3
oracle 10.2.0.5 双节点RAC
客户反应,在10号早上重启了主机后,库起来了,但是部分业务不能办理,部分业务正常(客户描述)
他们的维护策略是:10号重启1节点主机,20号重启节点主机。
具体表现;在一个select查询时,两个节点均遭遇
SQL> select * from scxnb.ac43 where aaa027='510904';
select * from scxnb.ac43 where aaa027='510904'
*
ERROR at line 1:
ORA-01157: cannot identify/lock data file 102 - see DBWR trace file
ORA-01110: data file 102: '+DG_DATA1/cp3_94'
检查:
起初怀疑数据文件丢失
然而经过检查,数据文件存在。
检查alert日志后,之前客户比较粗暴的关闭主机,对主机进行重启的(他们目的是重启数据库相关服务,不知道为啥直接要重启主机)
相关alert日志如下:
SUCCESS: diskgroup ARCHDG was mounted
Wed Apr 09 23:46:56 BEIST 2014
SUCCESS: diskgroup ARCHDG was dismounted
Thu Apr 10 07:00:05 BEIST 2014
Shutting down instance (abort)
License high water mark = 344
Instance terminated by USER, pid = 733204
Thu Apr 10 07:13:33 BEIST 2014
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 en1 172.16.1.0 configured from OCR for use as a cluster interconnect
Interface type 1 en7 10.160.5.0 configured from OCR for use as a public interface
Picked latch-free SCN scheme 3
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.5.0.
System parameters with non-default values:
processes = 1500
sessions = 2000
resource_limit = TRUE
sga_max_size = 21474836480
__shared_pool_size = 1442840576
__large_pool_size = 16777216
__java_pool_size = 50331648
__streams_pool_size = 0
spfile = +DATADG/scnbdb/spfilescnbdb.ora
sga_target = 12884901888
control_files = +DATADG/scnbdb/controlfile/current.260.748375667
db_block_size = 8192
__db_cache_size = 11358175232
compatible = 10.2.0.5.0
log_archive_dest_1 = location=+ARCHDG
log_archive_format = %t_%s_%r.dbf
db_file_multiblock_read_count= 16
cluster_database = TRUE
cluster_database_instances= 2
db_create_file_dest = +DATADG
thread = 1
instance_number = 1
undo_management = AUTO
undo_tablespace = UNDOTBS1
undo_retention = 86400
remote_login_passwordfile= EXCLUSIVE
db_domain =
dispatchers = (PROTOCOL=TCP) (SERVICE=scnbdbXDB)
local_listener = (address=(protocol=tcp)(host=10.160.5.1)(port=1521))
remote_listener = LISTENERS_SCNBDB
utl_file_dir = /u01/app/oracle/logs_utl_file
job_queue_processes = 10
background_dump_dest = /u01/app/oracle/admin/scnbdb/bdump
user_dump_dest = /u01/app/oracle/admin/scnbdb/udump
core_dump_dest = /u01/app/oracle/admin/scnbdb/cdump
audit_file_dest = /u01/app/oracle/admin/scnbdb/adump
db_name = scnbdb
open_cursors = 2000
pga_aggregate_target = 8589934592
_awr_flush_threshold_metrics= TRUE
Cluster communication is configured to use the following interface(s) for this instance
172.16.1.1
Thu Apr 10 07:13:34 BEIST 2014
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=229432
DIAG started with pid=3, OS id=233530
PSP0 started with pid=4, OS id=229802
LMON started with pid=5, OS id=237628
LMD0 started with pid=6, OS id=241726
LMS0 started with pid=7, OS id=131618
LMS1 started with pid=8, OS id=168506
LMS2 started with pid=9, OS id=254020
MMAN started with pid=10, OS id=201020
DBW0 started with pid=11, OS id=233808
DBW1 started with pid=12, OS id=221268
LGWR started with pid=13, OS id=262218
CKPT started with pid=14, OS id=209300
SMON started with pid=15, OS id=241996
RECO started with pid=16, OS id=250192
CJQ0 started with pid=17, OS id=266322
MMON started with pid=18, OS id=270420
Thu Apr 10 07:13:34 BEIST 2014
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=19, OS id=274518
Thu Apr 10 07:13:35 BEIST 2014
starting up 1 shared server(s) ...
Thu Apr 10 07:13:38 BEIST 2014
lmon registered with NM - instance id 1 (internal mem no 0)
Thu Apr 10 07:13:38 BEIST 2014
Reconfiguration started (old inc 0, new inc 8)
List of nodes:
0 1
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 valid according to instance 1
* domain 0 valid = 1 according to instance 1
Thu Apr 10 07:13:38 BEIST 2014
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Thu Apr 10 07:13:38 BEIST 2014
LMS 0: 0 GCS shadows cancelled, 0 closed
Thu Apr 10 07:13:38 BEIST 2014
LMS 2: 0 GCS shadows cancelled, 0 closed
Thu Apr 10 07:13:38 BEIST 2014
LMS 1: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Thu Apr 10 07:13:39 BEIST 2014
LMS 2: 0 GCS shadows traversed, 0 replayed
Thu Apr 10 07:13:39 BEIST 2014
LMS 0: 0 GCS shadows traversed, 0 replayed
Thu Apr 10 07:13:39 BEIST 2014
LMS 1: 0 GCS shadows traversed, 0 replayed
Thu Apr 10 07:13:39 BEIST 2014
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=22, OS id=258128
Thu Apr 10 07:13:41 BEIST 2014
ALTER DATABASE MOUNT
Thu Apr 10 07:13:42 BEIST 2014
Starting background process ASMB
ASMB started with pid=24, OS id=282746
Starting background process RBAL
RBAL started with pid=25, OS id=245828
Thu Apr 10 07:13:46 BEIST 2014
SUCCESS: diskgroup DATADG was mounted
Thu Apr 10 07:13:51 BEIST 2014
Setting recovery target incarnation to 2
Thu Apr 10 07:13:51 BEIST 2014
Successful mount of redo thread 1, with mount id 3422982371
Thu Apr 10 07:13:51 BEIST 2014
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Completed: ALTER DATABASE MOUNT
Thu Apr 10 07:13:52 BEIST 2014
ALTER DATABASE OPEN
Picked broadcast on commit scheme to generate SCNs
Thu Apr 10 07:13:58 BEIST 2014
Errors in file /u01/app/oracle/admin/scnbdb/bdump/scnbdb1_dbw0_233808.trc:
ORA-01157: cannot identify/lock data file 99 - see DBWR trace file
ORA-01110: data file 99: '+DG_DATA1/cp3_91'
ORA-17503: ksfdopn:2 Failed to open file +DG_DATA1/cp3_91
ORA-15001: diskgroup "DG_DATA1" does not exist or is not mounted
ORA-15001: diskgroup "DG_DATA1" does not exist or is not mounted
Thu Apr 10 07:13:58 BEIST 2014
Errors in file /u01/app/oracle/admin/scnbdb/bdump/scnbdb1_dbw0_233808.trc:
ORA-01157: cannot identify/lock data file 100 - see DBWR trace file
ORA-01110: data file 100: '+DG_DATA1/cp3_92'
ORA-17503: ksfdopn:2 Failed to open file +DG_DATA1/cp3_92
ORA-15001: diskgroup "DG_DATA1" does not exist or is not mounted
ORA-15001: diskgroup "DG_DATA1" does not exist or is not mounted
在两个节点
NODE1:
export ORACLE_SID=+ASM1
SQL> select name,state from v$asm_diskgroup;
NAME STATE
------------------------------ ----------------------
ARCHDG MOUNTED
DATADG MOUNTED
DG_DATA1 DISMOUNTED
NODE2
export ORACLE_SID=+ASM2
SQL> select name,state from v$asm_diskgroup;
NAME STATE
------------------------------ ----------------------
ARCHDG MOUNTED
DATADG MOUNTED
DG_DATA1 DISMOUNTED
export ORACLE_SID=+ASM1
SQL> select name,state from v$asm_diskgroup;
NAME STATE
------------------------------ ----------------------
ARCHDG MOUNTED
DATADG MOUNTED
DG_DATA1 DISMOUNTED
--asm asm_diskgroups参数指定了 DATADG 、ARCHDG 、DG_DATA1 但是,DATADG 、ARCHDG mount了,
DG_DATA1 DISMOUNTED;
--比较奇怪,如果是因为shutdown 导致的diskgroup dismount,数据库还能open?有点疑惑。。
且此时数据库正常运行,日志切换均正常。
和客户沟通,9号一切业务正常,推断是因为对主机的shutdown导致数据库的此错误。
分别在两个节点mount DG_DATA1磁盘组
alter diskgroup DG_DATA1 mount;
然后在数据库执行
SQL> alter system check datafiles;
System altered.
检查之前报错的数据文件状态
SQL> select file_id,file_name,status from dba_data_files where file_id=102;
FILE_ID FILE_NAME STATUS
------- ------------------------------ --------------------
102 +DG_DATA1/cp3_94 AVAILABLE
状态正常,
select * from scxnb.ac43 where aaa027='510904';
--返回结果,不再报错 |
|