- 最后登录
- 2016-10-15
- 在线时间
- 13 小时
- 威望
- 0
- 金钱
- 81
- 注册时间
- 2013-6-24
- 阅读权限
- 10
- 帖子
- 23
- 精华
- 0
- 积分
- 0
- UID
- 1139
|
2#
发表于 2014-5-27 20:36:17
本帖最后由 与晶之恋 于 2014-5-27 20:37 编辑
TIMEOUT ON CONTROL FILE ENQUEUE
数据库在等待 CF enqueue 的锁900秒后就会杀死进程,如果该进程是后台进程就会导致数据库crash.
在crash之前 redo切换的比较快.还出现了一个checkpoint not complete.
至于为啥有CF 的锁得看看当时的实务了.
#######################################################
*** 2014-05-13 10:36:52.588
TIMEOUT ON CONTROL FILE ENQUEUE
mode=X, type=0, wait=1, eqt=900
Dump the blocker's call stack (ospid 2425488)
*** 2014-05-13 10:37:22.614
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
######################################################
Thread 1 advanced to log sequence 681372
Current log# 1 seq# 681372 mem# 0: /oracle/oradata/hjfxdb/redo01_01.log
Current log# 1 seq# 681372 mem# 1: /oracle/oradata/hjfxdb/redo01_02.log
Tue May 13 10:14:31 2014
Thread 1 advanced to log sequence 681373
Current log# 2 seq# 681373 mem# 0: /oracle/oradata/hjfxdb/redo02_01.log
Current log# 2 seq# 681373 mem# 1: /oracle/oradata/hjfxdb/redo02_02.log
Tue May 13 10:16:56 2014
Thread 1 advanced to log sequence 681374
Current log# 3 seq# 681374 mem# 0: /oracle/oradata/hjfxdb/redo03_01.log
Current log# 3 seq# 681374 mem# 1: /oracle/oradata/hjfxdb/redo03_02.log
Tue May 13 10:21:40 2014
Thread 1 cannot allocate new log, sequence 681375
Checkpoint not complete
Current log# 3 seq# 681374 mem# 0: /oracle/oradata/hjfxdb/redo03_01.log
Current log# 3 seq# 681374 mem# 1: /oracle/oradata/hjfxdb/redo03_02.log
Tue May 13 10:37:22 2014
Errors in file /oracle/admin/hjfxdb/bdump/hjfxdb_lgwr_3276926.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
Tue May 13 10:37:23 2014
Errors in file /oracle/admin/hjfxdb/bdump/hjfxdb_lgwr_3276926.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
#################################################################
同时参考
http://blog.chinaunix.net/uid-22948773-id-3016011.html
Subject: Diagnosing ORA-600 [2103] issues in RAC environment
Doc ID: Note:297845.1 Type: HOWTO
Last Revision Date: 23-FEB-2005 Status: PUBLISHED
The information in this document applies to:
Oracle Server - Enterprise Edition - Version: 10.1 to 10.2
Information in this document applies to any platform.
Errors
ORA-600 "internal error code, arguments: [%s],[%s],[%s], [%s], [%s],
[2103]
Goal
This document tries to explain how to troubleshoot and find out the cause of ORA-600 [2103]. An ORA-600 [2103] is signalled when an Oracle process cannot get an CF enqueue for 900 seconds. In 10g, The process signalling the ORA-600 [2103] dumps enough information for us to understand what caused the ORA-600 [2103].
Before proceeding with the troubleshooting, Please be advised that an ORA-600 [2103] can be also caused by one or a combination of the below reasons
Very slow I/O subsystem where the Controlfiles are stored.
Frequent log switching, For example if you check the alert.log and find out that the instances are switching logs very often then it is possible for an ORA-600 [2103] to occur.
The steps below will help understand why the process requesting the CF enqueue had to wait for 900 seconds before giving up and crashing the instance
Fix
Check the alert.log of the instance which reported the ORA-600 [2103]
Code:
Mon Feb 7 01:45:06 2005
Errors in file /app/oracle/admin/atmvs/bdump/atmvs1_lgwr_30822.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
Mon Feb 7 01:45:07 2005
Errors in file /app/oracle/admin/atmvs/bdump/atmvs1_lgwr_30822.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
Mon Feb 7 01:45:07 2005LGWR: terminating instance due to error 470
So we know that logwriter was trying to get the CF enqueue.
Now we should check the lmd trace file from the background dump destination and try to find the same timeline
Code:
*** 2005-02-07 01:45:05.916ENQUEUE DUMP REQUEST: from 1 on [0x0][0x0],[CF] for reason 2 mtype 0DUMP LOCAL BLOCKER/HOLDER: block level 4 res [0x0][0x0],[CF]
This also further below in the same log should show us the holder.
Code:
----------enqueue------------------------
lock version : 753415
Owner node : 0
grant_level : KJUSERPW
req_level : KJUSERPW
bast_level : KJUSERNL
notify_func : (nil)
resp : 0x9ad06258
procp : 0x9acc5970
pid : 6009
proc version : 650
oprocp : (nil)
opid : 0
group lock : 1
xid : 0000-0000-00000000
dd_time : 0.0 secs
dd_count : 0
timeout : 0.0 secs
On_timer_q? : N
On_dd_q? : N
lock_state : GRANTED
Open Options : KJUSERDEADLOCK
Convert options : KJUSERGETVALUE KJUSERNODEADLOCKWAIT KJUSERNODEADLOCKBLOCK KJUSERTRCCANCEL
History : 0x514514a5
Msg_Seq : 0x0
res_seq : 0
valblk : 0x50330100f4a0ffbfc82f3908dca0ffbf P3/9
DUMP LOCAL BLOCKER: initiate state dump for
possible owner[30.6009]
The interesting piece is the PID of the holder which is 6009.
Now we should find out the matching files in the bdump or udump destination depending on if the process holding the enqueue is a user process or a background process. In the example below, it is a user process and hence we search for this file in the user dump dest.
ls -l <udump_location>/*6009*.trc
In the file atmvs1_ora_6009.trc, We can find kst tracing and a process dump to find out what is it doing that it needs to hold the CF enqueue in exclusive mode for more than 900 seconds.
In the case above, the current SQL itself shows what is process 6009 doing.
Code:
Current SQL statement for this session:begin sys . dbms_backup_restore . createRmanStatusRow ( level
=> :level , parent_id => id , parent_stamp => ts , status => :status , command_id => :command_id , operation => peration , row_id => :row_id , row_stamp => :row_stamp ) ; end ;
In this case, the cultprit was a rman job which was running. This itself was a Customer specific problem. In most cases, further investigation has to be done to find out why the holder is not yeilding the CF enqueue within the 900 seconds. |
|