- 最后登录
- 2017-5-4
- 在线时间
- 81 小时
- 威望
- 999
- 金钱
- 2391
- 注册时间
- 2013-9-11
- 阅读权限
- 150
- 帖子
- 1124
- 精华
- 5
- 积分
- 999
- UID
- 1220
|
1#
发表于 2013-9-21 10:54:07
|
查看: 5691 |
回复: 0
ALTER DATABASE BACKUP CONTROLFILE TO '/usr/mac/tmp/ctrl_macdb.dbf' REUSE
Thu Aug 30 04:26:34 BEIST 2012
Completed: ALTER DATABASE BACKUP CONTROLFILE TO '/usr/mac/tmp/ctrl_macdb.dbf' REUSE
Thu Aug 30 06:12:09 BEIST 2012
Thread 1 advanced to log sequence 6774 (LGWR switch)
Current log# 4 seq# 6774 mem# 0: /dev/rlv_redo41
Thu Aug 30 09:33:42 BEIST 2012
Thread 1 advanced to log sequence 6775 (LGWR switch)
Current log# 5 seq# 6775 mem# 0: /dev/rlv_redo51
Thu Aug 30 11:34:02 BEIST 2012
IPC Send timeout detected. Receiver ospid 336138
Thu Aug 30 11:34:02 BEIST 2012
Errors in file /home/oracle/app/oracle/admin/macdb/bdump/macdb1_pz99_336138.trc:
Thu Aug 30 11:40:35 BEIST 2012
IPC Send timeout detected. Receiver ospid 336138
Thu Aug 30 11:40:35 BEIST 2012
Errors in file /home/oracle/app/oracle/admin/macdb/bdump/macdb1_pz99_336138.trc:
Thu Aug 30 11:45:34 BEIST 2012
MMNL absent for 1201 secs; Foregrounds taking over
Thu Aug 30 11:45:45 BEIST 2012
IPC Send timeout detected. Receiver ospid 336138
Thu Aug 30 11:45:45 BEIST 2012
Errors in file /home/oracle/app/oracle/admin/macdb/bdump/macdb1_pz99_336138.trc:
Thu Aug 30 11:56:52 BEIST 2012
Errors in file /home/oracle/app/oracle/admin/macdb/bdump/macdb1_pmon_360920.trc:
ORA-00471: DBWR process terminated with error
Thu Aug 30 11:56:52 BEIST 2012
PMON: terminating instance due to error 471
Thu Aug 30 11:56:52 BEIST 2012
System state dump is made for local instance
System State dumped to trace file /home/oracle/app/oracle/admin/macdb/bdump/macdb1_diag_217786.trc
Thu Aug 30 11:56:55 BEIST 2012
Shutting down instance (abort)
License high water mark = 2975
Thu Aug 30 11:56:59 BEIST 2012
Instance terminated by PMON, pid = 360920
Thu Aug 30 11:57:05 BEIST 2012
Instance terminated by USER, pid = 6008998
Thu Aug 30 11:57:14 BEIST 2012
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 en11 192.168.55.0 configured from OCR for use as a cluster interconnect
Interface type 1 en10 133.0.175.0 configured from OCR for use as a public interface
Picked latch-free SCN scheme 3
IPC Send timeout detected.信息说明PZ99进程节点间通信超时,查看其后台TRACE文件可以发现如下信息:
/home/oracle/app/oracle/admin/macdb/bdump/macdb1_pz99_336138.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /home/oracle/app/oracle/product/10g
System name: AIX
Node name: macdb1
Release: 3
Version: 5
Machine: 00CAF8E64C00
Instance name: macdb1
Redo thread mounted by this instance: 1
Oracle process number: 34
Unix process pid: 336138, image: oracle@macdb1 (PZ99)
*** 2012-08-30 11:28:52.444
*** SERVICE NAME:(SYS$USERS) 2012-08-30 11:28:52.441
*** SESSION ID:(4926.15944) 2012-08-30 11:28:52.441
SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes
SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes
SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes
SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes
SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes
SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes
SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes
SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes
*** 2012-08-30 11:56:48.878
SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes
SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes
SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes
SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes
SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes
SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes
SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes
SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes
SKGXPSEGRCV的信息与Metalink Note<Bug 7360240 - RAC sanity checks in skgxpsegrcv can cause incorrect "IPC send timeouts" / hang / eviction (Doc ID 7360240.8)>描述的问题较为吻合,但是查询其他所有ORACLE进程的后台TRACE没有再发现SKGXPSEGRCV信息,而且PZ99进程之前在等待的是'SGA: allocation forcing component growth'事件:
SO: 700000184b4fb38, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=34, calls cur/top: 70000011738e4c0/70000018240c1f8, flag: (0) -
int error: 2147494036, call error: 0, sess error: 0, txn error 0
(post info) last post received: 7629 0 4
last post received-location: kslpsr
last process to post me: 700000187b8ece8 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: 700000187b8ece8 1 6
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 700000187d89228
O/S info: user: oracle, term: UNKNOWN, ospid: 336138 (DEAD)
OSD pid info: Unix process pid: 336138, image: oracle@macdb1 (PZ99)
----------------------------------------
SO: 700000181263928, type: 4, owner: 700000184b4fb38, flag: INIT/-/-/0x00
(session) sid: 4926 trans: 0, creator: 700000184b4fb38, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-0022-000179C2, short-term DID: 0001-0022-000179C1
txn branch: 0
oct: 3, prv: 0, sql: 0, psql: 0, user: 0/SYS
service name: SYS$USERS
O/S info: user: oracle, term: , ospid: 336138, machine: macdb2
program: oracle@macdb1 (PZ99)
application name: racgimon@macdb2 (TNS V1-V3), hash value=2386761235
last wait for 'SGA: allocation forcing component growth' wait_time=0.000789 sec, seconds since wait started=3
=0, =0, =0
blocking sess=0x0 seq=3823
Dumping Session Wait History
for 'SGA: allocation forcing component growth' count=1 wait_time=0.000789 sec
=0, =0, =0
for 'SGA: allocation forcing component growth' count=1 wait_time=0.000732 sec
=0, =0, =0
for 'SGA: allocation forcing component growth' count=1 wait_time=0.000513 sec
=0, =0, =0
for 'SGA: allocation forcing component growth' count=1 wait_time=0.000521 sec
=0, =0, =0
分析RAC后台诊断进程DIAG进程生成的SYSTEMSTATE DUMP信息:
oracle @ macdb1 : /home/oracle $awk -f ass1033.awk
/home/oracle/app/oracle/admin/macdb/bdump/macdb1_diag_217786.trc
Starting Systemstate 1
..............................................................................
.....
Ass.Awk Version 1.0.33
~~~~~~~~~~~~~~~~~~~~~~
Source file : /home/oracle/app/oracle/admin/macdb/bdump/macdb1_diag_217786.trc
System State 1 (2012-08-30 11:56:52.566)
~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
1: [DEAD]
2: last wait for 'pmon timer'
3: last wait for 'DIAG idle wait'
4: last wait for 'rdbms ipc message'
5: last wait for 'rdbms ipc message'
6: last wait for 'ges remote message' [DEAD]
7: last wait for 'gcs remote message' [DEAD]
8: last wait for 'gcs remote message' [DEAD]
9: last wait for 'gcs remote message' [DEAD]
10: last wait for 'gcs remote message' [DEAD]
11: last wait for 'gcs remote message' [DEAD]
12: last wait for 'gcs remote message' [DEAD]
13: last wait for 'gcs remote message' [DEAD]
14: last wait for 'gcs remote message' [DEAD]
15: last wait for 'gcs remote message' [DEAD]
16: last wait for 'gcs remote message' [DEAD]
17: last wait for 'gcs remote message' [DEAD]
18: last wait for 'gcs remote message' [DEAD]
19: last wait for 'rdbms ipc message' [DEAD]
20: last wait for 'rdbms ipc message' [DEAD]
21: last wait for 'rdbms ipc message' [DEAD]
22: waiting for 'rdbms ipc message' [DEAD]
23: waiting for 'rdbms ipc message' [DEAD]
24: last wait for 'rdbms ipc message' [DEAD]
25: waiting for 'rdbms ipc message' [DEAD]
26: last wait for 'rdbms ipc message' [DEAD]
27: last wait for 'rdbms ipc message' [DEAD]
28: last wait for 'SGA: allocation forcing component growth' [DEAD]
29: waiting for 'rdbms ipc message' [DEAD]
30: last wait for 'SGA: allocation forcing component growth' [DEAD]
31: last wait for 'SGA: allocation forcing component growth' [DEAD]
32: last wait for 'cursor: pin S wait on X'[Mutex 5d978f5c] [DEAD]
33: waiting for 'rdbms ipc message' [DEAD]
34: last wait for 'SGA: allocation forcing component growth' [DEAD]
Cmd: Select
35: waiting for 'SQL*Net message from client' [DEAD]
36: waiting for 'rdbms ipc message' [DEAD]
37: waiting for 'rdbms ipc message' [DEAD]
38: waiting for 'SQL*Net message from client' [DEAD]
39: waiting for 'row cache lock' [Rcache object=70000018ef42ea8,] [DEAD]
40: last wait for 'cursor: pin S wait on X'[Mutex fe59717e] [DEAD]
Cmd: Select
41: waiting for 'row cache lock' [Rcache object=70000018ef42ea8,] [DEAD]
42: last wait for 'SGA: allocation forcing component growth' [DEAD]
Cmd: Select
43: waiting for 'Streams AQ: qmn coordinator idle wait' [DEAD]
44: last wait for 'row cache lock' [Rcache object=70000018ef42ea8,] [DEAD]
45: waiting for 'Streams AQ: waiting for time management or cleanup tasks' [DEAD]
46: last wait for 'cursor: pin S wait on X'[Mutex bc5573b6] [DEAD]
47: waiting for 'SQL*Net message from client' [DEAD]
48: last wait for 'cursor: pin S wait on X'[Mutex fe59717e] [DEAD]
Cmd: Select
49: waiting for 'SQL*Net message from client' [DEAD]
50: waiting for 'SQL*Net message from client' [DEAD]
51: waiting for 'SQL*Net message from client' [DEAD]
52: last wait for 'cursor: pin S wait on X'[Mutex fe59717e] [DEAD]
Cmd: Select
53: last wait for 'SGA: allocation forcing component growth' [DEAD]
Cmd: Update
54: last wait for 'cursor: pin S wait on X'[Mutex bc5573b6] [DEAD]
Cmd: Select
55: last wait for 'cursor: pin S wait on X'[Mutex a154ba91] [DEAD]
Cmd: Select
56: last wait for 'SGA: allocation forcing component growth' [DEAD]
Cmd: Select
57: waiting for 'SQL*Net message from client' [DEAD]
58: last wait for 'SGA: allocation forcing component growth' [DEAD]
59: waiting for 'row cache lock' [Rcache object=70000018ef42ea8,] [DEAD]
60: last wait for 'SGA: allocation forcing component growth' [DEAD]
Cmd: Select
61: last wait for 'cursor: pin S wait on X'[Mutex f3002747] [DEAD]
Cmd: Select
62: last wait for 'row cache lock' [Rcache object=70000018ef42ea8,] [DEAD]
63: waiting for 'SQL*Net message from client' [DEAD]
64: last wait for 'row cache lock' [Rcache object=70000018ef42ea8,] [DEAD]
65: last wait for 'row cache lock' [Rcache object=70000018ef42ea8,] [DEAD]
66: waiting for 'SQL*Net message from client' [DEAD]
67: waiting for 'SQL*Net message from client' [DEAD]
68: waiting for 'SQL*Net message from client' [DEAD]
69: waiting for 'SQL*Net message from client' [DEAD]
70: waiting for 'SQL*Net message from client' [DEAD]
71: waiting for 'SQL*Net message from client' [DEAD]
72: waiting for 'SQL*Net message from client' [DEAD]
73: waiting for 'SQL*Net message from client' [DEAD]
74: waiting for 'SQL*Net message from client' [DEAD]
75: waiting for 'SQL*Net message from client' [DEAD]
76: waiting for 'SQL*Net message from client' [DEAD]
77: waiting for 'SQL*Net message from client' [DEAD]
78: last wait for 'library cache load lock'[LOAD: 70000015c903b10] [DEAD]
Cmd: Select
79: waiting for 'SQL*Net message from client' [DEAD]
80: waiting for 'SQL*Net message from client' [DEAD]
81: waiting for 'SQL*Net message from client' [DEAD]
82: waiting for 'SQL*Net message from client' [DEAD]
83: waiting for 'SQL*Net message from client' [DEAD]
Blockers
~~~~~~~~
Above is a list of all the processes. If they are waiting for a resource
then it will be given in square brackets. Below is a summary of the
waited upon resources, together with the holder of that resource.
Notes:
~~~~~
o A process id of '???' implies that the holder was not found in the
systemstate. (The holder may have released the resource before we
dumped the state object tree of the blocking process).
o Lines with 'Enqueue conversion' below can be ignored *unless*
other sessions are waiting on that resource too. For more, see
http://dlsunuk11.uk.oracle.com/Public/TOOLS/Ass.html#enqcnv)
Resource Holder State
Mutex 5d978f5c ??? Blocker
Rcache object=70000018ef42ea8, ??? Blocker
Mutex fe59717e ??? Blocker
Mutex bc5573b6 ??? Blocker
Mutex a154ba91 ??? Blocker
Mutex f3002747 ??? Blocker
LOAD: 70000015c903b10 ??? Blocker
Blockers According to Tracefile Wait Info:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. This may not work for 64bit platforms. See bug 2902997 for details.
2. If the blocking process is shown as 0 then that session may no longer be
present.
3. If resources are held across code layers then sometimes the tracefile wait
info will not recognise the problem.
No blockers seen.
Object Names
~~~~~~~~~~~~
Mutex 5d978f5c
Rcache object=70000018ef42ea8, cid=14(dc_profiles)
Mutex fe59717e
Mutex bc5573b6
Mutex a154ba91
Mutex f3002747
LOAD: 70000015c903b10
Summary of Wait Events Seen (count>10)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
No wait events seen more than 10 times
可以从对SYSTEMSTATE DUMP的分析中看到,大量进程在等待'cursor: pin S wait on X' 'SGA: allocation forcing component growth' 和 'row cache lock'。
其中'row cache lock'等待的对象是(Rcache object=70000018ef42ea8, cid=14(dc_profiles),dc_profiles字典缓存是session登陆的重要资源,该字典缓存锁被持有会导致新的会话无法登陆。
由于系统hang住的原因11:00-12:00的AWR未正常收集,但是参考10:00-11:00的AWR可以发现shared pool存在SHRINK操作:
Cache Sizes
Begin End
Buffer Cache: 4,816M 4,832M Std Block Size: 16K
Shared Pool Size: 1,264M 1,248M Log Buffer: 30,708K
Shared Pool由1264MB收缩至1248MB,共享池的收缩可能导致部分关键的字典缓存被锁住或者FLUSH OUT,进而引起严重的解析等待和row cache lock,这将阻塞新的会话登陆。
但是在DIAG进程的SYSTEM DUMP中未体现出Blocker阻塞进程的信息,这可能是DIAG进程收集信息存在时间限制,在指定时间内仅够DUMP不完整的process信息。
由于缺乏完整的SYSTEMSTATE DUMP信息将阻碍我们进一步诊断分析问题,所以我们建议在系统hang住无反映的情况下,先收集SYSTEMSTATE DUMP再重启数据库实例,收集的方法如下:
--获取SYSTEMSTATE的方法:
使用ORADEBUG做
sqlplus / as sysdba
oradebug setmypid;
oradebug unlimit;
oradebug dump systemstate 266;
! sleep 60
oradebug dump systemstate 266;
oradebug tracefile_name;
建议
1.通过修改AUTO SGA自动内存管理模式为手动SGA管理(SGA_TARGET=0),避免shared pool的shrink行为,从而避免上述故障再次发生。
2.对于系统hang住的情况,优先收集systemstate dump再重启实例,方法如上文所述。
|
|