yehc@epsoft.com 发表于 2015-1-11 13:58:12

3节点RAC,异常的instance hang!

本帖最后由 yehc@epsoft.com 于 2015-1-11 13:59 编辑

1、环境描述:3节点RAC
数据库版本:10.2.0.5.0
OS : HP-UX 11i.V3
存储结构:Storage Foundation2、故障现象描述
---- Node 3 alert日志截取:Mon Dec 29 08:59:16 EAT 2014
Thread 3 advanced to log sequence 26432 (LGWR switch)
  Current log# 26 seq# 26432 mem# 0: /oradata/oradata/szyb/szyb_redo3_26-500m.log
Mon Dec 29 10:11:48 EAT 2014
Errors in file /oradata/admin/szyb/udump/szyb3_ora_7686.trc:
ORA-27300: 操作系统系统相关操作: invalid_process_id 失败, 状态为: 0
ORA-27301: 操作系统故障消息: Error 0
ORA-27302: 错误发生在: skgpalive1
Mon Dec 29 10:14:14 EAT 2014
Errors in file /oradata/admin/szyb/udump/szyb3_ora_7686.trc:
ORA-27300: 操作系统系统相关操作: invalid_process_id 失败, 状态为: 0
ORA-27301: 操作系统故障消息: Error 0
ORA-27302: 错误发生在: skgpalive1
Mon Dec 29 10:16:53 EAT 2014
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=16
System State dumped to trace file /oradata/admin/szyb/bdump/szyb3_mmon_2733.trc----Node2 alert 日志截取:Mon Dec 29 09:32:26 EAT 2014
Thread 2 cannot allocate new log, sequence 45567
Checkpoint not complete
  Current log# 9 seq# 45566 mem# 0: /oradata/oradata/szyb/szyb_redo2_9-500m.log
Mon Dec 29 09:54:31 EAT 2014
MMNL absent for 1201 secs; Foregrounds taking over
Mon Dec 29 10:22:06 EAT 2014
Process startup failed, error stack:
Mon Dec 29 10:22:06 EAT 2014
Errors in file /oradata/admin/szyb/bdump/szyb2_psp0_5034.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
Mon Dec 29 10:22:07 EAT 2014
Process J008 died, see its trace file
Mon Dec 29 10:22:07 EAT 2014
kkjcre1p: unable to spawn jobq slave process
Mon Dec 29 10:22:07 EAT 2014
Errors in file /oradata/admin/szyb/bdump/szyb2_cjq0_5056.trc:

Mon Dec 29 10:22:07 EAT 2014
Process startup failed, error stack:
Mon Dec 29 10:22:07 EAT 2014
Errors in file /oradata/admin/szyb/bdump/szyb2_psp0_5034.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
Mon Dec 29 10:22:08 EAT 2014
Process J008 died, see its trace file
Mon Dec 29 10:22:08 EAT 2014
kkjcre1p: unable to spawn jobq slave process
Mon Dec 29 10:22:08 EAT 2014
Errors in file /oradata/admin/szyb/bdump/szyb2_cjq0_5056.trc:---Node 1 alert 日志截取:Mon Dec 29 08:44:49 EAT 2014
Thread 1 advanced to log sequence 35397 (LGWR switch)
  Current log# 14 seq# 35397 mem# 0: /oradata/oradata/szyb/szyb_redo1_14-500m.log
Mon Dec 29 09:15:05 EAT 2014
Thread 1 advanced to log sequence 35398 (LGWR switch)
  Current log# 15 seq# 35398 mem# 0: /oradata/oradata/szyb/szyb_redo1_15-500m.log
Mon Dec 29 10:17:34 EAT 2014
Shutting down instance (immediate)
Mon Dec 29 10:17:34 EAT 2014
Shutting down instance: further logons disabled
Mon Dec 29 10:18:22 EAT 2014
kkjcre1p: unable to spawn jobq slave process, error 1089
Mon Dec 29 10:18:27 EAT 2014
kkjcre1p: unable to spawn jobq slave process, error 1089
Mon Dec 29 10:18:32 EAT 2014
kkjcre1p: unable to spawn jobq slave process, error 1089
Mon Dec 29 10:18:37 EAT 2014
kkjcre1p: unable to spawn jobq slave process, error 1089
Mon Dec 29 10:18:42 EAT 2014
kkjcre1p: unable to spawn jobq slave process, error 1089
Mon Dec 29 10:18:47 EAT 2014
kkjcre1p: unable to spawn jobq slave process, error 1089
Mon Dec 29 10:18:52 EAT 2014
kkjcre1p: unable to spawn jobq slave process, error 1089类似情况出现了2次,到目前也没用头绪,希望各位大神帮定位定位!!!

Maclean Liu(刘相兵 发表于 2015-1-11 15:03:55

odm finding:

$ awk -f ass1033.awk szyb3_mmon_2733.trc

Starting Systemstate 1
..............................................................................
...............................................................................
..............
Ass.Awk Version 1.0.33
~~~~~~~~~~~~~~~~~~~~~~
Source file : szyb3_mmon_2733.trc

System State 1  (2014-12-29 10:16:53.844)
~~~~~~~~~~~~~~   ~~~~~~~~~~~~~~~~~~~~~~~
WARNING: The following processes had a corrupted / in-flux state object tree :
Process 156: at line 176333

1:                                      
2:  waiting for 'pmon timer'            
3:  waiting for 'DIAG idle wait'        
4:  waiting for 'rdbms ipc message'     
5:  waiting for 'rdbms ipc message'     
6:  waiting for 'ges remote message'   
7:  last wait for 'gcs remote message'  
8:  last wait for 'gcs remote message'  
9:  waiting for 'rdbms ipc message'     
10: waiting for 'rdbms ipc message'     
11: waiting for 'rdbms ipc message'     
12: waiting for 'rdbms ipc message'     
13: waiting for 'enq: RO - fast object reuse'
14: waiting for 'rdbms ipc message'     
15: waiting for 'rdbms ipc message'     
16: last wait for 'ksdxexeotherwait'   
17: last wait for 'rdbms ipc message'   
18:                                    
19:                                    
20: waiting for 'rdbms ipc message'     
21: waiting for 'DFS lock handle'      
     Cmd: Insert
22: waiting for 'DFS lock handle'      
     Cmd: Insert
23: waiting for 'rdbms ipc message'     
24: waiting for 'rdbms ipc message'     
25: waiting for 'rdbms ipc message'     
26: waiting for 'Streams AQ: qmn coordinator idle wait'
27: waiting for 'SQL*Net message from client'
28: waiting for 'SQL*Net message from client'
29: waiting for 'SQL*Net message from client'
30: waiting for 'Streams AQ: waiting for time management or cleanup tasks'
31:                                    
32: waiting for 'SQL*Net message from client'
33: waiting for 'SQL*Net message from client'
34: waiting for 'gc buffer busy'      
     Cmd: Call Method
35: waiting for 'enq: TX - row lock contention'
     Cmd: Call Method
36: waiting for 'SQL*Net message from client'
37: waiting for 'gc buffer busy'      
     Cmd: Call Method
38: waiting for 'SQL*Net message from client'
39: waiting for 'enq: SQ - contention'  
     Cmd: Call Method
40: waiting for 'DFS lock handle'      
     Cmd: Insert
41: waiting for 'DFS lock handle'      
     Cmd: Insert
42: waiting for 'enq: SQ - contention'  
     Cmd: Call Method
43:                                    
44: waiting for 'enq: TX - row lock contention'
     Cmd: Call Method
45:                                    
46: waiting for 'gc buffer busy'      
     Cmd: Call Method
47: waiting for 'DFS lock handle'      
     Cmd: Insert
48: waiting for 'DFS lock handle'      
     Cmd: Insert
49: waiting for 'gc buffer busy'      
     Cmd: Call Method
50: waiting for 'gc current request'   
     Cmd: Call Method
51: waiting for 'row cache lock'      
     Cmd: Call Method
52: waiting for 'enq: TX - row lock contention'
     Cmd: Call Method
53: waiting for 'gc buffer busy'      
     Cmd: Call Method
54: waiting for 'row cache lock'      
     Cmd: Call Method
55: waiting for 'DFS lock handle'      
     Cmd: Select
56: waiting for 'gc buffer busy'      
     Cmd: Call Method
57: waiting for 'gc buffer busy'        
     Cmd: Call Method
58: waiting for 'SQL*Net message from client'
59: waiting for 'gc buffer busy'      
     Cmd: Call Method
60: waiting for 'SQL*Net message from client'
61: waiting for 'enq: TX - row lock contention'
     Cmd: Call Method
62:                                    
63: waiting for 'enq: TX - row lock contention'
     Cmd: Call Method
64: waiting for 'DFS lock handle'      
     Cmd: Insert
65:                                    
66: waiting for 'DFS lock handle'      
     Cmd: Select
67: waiting for 'row cache lock'      
     Cmd: Call Method
68: waiting for 'gc current request'   
     Cmd: Call Method
69: waiting for 'row cache lock'      
     Cmd: Call Method
70: waiting for 'enq: SQ - contention'  
     Cmd: Call Method
71: waiting for 'row cache lock'      
     Cmd: Call Method
72: waiting for 'row cache lock'      
     Cmd: Call Method
73: waiting for 'row cache lock'      
     Cmd: Call Method
74: waiting for 'row cache lock'      
     Cmd: Call Method
75: waiting for 'enq: TX - row lock contention'
     Cmd: Call Method
76:                                    
77: waiting for 'Streams AQ: qmn slave idle wait'
78: waiting for 'DFS lock handle'      
     Cmd: Insert
79: waiting for 'gc buffer busy'      
     Cmd: Call Method
80: waiting for 'SQL*Net message from client'
81: waiting for 'SQL*Net message from client'
82:                                    
83: waiting for 'SQL*Net message from client'
84: waiting for 'enq: SQ - contention'  
     Cmd: Call Method
85: waiting for 'SQL*Net message from client'
86: waiting for 'SQL*Net message from client'
87: waiting for 'enq: SQ - contention'  
     Cmd: Call Method
88: waiting for 'SQL*Net message from client'
     Cmd: Select
89: waiting for 'enq: SQ - contention'  
     Cmd: Call Method
90: waiting for 'enq: RO - fast object reuse'
91: waiting for 'DFS lock handle'      
     Cmd: Select
92:                                    
93: waiting for 'SQL*Net message from client'
94: waiting for 'SQL*Net message from client'
95: waiting for 'enq: SQ - contention'  
     Cmd: Call Method
96: waiting for 'gc buffer busy'      
     Cmd: Call Method
97: waiting for 'gc buffer busy'      
     Cmd: Call Method
98: waiting for 'SQL*Net message from client'
     Cmd: Select
99: waiting for 'DFS lock handle'      
     Cmd: Insert
100:waiting for 'row cache lock'      
     Cmd: Call Method
101:waiting for 'SQL*Net message from client'
102:waiting for 'row cache lock'      
     Cmd: Call Method
103:waiting for 'SQL*Net message from client'
     Cmd: Select
104:waiting for 'enq: SQ - contention'  
     Cmd: Call Method
105:waiting for 'enq: SQ - contention'  
     Cmd: Call Method
106:waiting for 'row cache lock'      
     Cmd: Call Method
107:waiting for 'row cache lock'      
     Cmd: Call Method
108:waiting for 'enq: SQ - contention'  
     Cmd: Call Method
109:waiting for 'row cache lock'      
     Cmd: Call Method
110:waiting for 'gc buffer busy'      
     Cmd: Call Method
111:waiting for 'gc buffer busy'      
     Cmd: Call Method
112:waiting for 'SQL*Net message from client'
113:waiting for 'SQL*Net message from client'
114:waiting for 'SQL*Net message from client'
115:waiting for 'SQL*Net message from client'
116:waiting for 'enq: SQ - contention'  
     Cmd: Call Method
117:waiting for 'gc buffer busy'      
     Cmd: Call Method
118:waiting for 'enq: SQ - contention'  
     Cmd: Call Method
119:waiting for 'gc buffer busy'      
     Cmd: Call Method
120:waiting for 'SQL*Net message from client'
121:waiting for 'row cache lock'      
     Cmd: Call Method
122:waiting for 'gc buffer busy'      
     Cmd: Call Method
123:waiting for 'gc buffer busy'      
     Cmd: Call Method
124:waiting for 'SQL*Net message from client'
     Cmd: Select
125:waiting for 'gc buffer busy'      
     Cmd: Call Method
126:waiting for 'row cache lock'      
     Cmd: Call Method
127:waiting for 'gc buffer busy'      
     Cmd: Call Method
128:waiting for 'gc buffer busy'      
     Cmd: Call Method
129:waiting for 'gc buffer busy'      
     Cmd: Call Method
130:waiting for 'gc buffer busy'      
     Cmd: Call Method
131:waiting for 'gc buffer busy'      
     Cmd: Call Method
132:waiting for 'gc buffer busy'      
     Cmd: Call Method
133:waiting for 'gc current request'   
     Cmd: PL/SQL Execute
134:waiting for 'gc buffer busy'      
     Cmd: Call Method
135:waiting for 'gc buffer busy'      
     Cmd: Call Method
136:waiting for 'gc buffer busy'      
     Cmd: Call Method
137:waiting for 'gc buffer busy'      
     Cmd: Call Method
138:waiting for 'row cache lock'      
     Cmd: Call Method
139:waiting for 'row cache lock'      
     Cmd: Call Method
140:waiting for 'row cache lock'      
     Cmd: Call Method
141:waiting for 'gc buffer busy'      
     Cmd: Call Method
142:waiting for 'row cache lock'      
     Cmd: Call Method
143:waiting for 'gc buffer busy'      
     Cmd: Call Method
144:waiting for 'gc buffer busy'      
     Cmd: Call Method
145:waiting for 'gc buffer busy'      
     Cmd: Call Method
146:waiting for 'gc buffer busy'      
     Cmd: Call Method
147:waiting for 'row cache lock'      
     Cmd: Call Method
148:waiting for 'row cache lock'      
     Cmd: Call Method
149:waiting for 'gc buffer busy'      
     Cmd: Call Method
150:waiting for 'row cache lock'      
     Cmd: Call Method
151:waiting for 'gc buffer busy'      
     Cmd: Call Method
152:waiting for 'gc buffer busy'      
     Cmd: Call Method
153:waiting for 'row cache lock'      
     Cmd: Call Method
154:waiting for 'row cache lock'      
     Cmd: Call Method
155:waiting for 'row cache lock'      
     Cmd: Call Method
156:waiting for 'gc buffer busy'        
     Cmd: Call Method
161:waiting for 'row cache lock'      
     Cmd: Call Method
256:waiting for 'DFS lock handle'      
     Cmd: Insert
307:waiting for 'SQL*Net message from client'
321:                                    
441:waiting for 'SQL*Net message from client'
446:                                    
558:waiting for 'DFS lock handle'      
     Cmd: Select
572:                                    
584:waiting for 'enq: SQ - contention'  
     Cmd: Call Method
623:waiting for 'enq: SQ - contention'  
     Cmd: Call Method
628:waiting for 'enq: SQ - contention'  
     Cmd: Call Method
629:waiting for 'enq: SQ - contention'  
     Cmd: Call Method
637:waiting for 'enq: SQ - contention'  
     Cmd: Call Method
656:waiting for 'DFS lock handle'      
     Cmd: Select
699:waiting for 'SQL*Net message from client'

Blockers
~~~~~~~~

        Above is a list of all the processes. If they are waiting for a resource
        then it will be given in square brackets. Below is a summary of the
        waited upon resources, together with the holder of that resource.
        Notes:
        ~~~~~
         o A process id of '???' implies that the holder was not found in the
           systemstate. (The holder may have released the resource before we
           dumped the state object tree of the blocking process).
         o Lines with 'Enqueue conversion' below can be ignored *unless*
           other sessions are waiting on that resource too. For more, see
           http://dlsunuk11.uk.oracle.com/Public/TOOLS/Ass.html#enqcnv)

                    Resource Holder State
    Enq RO-0003000D-00000001    12: waiting for 'rdbms ipc message'
    Enq RO-0003000D-00000001    13: 13: is waiting for 12: 13:
Rcache object=c0000004fdd21218,    68: waiting for 'gc current request'
           Buffer 0x07c26d24    50: waiting for 'gc current request'
    Enq TX-00070017-008921CA    ??? Blocker
    Enq TX-00130003-008C762F    ??? Blocker
    Enq RO-0003005A-00000001    12: waiting for 'rdbms ipc message'
    Enq RO-0003005A-00000001    90: 90: is waiting for 12: 90:

Blockers According to Tracefile Wait Info:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. This may not work for 64bit platforms. See bug 2902997 for details.
2. If the blocking process is shown as 0 then that session may no longer be
   present.
3. If resources are held across code layers then sometimes the tracefile wait
   info will not recognise the problem.

No blockers seen.

Object Names
~~~~~~~~~~~~
Enq RO-0003000D-00000001                                      
Rcache object=c0000004fdd21218, cid=3(dc_rollback_segments)   
Buffer 0x07c26d24                                             
Enq TX-00070017-008921CA                                      
Enq TX-00130003-008C762F                                      
Enq RO-0003005A-00000001                                    

Maclean Liu(刘相兵 发表于 2015-1-11 15:04:24

2节点:

ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3

Not enough space的具体原因是否已定位?

yehc@epsoft.com 发表于 2015-1-11 15:39:20

本帖最后由 yehc@epsoft.com 于 2015-1-11 15:41 编辑

Maclean Liu(刘相兵 发表于 2015-1-11 15:04 static/image/common/back.gif
2节点:

ORA-27300: OS system dependent operation:fork failed with status: 12


1号节点最先抛出Not enough space, Dec 29 00:30左右应该发起的NBU备份没有正常发起,前后差距显示对比如下:
--正常情况下:Sat Dec 27 22:20:14 EAT 2014
Thread 1 advanced to log sequence 35378 (LGWR switch)
  Current log# 15 seq# 35378 mem# 0: /oradata/oradata/szyb/szyb_redo1_15-500m.log
Sun Dec 28 00:30:04 EAT 2014
Thread 1 advanced to log sequence 35379 (LGWR switch)
  Current log# 16 seq# 35379 mem# 0: /oradata/oradata/szyb/szyb_redo1_16-500m.log
Sun Dec 28 03:11:06 EAT 2014
Starting control autobackup
Sun Dec 28 03:12:06 EAT 2014
Control autobackup written to SBT_TAPE device
        comment 'API Version 2.0,MMS Version 5.0.0.0',
        media 'AA02L2'
        handle 'c-2952192234-20141228-00'
Sun Dec 28 03:12:13 EAT 2014
ALTER SYSTEM ARCHIVE LOG
Sun Dec 28 03:12:17 EAT 2014
Thread 1 advanced to log sequence 35380 (LGWR switch)
  Current log# 17 seq# 35380 mem# 0: /oradata/oradata/szyb/szyb_redo1_17-500m.log
Sun Dec 28 03:12:22 EAT 2014
ALTER SYSTEM ARCHIVE LOG
Sun Dec 28 03:12:22 EAT 2014
Thread 1 advanced to log sequence 35381 (LGWR switch)
  Current log# 18 seq# 35381 mem# 0: /oradata/oradata/szyb/szyb_redo1_18-500m.log--29日未发起备份操作:Sun Dec 28 22:55:42 EAT 2014
Thread 1 advanced to log sequence 35390 (LGWR switch)
  Current log# 17 seq# 35390 mem# 0: /oradata/oradata/szyb/szyb_redo1_17-500m.log
Mon Dec 29 00:22:10 EAT 2014
Thread 1 advanced to log sequence 35391 (LGWR switch)
  Current log# 18 seq# 35391 mem# 0: /oradata/oradata/szyb/szyb_redo1_18-500m.log
Mon Dec 29 01:00:30 EAT 2014
Thread 1 advanced to log sequence 35392 (LGWR switch)
  Current log# 19 seq# 35392 mem# 0: /oradata/oradata/szyb/szyb_redo1_19-500m.log
Mon Dec 29 03:29:46 EAT 2014
Thread 1 advanced to log sequence 35393 (LGWR switch)
  Current log# 20 seq# 35393 mem# 0: /oradata/oradata/szyb/szyb_redo1_20-500m.log
Mon Dec 29 04:42:15 EAT 2014
Thread 1 advanced to log sequence 35394 (LGWR switch)
  Current log# 11 seq# 35394 mem# 0: /oradata/oradata/szyb/szyb_redo1_11-500m.log
Mon Dec 29 07:37:56 EAT 2014
……
Shutting down instance (immediate)随后数据库出现hang现象
--2号节点出现redo无法切换: Current log# 1 seq# 45548 mem# 0: /oradata/oradata/szyb/szyb_redo2_1-500m.log
Sun Dec 28 03:12:14 EAT 2014
Thread 2 advanced to log sequence 45549 (LGWR switch)
  Current log# 2 seq# 45549 mem# 0: /oradata/oradata/szyb/szyb_redo2_2-500m.log
Sun Dec 28 03:12:23 EAT 2014
Thread 2 advanced to log sequence 45550 (LGWR switch)
  Current log# 3 seq# 45550 mem# 0: /oradata/oradata/szyb/szyb_redo2_3-500m.log
Sun Dec 28 08:35:50 EAT 2014
Thread 2 advanced to log sequence 45551 (LGWR switch)
  Current log# 4 seq# 45551 mem# 0: /oradata/oradata/szyb/szyb_redo2_4-500m.log
Sun Dec 28 09:30:54 EAT 2014
Thread 2 advanced to log sequence 45552 (LGWR switch)
  Current log# 5 seq# 45552 mem# 0: /oradata/oradata/szyb/szyb_redo2_5-500m.log
Sun Dec 28 10:28:00 EAT 2014
Thread 2 advanced to log sequence 45553 (LGWR switch)
  Current log# 6 seq# 45553 mem# 0: /oradata/oradata/szyb/szyb_redo2_6-500m.log
Sun Dec 28 11:15:37 EAT 2014
Thread 2 advanced to log sequence 45554 (LGWR switch)
  Current log# 7 seq# 45554 mem# 0: /oradata/oradata/szyb/szyb_redo2_7-500m.log
Sun Dec 28 11:49:32 EAT 2014
Thread 2 advanced to log sequence 45555 (LGWR switch)
  Current log# 8 seq# 45555 mem# 0: /oradata/oradata/szyb/szyb_redo2_8-500m.log
Sun Dec 28 14:40:21 EAT 2014
Thread 2 advanced to log sequence 45556 (LGWR switch)
  Current log# 9 seq# 45556 mem# 0: /oradata/oradata/szyb/szyb_redo2_9-500m.log
Sun Dec 28 18:23:13 EAT 2014
Thread 2 advanced to log sequence 45557 (LGWR switch)
  Current log# 10 seq# 45557 mem# 0: /oradata/oradata/szyb/szyb_redo2_10-500m.log
Mon Dec 29 00:03:29 EAT 2014
Thread 2 advanced to log sequence 45558 (LGWR switch)
  Current log# 1 seq# 45558 mem# 0: /oradata/oradata/szyb/szyb_redo2_1-500m.log
Mon Dec 29 03:29:47 EAT 2014
Thread 2 advanced to log sequence 45559 (LGWR switch)
  Current log# 2 seq# 45559 mem# 0: /oradata/oradata/szyb/szyb_redo2_2-500m.log
Mon Dec 29 06:37:10 EAT 2014
Thread 2 advanced to log sequence 45560 (LGWR switch)
  Current log# 3 seq# 45560 mem# 0: /oradata/oradata/szyb/szyb_redo2_3-500m.log
Mon Dec 29 07:17:32 EAT 2014
Thread 2 advanced to log sequence 45561 (LGWR switch)
  Current log# 4 seq# 45561 mem# 0: /oradata/oradata/szyb/szyb_redo2_4-500m.log
Mon Dec 29 07:37:54 EAT 2014
Thread 2 advanced to log sequence 45562 (LGWR switch)
  Current log# 5 seq# 45562 mem# 0: /oradata/oradata/szyb/szyb_redo2_5-500m.log
Mon Dec 29 07:58:25 EAT 2014
Thread 2 advanced to log sequence 45563 (LGWR switch)
  Current log# 6 seq# 45563 mem# 0: /oradata/oradata/szyb/szyb_redo2_6-500m.log
Mon Dec 29 08:39:49 EAT 2014
Thread 2 advanced to log sequence 45564 (LGWR switch)
  Current log# 7 seq# 45564 mem# 0: /oradata/oradata/szyb/szyb_redo2_7-500m.log
Mon Dec 29 08:58:55 EAT 2014
Thread 2 advanced to log sequence 45565 (LGWR switch)
  Current log# 8 seq# 45565 mem# 0: /oradata/oradata/szyb/szyb_redo2_8-500m.log
Mon Dec 29 09:14:05 EAT 2014
Thread 2 advanced to log sequence 45566 (LGWR switch)
  Current log# 9 seq# 45566 mem# 0: /oradata/oradata/szyb/szyb_redo2_9-500m.log
Mon Dec 29 09:32:26 EAT 2014
Thread 2 cannot allocate new log, sequence 45567
Checkpoint not complete
  Current log# 9 seq# 45566 mem# 0: /oradata/oradata/szyb/szyb_redo2_9-500m.log
Mon Dec 29 09:54:31 EAT 2014
MMNL absent for 1201 secs; Foregrounds taking over
Mon Dec 29 10:22:06 EAT 2014
Process startup failed, error stack:从SF日志显示,时间段段内swap使用率居高不下,100%2014/12/29 10:09:21 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 13% and Swap usage = 99%.
2014/12/29 10:09:51 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 17% and Swap usage = 100%.
2014/12/29 10:10:20 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 13% and Swap usage = 100%.
2014/12/29 10:10:51 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 13% and Swap usage = 100%.
2014/12/29 10:11:21 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 15% and Swap usage = 100%.
2014/12/29 10:11:50 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 4% and Swap usage = 100%.
2014/12/29 10:12:21 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 6% and Swap usage = 100%.
2014/12/29 10:12:51 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 7% and Swap usage = 100%.
2014/12/29 10:13:20 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 2% and Swap usage = 100%.
2014/12/29 10:13:51 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 2% and Swap usage = 100%.
2014/12/29 10:14:21 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 1% and Swap usage = 100%.
2014/12/29 10:14:50 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 3% and Swap usage = 100%.
2014/12/29 10:15:21 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 2% and Swap usage = 100%.
2014/12/29 10:15:51 VCS INFO V-16-10061-14001 HostMonitor:VCShm:monitor:Updating System attribute with CPU usage = 2% and Swap usage = 100%.
主机的基本配置信息及Oracle Memory配置结构信息:
--OSPhysical memory:32G
Swap:8G--Oracle SGASQL> show parameter sga

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
lock_sga                             boolean     FALSE
pre_page_sga                         boolean     FALSE
sga_max_size                         big integer 16G
sga_target                           big integer 16G

SQL> show parameter pga

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target                 big integer 4G

Maclean Liu(刘相兵 发表于 2015-1-11 15:46:39

需要其他日志 分析  swap 100%的原因,否则这类案例 最后很难说清楚 是谁把swap用了

yehc@epsoft.com 发表于 2015-1-11 15:55:07

Maclean Liu(刘相兵 发表于 2015-1-11 15:46 static/image/common/back.gif
需要其他日志 分析  swap 100%的原因,否则这类案例 最后很难说清楚 是谁把swap用了 ...

是否可以给个你的建议,这套系统日常swap使用率基本维持在70%~80%之间。
目前,我们给出的建议:
1)加大现有的swap大小,从目前8G上升至32*0.75=24G
2)暂时关闭统计信息收集功能,改为手动收集(原因:2015年1月7号晚上23点左右,出现call dbms_stats.gather_database_stats_job_proc ( )
call dbms_space.auto_space_advisor_job_proc ( )长时间无响应,并造成大量的I/O等待事件

Maclean Liu(刘相兵 发表于 2015-1-11 18:46:33

这套系统日常swap使用率基本维持在70%~80%之间。  ==>这并不算正常

sunguo40 发表于 2015-1-13 21:58:26

三节点 还没玩过呢
页: [1]
查看完整版本: 3节点RAC,异常的instance hang!