Oracle数据库数据恢复、性能优化»论坛 › Oracle › Oracle数据库管理 › 求助分析原因---数据库hang住

52 积分	0 好友	6 主题

发消息

求助分析原因---数据库hang住

1^#

发表于 2012-6-3 18:08:55 | 查看: 10971| 回复: 10

一台小机上有四个数据库实例，6月1日下午4点20左右系统物理内存用尽，换页使用率达到53%，应用服务器连接不上数据库。
过了大概几分钟系统内存使用率下降，三个数据库实例恢复正常，其中一个数据库节点1上的实例连接不上，停止服务，节点2正常。
下面附件为该数据的警告日志和其他trc日志。
查出来是因为超出150，导致连接不上，能不能看到当时是什么原因导致有这么大的process数。
ORA-00020: maximum number of processes 150 exceeded
Died during process startup with error 20 (seq=28625)
OPIRIP: Uncaught error 20. Error stack:
ORA-00020: maximum number of processes (150) exceeded

trc日志.rar

431.83 KB, 下载次数: 1034

alert_znavls1.rar

5.89 KB, 下载次数: 1112

分享0

收藏0 回复只看该作者道具举报

xinxin415415

2^#

发表于 2012-6-3 19:22:55

系统errpt 报错信息
LABEL:       CORE_DUMP
IDENTIFIER:    A924A5FC
Date/Time:    Fri Jun  1 16:32:07 GMT+08:00 2012
Sequence Number: 30596
Machine Id:    00C20ED54C00
Node Id:       zn_vlsdbs_01
Class:          S
Type:          PERM
WPAR:          Global
Resource Name: SYSPROC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
      Recommended Actions
      CORRECT THEN RETRY
Failure Causes
SOFTWARE PROGRAM
      Recommended Actions
      RERUN THE APPLICATION PROGRAM
      IF PROBLEM PERSISTS THEN DO THE FOLLOWING
      CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
      11
USER'S PROCESS ID:
            12779688
FILE SYSTEM SERIAL NUMBER
      13
INODE NUMBER
         0    116517
CORE FILE NAME
/oracle/db10g/log/zn_vlsdbs_01/racg/racgimon/core
PROGRAM NAME
racgimon
STACK EXECUTION DISABLED
         0
COME FROM ADDRESS REGISTER
sltsmna 14
PROCESSOR ID
  hw_fru_id: 1
  hw_cpu_id: 6
ADDITIONAL INFORMATION
clscugblm B88
clscugblm B24
crs_qstat 178
crs_qstat 40
clsrcqrya 24C
clsrcinst 1F58
clsrcacti 358
clsrd_do 1FC
_pthread_ F4
??
Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/racgimon SIG/11 FLDS/clscugblm VALU/b88 FLDS/clsrcqrya

回复只看该作者道具举报

Maclean Liu(刘相兵

3^#

发表于 2012-6-3 19:47:03

ORA-00020: maximum number of processes 150 exceeded
Died during process startup with error 20 (seq=28397)
OPIRIP: Uncaught error 20. Error stack:
ORA-00020: maximum number of processes (150) exceeded

ORA-00020 process数耗尽，TRACE和alert.log无法显示究竟当时从哪里来了这么多server process，这需要配合logon audit 才能具体指导

这是一套RAC 数据库 10.2.0.4+ AIX 6.1

swap info: free_mem = 2815.93M rsv = 96.00M
         alloc = 13846.67M avail = 24576.00M swap_free = 10729.33M

就日志看内存仍有空余

   F S    UID    PID    PPID C PRI NI ADDR SZ WCHAN STIME TTY  TIME CMD
  240001 A oracle 33095960       1 0  60 20 1130910590 96464       16:29:26    -  0:00 ora_q001_znavls1
33095960: ora_q001_znavls1
0x09000000000ecee4  thread_wait(0x12c0000012c) + 0x244
0x00000001000fcb74  sskgpwwait(??, ??, ??, ??, ??) + 0x34
0x00000001000fa15c  skgpwwait(??, ??, ??, ??, ??) + 0xbc
0x000000010011e70c  kslges(??, ??, ??, ??, ??) + 0x54c
0x000000010012253c  kslgetl(??, ??, ??, ??) + 0x33c
0x00000001049f10b8  ksfglt(??, ??, ??, ??, ??) + 0x198
0x00000001000847f4  kghfrunp(??, ??, ??, ??, ??, ??, ??) + 0x794
0x000000010007a4c8  kghfnd(??, ??, ??, ??, ??, ??) + 0x7e8
0x0000000100098484  kghalo(??, ??, ??, ??, ??, ??, ??, ??) + 0xa24
0x0000000100005948  ksp_param_handle_alloc(??) + 0x168
0x000000010001d25c  kspcrec(??) + 0x1bc
0x0000000100141348  ksucre(??) + 0x408
0x0000000101281aa8  ksvrdp() + 0x368
0x000000010430d1b4  opirip(??, ??, ??) + 0x554
0x0000000102d9b558  opidrv(??, ??, ??) + 0x458
0x000000010370c070  sou2o(??, ??, ??, ??) + 0x90
0x00000001000008b0  opimai_real(??, ??) + 0x150
0x0000000100000718  main(??, ??) + 0x98
0x0000000100000340  __start() + 0x70
*** 2012-06-01 16:30:29.510
*** 2012-06-01 16:30:39.711

q001尝试多次启动但最后都 creation failed ，观察其stack call

ksp_param_handle_alloc=>kghalo => kghfnd=> kghfrunp=> ksfglt=>kslgetl=> kslges

可以看到该 q001进程hang在kslges  上，它试图 hold 一个latch 但是始终没有得到

观察其他进程的TRACE

Oracle process number: 43

Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 15401178, image:
*** 2012-06-01 16:51:13.052
ksedmp: internal or fatal error
Current SQL statement for this session:
insert into ArrRegieFact (LogisticsNo, REGIECODE,VIN,QUALSTATUS,REMARK) values((SELECT LogisticsNo FROM LogisticsPlan WHERE PlanStatus IN ('PO', '*PO') AND VIN = 'LJNMDV1L2BN067145' AND TRIM(OutDoorMark) = '*' AND trantype IN ('0', '1') AND TRIM(ArrMark) IS NULL),'0101','LJNMDV1L2BN067145',Coalesce(Trim('0'), '0'),'')
----- Call Stack Trace -----
calling             call    entry             argument values in hex
location          type    point             (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c       bl    ksedst1             000000000 ? 000000000 ?
ksedmp+0290       bl    ksedst             104A2CDB0 ?
ksdxfdmp+0338       bl    _ptrgl
ksdxcb+04e4       bl    _ptrgl
sspuser+0074       bl    _ptrgl
000047B8          ?       00000000
sskgpwwait+0034    bl    000FD1F0
skgpwwait+00bc    bl    sskgpwwait          0000000D6 ? 70000010915F0AC ?
                                                700000109185EA0 ? 11022A3E0 ?
                                                000000000 ?
kslges+054c       bl    skgpwwait          000000000 ? D6000000D6 ?
                                                000000000 ? 700000103A490D0 ?
                                                0000000D7 ?
kslgetl+033c       bl    kslges             10009719C ? 000000000 ?
                                                700000103A48E08 ? 000000000 ?
                                                004B3AF98 ?
kslg2c+00d8       bl    kslgetl             FFFFFFFFFFED830 ?
                                                FFFFFFFFFFEDC88 ? 000000005 ?
                                                110401DE8 ?
ksfg2c+0024       bl    03F29EC4
kglpin+1164       bl    _ptrgl
IPRA.$kkdcchs+0130 bl    kglpin             110195490 ? FFFFFFFFFFEE0F0 ?

SO: 70000010a19ba50, type: 4, owner: 700000109185db8, flag: INIT/-/-/0x00
(session) sid: 110 trans: 0, creator: 700000109185db8, flag: (41) USR/- BSY/-/-/-/-/-
            DID: 0001-002B-00009E08, short-term DID: 0001-002B-00009E09
            txn branch: 0
            oct: 2, prv: 0, sql: 7000000d6fefc28, psql: 70000010cb8ad08, user: 63/VLS
service name: znavls
O/S info: user: Administrator, term: ZN-VLSAP-01, ospid: 10980:11888, machine: WORKGROUP\ZN-VLSAP-01
            program: Logic.MQ.exe
application name: Logic.MQ.exe, hash value=2924312369
waiting for 'latch: library cache' blocking sess=0x0 seq=5846 wait_time=0 seconds since wait started=1005
            address=700000103a490d0, number=d7, tries=d19
Dumping Session Wait History
   for 'latch: library cache' count=1 wait_time=292985
            address=700000103a490d0, number=d7, tries=d18
   for 'latch: library cache' count=1 wait_time=292990
            address=700000103a490d0, number=d7, tries=d17
   for 'latch: library cache' count=1 wait_time=292987
            address=700000103a490d0, number=d7, tries=d16
   for 'latch: library cache' count=1 wait_time=292989
            address=700000103a490d0, number=d7, tries=d15
   for 'latch: library cache' count=1 wait_time=292988
            address=700000103a490d0, number=d7, tries=d14
   for 'latch: library cache' count=1 wait_time=292987
            address=700000103a490d0, number=d7, tries=d13

PID=43 也试图 hold 一个latch  ，它在等待latch: library cache

waiting for 700000103a490d0 Child library cache level=5 child#=1
      Location from where latch is held: kghfrunp: clatch: wait:
      Context saved from call: 0
      state=busy, wlstate=free
      waiters [orapid (seconds since: put on list, posted, alive check)]:
         59 (1007, 1338540673, 2)
         44 (1007, 1338540673, 2)
         60 (1007, 1338540673, 2)
         57 (1007, 1338540673, 2)
         65 (1007, 1338540673, 2)
         77 (1007, 1338540673, 2)
         41 (1007, 1338540673, 2)
         66 (1007, 1338540673, 2)
         72 (1007, 1338540673, 2)
         52 (1007, 1338540673, 2)
         51 (1007, 1338540673, 2)
         36 (1007, 1338540673, 2)
         43 (1007, 1338540673, 2)
         67 (1007, 1338540673, 2)
         54 (1007, 1338540673, 2)
         64 (1007, 1338540673, 2)
         47 (1007, 1338540673, 2)
         62 (1007, 1338540673, 2)
         50 (1007, 1338540673, 2)
         78 (1004, 1338540673, 2)
         55 (1004, 1338540673, 2)
         73 (1004, 1338540673, 2)
         58 (1004, 1338540673, 2)
         46 (1004, 1338540673, 2)
         71 (1004, 1338540673, 2)
         69 (1004, 1338540673, 2)
         76 (1004, 1338540673, 2)
         49 (1004, 1338540673, 2)
         56 (1001, 1338540673, 2)
         18 (1001, 1338540673, 2)
         32 (950, 1338540673, 2)
         82 (788, 1338540673, 2)
         84 (725, 1338540673, 2)
         86 (674, 1338540673, 2)
         87 (656, 1338540673, 2)
         89 (548, 1338540673, 2)
         100 (389, 1338540673, 2)
         103 (257, 1338540673, 2)
         waiter count=38
      gotten 41227975 times wait, failed first 7110 sleeps 8296
      gotten 2106764 times nowait, failed: 1455
      possible holder pid = 79 ospid=66453754
   on wait list for 700000103a490d0
   holding (efd=11) 700000103a48770 Child library cache level=5 child#=16
      Location from where latch is held: kglpin:
      Context saved from call: 0
      state=busy, wlstate=free
      waiters [orapid (seconds since: put on list, posted, alive check)]:
         70 (1004, 1338540673, 2)
         63 (908, 1338540673, 2)
         31 (902, 1338540673, 2)
         45 (887, 1338540673, 2)
         68 (875, 1338540673, 2)
         74 (863, 1338540673, 2)
         80 (809, 1338540673, 2)
         81 (797, 1338540673, 2)
         42 (773, 1338540673, 2)
         83 (764, 1338540673, 2)
         53 (692, 1338540673, 2)
         90 (554, 1338540673, 2)
         91 (554, 1338540673, 2)
         92 (527, 1338540673, 2)
         93 (518, 1338540673, 2)
         94 (485, 1338540673, 2)
         95 (485, 1338540673, 2)
         96 (485, 1338540673, 2)
         97 (470, 1338540673, 2)
         98 (467, 1338540673, 2)
         101 (347, 1338540673, 2)
         104 (227, 1338540673, 2)
         16 (173, 1338540673, 2)
         23 (155, 1338540673, 2)
         waiter count=24

其他一大堆的进程也都在等待  Child library cache  latch 并耗费了大量的时间  最长的1007 秒

这可能是导致出现大量process的原因，

      Location from where latch is held: kghfrunp: clatch: wait:

hold 该 Child library cache  latch 的是 kghfrunp: clatch 函数

需要具体分析 hold  Child library cache  latch 的是哪一个进程，这需要 diag进程可能做出的systemstate dump

请上传当时diag 进程的 TRACE

回复只看该作者道具举报

xinxin415415

4^#

发表于 2012-6-3 21:57:56

diag文件请见附件，谢谢

znavls1_diag_15401178.rar

213.17 KB, 下载次数: 1066

回复只看该作者道具举报

Maclean Liu(刘相兵

5^#

发表于 2012-6-3 23:33:05

ODM DATA:

Fri Jun  1 16:30:16 2012
PMON failed to acquire latch, see PMON dump
Fri Jun  1 16:31:04 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:31:38 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:32:02 2012
PMON failed to acquire latch, see PMON dump
Fri Jun  1 16:33:01 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:33:04 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:33:09 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:33:09 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:33:13 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:33:14 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:33:57 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:34:27 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:36:49 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:39:10 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:41:31 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:43:51 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:46:13 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:48:34 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:50:55 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:51:13 2012
IPC Send timeout detected. Receiver ospid 42336460
Receiver is waiting for a latch dumping latch state for receiver -17068
Fri Jun  1 16:51:13 2012
Errors in file /oracle/db10g/admin/znavls/udump/znavls1_ora_67043510.trc:
Fri Jun  1 16:51:16 2012
Errors in file /oracle/db10g/admin/znavls/bdump/znavls1_pz97_42336460.trc:
Fri Jun  1 16:51:16 2012
Trace dumping is performing id=[cdmp_20120601165116]
Fri Jun  1 16:53:16 2012

pmon长期无法acquire 必要的latch
出现ORA-3136
IPC Send timeout detected. Receiver ospid 42336460 Receiver is waiting for a latch dumping latch state for receiver -17068

这三个特征有可能是因为负载较高而引起的latch dead lock

IPC Send timeout detected.  Receiver  latch的相关案例

ODM FINDING:

Hdr: 6315581 10.2.0.3 RDBMS 10.2.0.3 BUFFER CACHE PRODID-5 PORTID-46 ORA-29740 6111445
Abstract: RECEIVER IS WAITING FOR A LATCH DUMPING LATCH STATE FOR RECEIVER

PROBLEM:
--------
Series of IPC Send timeout error messages in the alert log and then
ORA-29740, and node got evicted.

ETue Jul 31 22:44:58 2007
Errors in file /app/oracle/admin/statsrac/bdump/statsrac1_psp0_31176.trc:
ORA-29740: evicted by member , group incarnation
Tue Jul 31 22:44:58 2007
IPC Send timeout detected. Receiver ospid 31182
Receiver is waiting for a latch dumping latch state for receiver 0
Tue Jul 31 22:44:58 2007
Errors in file /app/oracle/admin/statsrac/bdump/statsrac1_lms0_31182.trc:
IPC Send timeout detected. Receiver ospid 31182
Receiver is waiting for a latch dumping latch state for receiver 0
IPC Send timeout detected. Receiver ospid 31182
Receiver is waiting for a latch dumping latch state for receiver 0
IPC Send timeout detected. Receiver ospid 31182
Receiver is waiting for a latch dumping latch state for receiver 0
IPC Send timeout detected. Receiver ospid 31182
Receiver is waiting for a latch dumping latch state for receiver 0
IPC Send timeout detected. Receiver ospid 31182
Receiver is waiting for a latch dumping latch state for receiver 0
System state dump is made for local instance
System State dumped to trace file
/app/oracle/admin/statsrac/bdump/statsrac1_diag_31173.trc
Tue Jul 31 22:44:59 2007
Shutting down instance (abort)

Recomended to Set the following Parameters

_skgxp_udp_ach_reaping_time=0
_skgxp_udp_keep_alive_ping_timer_secs=0
_enable_reliable_latch_waits=FALSE

Also recommended to set thefollowing kernel parameters to at least 256k:

# sysctl -w net.core.rmem_max=262144
# sysctl -w net.core.wmem_max=262144
# sysctl -w net.core.rmem_default=262144
# sysctl -w net.core.wmem_default=262144

Applied Patches
==============
INFO:Interim patches (4) :

INFO:Patch  6279765    : applied on Sat Jul 28 15:39:27 CDT 2007
   Created on 27 Jul 2007, 01:49:56 hrs PST8PDT
   Bugs fixed:
      5165885, 6207951, 6013968, 5454831, 6279765

INFO:Patch  4478139    : applied on Wed Jul 18 14:42:31 CDT 2007
   Created on 13 Jul 2007, 05:28:06 hrs PST8PDT
   Bugs fixed:
      4478139

INFO:Patch  5556081    : applied on Wed Jul 18 14:35:13 CDT 2007
   Created on 9 Nov 2006, 22:20:50 hrs PST8PDT
   Bugs fixed:
      5556081

INFO:Patch  5557962    : applied on Wed Jul 18 14:34:41 CDT 2007
   Created on 9 Nov 2006, 23:23:06 hrs PST8PDT
   Bugs fixed:
      4269423, 5557962, 5528974

eceived ORADEBUG command 'dump errorstack 1' from process

*** 22:41:47.856
Received ORADEBUG command 'dump errorstack 1' from process Unix

*** 22:42:17.906
Received ORADEBUG command 'dump errorstack 1' from

In the DIAg trace file we see that the dumping started
earlier:
*** 22:26:12.689
Dump requested by process [orapid=5]
Dumping process info of pid[7.31182] requested by pid[5.31178]

The good thing is that we now have a stack trace of lms0:
#4  0x08318671 in kslgess ()
#5  0x08317a7f in kslgetsl ()
#6  0x08a6a076 in kclbla ()
#7  0x088c6178 in kjblpbast ()
#8  0x088e2a4f in kjbmpbast ()
#9  0x088446c9 in kjmxmpm ()
#10 0x0883d0f6 in kjmpmsgi ()
#11 0x0883fe53 in kjmsm ()

Its waiting for a latch:

      Location from where call was made: kclbla:
waiting for 990dca78 Child cache buffers chains level=1 child#=5043
      Location from where latch is held: kcbgtcr: kslbegin excl:
      Context saved from call: 12613539
      state=busy(exclusive) (val=0x20000026) holder orapid = 38
      waiters [orapid (seconds since: put on list, posted, alive check)]:
      30 (315, 1185938775, 0)
      34 (304, 1185938775, 0)
      7 (212, 1185938775, 0)

The problem here seems to be more a latch contention problem then anything
else. There is no evidence of bug 5190596.

Unfortunately the system state dump is truncated after process 20
so its unknown what process 38 is doing.
*** SIZE IS LIMITED TO 5242880 BYTES ***
Q1:
Please set  max_dump_file_size to unlimited.
Q2:
Are there any trace files for the user process with ORAPID = 38?
(the trace file should have a line like:
Oracle process number: 38)

The _reliable_latch_waits = false helps for cases where
we fail to recognize that a latch has been released.
Typical for such cases was a waiting session, but no holder,
here there seems to be a valid holder orapid 38.

Really the key to the solution is what process 38 was doing,
if we do not have any information about it we will have
to wait for another occurrence.

This is at severity 1.  A warm-hand off is required and the bug
Should be assigned to the engineer you contact for the warm hand-off.
This may require that you override the BAT assignment.

This is at severity 1.  A warm-hand off is required and the bug
Should be assigned to the engineer you contact for the warm hand-off.
This may require that you override the BAT assignment.

This looks like a clock running backwards problem:
SKGXPIWAIT: keepalive_reset elapsed 4294967295 ts 271812561 last ping
271812562 check 60000
The current timestamp is 1 lower than the last ping time.

建议

可能使因为内存耗尽引起的CPU队列拥堵，造成library cache latch死锁或者进程僵死，导致更多进程连入实例，建议你设置AIX上的VMO min free memory最小空闲内存为 50M

回复只看该作者道具举报

xinxin415415

6^#

发表于 2012-6-4 11:44:33

当时的内存没有用完，该问题跟以下的一条语句有关系吗？影响内存分配的
Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 15401178, image:
ksedmp: internal or fatal error
Current SQL statement for this session:
insert into ArrRegieFact (LogisticsNo, REGIECODE,VIN,QUALSTATUS,REMARK) values((SELECT LogisticsNo FROM LogisticsPlan WHERE PlanStatus IN ('PO', '*PO') AND VIN = 'LJNMDV1L2BN067145' AND TRIM(OutDoorMark) = '*' AND trantype IN ('0', '1') AND TRIM(ArrMark) IS NULL),'0101','LJNMDV1L2BN067145',Coalesce(Trim('0'), '0'),'')

回复只看该作者道具举报

xinxin415415

7^#

发表于 2012-6-4 16:18:50

另外有一个数据库实例在16：14分出现以下日志：
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:14:13 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:14:13 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:14:17 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:14:17 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:24:23 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:24:23 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:24:23 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:24:44 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:24:44 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:26:52 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:26:52 2012
WARNING: inbound connection timed out (ORA-3136)

回复只看该作者道具举报

xinxin415415

8^#

发表于 2012-6-4 18:19:33

又发现CRS当时有问题
2012-06-01 16:02:05.046: [  OCRSRV][3868]th_select_handler: Failed to retrieve procctx from ht. constr = [387532880] retval lht [-27] Signal CV.
2012-06-01 16:02:44.933: [  CRSEVT][11920]32CAAMonitorHandler :: 0:Action Script /oracle/db10g/bin/racgwrap(check) timed out for ora.znavls.znavls1.inst! (timeout=600)
2012-06-01 16:02:44.933: [  CRSAPP][11920]32CheckResource error for ora.znavls.znavls1.inst error code = -2
2012-06-01 16:02:50.641: [  OCRSRV][3868]th_select_handler: Failed to retrieve procctx from ht. constr = [387532880] retval lht [-27] Signal CV.
2012-06-01 16:03:51.230: [  OCRSRV][3868]th_select_handler: Failed to retrieve procctx from ht. constr = [387532880] retval lht [-27] Signal CV.
2012-06-01 16:03:58.141: [  CRSEVT][11928]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:58.141: [  CRSAPP][11928]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:58.275: [  CRSEVT][12442]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.e3sdb.db! (timeout=600)
2012-06-01 16:03:58.291: [  CRSAPP][12442]32CheckResource error for ora.e3sdb.db error code = -2
2012-06-01 16:03:58.320: [  CRSEVT][11932]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:58.321: [  CRSAPP][11932]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:58.412: [  CRSEVT][11936]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:58.417: [  CRSAPP][11936]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:58.460: [  CRSEVT][12191]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.e3sdb.db! (timeout=600)
2012-06-01 16:03:58.460: [  CRSAPP][12191]32CheckResource error for ora.e3sdb.db error code = -2
2012-06-01 16:03:58.579: [  CRSEVT][11939]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:58.592: [  CRSAPP][11939]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:58.624: [  CRSEVT][12196]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.e3sdb.db! (timeout=600)
2012-06-01 16:03:58.625: [  CRSAPP][12196]32CheckResource error for ora.e3sdb.db error code = -2
2012-06-01 16:03:58.751: [  CRSEVT][11943]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:58.763: [  CRSAPP][11943]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:58.793: [  CRSEVT][12200]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.e3sdb.db! (timeout=600)
2012-06-01 16:03:58.793: [  CRSAPP][12200]32CheckResource error for ora.e3sdb.db error code = -2
2012-06-01 16:03:58.921: [  CRSEVT][11947]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:58.935: [  CRSAPP][11947]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:58.949: [  CRSEVT][12204]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.e3sdb.db! (timeout=600)
2012-06-01 16:03:58.949: [  CRSAPP][12204]32CheckResource error for ora.e3sdb.db error code = -2
2012-06-01 16:03:59.076: [  CRSEVT][11951]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:59.089: [  CRSAPP][11951]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:59.120: [  CRSEVT][12208]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.e3sdb.db! (timeout=600)

回复只看该作者道具举报

Maclean Liu(刘相兵

9^#

发表于 2012-6-4 22:14:45

WARNING: inbound connection timed out (ORA-3136)

频繁出现 ORA-3136 ，是系统资源不足的一种表现

racgwrap(check) timed out for ora.znavls.db! (timeout=600)

crs 检测脚本 racgwrap 检测db 实例超时，这是db hang住在CRS层面的表现，是 db hang导致的 racgwrap(check) timed out ，

你举出的这2点都是问题引发的现象，不是问题产生的原因

回复只看该作者道具举报

xifenfei

10^#

发表于 2012-6-4 22:50:46

以前遇到过，见：http://www.xifenfei.com/2835.html

[ 本帖最后由 xifenfei 于 2012-6-4 23:06 编辑 ]

回复只看该作者道具举报

xinxin415415

11^#

发表于 2012-6-6 13:33:54

感谢两位的回复，谢谢！

回复只看该作者道具举报

返回列表

		自动登录	找回密码
密码			注册