Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

52

积分

0

好友

6

主题
1#
发表于 2012-6-3 18:08:55 | 查看: 10970| 回复: 10
一台小机上有四个数据库实例,6月1日下午4点20左右系统物理内存用尽,换页使用率达到53%,应用服务器连接不上数据库。
过了大概几分钟系统内存使用率下降,三个数据库实例恢复正常,其中一个数据库节点1上的实例连接不上,停止服务,节点2正常。
下面附件为该数据的警告日志和其他trc日志。
查出来是因为超出150,导致连接不上,能不能看到当时是什么原因导致有这么大的process数。
ORA-00020: maximum number of processes 150 exceeded
Died during process startup with error 20 (seq=28625)
OPIRIP: Uncaught error 20. Error stack:
ORA-00020: maximum number of processes (150) exceeded

trc日志.rar

431.83 KB, 下载次数: 1034

alert_znavls1.rar

5.89 KB, 下载次数: 1112

2#
发表于 2012-6-3 19:22:55
系统errpt 报错信息
LABEL:          CORE_DUMP
IDENTIFIER:     A924A5FC
Date/Time:       Fri Jun  1 16:32:07 GMT+08:00 2012
Sequence Number: 30596
Machine Id:      00C20ED54C00
Node Id:         zn_vlsdbs_01
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   SYSPROC         
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
        Recommended Actions
        CORRECT THEN RETRY
Failure Causes
SOFTWARE PROGRAM
        Recommended Actions
        RERUN THE APPLICATION PROGRAM
        IF PROBLEM PERSISTS THEN DO THE FOLLOWING
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
          11
USER'S PROCESS ID:
              12779688
FILE SYSTEM SERIAL NUMBER
          13
INODE NUMBER
           0      116517
CORE FILE NAME
/oracle/db10g/log/zn_vlsdbs_01/racg/racgimon/core
PROGRAM NAME
racgimon
STACK EXECUTION DISABLED
           0
COME FROM ADDRESS REGISTER
sltsmna 14
PROCESSOR ID
  hw_fru_id: 1
  hw_cpu_id: 6
ADDITIONAL INFORMATION
clscugblm B88
clscugblm B24
crs_qstat 178
crs_qstat 40
clsrcqrya 24C
clsrcinst 1F58
clsrcacti 358
clsrd_do 1FC
_pthread_ F4
??
Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/racgimon SIG/11 FLDS/clscugblm VALU/b88 FLDS/clsrcqrya

回复 只看该作者 道具 举报

3#
发表于 2012-6-3 19:47:03
ORA-00020: maximum number of processes 150 exceeded
Died during process startup with error 20 (seq=28397)
OPIRIP: Uncaught error 20. Error stack:
ORA-00020: maximum number of processes (150) exceeded


ORA-00020 process数耗尽,TRACE和alert.log无法显示 究竟当时 从哪里来了这么多server process, 这需要配合logon audit 才能具体指导


这是一套RAC 数据库 10.2.0.4+ AIX 6.1

swap info: free_mem = 2815.93M rsv = 96.00M
           alloc = 13846.67M avail = 24576.00M swap_free = 10729.33M

就日志看 内存仍有空余


       F S      UID      PID     PPID   C PRI NI ADDR    SZ    WCHAN    STIME    TTY  TIME CMD
  240001 A   oracle 33095960        1   0  60 20 1130910590 96464          16:29:26      -  0:00 ora_q001_znavls1
33095960: ora_q001_znavls1
0x09000000000ecee4  thread_wait(0x12c0000012c) + 0x244
0x00000001000fcb74  sskgpwwait(??, ??, ??, ??, ??) + 0x34
0x00000001000fa15c  skgpwwait(??, ??, ??, ??, ??) + 0xbc
0x000000010011e70c  kslges(??, ??, ??, ??, ??) + 0x54c
0x000000010012253c  kslgetl(??, ??, ??, ??) + 0x33c
0x00000001049f10b8  ksfglt(??, ??, ??, ??, ??) + 0x198
0x00000001000847f4  kghfrunp(??, ??, ??, ??, ??, ??, ??) + 0x794
0x000000010007a4c8  kghfnd(??, ??, ??, ??, ??, ??) + 0x7e8
0x0000000100098484  kghalo(??, ??, ??, ??, ??, ??, ??, ??) + 0xa24
0x0000000100005948  ksp_param_handle_alloc(??) + 0x168
0x000000010001d25c  kspcrec(??) + 0x1bc
0x0000000100141348  ksucre(??) + 0x408
0x0000000101281aa8  ksvrdp() + 0x368
0x000000010430d1b4  opirip(??, ??, ??) + 0x554
0x0000000102d9b558  opidrv(??, ??, ??) + 0x458
0x000000010370c070  sou2o(??, ??, ??, ??) + 0x90
0x00000001000008b0  opimai_real(??, ??) + 0x150
0x0000000100000718  main(??, ??) + 0x98
0x0000000100000340  __start() + 0x70
*** 2012-06-01 16:30:29.510
*** 2012-06-01 16:30:39.711


q001尝试多次启动 但最后都 creation failed , 观察其stack call

ksp_param_handle_alloc=>kghalo => kghfnd=> kghfrunp=> ksfglt=>kslgetl=> kslges  

可以看到该 q001进程hang在kslges  上,它试图 hold 一个latch 但是始终没有得到


观察其他进程的TRACE


Oracle process number: 43

Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 15401178, image:
*** 2012-06-01 16:51:13.052
ksedmp: internal or fatal error
Current SQL statement for this session:
insert into ArrRegieFact (LogisticsNo, REGIECODE,VIN,QUALSTATUS,REMARK) values((SELECT LogisticsNo FROM LogisticsPlan WHERE PlanStatus IN ('PO', '*PO') AND VIN = 'LJNMDV1L2BN067145' AND TRIM(OutDoorMark) = '*' AND trantype IN ('0', '1') AND TRIM(ArrMark) IS NULL),'0101','LJNMDV1L2BN067145',Coalesce(Trim('0'), '0'),'')
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              000000000 ? 000000000 ?
ksedmp+0290          bl       ksedst               104A2CDB0 ?
ksdxfdmp+0338        bl       _ptrgl               
ksdxcb+04e4          bl       _ptrgl               
sspuser+0074         bl       _ptrgl               
000047B8             ?        00000000            
sskgpwwait+0034      bl       000FD1F0            
skgpwwait+00bc       bl       sskgpwwait           0000000D6 ? 70000010915F0AC ?
                                                   700000109185EA0 ? 11022A3E0 ?
                                                   000000000 ?
kslges+054c          bl       skgpwwait            000000000 ? D6000000D6 ?
                                                   000000000 ? 700000103A490D0 ?
                                                   0000000D7 ?
kslgetl+033c         bl       kslges               10009719C ? 000000000 ?
                                                   700000103A48E08 ? 000000000 ?
                                                   004B3AF98 ?
kslg2c+00d8          bl       kslgetl              FFFFFFFFFFED830 ?
                                                   FFFFFFFFFFEDC88 ? 000000005 ?
                                                   110401DE8 ?
ksfg2c+0024          bl       03F29EC4            
kglpin+1164          bl       _ptrgl               
IPRA.$kkdcchs+0130   bl       kglpin               110195490 ? FFFFFFFFFFEE0F0 ?

    SO: 70000010a19ba50, type: 4, owner: 700000109185db8, flag: INIT/-/-/0x00
    (session) sid: 110 trans: 0, creator: 700000109185db8, flag: (41) USR/- BSY/-/-/-/-/-
              DID: 0001-002B-00009E08, short-term DID: 0001-002B-00009E09
              txn branch: 0
              oct: 2, prv: 0, sql: 7000000d6fefc28, psql: 70000010cb8ad08, user: 63/VLS
    service name: znavls
    O/S info: user: Administrator, term: ZN-VLSAP-01, ospid: 10980:11888, machine: WORKGROUP\ZN-VLSAP-01
              program: Logic.MQ.exe
    application name: Logic.MQ.exe, hash value=2924312369
    waiting for 'latch: library cache' blocking sess=0x0 seq=5846 wait_time=0 seconds since wait started=1005
                address=700000103a490d0, number=d7, tries=d19
    Dumping Session Wait History
     for 'latch: library cache' count=1 wait_time=292985
                address=700000103a490d0, number=d7, tries=d18
     for 'latch: library cache' count=1 wait_time=292990
                address=700000103a490d0, number=d7, tries=d17
     for 'latch: library cache' count=1 wait_time=292987
                address=700000103a490d0, number=d7, tries=d16
     for 'latch: library cache' count=1 wait_time=292989
                address=700000103a490d0, number=d7, tries=d15
     for 'latch: library cache' count=1 wait_time=292988
                address=700000103a490d0, number=d7, tries=d14
     for 'latch: library cache' count=1 wait_time=292987
                address=700000103a490d0, number=d7, tries=d13

PID=43 也试图 hold 一个latch  , 它在等待latch: library cache


    waiting for 700000103a490d0 Child library cache level=5 child#=1
        Location from where latch is held: kghfrunp: clatch: wait:
        Context saved from call: 0
        state=busy, wlstate=free
          waiters [orapid (seconds since: put on list, posted, alive check)]:
           59 (1007, 1338540673, 2)
           44 (1007, 1338540673, 2)
           60 (1007, 1338540673, 2)
           57 (1007, 1338540673, 2)
           65 (1007, 1338540673, 2)
           77 (1007, 1338540673, 2)
           41 (1007, 1338540673, 2)
           66 (1007, 1338540673, 2)
           72 (1007, 1338540673, 2)
           52 (1007, 1338540673, 2)
           51 (1007, 1338540673, 2)
           36 (1007, 1338540673, 2)
           43 (1007, 1338540673, 2)
           67 (1007, 1338540673, 2)
           54 (1007, 1338540673, 2)
           64 (1007, 1338540673, 2)
           47 (1007, 1338540673, 2)
           62 (1007, 1338540673, 2)
           50 (1007, 1338540673, 2)
           78 (1004, 1338540673, 2)
           55 (1004, 1338540673, 2)
           73 (1004, 1338540673, 2)
           58 (1004, 1338540673, 2)
           46 (1004, 1338540673, 2)
           71 (1004, 1338540673, 2)
           69 (1004, 1338540673, 2)
           76 (1004, 1338540673, 2)
           49 (1004, 1338540673, 2)
           56 (1001, 1338540673, 2)
           18 (1001, 1338540673, 2)
           32 (950, 1338540673, 2)
           82 (788, 1338540673, 2)
           84 (725, 1338540673, 2)
           86 (674, 1338540673, 2)
           87 (656, 1338540673, 2)
           89 (548, 1338540673, 2)
           100 (389, 1338540673, 2)
           103 (257, 1338540673, 2)
           waiter count=38
          gotten 41227975 times wait, failed first 7110 sleeps 8296
          gotten 2106764 times nowait, failed: 1455
        possible holder pid = 79 ospid=66453754
      on wait list for 700000103a490d0
      holding    (efd=11) 700000103a48770 Child library cache level=5 child#=16
        Location from where latch is held: kglpin:
        Context saved from call: 0
        state=busy, wlstate=free
          waiters [orapid (seconds since: put on list, posted, alive check)]:
           70 (1004, 1338540673, 2)
           63 (908, 1338540673, 2)
           31 (902, 1338540673, 2)
           45 (887, 1338540673, 2)
           68 (875, 1338540673, 2)
           74 (863, 1338540673, 2)
           80 (809, 1338540673, 2)
           81 (797, 1338540673, 2)
           42 (773, 1338540673, 2)
           83 (764, 1338540673, 2)
           53 (692, 1338540673, 2)
           90 (554, 1338540673, 2)
           91 (554, 1338540673, 2)
           92 (527, 1338540673, 2)
           93 (518, 1338540673, 2)
           94 (485, 1338540673, 2)
           95 (485, 1338540673, 2)
           96 (485, 1338540673, 2)
           97 (470, 1338540673, 2)
           98 (467, 1338540673, 2)
           101 (347, 1338540673, 2)
           104 (227, 1338540673, 2)
           16 (173, 1338540673, 2)
           23 (155, 1338540673, 2)
           waiter count=24


其他一大堆的 进程也都在等待  Child library cache  latch 并耗费了大量的时间  最长的1007 秒

这可能是导致出现大量process的原因,

        Location from where latch is held: kghfrunp: clatch: wait:

hold 该 Child library cache  latch 的 是 kghfrunp: clatch 函数


需要具体分析 hold  Child library cache  latch 的是哪一个进程, 这需要 diag进程可能做出的systemstate dump


请上传 当时diag 进程的 TRACE

回复 只看该作者 道具 举报

4#
发表于 2012-6-3 21:57:56
diag文件请见附件,谢谢

znavls1_diag_15401178.rar

213.17 KB, 下载次数: 1065

回复 只看该作者 道具 举报

5#
发表于 2012-6-3 23:33:05
ODM DATA:

Fri Jun  1 16:30:16 2012
PMON failed to acquire latch, see PMON dump
Fri Jun  1 16:31:04 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:31:38 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:32:02 2012
PMON failed to acquire latch, see PMON dump
Fri Jun  1 16:33:01 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:33:04 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:33:09 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:33:09 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:33:13 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:33:14 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:33:57 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:34:27 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:36:49 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:39:10 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:41:31 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:43:51 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:46:13 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:48:34 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:50:55 2012
ksvcreate: Process(q001) creation failed
Fri Jun  1 16:51:13 2012
IPC Send timeout detected. Receiver ospid 42336460
Receiver is waiting for a latch dumping latch state for receiver -17068
Fri Jun  1 16:51:13 2012
Errors in file /oracle/db10g/admin/znavls/udump/znavls1_ora_67043510.trc:
Fri Jun  1 16:51:16 2012
Errors in file /oracle/db10g/admin/znavls/bdump/znavls1_pz97_42336460.trc:
Fri Jun  1 16:51:16 2012
Trace dumping is performing id=[cdmp_20120601165116]
Fri Jun  1 16:53:16 2012






pmon长期无法acquire 必要的latch  
出现ORA-3136
IPC Send timeout detected. Receiver ospid 42336460 Receiver is waiting for a latch dumping latch state for receiver -17068

这三个特征有可能是因为负载较高 而引起的latch dead lock



IPC Send timeout detected.  Receiver  latch的相关案例

ODM FINDING:

Hdr: 6315581 10.2.0.3 RDBMS 10.2.0.3 BUFFER CACHE PRODID-5 PORTID-46 ORA-29740 6111445
Abstract: RECEIVER IS WAITING FOR A LATCH DUMPING LATCH STATE FOR RECEIVER


PROBLEM:
--------
Series of IPC Send timeout error messages in the alert log and then
ORA-29740, and node got evicted.

ETue Jul 31 22:44:58 2007
Errors in file /app/oracle/admin/statsrac/bdump/statsrac1_psp0_31176.trc:
ORA-29740: evicted by member , group incarnation
Tue Jul 31 22:44:58 2007
IPC Send timeout detected. Receiver ospid 31182
Receiver is waiting for a latch dumping latch state for receiver 0
Tue Jul 31 22:44:58 2007
Errors in file /app/oracle/admin/statsrac/bdump/statsrac1_lms0_31182.trc:
IPC Send timeout detected. Receiver ospid 31182
Receiver is waiting for a latch dumping latch state for receiver 0
IPC Send timeout detected. Receiver ospid 31182
Receiver is waiting for a latch dumping latch state for receiver 0
IPC Send timeout detected. Receiver ospid 31182
Receiver is waiting for a latch dumping latch state for receiver 0
IPC Send timeout detected. Receiver ospid 31182
Receiver is waiting for a latch dumping latch state for receiver 0
IPC Send timeout detected. Receiver ospid 31182
Receiver is waiting for a latch dumping latch state for receiver 0
System state dump is made for local instance
System State dumped to trace file
/app/oracle/admin/statsrac/bdump/statsrac1_diag_31173.trc
Tue Jul 31 22:44:59 2007
Shutting down instance (abort)

Recomended to Set the following Parameters

_skgxp_udp_ach_reaping_time=0
_skgxp_udp_keep_alive_ping_timer_secs=0
_enable_reliable_latch_waits=FALSE

Also recommended to set thefollowing kernel parameters to at least 256k:

# sysctl -w net.core.rmem_max=262144
# sysctl -w net.core.wmem_max=262144
# sysctl -w net.core.rmem_default=262144
# sysctl -w net.core.wmem_default=262144

Applied Patches
    ==============
    INFO:Interim patches (4) :
   
    INFO:Patch  6279765      : applied on Sat Jul 28 15:39:27 CDT 2007
       Created on 27 Jul 2007, 01:49:56 hrs PST8PDT
       Bugs fixed:
         5165885, 6207951, 6013968, 5454831, 6279765
   
    INFO:Patch  4478139      : applied on Wed Jul 18 14:42:31 CDT 2007
       Created on 13 Jul 2007, 05:28:06 hrs PST8PDT
       Bugs fixed:
         4478139
   
    INFO:Patch  5556081      : applied on Wed Jul 18 14:35:13 CDT 2007
       Created on 9 Nov 2006, 22:20:50 hrs PST8PDT
       Bugs fixed:
         5556081
   
    INFO:Patch  5557962      : applied on Wed Jul 18 14:34:41 CDT 2007
       Created on 9 Nov 2006, 23:23:06 hrs PST8PDT
       Bugs fixed:
         4269423, 5557962, 5528974

eceived ORADEBUG command 'dump errorstack 1' from process

*** 22:41:47.856
Received ORADEBUG command 'dump errorstack 1' from process Unix

*** 22:42:17.906
Received ORADEBUG command 'dump errorstack 1' from

In the DIAg trace file we see that the dumping started
earlier:
*** 22:26:12.689
Dump requested by process [orapid=5]
Dumping process info of pid[7.31182] requested by pid[5.31178]

The good thing is that we now have a stack trace of lms0:
#4  0x08318671 in kslgess ()
#5  0x08317a7f in kslgetsl ()
#6  0x08a6a076 in kclbla ()
#7  0x088c6178 in kjblpbast ()
#8  0x088e2a4f in kjbmpbast ()
#9  0x088446c9 in kjmxmpm ()
#10 0x0883d0f6 in kjmpmsgi ()
#11 0x0883fe53 in kjmsm ()

Its waiting for a latch:

        Location from where call was made: kclbla:
    waiting for 990dca78 Child cache buffers chains level=1 child#=5043
        Location from where latch is held: kcbgtcr: kslbegin excl:
        Context saved from call: 12613539
        state=busy(exclusive) (val=0x20000026) holder orapid = 38
        waiters [orapid (seconds since: put on list, posted, alive check)]:
         30 (315, 1185938775, 0)
         34 (304, 1185938775, 0)
         7 (212, 1185938775, 0)

The problem here seems to be more a latch contention problem then anything
else. There is no evidence of bug 5190596.

Unfortunately the system state dump is truncated after process 20
so its unknown what process 38 is doing.
*** SIZE IS LIMITED TO 5242880 BYTES ***
Q1:
Please set  max_dump_file_size to unlimited.
Q2:
Are there any trace files for the user process with ORAPID = 38?
(the trace file should have a line like:
Oracle process number: 38)

The _reliable_latch_waits = false helps for cases where
we fail to recognize that a latch has been released.
Typical for such cases was a waiting session, but no holder,
here there seems to be a valid holder orapid 38.

Really the key to the solution is what process 38 was doing,
if we do not have any information about it we will have
to wait for another occurrence.


This is at severity 1.  A warm-hand off is required and the bug
Should be assigned to the engineer you contact for the warm hand-off.
This may require that you override the BAT assignment.

This is at severity 1.  A warm-hand off is required and the bug
Should be assigned to the engineer you contact for the warm hand-off.
This may require that you override the BAT assignment.

This looks like a clock running backwards problem:
SKGXPIWAIT: keepalive_reset elapsed 4294967295 ts 271812561 last ping
271812562 check 60000
The current timestamp is 1 lower than the last ping time.





建议

可能使因为内存耗尽 引起的CPU队列拥堵,造成library cache latch死锁 或者进程僵死, 导致更多进程连入实例, 建议你设置AIX上的VMO min free memory最小空闲内存为 50M

回复 只看该作者 道具 举报

6#
发表于 2012-6-4 11:44:33
当时的内存没有用完,该问题跟以下的一条语句有关系吗?影响内存分配的
Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 15401178, image:
ksedmp: internal or fatal error
Current SQL statement for this session:
insert into ArrRegieFact (LogisticsNo, REGIECODE,VIN,QUALSTATUS,REMARK) values((SELECT LogisticsNo FROM LogisticsPlan WHERE PlanStatus IN ('PO', '*PO') AND VIN = 'LJNMDV1L2BN067145' AND TRIM(OutDoorMark) = '*' AND trantype IN ('0', '1') AND TRIM(ArrMark) IS NULL),'0101','LJNMDV1L2BN067145',Coalesce(Trim('0'), '0'),'')

回复 只看该作者 道具 举报

7#
发表于 2012-6-4 16:18:50
另外有一个数据库实例在16:14分出现以下日志:
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:14:13 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:14:13 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:14:17 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:14:17 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:24:23 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:24:23 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:24:23 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:24:44 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:24:44 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:26:52 2012
WARNING: inbound connection timed out (ORA-3136)
Fri Jun  1 16:26:52 2012
WARNING: inbound connection timed out (ORA-3136)

回复 只看该作者 道具 举报

8#
发表于 2012-6-4 18:19:33
又发现CRS当时有问题
2012-06-01 16:02:05.046: [  OCRSRV][3868]th_select_handler: Failed to retrieve procctx from ht. constr = [387532880] retval lht [-27] Signal CV.
2012-06-01 16:02:44.933: [  CRSEVT][11920]32CAAMonitorHandler :: 0:Action Script /oracle/db10g/bin/racgwrap(check) timed out for ora.znavls.znavls1.inst! (timeout=600)
2012-06-01 16:02:44.933: [  CRSAPP][11920]32CheckResource error for ora.znavls.znavls1.inst error code = -2
2012-06-01 16:02:50.641: [  OCRSRV][3868]th_select_handler: Failed to retrieve procctx from ht. constr = [387532880] retval lht [-27] Signal CV.
2012-06-01 16:03:51.230: [  OCRSRV][3868]th_select_handler: Failed to retrieve procctx from ht. constr = [387532880] retval lht [-27] Signal CV.
2012-06-01 16:03:58.141: [  CRSEVT][11928]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:58.141: [  CRSAPP][11928]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:58.275: [  CRSEVT][12442]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.e3sdb.db! (timeout=600)
2012-06-01 16:03:58.291: [  CRSAPP][12442]32CheckResource error for ora.e3sdb.db error code = -2
2012-06-01 16:03:58.320: [  CRSEVT][11932]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:58.321: [  CRSAPP][11932]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:58.412: [  CRSEVT][11936]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:58.417: [  CRSAPP][11936]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:58.460: [  CRSEVT][12191]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.e3sdb.db! (timeout=600)
2012-06-01 16:03:58.460: [  CRSAPP][12191]32CheckResource error for ora.e3sdb.db error code = -2
2012-06-01 16:03:58.579: [  CRSEVT][11939]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:58.592: [  CRSAPP][11939]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:58.624: [  CRSEVT][12196]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.e3sdb.db! (timeout=600)
2012-06-01 16:03:58.625: [  CRSAPP][12196]32CheckResource error for ora.e3sdb.db error code = -2
2012-06-01 16:03:58.751: [  CRSEVT][11943]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:58.763: [  CRSAPP][11943]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:58.793: [  CRSEVT][12200]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.e3sdb.db! (timeout=600)
2012-06-01 16:03:58.793: [  CRSAPP][12200]32CheckResource error for ora.e3sdb.db error code = -2
2012-06-01 16:03:58.921: [  CRSEVT][11947]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:58.935: [  CRSAPP][11947]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:58.949: [  CRSEVT][12204]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.e3sdb.db! (timeout=600)
2012-06-01 16:03:58.949: [  CRSAPP][12204]32CheckResource error for ora.e3sdb.db error code = -2
2012-06-01 16:03:59.076: [  CRSEVT][11951]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.znavls.db! (timeout=600)
2012-06-01 16:03:59.089: [  CRSAPP][11951]32CheckResource error for ora.znavls.db error code = -2
2012-06-01 16:03:59.120: [  CRSEVT][12208]32CAAMonitorHandler :: 0:Action Script /oracle/crs/bin/racgwrap(check) timed out for ora.e3sdb.db! (timeout=600)

回复 只看该作者 道具 举报

9#
发表于 2012-6-4 22:14:45
WARNING: inbound connection timed out (ORA-3136)

频繁出现 ORA-3136 , 是系统资源不足的一种表现

racgwrap(check) timed out for ora.znavls.db! (timeout=600)

crs 检测脚本 racgwrap 检测db 实例超时 , 这是db hang住在CRS层面的表现, 是 db hang导致的  racgwrap(check) timed out ,


你举出的这2点都是 问题引发的现象, 不是问题产生的原因

回复 只看该作者 道具 举报

10#
发表于 2012-6-4 22:50:46
以前遇到过,见:http://www.xifenfei.com/2835.html

[ 本帖最后由 xifenfei 于 2012-6-4 23:06 编辑 ]

回复 只看该作者 道具 举报

11#
发表于 2012-6-6 13:33:54
感谢两位的回复,谢谢!

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-12-26 01:18 , Processed in 0.097367 second(s), 24 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569