- 最后登录
- 2023-8-16
- 在线时间
- 1686 小时
- 威望
- 2135
- 金钱
- 50532
- 注册时间
- 2011-10-12
- 阅读权限
- 200
- 帖子
- 5207
- 精华
- 39
- 积分
- 2135
- UID
- 2
|
3#
发表于 2012-6-3 19:47:03
ORA-00020: maximum number of processes 150 exceeded
Died during process startup with error 20 (seq=28397)
OPIRIP: Uncaught error 20. Error stack:
ORA-00020: maximum number of processes (150) exceeded
ORA-00020 process数耗尽,TRACE和alert.log无法显示 究竟当时 从哪里来了这么多server process, 这需要配合logon audit 才能具体指导
这是一套RAC 数据库 10.2.0.4+ AIX 6.1
swap info: free_mem = 2815.93M rsv = 96.00M
alloc = 13846.67M avail = 24576.00M swap_free = 10729.33M
就日志看 内存仍有空余
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
240001 A oracle 33095960 1 0 60 20 1130910590 96464 16:29:26 - 0:00 ora_q001_znavls1
33095960: ora_q001_znavls1
0x09000000000ecee4 thread_wait(0x12c0000012c) + 0x244
0x00000001000fcb74 sskgpwwait(??, ??, ??, ??, ??) + 0x34
0x00000001000fa15c skgpwwait(??, ??, ??, ??, ??) + 0xbc
0x000000010011e70c kslges(??, ??, ??, ??, ??) + 0x54c
0x000000010012253c kslgetl(??, ??, ??, ??) + 0x33c
0x00000001049f10b8 ksfglt(??, ??, ??, ??, ??) + 0x198
0x00000001000847f4 kghfrunp(??, ??, ??, ??, ??, ??, ??) + 0x794
0x000000010007a4c8 kghfnd(??, ??, ??, ??, ??, ??) + 0x7e8
0x0000000100098484 kghalo(??, ??, ??, ??, ??, ??, ??, ??) + 0xa24
0x0000000100005948 ksp_param_handle_alloc(??) + 0x168
0x000000010001d25c kspcrec(??) + 0x1bc
0x0000000100141348 ksucre(??) + 0x408
0x0000000101281aa8 ksvrdp() + 0x368
0x000000010430d1b4 opirip(??, ??, ??) + 0x554
0x0000000102d9b558 opidrv(??, ??, ??) + 0x458
0x000000010370c070 sou2o(??, ??, ??, ??) + 0x90
0x00000001000008b0 opimai_real(??, ??) + 0x150
0x0000000100000718 main(??, ??) + 0x98
0x0000000100000340 __start() + 0x70
*** 2012-06-01 16:30:29.510
*** 2012-06-01 16:30:39.711
q001尝试多次启动 但最后都 creation failed , 观察其stack call
ksp_param_handle_alloc=>kghalo => kghfnd=> kghfrunp=> ksfglt=>kslgetl=> kslges
可以看到该 q001进程hang在kslges 上,它试图 hold 一个latch 但是始终没有得到
观察其他进程的TRACE
Oracle process number: 43
Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 15401178, image:
*** 2012-06-01 16:51:13.052
ksedmp: internal or fatal error
Current SQL statement for this session:
insert into ArrRegieFact (LogisticsNo, REGIECODE,VIN,QUALSTATUS,REMARK) values((SELECT LogisticsNo FROM LogisticsPlan WHERE PlanStatus IN ('PO', '*PO') AND VIN = 'LJNMDV1L2BN067145' AND TRIM(OutDoorMark) = '*' AND trantype IN ('0', '1') AND TRIM(ArrMark) IS NULL),'0101','LJNMDV1L2BN067145',Coalesce(Trim('0'), '0'),'')
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c bl ksedst1 000000000 ? 000000000 ?
ksedmp+0290 bl ksedst 104A2CDB0 ?
ksdxfdmp+0338 bl _ptrgl
ksdxcb+04e4 bl _ptrgl
sspuser+0074 bl _ptrgl
000047B8 ? 00000000
sskgpwwait+0034 bl 000FD1F0
skgpwwait+00bc bl sskgpwwait 0000000D6 ? 70000010915F0AC ?
700000109185EA0 ? 11022A3E0 ?
000000000 ?
kslges+054c bl skgpwwait 000000000 ? D6000000D6 ?
000000000 ? 700000103A490D0 ?
0000000D7 ?
kslgetl+033c bl kslges 10009719C ? 000000000 ?
700000103A48E08 ? 000000000 ?
004B3AF98 ?
kslg2c+00d8 bl kslgetl FFFFFFFFFFED830 ?
FFFFFFFFFFEDC88 ? 000000005 ?
110401DE8 ?
ksfg2c+0024 bl 03F29EC4
kglpin+1164 bl _ptrgl
IPRA.$kkdcchs+0130 bl kglpin 110195490 ? FFFFFFFFFFEE0F0 ?
SO: 70000010a19ba50, type: 4, owner: 700000109185db8, flag: INIT/-/-/0x00
(session) sid: 110 trans: 0, creator: 700000109185db8, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-002B-00009E08, short-term DID: 0001-002B-00009E09
txn branch: 0
oct: 2, prv: 0, sql: 7000000d6fefc28, psql: 70000010cb8ad08, user: 63/VLS
service name: znavls
O/S info: user: Administrator, term: ZN-VLSAP-01, ospid: 10980:11888, machine: WORKGROUP\ZN-VLSAP-01
program: Logic.MQ.exe
application name: Logic.MQ.exe, hash value=2924312369
waiting for 'latch: library cache' blocking sess=0x0 seq=5846 wait_time=0 seconds since wait started=1005
address=700000103a490d0, number=d7, tries=d19
Dumping Session Wait History
for 'latch: library cache' count=1 wait_time=292985
address=700000103a490d0, number=d7, tries=d18
for 'latch: library cache' count=1 wait_time=292990
address=700000103a490d0, number=d7, tries=d17
for 'latch: library cache' count=1 wait_time=292987
address=700000103a490d0, number=d7, tries=d16
for 'latch: library cache' count=1 wait_time=292989
address=700000103a490d0, number=d7, tries=d15
for 'latch: library cache' count=1 wait_time=292988
address=700000103a490d0, number=d7, tries=d14
for 'latch: library cache' count=1 wait_time=292987
address=700000103a490d0, number=d7, tries=d13
PID=43 也试图 hold 一个latch , 它在等待latch: library cache
waiting for 700000103a490d0 Child library cache level=5 child#=1
Location from where latch is held: kghfrunp: clatch: wait:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
59 (1007, 1338540673, 2)
44 (1007, 1338540673, 2)
60 (1007, 1338540673, 2)
57 (1007, 1338540673, 2)
65 (1007, 1338540673, 2)
77 (1007, 1338540673, 2)
41 (1007, 1338540673, 2)
66 (1007, 1338540673, 2)
72 (1007, 1338540673, 2)
52 (1007, 1338540673, 2)
51 (1007, 1338540673, 2)
36 (1007, 1338540673, 2)
43 (1007, 1338540673, 2)
67 (1007, 1338540673, 2)
54 (1007, 1338540673, 2)
64 (1007, 1338540673, 2)
47 (1007, 1338540673, 2)
62 (1007, 1338540673, 2)
50 (1007, 1338540673, 2)
78 (1004, 1338540673, 2)
55 (1004, 1338540673, 2)
73 (1004, 1338540673, 2)
58 (1004, 1338540673, 2)
46 (1004, 1338540673, 2)
71 (1004, 1338540673, 2)
69 (1004, 1338540673, 2)
76 (1004, 1338540673, 2)
49 (1004, 1338540673, 2)
56 (1001, 1338540673, 2)
18 (1001, 1338540673, 2)
32 (950, 1338540673, 2)
82 (788, 1338540673, 2)
84 (725, 1338540673, 2)
86 (674, 1338540673, 2)
87 (656, 1338540673, 2)
89 (548, 1338540673, 2)
100 (389, 1338540673, 2)
103 (257, 1338540673, 2)
waiter count=38
gotten 41227975 times wait, failed first 7110 sleeps 8296
gotten 2106764 times nowait, failed: 1455
possible holder pid = 79 ospid=66453754
on wait list for 700000103a490d0
holding (efd=11) 700000103a48770 Child library cache level=5 child#=16
Location from where latch is held: kglpin:
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
70 (1004, 1338540673, 2)
63 (908, 1338540673, 2)
31 (902, 1338540673, 2)
45 (887, 1338540673, 2)
68 (875, 1338540673, 2)
74 (863, 1338540673, 2)
80 (809, 1338540673, 2)
81 (797, 1338540673, 2)
42 (773, 1338540673, 2)
83 (764, 1338540673, 2)
53 (692, 1338540673, 2)
90 (554, 1338540673, 2)
91 (554, 1338540673, 2)
92 (527, 1338540673, 2)
93 (518, 1338540673, 2)
94 (485, 1338540673, 2)
95 (485, 1338540673, 2)
96 (485, 1338540673, 2)
97 (470, 1338540673, 2)
98 (467, 1338540673, 2)
101 (347, 1338540673, 2)
104 (227, 1338540673, 2)
16 (173, 1338540673, 2)
23 (155, 1338540673, 2)
waiter count=24
其他一大堆的 进程也都在等待 Child library cache latch 并耗费了大量的时间 最长的1007 秒
这可能是导致出现大量process的原因,
Location from where latch is held: kghfrunp: clatch: wait:
hold 该 Child library cache latch 的 是 kghfrunp: clatch 函数
需要具体分析 hold Child library cache latch 的是哪一个进程, 这需要 diag进程可能做出的systemstate dump
请上传 当时diag 进程的 TRACE |
|