ORA-07445: 出现异常错误: 核心转储 [kghlkremf+0044] [SIGSEGV]
环境:AIX 5.3 ORACLE RAC 10.2.0.4 实例名分别为imscmdb1/imscmdb2晚上接到电话,说应用系统异常持续了大概半个小时,时间段是22:30-22:50,让检查数据库情况。查看alert日志,发现imscmdb1有ORA-7445错误和实例重启情况。附上alert和trace文件,各位大侠有时间了帮分析一下。
8月1日 22:26:45-22:45:12,数据库后台日志alert持续报“Fri Aug 1 22:26:45 2014
Errors in file /oracle/product/10.2.0/db_1/admin/imscmdb/udump/imscmdb1_ora_1302918.trc:
ORA-07445: 出现异常错误: 核心转储 [] []”。
22:45:32-22:50:12 ,持续报“Fri Aug 1 22:45:32 2014
Errors in file /oracle/product/10.2.0/db_1/admin/imscmdb/bdump/imscmdb1_pmon_446570.trc:
ORA-00474: SMON process terminated with error”
22:50:33-22:50:34 报“Fri Aug 1 22:50:33 2014
Errors in file /oracle/product/10.2.0/db_1/admin/imscmdb/bdump/imscmdb1_lms0_344392.trc:
ORA-00472: PMON process terminated with error”
22:50:37 oracle执行了“Fri Aug 1 22:50:37 2014 Shutting down instance (abort)”
22:54:19 oracle重启完成。
ORA-07445: ≥ˆœ÷“Ï≥£¥ÌŒÛ: ∫À–ƒ◊™¥¢ [] []
No current SQL statement being executed.
Errors in file /oracle/product/10.2.0/db_1/admin/imscmdb/udump/imscmdb1_ora_966820.trc:
ORA-00600: ÄÚ²¿´íÎó´úÂë, ²ÎÊý: , , , [], [], [], [], []
Sat Aug 2 06:41:46 2014
729+kghlkremf pga=3145728000
上传附近时段的awr 我们能知道这个配置的hi是否合理。 6点-7点的awr 服务器的OSWBB监控数据,日志太多,仅上传VM的监控日志 SELECT dimension_column1 AS labels,CORPORATIONID AS corpId, value as currentValue,(nvl(VALUE,0)-nvl((select value from STATDBA.T_BSRunningTime where CORPORATIONID=T.CORPORATIONID and DIMENSION_COLUMN0 = 3 and DIMENSION_COLUMN1=T.DIMENSION_COLUMN1 and time=T.time-5/24/60 and ROWNUM=1),0)) as Value FROM (SELECT CORPORATIONID,DIMENSION_COLUMN1,VALUE,time, ROW_NUMBER() OVER(PARTITION BY ST.CORPORATIONID, ST.DIMENSION_COLUMN1 ORDER BY ST.TIME DESC) rowno FROM STATDBA.T_BSRunningTime ST inner join DICTDBA.DT_CORPORATION CO on ST.CORPORATIONID=CO.CORP_CODE WHERE CO.corp_code =17 AND ST.DIMENSION_COLUMN1 IN(27,53,51,32,62,3,2,37,38,31,39,21,41,36,4,34,73,35,16,29,63,30,82,43,42,94,90,89,81,71,18) AND ST.DIMENSION_COLUMN0 = 3 AND ST.TIME BETWEEN to_date('2014/08/01 22:20:00','YYYY/MM/DD HH24:MI:SS') AND to_date('2014/08/01 22:25:00','YYYY/MM/DD HH24:MI:SS') ) T WHERE rowno = 1
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c bl ksedst1 000000000 ? 104A2CD0D ?
ksedmp+0290 bl ksedst 104A2C690 ?
ssexhd+03e0 bl ksedmp 300001D15 ?
000044C0 ? 00000000
kghget_reserved_ext bl kghadd_reserved_ext FFFFFFFFFFF10C0 ? 00089A260 ?
ent+0214 ent 101F9CA14 ? 110440918 ?
FFFFFFFFFFF10C0 ?
9001000A00634F0 ?
kghgex+0534 bl kghget_reserved_ext 104D88528 ? 000000001 ?
ent 1105D2DA8 ? 000000168 ?
000000001 ?
kghfnd+0584 bl kghgex 7000001BC470430 ? 0000009A4 ?
000000001 ? 110000FF8 ?
000000001 ? 7000000100EDC68 ?
FFFFFFFFFFF12F0 ?
kghalo+0a24 bl kghfnd 104A25EA0 ? 11022AC50 ? 给出上面SQL的 执行计划:
SELECT dimension_column1 AS labels,CORPORATIONID AS corpId, value as currentValue,(nvl(VALUE,0)-nvl((select value from STATDBA.T_BSRunningTime where CORPORATIONID=T.CORPORATIONID and DIMENSION_COLUMN0 = 3 and DIMENSION_COLUMN1=T.DIMENSION_COLUMN1 and time=T.time-5/24/60 and ROWNUM=1),0)) as Value FROM (SELECT CORPORATIONID,DIMENSION_COLUMN1,VALUE,time, ROW_NUMBER() OVER(PARTITION BY ST.CORPORATIONID, ST.DIMENSION_COLUMN1 ORDER BY ST.TIME DESC) rowno FROM STATDBA.T_BSRunningTime ST inner join DICTDBA.DT_CORPORATION CO on ST.CORPORATIONID=CO.CORP_CODE WHERE CO.corp_code =17 AND ST.DIMENSION_COLUMN1 IN(27,53,51,32,62,3,2,37,38,31,39,21,41,36,4,34,73,35,16,29,63,30,82,43,42,94,90,89,81,71,18) AND ST.DIMENSION_COLUMN0 = 3 AND ST.TIME BETWEEN to_date('2014/08/01 22:20:00','YYYY/MM/DD HH24:MI:SS') AND to_date('2014/08/01 22:25:00','YYYY/MM/DD HH24:MI:SS') ) T WHERE rowno = 1 odm finding:
Database hangs, is not available for users, and possible to crash after a while. This condition has also been observed by pressing Control-C in a SQL*Plus session.
This is an intermittent problem
Alert log shows errors:
ORA-00600: internal error code, arguments: , , [], [], [], [], [], [].
The error can also appear as (or with) one of the following:
ORA-7445 .
ORA-7445
ORA-7445
Trace files show HEAP corruption errors, and show the errors
occurring when pinning an object in memory. Additionally, there is much Parallel Query (PQ) processing
We can see different Call stacks:
kgesic1 kghfrunp kghfnd kghalo kghsupmm kghssgai qesblAlo qesblGetFilter qerblFetch . . .
or
kghalo kspcrec ksucre kxfpProcessJoin
or
kghnerror kghadd_reserved_extent kghget_reserved_ext ent kghgex ent kghfnd kghalo kghgex kghalf kghalp kghssgai kggsmInitCompact 1360 qkssmMigrateExec
If there is a heap dump, then you will typically have memory chunks with the "qesblFilter_seg" tag in the heap dump (This indicates that bloom filter is used).
If there is an explain plan, it will often show the existence of Bloom filters :BF0000
CAUSE
This is the same issue as in
Bug 5914711 DATABASE HANG WITH MANY ORA-600 ERROR
Bug 5914711 is diagnosed as a duplicate of unpublished bug 5736850
Unpublished Bug 5736850 ABSTRACT: MULTI KILL SESSION LEADS TO SGA CORRUPTION
Fixed In Ver: 11.1 and 10.2.0.4
SOLUTION
The workaround is to set _bloom_filter_enabled=FALSE in init.ora.
It can also be disabled dynamically as follows:
SQL> alter system set "_bloom_filter_enabled"=false scope=both;
Please Note:
Although the _bloom_filter_enabled parameter can be set dynamically, doing so will not necessarily stop the errors. This is because it will not affect the existing sessions (e.g. parallel slaves already spawned). Furthermore, because the Bug causes SGA corruptions, we recommend that you set the parameter, then restart the instance. This ensures the SGA corruption is cleared up, and all sessions will have the setting.
It is sometimes possible to tell that bloom filters are being used because of the presence of qerblRop in the stack or because there are memory chunks in the trace with the label 'qesblFilter_seg'.
A bloom filter is a generic(non-Oracle) data structure/algorithm used for looking up elements in a set. You can find detailed information by searching for "bloom filter" in Google or any other search engine.
You can also review the WikiPedia page on Bloom Filters for more information. In Oracle 10.2 onwards, parallel query can use this operation internally if the parameter is true.
Disabling the bloom filter might have a slight negative impact on PQ performance but since this is a new 10gR2 feature, it may not be noticeable.
The fix for the bug will be in 10.2.0.4 and Rel. 11. This fix changes a fixed sga structure and introduces a new wait event so it cannot be backported. 执行计划附上 本帖最后由 lgang403 于 2014-8-6 14:40 编辑
是的,刘大,我也在MOS上看到了ORA-600 , ORA-600 and Cause Database To Crash (文档 ID 468456.1),主要是看到它写到已在10.2.0.4里进行了修复,现在的版本已经是10.2.0.4了,只是没有安装PSU补丁,另外我还参考了SMON Terminates With ORA-7445 (文档 ID 1189894.1),我的计划是将oracle从10.2.0.4.0升级到10.2.0.4.4,并将_bloom_filter_enabled 修改为false。 另外,我看了下22:30左右有page out 现象:
zzz ***Fri Aug 1 22:31:11 BEIST 2014
System configuration: lcpu=16 mem=31871MB
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
3 0 7848160 168348 0 4 0 516 1984 0 2422 142144 17230 41 9 44 6
2 1 7848241 168395 0 0 0 903 7537 0 2450 288726 17032 44 7 42 6
4 0 7848452 168221 0 1 1 1293 11027 0 2460 62909 16144 38 3 48 12
zzz ***Fri Aug 1 22:31:41 BEIST 2014
System configuration: lcpu=16 mem=31871MB
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
24 1 7812825 203744 0 6 0 645 4983 0 3242 373171 16472 76 11 10 3
7 1 7812739 203715 0 3 0 645 5729 0 2147 555612 17753 76 8 13 3
10 3 7812560 204021 0 17 0 1198 26900 0 2437 322212 18043 71 4 20 5
zzz ***Fri Aug 1 22:32:12 BEIST 2014
System configuration: lcpu=16 mem=31871MB
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
7 2 7790350 225879 0 0 0 774 12543 0 2283 381554 20112 53 11 31 5
5 1 7790351 225787 0 9 0 916 7591 0 1868 163619 18383 46 3 42 9
3 2 7790222 225969 0 46 0 901 14729 0 2157 170634 19392 46 4 38 12
zzz ***Fri Aug 1 22:32:43 BEIST 2014
System configuration: lcpu=16 mem=31871MB
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
4 0 7716806 312435 0 15 0 0 0 0 5020 102805 25520 37 9 48 6
2 2 7716862 312716 0 73 0 0 0 0 6640 81756 29590 37 7 50 7
3 2 7716697 313061 0 66 0 0 0 0 6223 71022 28720 35 5 53 7
zzz ***Fri Aug 1 22:33:14 BEIST 2014
System configuration: lcpu=16 mem=31871MB
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
3 1 7706469 308994 0 21 0 0 0 0 3203 105820 25426 37 8 46 10
4 2 7706463 308468 0 58 0 260 3183 0 2577 85534 24818 36 3 50 11
0 0 7706308 308708 0 83 0 774 13268 0 2758 83928 25808 36 4 50 10
zzz ***Fri Aug 1 22:33:44 BEIST 2014
System configuration: lcpu=16 mem=31871MB
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
3 2 7697049 316930 0 62 0 777 8870 0 3252 93905 18852 37 8 45 10
3 2 7697087 316781 0 44 0 387 4692 0 2617 76142 17687 37 4 49 10
7 3 7696903 316919 0 50 0 646 9251 0 3031 90176 19275 37 5 49 10
zzz ***Fri Aug 1 22:34:15 BEIST 2014
页:
[1]