Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

3

积分

0

好友

4

主题
1#
发表于 2014-8-6 10:25:08 | 查看: 15217| 回复: 9
环境:AIX 5.3 ORACLE  RAC 10.2.0.4  实例名分别为imscmdb1/imscmdb2
晚上接到电话,说应用系统异常持续了大概半个小时,时间段是22:30-22:50,让检查数据库情况。查看alert日志,发现imscmdb1有ORA-7445错误和实例重启情况。附上alert和trace文件,各位大侠有时间了帮分析一下。

8月1日 22:26:45-22:45:12,数据库后台日志alert持续报“Fri Aug  1 22:26:45 2014
Errors in file /oracle/product/10.2.0/db_1/admin/imscmdb/udump/imscmdb1_ora_1302918.trc:
ORA-07445: 出现异常错误: 核心转储 [kghlkremf+0044] [SIGSEGV] [Address not mapped to object] [0x1850C8B0004000D] [] []”。
22:45:32-22:50:12 ,持续报“Fri Aug  1 22:45:32 2014
Errors in file /oracle/product/10.2.0/db_1/admin/imscmdb/bdump/imscmdb1_pmon_446570.trc:
ORA-00474: SMON process terminated with error”
22:50:33-22:50:34 报“Fri Aug  1 22:50:33 2014
Errors in file /oracle/product/10.2.0/db_1/admin/imscmdb/bdump/imscmdb1_lms0_344392.trc:
ORA-00472: PMON  process terminated with error”
22:50:37 oracle执行了“Fri Aug  1 22:50:37 2014 Shutting down instance (abort)”
22:54:19 oracle重启完成。
   

0804a.rar

775.89 KB, 下载次数: 1135

2#
发表于 2014-8-6 10:50:35
ORA-07445: ≥ˆœ÷“Ï≥£¥ÌŒÛ: ∫À–ƒ◊™¥¢ [kghlkremf+0044] [SIGSEGV] [Address not mapped to object] [0x1850C8B0004000D] [] []
No current SQL statement being executed.

Errors in file /oracle/product/10.2.0/db_1/admin/imscmdb/udump/imscmdb1_ora_966820.trc:
ORA-00600: ÄÚ²¿´íÎó´úÂë, ²ÎÊý: [729], [800], [space leak], [], [], [], [], []
Sat Aug  2 06:41:46 2014


729+kghlkremf pga=3145728000

上传附近时段的awr 我们能知道这个配置的hi是否合理。

回复 只看该作者 道具 举报

3#
发表于 2014-8-6 11:11:04
6点-7点的awr

awr_20140802_imscmdb1.html

934.98 KB, 下载次数: 732

awr_20140802_imscmdb2.html

375.77 KB, 下载次数: 734

回复 只看该作者 道具 举报

4#
发表于 2014-8-6 11:22:19
服务器的OSWBB监控数据,日志太多,仅上传VM的监控日志

oswvmstat.rar

1.5 MB, 下载次数: 1104

回复 只看该作者 道具 举报

5#
发表于 2014-8-6 13:27:01
SELECT        dimension_column1 AS labels,CORPORATIONID AS corpId, value as currentValue,(nvl(VALUE,0)-nvl((select value from STATDBA.T_BSRunningTime where CORPORATIONID=T.CORPORATIONID and DIMENSION_COLUMN0 = 3 and DIMENSION_COLUMN1=T.DIMENSION_COLUMN1 and time=T.time-5/24/60 and ROWNUM=1),0)) as Value         FROM        (SELECT CORPORATIONID,DIMENSION_COLUMN1,VALUE,time,      ROW_NUMBER() OVER(PARTITION BY ST.CORPORATIONID, ST.DIMENSION_COLUMN1 ORDER BY ST.TIME DESC) rowno                 FROM                        STATDBA.T_BSRunningTime ST inner join DICTDBA.DT_CORPORATION CO on ST.CORPORATIONID=CO.CORP_CODE                WHERE                        CO.corp_code =17                 AND ST.DIMENSION_COLUMN1 IN(27,53,51,32,62,3,2,37,38,31,39,21,41,36,4,34,73,35,16,29,63,30,82,43,42,94,90,89,81,71,18)                AND ST.DIMENSION_COLUMN0 = 3                AND ST.TIME BETWEEN to_date('2014/08/01 22:20:00','YYYY/MM/DD HH24:MI:SS') AND to_date('2014/08/01 22:25:00','YYYY/MM/DD HH24:MI:SS')        ) T WHERE        rowno = 1
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              000000000 ? 104A2CD0D ?
ksedmp+0290          bl       ksedst               104A2C690 ?
ssexhd+03e0          bl       ksedmp               300001D15 ?
000044C0             ?        00000000            
kghget_reserved_ext  bl       kghadd_reserved_ext  FFFFFFFFFFF10C0 ? 00089A260 ?
ent+0214                      ent                  101F9CA14 ? 110440918 ?
                                                   FFFFFFFFFFF10C0 ?
                                                   9001000A00634F0 ?
kghgex+0534          bl       kghget_reserved_ext  104D88528 ? 000000001 ?
                              ent                  1105D2DA8 ? 000000168 ?
                                                   000000001 ?
kghfnd+0584          bl       kghgex               7000001BC470430 ? 0000009A4 ?
                                                   000000001 ? 110000FF8 ?
                                                   000000001 ? 7000000100EDC68 ?
                                                   FFFFFFFFFFF12F0 ?
kghalo+0a24          bl       kghfnd               104A25EA0 ? 11022AC50 ?

回复 只看该作者 道具 举报

6#
发表于 2014-8-6 13:27:12
给出上面SQL的 执行计划:

SELECT        dimension_column1 AS labels,CORPORATIONID AS corpId, value as currentValue,(nvl(VALUE,0)-nvl((select value from STATDBA.T_BSRunningTime where CORPORATIONID=T.CORPORATIONID and DIMENSION_COLUMN0 = 3 and DIMENSION_COLUMN1=T.DIMENSION_COLUMN1 and time=T.time-5/24/60 and ROWNUM=1),0)) as Value         FROM        (SELECT CORPORATIONID,DIMENSION_COLUMN1,VALUE,time,      ROW_NUMBER() OVER(PARTITION BY ST.CORPORATIONID, ST.DIMENSION_COLUMN1 ORDER BY ST.TIME DESC) rowno                 FROM                        STATDBA.T_BSRunningTime ST inner join DICTDBA.DT_CORPORATION CO on ST.CORPORATIONID=CO.CORP_CODE                WHERE                        CO.corp_code =17                 AND ST.DIMENSION_COLUMN1 IN(27,53,51,32,62,3,2,37,38,31,39,21,41,36,4,34,73,35,16,29,63,30,82,43,42,94,90,89,81,71,18)                AND ST.DIMENSION_COLUMN0 = 3                AND ST.TIME BETWEEN to_date('2014/08/01 22:20:00','YYYY/MM/DD HH24:MI:SS') AND to_date('2014/08/01 22:25:00','YYYY/MM/DD HH24:MI:SS')        ) T WHERE        rowno = 1

回复 只看该作者 道具 举报

7#
发表于 2014-8-6 13:27:49
odm finding:


Database hangs, is not available for users, and possible to crash after a while. This condition has also been observed by pressing Control-C in a SQL*Plus session.

This is an intermittent problem


Alert log shows errors:
ORA-00600: internal error code, arguments: [KGHLKREM1], [0x7DA72C420], [], [], [], [], [], [].

The error can also appear as (or with) one of the following:
ORA-7445 [KGHLKREMF()].
ORA-7445 [qerblRop]
ORA-7445[kghsrch]



Trace files show HEAP corruption errors, and show the errors
occurring when pinning an object in memory. Additionally, there is much Parallel Query (PQ) processing


We can see different Call stacks:

kgesic1 kghfrunp kghfnd kghalo kghsupmm kghssgai qesblAlo qesblGetFilter qerblFetch . . .

or

kghalo  kspcrec ksucre kxfpProcessJoin

or

kghnerror kghadd_reserved_extent kghget_reserved_ext ent kghgex ent kghfnd kghalo kghgex kghalf kghalp kghssgai kggsmInitCompact 1360 qkssmMigrateExec


If there is a heap dump, then you will typically have memory chunks with the "qesblFilter_seg" tag in the heap dump (This indicates that bloom filter is used).

If there is an explain plan, it will often show the existence of Bloom filters :BF0000


CAUSE

This is the same issue as in
Bug 5914711 DATABASE HANG WITH MANY ORA-600 [KGHLKREM1] ERROR

Bug 5914711 is diagnosed as a duplicate of  unpublished bug 5736850
Unpublished Bug 5736850   ABSTRACT: MULTI KILL SESSION LEADS TO SGA CORRUPTION

Fixed In Ver: 11.1 and 10.2.0.4



SOLUTION



The workaround  is to set _bloom_filter_enabled=FALSE in init.ora.
It can also be disabled dynamically as follows:

       SQL> alter system set "_bloom_filter_enabled"=false scope=both;
Please Note:
Although the _bloom_filter_enabled parameter can be set dynamically, doing so will not necessarily stop the errors. This is because it will not affect the existing sessions (e.g. parallel slaves already spawned).   Furthermore, because the Bug causes SGA corruptions, we recommend that you set the parameter, then restart the instance. This ensures the SGA corruption is cleared up, and all sessions will have the setting.
It is sometimes possible to tell that bloom filters are being used because of the presence of qerblRop in the stack or because there are memory chunks in the trace with the label 'qesblFilter_seg'.

A bloom filter is a generic(non-Oracle) data structure/algorithm used for looking up elements in a set. You can find detailed information by searching for "bloom filter" in Google or any other search engine.
You can also review the WikiPedia page on Bloom Filters for more information. In Oracle 10.2 onwards, parallel query can use this operation internally if the parameter is true.

Disabling the bloom filter might have a slight negative impact on PQ performance but since this is a new 10gR2 feature, it may not be noticeable.

The fix for the bug will be in 10.2.0.4 and Rel. 11.  This fix changes a fixed sga structure and introduces a new wait event so it cannot be backported.

回复 只看该作者 道具 举报

8#
发表于 2014-8-6 14:10:57
执行计划附上

imscmdb1_explain.txt

3.66 KB, 下载次数: 1088

回复 只看该作者 道具 举报

9#
发表于 2014-8-6 14:15:57
本帖最后由 lgang403 于 2014-8-6 14:40 编辑

是的,刘大,我也在MOS上看到了ORA-600 [17147], ORA-600 [KGHLKREM1] and [kghlkremf()] Cause Database To Crash (文档 ID 468456.1),主要是看到它写到已在10.2.0.4里进行了修复,现在的版本已经是10.2.0.4了,只是没有安装PSU补丁,另外我还参考了SMON Terminates With ORA-7445 [Kghsrch()+128] (文档 ID 1189894.1),我的计划是将oracle从10.2.0.4.0升级到10.2.0.4.4,并将_bloom_filter_enabled 修改为false。

回复 只看该作者 道具 举报

10#
发表于 2014-8-6 14:23:00
另外,我看了下22:30左右有page out 现象:
zzz ***Fri Aug 1 22:31:11 BEIST 2014

System configuration: lcpu=16 mem=31871MB

kthr    memory              page              faults        cpu   
----- ----------- ------------------------ ------------ -----------
r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
3  0 7848160 168348   0   4   0 516 1984   0 2422 142144 17230 41  9 44  6
2  1 7848241 168395   0   0   0 903 7537   0 2450 288726 17032 44  7 42  6
4  0 7848452 168221   0   1   1 1293 11027   0 2460 62909 16144 38  3 48 12
zzz ***Fri Aug 1 22:31:41 BEIST 2014

System configuration: lcpu=16 mem=31871MB

kthr    memory              page              faults        cpu   
----- ----------- ------------------------ ------------ -----------
r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
24  1 7812825 203744   0   6   0 645 4983   0 3242 373171 16472 76 11 10  3
7  1 7812739 203715   0   3   0 645 5729   0 2147 555612 17753 76  8 13  3
10  3 7812560 204021   0  17   0 1198 26900   0 2437 322212 18043 71  4 20  5
zzz ***Fri Aug 1 22:32:12 BEIST 2014

System configuration: lcpu=16 mem=31871MB

kthr    memory              page              faults        cpu   
----- ----------- ------------------------ ------------ -----------
r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
7  2 7790350 225879   0   0   0 774 12543   0 2283 381554 20112 53 11 31  5
5  1 7790351 225787   0   9   0 916 7591   0 1868 163619 18383 46  3 42  9
3  2 7790222 225969   0  46   0 901 14729   0 2157 170634 19392 46  4 38 12
zzz ***Fri Aug 1 22:32:43 BEIST 2014

System configuration: lcpu=16 mem=31871MB

kthr    memory              page              faults        cpu   
----- ----------- ------------------------ ------------ -----------
r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
4  0 7716806 312435   0  15   0   0    0   0 5020 102805 25520 37  9 48  6
2  2 7716862 312716   0  73   0   0    0   0 6640 81756 29590 37  7 50  7
3  2 7716697 313061   0  66   0   0    0   0 6223 71022 28720 35  5 53  7
zzz ***Fri Aug 1 22:33:14 BEIST 2014

System configuration: lcpu=16 mem=31871MB

kthr    memory              page              faults        cpu   
----- ----------- ------------------------ ------------ -----------
r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
3  1 7706469 308994   0  21   0   0    0   0 3203 105820 25426 37  8 46 10
4  2 7706463 308468   0  58   0 260 3183   0 2577 85534 24818 36  3 50 11
0  0 7706308 308708   0  83   0 774 13268   0 2758 83928 25808 36  4 50 10
zzz ***Fri Aug 1 22:33:44 BEIST 2014

System configuration: lcpu=16 mem=31871MB

kthr    memory              page              faults        cpu   
----- ----------- ------------------------ ------------ -----------
r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
3  2 7697049 316930   0  62   0 777 8870   0 3252 93905 18852 37  8 45 10
3  2 7697087 316781   0  44   0 387 4692   0 2617 76142 17687 37  4 49 10
7  3 7696903 316919   0  50   0 646 9251   0 3031 90176 19275 37  5 49 10
zzz ***Fri Aug 1 22:34:15 BEIST 2014

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-12-21 02:06 , Processed in 0.052193 second(s), 23 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569