- 最后登录
- 2023-8-16
- 在线时间
- 1686 小时
- 威望
- 2135
- 金钱
- 50532
- 注册时间
- 2011-10-12
- 阅读权限
- 200
- 帖子
- 5207
- 精华
- 39
- 积分
- 2135
- UID
- 2
|
1#
发表于 2013-10-11 23:59:19
|
查看: 4166 |
回复: 0
PMON ORA-00600: [kfioUnidentify01]错误导致实例重启
分析问题发生时段的ALERT.LOG内容如下:
Fri Nov 9 22:00:11 2012
Errors in file /u01/app/oracle/admin/mac/udump/mac1_ora_1868192.trc:
ORA-00600: internal error code, arguments: [1265], [0x70000048EDA3FD8], [], [], [], [], [], []
Fri Nov 9 22:00:11 2012
Errors in file /u01/app/oracle/admin/mac/udump/mac1_ora_1868192.trc:
ORA-07445: exception encountered: core dump [kghalf+03e4] [SIGSEGV] [Address not mapped to object] [0xFFFFFFFFFFFFFFF0] [] []
ORA-00600: internal error code, arguments: [1265], [0x70000048EDA3FD8], [], [], [], [], [], []
Fri Nov 9 22:00:11 2012
Errors in file /u01/app/oracle/admin/mac/udump/mac1_ora_1868192.trc:
ORA-07445: exception encountered: core dump [kghalf+03e4] [SIGSEGV] [Address not mapped to object] [0xFFFFFFFFFFFFFFF0] [] []
ORA-07445: exception encountered: core dump [kghalf+03e4] [SIGSEGV] [Address not mapped to object] [0xFFFFFFFFFFFFFFF0] [] []
ORA-00600: internal error code, arguments: [1265], [0x70000048EDA3FD8], [], [], [], [], [], []
Fri Nov 9 22:00:11 2012
Errors in file /u01/app/oracle/admin/mac/udump/mac1_ora_1868192.trc:
ORA-07445: exception encountered: core dump [kghalf+03e4] [SIGSEGV] [Address not mapped to object] [0xFFFFFFFFFFFFFFF0] [] []
ORA-07445: exception encountered: core dump [kghalf+03e4] [SIGSEGV] [Address not mapped to object] [0xFFFFFFFFFFFFFFF0] [] []
ORA-07445: exception encountered: core dump [kghalf+03e4] [SIGSEGV] [Address not mapped to object] [0xFFFFFFFFFFFFFFF0] [] []
ORA-00600: internal error code, arguments: [1265], [0x70000048EDA3FD8], [], [], [], [], [], []
Fri Nov 9 22:01:04 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_pmon_639420.trc:
ORA-00600: internal error code, arguments: [kfioUnidentify01], [], [], [], [], [], [], []
Fri Nov 9 22:01:05 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_pmon_639420.trc:
ORA-00600: internal error code, arguments: [kfioUnidentify01], [], [], [], [], [], [], []
Fri Nov 9 22:01:05 2012
PMON: terminating instance due to error 472
Fri Nov 9 22:01:05 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_lms3_4035054.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:05 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_lms0_3285032.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:05 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_lms1_3792918.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:05 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_lms5_3563948.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:05 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_lms2_3919900.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:05 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_lms4_696618.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:05 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_lms6_807050.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:05 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_lmon_3510366.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:05 2012
Trace dumping is performing id=[cdmp_20121109220105]
Fri Nov 9 22:01:05 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_dbw0_3842486.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:05 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_lmd0_2416794.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:05 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_dbw2_1515850.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:06 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_lck0_1597886.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:06 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_lgwr_733640.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:06 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_dbw3_967086.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:06 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_dbw1_1593726.trc:
ORA-00472: PMON process terminated with error
Fri Nov 9 22:01:08 2012
Shutting down instance (abort)
License high water mark = 1486
Fri Nov 9 22:01:12 2012
Instance terminated by PMON, pid = 639420
Fri Nov 9 22:01:13 2012
Instance terminated by USER, pid = 4034688
Fri Nov 9 22:01:27 2012
在22:00:11时SPID=1868192的前台服务进程报ORA-600[1265]、ORA-7445[kghalf+03e4]内部错误,分析该进程的TRACE如下:
/u01/app/oracle/admin/mac/udump/mac1_ora_1868192.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db
System name: AIX
Node name: umacdb1
Release: 3
Version: 5
Machine: 00F6C3914C00
Instance name: mac1
Redo thread mounted by this instance: 1
Oracle process number: 1104
Unix process pid: 1868192, image: oracle@umacdb1
*** ACTION NAME:() 2012-11-09 22:00:11.291
*** MODULE NAME:(oracle@rh8 (TNS V1-V3)) 2012-11-09 22:00:11.291
*** SERVICE NAME:(SYS$USERS) 2012-11-09 22:00:11.291
*** SESSION ID:(1046.45822) 2012-11-09 22:00:11.291
*** 2012-11-09 22:00:11.291
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [1265], [0x70000048EDA3FD8], [], [], [], [], [], []
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c bl ksedst1 000000000 ? 000000001 ?
ksedmp+0290 bl ksedst 104548428 ?
ksfdmp+0018 bl 03F4D668
kgeriv+0108 bl _ptrgl
kgeasi+0118 bl kgeriv 100205BA0 ? 000000000 ?
FFFFFFFFFFF5330 ?
880180438B678958 ?
FFFFFFFFFFF5330 ?
ksuprc+03c8 bl kgeasi 110173800 ? 110410040 ?
4F1000004F1 ? 200000002 ?
100000001 ? 000000002 ?
70000048EDA3FD8 ? 000000000 ?
opiodr+0744 bl ksuprc 000000000 ?
rpidrus+01bc bl opiodr 641045D3A0 ? 300000000 ?
FFFFFFFFFFF77B8 ? 000000000 ?
skgmstack+00c8 bl _ptrgl
rpidru+0088 bl skgmstack 104543058 ? 11043A140 ?
000000000 ? 000000000 ?
000000000 ?
rpiswu2+034c bl _ptrgl
rpidrv+095c bl rpiswu2 70000048EDA3FD8 ?
7000004540D5718 ?
FFFFFFFFFFF7370 ? 1101FBCC8 ?
100205D3C ? 32540FB5A0 ?
70000046D2FC4B8 ? 000000000 ?
rpiope+0060 bl rpidrv 0FFFF7860 ? 64540D5718 ?
FFFFFFFFFFF77B8 ? 000000000 ?
tbsfnl+0100 bl 01FB68C4
ktsxtffs+01a4 bl tbsfnl 000000000 ? 11048B0B0 ?
10487771C ? 000000030 ?
000000FA0 ? 000000000 ?
000000000 ? 11063D338 ?
ktfbfef+0a60 bl ktsxtffs 000000000 ? 000000000 ?
000000000 ? 110645278 ?
FFFFFFFFFFF7A20 ? 000000000 ?
000000000 ? 11063D338 ?
qerfxFetch+0f70 bl 03F4C294
qerjotFetch+0144 bl 03F4C294
qerjotFetch+0180 bl 03F4C294
rwsfcd+0054 bl _ptrgl
qeruaFetch+013c bl 03F4C294
qervwFetch+0088 bl 03F4C294
qerghFetch+0214 bl 03F4C294
qervwFetch+0088 bl 03F4C294
SO: 70000048ecff6e0, type: 4, owner: 70000048fa356c0, flag: INIT/-/-/0x00
(session) sid: 1046 trans: 0, creator: 70000048fa356c0, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-0450-000012D8, short-term DID: 0001-0450-000012DB
txn branch: 0
oct: 3, prv: 0, sql: 700000435854b88, psql: 0, user: 50/MAINTAIN_USER
service name: SYS$USERS
O/S info: user: oracle, term: , ospid: 2777698, machine: rh8
program: oracle@rh8 (TNS V1-V3)
application name: oracle@rh8 (TNS V1-V3), hash value=4217373428
last wait for 'control file sequential read' blocking sess=0x0 seq=2437 wait_time=297 seconds since wait started=0
file#=0, block#=4c8, blocks=1
Dumping Session Wait History
for 'control file sequential read' count=1 wait_time=297
file#=0, block#=4c8, blocks=1
for 'control file sequential read' count=1 wait_time=1660
file#=0, block#=305, blocks=1
for 'control file sequential read' count=1 wait_time=359
file#=0, block#=228, blocks=1
for 'control file sequential read' count=1 wait_time=226
file#=0, block#=12, blocks=1
for 'control file sequential read' count=1 wait_time=192
file#=0, block#=10, blocks=1
for 'control file sequential read' count=1 wait_time=183
file#=0, block#=1, blocks=1
for 'control file sequential read' count=1 wait_time=273
file#=0, block#=4c8, blocks=1
for 'control file sequential read' count=1 wait_time=1679
file#=0, block#=305, blocks=1
for 'control file sequential read' count=1 wait_time=1973
file#=0, block#=228, blocks=1
for 'control file sequential read' count=1 wait_time=182
file#=0, block#=12, blocks=1
temporary object counter: 0
Cursor#4(110431a50) state=FETCH curiob=1104477d0
curflg=4e fl2=0 par=0 ses=70000048ecff6e0
sqltxt(700000435854b88)=
SELECT 'mac1','84.16.145.27','2772770333','umacdb1',"A2"."TABLESPACE_NAME",ROUND("A2"."TOTAL_SIZE",1),
ROUND((1-DECODE(TO_CHAR("A1"."FREE_SIZE"),NULL,0,"A1"."FREE_SIZE")/"A2"."TOTAL_SIZE")*100,1),'2012-11-09 22:00:10' FROM (SELECT "A3"."TABLESPACE_NAME" "TABLESPACE_NAME",SUM("A3"."BYTES")/1024/1024 "TOTAL_SIZE" FROM "DBA_DATA_FILES" "A3" WHERE "A3"."TABLESPACE_NAME" NOT LIKE '%UNDO%' GROUP BY "A3"."TABLESPACE_NAME") "A2", (SELECT "A4"."TABLESPACE_NAME" "TABLESPACE_NAME",SUM("A4"."BYTES")/1024/1024 "FREE_SIZE" FROM "DBA_FREE_SPACE" "A4" GROUP BY "A4"."TABLESPACE_NAME") "A1" WHERE "A2"."TABLESPACE_NAME"="A1"."TABLESPACE_NAME"(+) AND ROUND((1-DECODE(TO_CHAR("A1"."FREE_SIZE"),NULL,0,"A1"."FREE_SIZE")/"A2"."TOTAL_SIZE")*100,1)>75
hash=0a008fd303e45f36d27a45f127618468
parent=700000418c9a108 maxchild=01 plk=70000043d4d2f70 ppn=n
SO: 70000048c2a1220, type: 5, owner: 70000048ecff6e0, flag: INIT/-/-/0x00
(enqueue) DX-00000012-00000000 DID: 0001-0450-000012E1
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0x70000048b2ef6c8, mode: S, lock_flag: 0x0
own: 0x70000048ecff6e0, sess: 0x70000048ecff6e0, proc: 0x70000048fa356c0, prv: 0x70000048b2ef6d8
该SPID=1868192的前台服务进程对应SESSION ID为1046,问题发生时期正在执行由远程节点rh8发起的远程分布式查询SQL语句如上,并且产生了分布式事务 Distributed transaction。
在版本10.2中分布式事务可能偶发性地触发BUG 5655419 <Distributed transaction hits ORA-600:[1265] or ORA-600:[k2gget:downgrade] in 10.2>,症状为分布式事务引起ORA-600[1265]或ORA-600:[k2gget:downgrade]错误。
在触发ORA-600[1265]错误后该进程尝试DUMP相关的进程信息,包括进程当前PINNED BUFFER和这些BUFFER相关的REDO,在DUMP REDO时由于使用了非法地址0xfffffffffffffff0而触发ORA-7445[kghalf+03e4] [SIGSEGV]错误:
*** timestamp before redo dump: 11/09/2012 22:00:11
***********************************************
* Dump Online Redo for Buffers in Pin History *
***********************************************
$$$$$$$ Dump Online Redo for DBA list (tsn.rdba in hex):
0x0.0040eb15 0x0.0040eb16 0x0.0040eb17 0x0.0040eb18 0x0.00400039 0x0.0040003a:
Dumping current redo log in thread 1
Exception signal: 11 (SIGSEGV), code: 50 (Address not mapped to object), addr: 0xfffffffffffffff0, PC: [0x100096e84, kghalf+03e4]
Registers:
iar: 0000000100096e84, msr: a00000000000d0b2
lr: 00000001043e3b7c, cr: 0000000082220828
r00: 0000000000000001, r01: 0fffffffffff12f0, r02: 00000001101fbcc8,
r03: 0000000000000058, r04: 0000000000000068, r05: 0000000000000000,
r06: 0000000000000060, r07: 0000000000000000, r08: 00000001050d7a50,
r09: 0000000000000001, r10: 0000000000000040, r11: 0000000000000000,
r12: 0000000044422848, r13: 00000001102372b8, r14: 0000000000000070,
r15: 0000000000000000, r16: 000000007ffffff8, r17: 0000000104543074,
r18: 0000000104543058, r19: 0000000000000000, r20: 0000000000000058,
r21: 0000000000000058, r22: fffffffffffffff0, r23: 0000000000000001,
r24: 0000000000000058, r25: 0000000000000001, r26: 0000000000000000,
r27: 0000000000000000, r28: 00000001050d7a50, r29: 0000000000000000,
r30: 0000000000000000, r31: 0000000110173800,
*** 2012-11-09 22:00:11.482
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [kghalf+03e4] [SIGSEGV] [Address not mapped to object] [0xFFFFFFFFFFFFFFF0] [] []
ORA-00600: internal error code, arguments: [1265], [0x70000048EDA3FD8], [], [], [], [], [], []
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c bl ksedst1 000000000 ? 104548AA5 ?
ksedmp+0290 bl ksedst 104548428 ?
ssexhd+03e0 bl ksedmp 300001D15 ?
000044F0 ? 00000000
kfupsIterInit+0038 bl 03F4C4D4
kfdIterInit+0054 bl 01FB59C0
kfgpnSetDisks+018c bl kfdIterInit FFFFFFFFFFF1610 ?
2442284000000001 ?
kfgscAcquireLcl+057 bl kfgpnSetDisks 700000470D1FF48 ?
c
kfgUseGrp+0248 bl kfgscAcquireLcl 700000470D1FF48 ?
FFFFFFFFFFF1E88 ?
100000102264A94 ? 5101FBCC8 ?
147AE140010D72C ? 000000000 ?
kfgUse+0124 bl kfgUseGrp 000000000 ? 000000000 ?
000000000 ? 000000000 ?
kfioIdentify+0344 bl kfgUse FFFFFFFFFFF2020 ?
FFFFFFFFFFF2000 ? 0000E61E4 ?
SO: 70000046d882950, type: 84, owner: 70000048ecff6e0, flag: INIT/-/-/0x00
(kfgso) flags: 00000000 clt: 1 err: 0 hint: 70d1ff48
(kfgpn) rpi: 393 itrn:0 gst:0 usrp:0
busy: 0 rep: 0 grp: 0 check: 0/0 glink: 6d8829d0 6d8829d0
----------------------------------------
SO: 700000470d1ff48, type: 85, owner: 70000046d882950, flag: INIT/-/-/0x00
(kfgsc) pwr: 0 flag: 0 op: 0 recur: 0
(kfgpn) rpi: 1245589 itrn:0 gst:0 usrp:0
busy: 0 rep: 0 grp: 6d883a50 check: 2/3652089895 glink: 6d883a98 70d200f0
kfgrp: DG_LOG number: 2/3652089895 type: 2 compat: 0.0.0.0.0 dbcompat:0.0.0.0.0
timestamp: 0 state: 7 flags: 142 gpnlist: 70d200f0 70d1ffc8
KFGPN at 70d20070 in dependent chain
KFGPN at 70d1ff48 in dependent chain
该进程在DUMP ONLINE REDOLOGFILE前调用了kfioIdentify()函数,但该进程没有成功DUMP ONLINE REDO就异常终止了。
以上SPID=1868192的前台服务进程由分布式事务触发ORA-600[1265]、ORA-7445[kghalf]的症状与Metalink Note< Bug 8587377 : ORA-00600 [K2GGET: DOWNGRADE] AND [1265] ORA-07445 [KGHALF()+58]IN ALERT.LOG>较为吻合。ORA-07445 [KGHALF()+58]是由于分布式事务触发BUG,进程对ONLINE REDO DUMP而触发的产物。而直接导致分布式事务触发ORA-600[1265]的是10.2上的BUG 5655419 <Distributed transaction hits ORA-600:[1265] or ORA-600:[k2gget:downgrade] in 10.2>。
单个进程的ORA-600:[1265]、ORA-07445 [KGHALF()+58]并不足以导致实例crash。导致Instance Crash的最直接原因是PMON进程因ORA-600[kfioUnidentify01]而异常终止。
PMON后台进程负责轮训服务进程,当发现服务进程异常终止时,PMON负责回收latch、enqueue等SO(State Object)对象。
PMON通过kfioUnidentify01回收dead process的<KFGPN>、<KFIOFIB>等内部资源时触发了ORA-00600: [kfioUnidentify01]。
SO: 70000046d882950, type: 84, owner: 70000048ecff6e0, flag: INIT/-/-/0x00
(kfgso) flags: 00000000 clt: 1 err: 0 hint: 70d1ff48
(kfgpn) rpi: 393 itrn:0 gst:0 usrp:0
busy: 0 rep: 0 grp: 0 check: 0/0 glink: 6d8829d0 6d8829d0
----------------------------------------
SO: 700000470d1ff48, type: 85, owner: 70000046d882950, flag: INIT/-/-/0x00
(kfgsc) pwr: 0 flag: 0 op: 0 recur: 0
(kfgpn) rpi: 1245589 itrn:0 gst:0 usrp:0
busy: 0 rep: 0 grp: 6d883a50 check: 2/3652089895 glink: 6d883a98 70d200f0
kfgrp: DG_LOG number: 2/3652089895 type: 2 compat: 0.0.0.0.0 dbcompat:0.0.0.0.0
timestamp: 0 state: 7 flags: 142 gpnlist: 70d200f0 70d1ffc8
KFGPN at 70d20070 in dependent chain
KFGPN at 70d1ff48 in dependent chain
Fri Nov 9 22:01:04 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_pmon_639420.trc:
ORA-00600: internal error code, arguments: [kfioUnidentify01], [], [], [], [], [], [], []
Fri Nov 9 22:01:05 2012
Errors in file /u01/app/oracle/admin/mac/bdump/mac1_pmon_639420.trc:
ORA-00600: internal error code, arguments: [kfioUnidentify01], [], [], [], [], [], [], []
关于PMON通过kfioUnidentify01的一些信息如下:
The bug causes [kfioUnidentify01] in PMON if a FG doing kfioUnidentify() is
killed after removing the dependent link from the <kfgsc> SO.
kfioUnidentify() does this before calling kfgUnuse(). Because the dlink is
removed, KFG can clean the SO, so if PMON must clean up the <kfiofib> SO
(which is not actually dependent any more), kfioUnidentify() is called again
and the <kfgsc> has been freed
@The bug can happen if a FG executing kfioUnidentify() is killed after
@removing the <kfiofib> from the <kfgpn> dependent link, but before updating
@<gpn_kfiofib>. Because the <kfiofib> is removed, PMON can delete <kfgpn>
@without deleting the now independent <kfiofib>, which in turn leads to
@[kfiofib01] when PMON gets to delete the state object encapsulating the
@<kfiofib>.
但是Note<Bug 4708822 - OERI[kfiounidentify01] from PMON [ID 4708822.8]>介绍" PMON may fail with ORA-600 [kfioUnidentify01]."的BUG已经在版本10.2.0.4中修复了。
2.2 建议
考虑安装《Patch 5655419: AQ: ORA-00600[K2SCHKSESS_1] AND [K2SCHKSESS_2] DURING AW2 TEST》,该ONE-OFF PATCH目前有AIX+10.2.0.4上的版本。通过安装补丁防止ORA-600[1265]错误发生,从而避免由PMON回收SO引发的实例异常终止。
参考文档:
Bug 5655419 - Distributed transaction hits ORA-600:[1265] or ORA-600:[k2gget: downgrade] in 10.2 [ID 5655419.8]
BUG 4708822 - PMON SIGNALS [KFIOUNIDENTIFY01]
ORA-600 [1265] (Doc ID 138374.1)
|
|