Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

41

积分

0

好友

8

主题
1#
发表于 2012-4-24 17:45:22 | 查看: 7516| 回复: 4
最近客户的一套运行在hpux 11.23下的oracle9i RAC在实例hddms2上报如下错误:
ORA-07445: exception encountered: core dump [__milli_memcpy()+2401] [SIGSEGV] [Address not mapped to object] [0x9FFFFFFFBF580000] [] []
Fri Apr 13 09:46:22 2012
Errors in file /u01/app/oracle/admin/hddms/udump/hddms2_ora_26010.trc:
ORA-07445: exception encountered: core dump [T_19_72f2_cl___doprnt_main()+35328] [SIGSEGV] [Address not mapped to object] [0x3FFFFFFF7EAD3740] [] []
ORA-07445: exception encountered: core dump [__milli_memcpy()+2401] [SIGSEGV] [Address not mapped to object] [0x9FFFFFFFBF580000] [] []
Fri Apr 13 09:46:23 2012
Errors in file /u01/app/oracle/admin/hddms/udump/hddms2_ora_26010.trc:
ORA-07445: exception encountered: core dump [kghalf()+960] [SIGSEGV] [Invalid permissions for mapped object] [0x000000008] [] []
ORA-07445: exception encountered: core dump [T_19_72f2_cl___doprnt_main()+35328] [SIGSEGV] [Address not mapped to object] [0x3FFFFFFF7EAD3740] [] []
ORA-07445: exception encountered: core dump [__milli_memcpy()+2401] [SIGSEGV] [Address not mapped to object] [0x9FFFFFFFBF580000] [] []
在4月15号出现下面报错,接着这个hddms2实例就停止工作了(通过alert日志文件不再写入判断的)。
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [kjmpmsg_1], [0], [], [], [], [], [], []
Sun Apr 15 08:51:51 2012
Trace dumping is performing id=[cdmp_20120415085151]
Sun Apr 15 08:51:52 2012
附件中是两个节点的alert日志文件及其中可能需要的trc文件.大家帮忙分析一下这个ORA-07445和ORA-00603错误的原因及推荐的解决办法!

oracle数据库日志.rar

261.76 KB, 下载次数: 950

2#
发表于 2012-4-24 22:24:06
ODM DATA:

9.2.0.8 RAC ON HP-UX B.11.23  ia64

引起hddms2 crash的 是lms RAC关键后台进程遇到 ORA-600[kjmpmsg_1]:

Errors in file /u01/app/oracle/admin/hddms/bdump/hddms2_lms1_25126.trc:
ORA-00600: internal error code, arguments: [kjmpmsg_1], [0], [], [], [], [], [], []
Sun Apr 15 08:51:51 2012
Errors in file /u01/app/oracle/admin/hddms/bdump/hddms2_lms1_25126.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [kjmpmsg_1], [0], [], [], [], [], [], []
Sun Apr 15 08:51:51 2012
Trace dumping is performing id=[cdmp_20120415085151]



hddms2_lms1_25126.trc:

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kjmpmsg_1], [0], [], [], [], [], [], []
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedmp()+528         call     ksedst()             000000000 ?
                                                   C000000000000996 ?
                                                   4000000002A53E60 ?
ksfdmp()+64          call     ksedmp()             000000003 ?
kgerinv()+400        call     ksfdmp()             600000000004F280 ?
                                                   000000003 ?
                                                   C000000000000795 ?
                                                   400000000551F5D0 ?
                                                   8000000000018203 ?
                                                   6000000000524900 ?
kgeanmfe()+128       call     kgerinv()            600000000004F280 ?
                                                   60000000005246B8 ?
                                                   60000000000502F8 ?
                                                   600000000001D800 ?
                                                   600000000001D790 ?
kjmpmsg()+1456       call     kgeanmfe()           600000000004F280 ?
                                                   6000000000600558 ?
                                                   40000000009B0A50 ?
                                                   000000001 ? 000000000 ?
                                                   000000000 ?
                                                   60000000005FC2C0 ?
                                                   9FFFFFFFFFFFD5A6 ?
kjmsm()+5584         call     kjmpmsg()            000000000 ?
                                                   C000002DBA002B5E ?
                                                   40000000034156A0 ?
                                                   8000000000010E07 ?
                                                   9FFFFFFFFFFFCF20 ?
                                                   600000000051B228 ?
                                                   6000000000530318 ?
                                                   6000000000531A04 ?
ksbrdp()+3200        call     kjmsm()              000000000 ?
                                                   C000000000001F42 ?
                                                   400000000147DCA0 ?
                                                   000000000 ?
opirip()+1248        call     ksbrdp()             C000000000000DA1 ?
                                                   4000000001438640 ?
                                                   00000C269 ?
                                                   9FFFFFFFFFFFD640 ?
                                                   C0000000A40C3000 ?
                                                   C00000007D6C4948 ?


===================================================
PROCESS STATE
-------------
Process global information:
     process: c00000007d6c4570, call: c00000007d8e2038, xact: 0000000000000000, curses: c00000007d776b48, usrses: c00000007d7760b8
  ----------------------------------------
  SO: c00000007d6c4570, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
  (process) Oracle pid=7, calls cur/top: c00000007d8e2038/c00000007d8e1f78, flag: (6) SYSTEM
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 0 0 38
              last post received-location: KJCS Post snd proxy to flush msg
              last process to post me: c00000007d6c4ad0 1 6
              last post sent: 187948 0 16
              last post sent-location: ksasnd
              last process posted by me: c00000007d6c4ad0 1 6
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: c00000007d76b0c0
    O/S info: user: oracle, term: UNKNOWN, ospid: 25126
    OSD pid info: Unix process pid: 25126, image: oracle@HDDB2 (LMS1)
    ----------------------------------------
    SO: c000000083411fa8, type: 22, owner: c00000007d6c4570, flag: -/-/-/0x00
namespace [KSXP] key   = [ 32 32 30 47 45 53 52 30 30 32 00 ]
    ----------------------------------------
    SO: c00000007d7760b8, type: 4, owner: c00000007d6c4570, flag: INIT/-/-/0x00
    (session) trans: 0000000000000000, creator: c00000007d6c4570, flag: (51) USR/- BSY/-/-/-/-/-
              DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000
              txn branch: 0000000000000000
              oct: 0, prv: 0, sql: 0000000000000000, psql: 0000000000000000, user: 0/SYS
    last wait for 'gcs remote message' blocking sess=0x0 seq=17432 wait_time=24


LMS1 最近在等待 gcs remote message

回复 只看该作者 道具 举报

3#
发表于 2012-4-24 22:50:09
[05]: kjmpbmsg [RAC_MLMDS]
[06]: kjmsm [RAC_MLMDS]

kjm         dlm related functionality ; associated with RAC or parallel server operation

猜测 kjmpbmsg 函数用以接收 gcs remote message

ODM Finding:

Bug 5587421 - LMS fails with OERI [kjmpmsg_1]

Affects:

    Product (Component)        Oracle Server (Rdbms)
    Range of versions believed to be affected         Versions BELOW 11.1
    Versions confirmed as being affected        

        10.2.0.3
        9.2.0.7

    Platforms affected        Generic (all / most platforms affected)

Fixed:

    This issue is fixed in       

        11.1.0.6 (Base Release)
        10.2.0.4 (Server Patch Set)

Symptoms:
       
Related To:

    Instance May Crash
    Internal Error May Occur (ORA-600)
    ORA-600 [kjmpmsg_1]

       

    RAC (Real Application Clusters) / OPS
    _LM_MSG_BATCH_SIZE

Description

    An LMS process may fail with ORA-600 [kjmpmsg_1].

    Workaround:
     Set _lm_msg_batch_size to a value smaller than
      MTU - (Size of IP header) - (Size of UDP header) - (Size of SKGXP header)
      = MTU - 20 - 8 - 40 = MTU - 68
      Please note size of SKGXP header is platform / version dependent.

     MTU may be obtained from the "ifconfig -a" OS command.
     eg: ifconfig -a
          ^
          ...
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          ...

Hdr: 5587421 9.2.0.7 RDBMS 9.2.0.7 RAC PRODID-5 PORTID-23 ORA-600
Abstract: LMS2 PROCESS FAILED WITH ORA-600 [KJMPMSG_1]


PROBLEM:
--------
On 2-node RAC, instance down occurred on node2 because LMS2
process failed with ORA-600 [kjmpmsg_1].

Sat Oct  7 05:09:40 2006
Errors in file
/home/app/oracle/product/9.2.0/rdbms/log/so_dbs02db_lms2_14764.trc:
ORA-600: internal error code, arguments: [kjmpmsg_1], [1], [], [], [], [],
[], []

DIAGNOSTIC ANALYSIS:
--------------------
From trace file, skgxpdocon message seems to be caused by Bug4673610.
I am not sure whether Bug4673610 is related to ORA-600[kjmpmsg_1]
or not.

[so_dbs02db_lms2_14764.trc]

*** 08:22:57.767
*** ID:(9.1) 2006-10-06 08:22:57.697
skgxpdocon: warning outstanding accept handle count has reached new high
water mark 1000
*** 01:02:25.988
skgxpdocon: warning outstanding accept handle count has reached new high
water mark 2000
*** 02:11:16.709
skgxpdocon: warning outstanding accept handle count has reached new high
water mark 3000
*** 05:09:40.783
ksedmp: internal or fatal error
ORA-600: internal error code, arguments: [kjmpmsg_1], [1], [], [], [], [],
[], []

My customer could startup instance on node2 after ORA-600[kjmpmsg_1].

WORKAROUND:
-----------
None.

RELATED BUGS:
-------------
Bug4490547(Base BUG3524566)
BUG3524566 is fixed in PSR9.2.0.7.

REPRODUCIBILITY:
----------------
Once at my customer's site.

TEST CASE:
----------
NA

STACK TRACE:
------------
ksedmp kgerinv kgeanmfe kjmpmsg kjmsm ksbrdp
opirip opidrv sou2o main start


     
MOS bug note 建议设置  _lm_msg_batch_size 到一个 比MTU 小68的值

例如MTU 为 1500, 则设置 _lm_msg_batch_size=1280

_lm_msg_batch_size GES batch message size

我个人建议你 观察一段时间 若频繁发生ORA-600[kjmpmsg_1] 导致instance crash 则考虑 设置该 隐藏参数 _lm_msg_batch_size



之前发生在Fri Apr 13 的 ORA-7445 [__milli_memcpy()+2401] [SIGSEGV]  与 15号的 ORA-600[kjmpmsg_1]  没有直接的联系, 一般不会导致 instance crash这类fatal error。

回复 只看该作者 道具 举报

4#
发表于 2012-4-24 22:57:20
多谢刘老大回复。
我在alert_hddms2.log中注意到,April 14没有产生任何alert信息,这里肯定是有问题的,是不是与April 13产生的那些ora-07445有直接关系呢?
Errors in file /u01/app/oracle/admin/hddms/udump/hddms2_ora_26010.trc:
ORA-07445: exception encountered: core dump [__milli_memcpy()+2401] [SIGSEGV] [Address not mapped to object] [0x9FFFFFFFBF580000] [] []
Fri Apr 13 09:46:22 2012
Errors in file /u01/app/oracle/admin/hddms/udump/hddms2_ora_26010.trc:
ORA-07445: exception encountered: core dump [T_19_72f2_cl___doprnt_main()+35328] [SIGSEGV] [Address not mapped to object] [0x3FFFFFFF7EAD3740] [] []
ORA-07445: exception encountered: core dump [__milli_memcpy()+2401] [SIGSEGV] [Address not mapped to object] [0x9FFFFFFFBF580000] [] []
Fri Apr 13 09:46:23 2012
Errors in file /u01/app/oracle/admin/hddms/udump/hddms2_ora_26010.trc:
ORA-07445: exception encountered: core dump [kghalf()+960] [SIGSEGV] [Invalid permissions for mapped object] [0x000000008] [] []
ORA-07445: exception encountered: core dump [T_19_72f2_cl___doprnt_main()+35328] [SIGSEGV] [Address not mapped to object] [0x3FFFFFFF7EAD3740] [] []
ORA-07445: exception encountered: core dump [__milli_memcpy()+2401] [SIGSEGV] [Address not mapped to object] [0x9FFFFFFFBF580000] [] []
Fri Apr 13 09:55:41 2012
Thread 2 advanced to log sequence 9632
  Current log# 4 seq# 9632 mem# 0: /dev/vg02/rredo02_2.log
Fri Apr 13 10:22:25 2012
Thread 2 advanced to log sequence 9633
  Current log# 3 seq# 9633 mem# 0: /dev/vg03/rredo02_3.log
Sun Apr 15 08:51:50 2012
Errors in file /u01/app/oracle/admin/hddms/bdump/hddms2_lms1_25126.trc:
ORA-00600: internal error code, arguments: [kjmpmsg_1], [0], [], [], [], [], [], []
Sun Apr 15 08:51:51 2012
Errors in file /u01/app/oracle/admin/hddms/bdump/hddms2_lms1_25126.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [kjmpmsg_1], [0], [], [], [], [], [], []
Sun Apr 15 08:51:51 2012

回复 只看该作者 道具 举报

5#
发表于 2012-4-24 23:46:53
就日志来看 2节点 应当没有什么 活动的session, 几个小时乃至几天 切换一次日志很常见



Tue Apr 10 17:05:40 2012
Thread 2 advanced to log sequence 9618
  Current log# 4 seq# 9618 mem# 0: /dev/vg02/rredo02_2.log

Thu Apr 12 02:06:47 2012
Thread 2 advanced to log sequence 9619
  Current log# 3 seq# 9619 mem# 0: /dev/vg03/rredo02_3.log



Thu Apr 12 02:06:47 2012上一次切换日志是在 Tue Apr 10 17:05:40 2012





Thu Apr 12 16:36:34 2012
Thread 2 advanced to log sequence 9627
  Current log# 3 seq# 9627 mem# 0: /dev/vg03/rredo02_3.log

Fri Apr 13 02:08:51 2012
Thread 2 advanced to log sequence 9628
  Current log# 4 seq# 9628 mem# 0: /dev/vg02/rredo02_2.log



值得参考的一点是 4月13日是 周五

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-12-24 09:37 , Processed in 0.051168 second(s), 24 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569