Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

158

积分

1

好友

8

主题
1#
发表于 2012-4-12 11:03:32 | 查看: 7842| 回复: 7
环境:9i RAC
Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.7.0 - Production
ORACLE_HOME = /u01/app/oracle/product/9.2.0
System name:    AIX
Node name:      kmolcom02
Release:        3
Version:        5
Machine:        000C1042D600
Instance name: kmsop2
描述:今早来查看alert日志,发现一号节点出现很多cdump
节点一:
Thu Apr 12 05:25:53 2012
Trace dumping is performing id=[cdmp_20120412052553]
Thu Apr 12 05:26:55 2012
Trace dumping is performing id=[cdmp_20120412052654]
Thu Apr 12 05:27:55 2012
Trace dumping is performing id=[cdmp_20120412052755]
Thu Apr 12 05:28:39 2012
Trace dumping is performing id=[cdmp_20120412052839]
Thu Apr 12 05:29:56 2012
Trace dumping is performing id=[cdmp_20120412052956]
Thu Apr 12 05:30:39 2012
Trace dumping is performing id=[cdmp_20120412053039]
Thu Apr 12 05:31:58 2012
Trace dumping is performing id=[cdmp_20120412053158]
Thu Apr 12 05:33:58 2012
Trace dumping is performing id=[cdmp_20120412053358]
Thu Apr 12 05:35:58 2012
Trace dumping is performing id=[cdmp_20120412053558]
Thu Apr 12 05:37:58 2012
Trace dumping is performing id=[cdmp_20120412053758]
Thu Apr 12 05:38:41 2012
Trace dumping is performing id=[cdmp_20120412053841]
Thu Apr 12 05:40:20 2012
Trace dumping is performing id=[cdmp_20120412054020]
Thu Apr 12 05:41:58 2012
Trace dumping is performing id=[cdmp_20120412054158]
Thu Apr 12 05:43:01 2012
Trace dumping is performing id=[cdmp_20120412054301]
Thu Apr 12 05:43:59 2012
Trace dumping is performing id=[cdmp_20120412054358]
Thu Apr 12 09:08:58 2012

--5点25分开始,5点43分结束,所幸,没有发生宕机。

查看节点2:
Thu Apr 12 05:25:52 2012
Errors in file /u01/app/oracle/admin/sopdb/udump/kmsop2_ora_876696.trc:
ORA-27506: IPC error connecting to a port
ORA-27300: OS system dependent operation:connect failed with status: 4
ORA-27301: OS failure message: Interrupted system call
ORA-27302: failure occurred at: skgxpdoaconr
ORA-27303: additional information: remote process is out of memory
Thu Apr 12 05:25:52 2012
Errors in file /u01/app/oracle/admin/sopdb/udump/kmsop2_ora_876696.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
ORA-27300: OS system dependent operation:connect failed with status: 4
ORA-27301: OS failure message: Interrupted system call
ORA-27302: failure occurred at: skgxpdoaconr
ORA-27303: additional information: remote process is out of memory
Thu Apr 12 05:25:53 2012
Trace dumping is performing id=[cdmp_20120412052553]
。。。
以下一直重复和上面一样的错误~~



---查看第一个报错信息:
Thu Apr 12 05:25:52 2012
Errors in file /u01/app/oracle/admin/sopdb/udump/kmsop2_ora_876696.trc:
ORA-27506: IPC error connecting to a port
ORA-27300: OS system dependent operation:connect failed with status: 4
ORA-27301: OS failure message: Interrupted system call
ORA-27302: failure occurred at: skgxpdoaconr
ORA-27303: additional information: remote process is out of memory
Thu Apr 12 05:25:52 2012
Errors in file /u01/app/oracle/admin/sopdb/udump/kmsop2_ora_876696.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
ORA-27300: OS system dependent operation:connect failed with status: 4
ORA-27301: OS failure message: Interrupted system call
ORA-27302: failure occurred at: skgxpdoaconr
ORA-27303: additional information: remote process is out of memory
Thu Apr 12 05:25:53 2012
Trace dumping is performing id=[cdmp_20120412052553]---和节点一的cdump一样的时间点,一样的序号!

-------/u01/app/oracle/admin/sopdb/udump/kmsop2_ora_876696.trc
kmolcom02:/home/oracle#more /u01/app/oracle/admin/sopdb/udump/kmsop2_ora_876696.trc
/u01/app/oracle/admin/sopdb/udump/kmsop2_ora_876696.trc
Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.7.0 - Production
ORACLE_HOME = /u01/app/oracle/product/9.2.0
System name:    AIX
Node name:      kmolcom02
Release:        3
Version:        5
Machine:        000C1042D600
Instance name: kmsop2
Redo thread mounted by this instance: 2
Oracle process number: 192
Unix process pid: 876696, image: oracle@kmolcom02 (TNS V1-V3)

*** 2012-04-10 18:07:07.925
*** SESSION ID:(141.4742) 2012-04-10 18:07:07.903
skgxpdocon: warning outstanding accept handle count has reached new high water mark 1000
*** 2012-04-10 21:21:53.288
skgxpdocon: warning outstanding accept handle count has reached new high water mark 2000
*** 2012-04-11 00:36:35.674
skgxpdocon: warning outstanding accept handle count has reached new high water mark 3000
*** 2012-04-11 03:51:39.216
skgxpdocon: warning outstanding accept handle count has reached new high water mark 4000
*** 2012-04-11 07:07:28.269
skgxpdocon: warning outstanding accept handle count has reached new high water mark 5000
*** 2012-04-11 10:22:30.875
skgxpdocon: warning outstanding accept handle count has reached new high water mark 6000
*** 2012-04-11 13:37:37.553
skgxpdocon: warning outstanding accept handle count has reached new high water mark 7000
*** 2012-04-11 16:52:21.419
skgxpdocon: warning outstanding accept handle count has reached new high water mark 8000
*** 2012-04-11 20:07:40.614
skgxpdocon: warning outstanding accept handle count has reached new high water mark 9000
*** 2012-04-11 23:23:34.957
skgxpdocon: warning outstanding accept handle count has reached new high water mark 10000
*** 2012-04-12 02:40:37.181
skgxpdocon: warning outstanding accept handle count has reached new high water mark 11000
ORA-27506: IPC error connecting to a port
ORA-27300: OS system dependent operation:connect failed with status: 4
ORA-27301: OS failure message: Interrupted system call
ORA-27302: failure occurred at: skgxpdoaconr
ORA-27303: additional information: remote process is out of memory
*** 2012-04-12 05:25:52.239
SKGXPCNH: 0x1035e8b0 SKGXPCON_CONN_SENT (3) sconno 459243771 accono 0 admno 1616406595
        Remote admin port
SSKGXPT 0x1035e8d4 flags SSKGXPT_WRITE active network 0
info for network 0
        socket no 8     IP 66.66.66.1   UDP 61718
        HACMP network_id 0      sflags SSKGXPT_UP
        Remote data port
SSKGXPT 0x1035e970 flags  active network 0
info for network 0
        socket no 0     IP 0.0.0.0      UDP 0
        HACMP network_id 0      sflags
        ERROR connect requestion should be on done q
         next seqno 32763 credits 8 ertt 64 resends on con 0
*** 2012-04-12 05:25:52.239
---trace文件里面找到相关sql
*** 2012-04-12 05:25:52.275
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [] [] [] [] [] []
ORA-27300: OS system dependent operation:connect failed with status: 4
ORA-27301: OS failure message: Interrupted system call
ORA-27302: failure occurred at: skgxpdoaconr
ORA-27303: additional information: remote process is out of memory
Current SQL statement for this session:
select 'RACMembership :' || inst_name ||
':' || inst_number dbMembers
from sys.v_$active_instances a,gv$instance b
where a.inst_number=b.instance_number  
----- Call Stack Trace -----

返回:查看1节点有trace文件,在dump目录下面cdump_***目录
主要看看这些trace文件内容
文件比较多,差不多查看了一遍。里面的内容都大部分相似:
2551691C:009A4DFE   250     0 10429   6 Free  MB: buf 7000000c9c66ab0 (SO queue 0) pool 70000000000e6b0, size 488
2551691E:009A4DFF   250     0 10429  10 IPC SODC: Exiting cleanup for 7000000c8883e48, rc: 1
25516920:009A4E00   250     0 10427   8 Disconn  : Disconnect from inst 1, receiver 0
25516921:009A4E01   250     0 10401   2 KSXPMSGCNCL: client 2 tid inst 2 ptid 1 flags 0x500
25516924:009A4E02   250     0 10401  46 KSXPMSGCNCL: could not map tid(2, 1, 0xfe0e1cda) to cnh
25516925:009A4E03   250     0 10427   8 Disconn  : Disconnect from inst 1, receiver 1
25516926:009A4E04   250     0 10401   2 KSXPMSGCNCL: client 2 tid inst 2 ptid 2 flags 0x500
2551692A:009A4E05   250     0 10427   8 Disconn  : Disconnect from inst 1, receiver 2
2551692B:009A4E06   250     0 10401   2 KSXPMSGCNCL: client 2 tid inst 2 ptid 3 flags 0x500
C864682E:009A9BBD   250     0 10280   1 0x00000000000000FA
C86468D9:009A9BBE   250     0 10401  29 KSXPUNMAP: client 1
C86468DB:009A9BBF   250     0 10401  28 KSXPMAP: client 1 base 0x7000000000b7000 size 0xf4f49000
C86A729A:009A9BC7   250   357 10429   7 MB SO Al: Allocated MBSO 7000000c8873f48
C86A729F:009A9BC8   250   357 10427   7 Connect  : Connect to inst 1, receiver 0
C86A72A0:009A9BC9   250   357 10427   7 Connect  : Connect to inst 1, receiver 1
C86A72A2:009A9BCA   250   357 10427   7 Connect  : Connect to inst 1, receiver 2

------------------------------------------------------------------------------------------------------------------------
网络资源:
http://www.itpub.net/thread-918844-1-1.html
http://www.itpub.net/thread-766945-1-1.html
2#
发表于 2012-4-12 12:52:15
上传 完整的 /u01/app/oracle/admin/sopdb/udump/kmsop2_ora_876696.trc和  alert.log

请压缩打包后上传

回复 只看该作者 道具 举报

3#
发表于 2012-4-12 13:06:35
Errors in file /u01/app/oracle/admin/sopdb/udump/kmsop2_ora_876696.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
ORA-27300: OS system dependent operation:connect failed with status: 4
ORA-27301: OS failure message: Interrupted system call
ORA-27302: failure occurred at: skgxpdoaconr
ORA-27303: additional information: remote process is out of memory


==>  remote process is out of memory 远程节点上的进程 存在 memory leak , 可能是 LMD 、 LMS这类RAC关键的后台进程  或者一些RAC辅助进程



Instance name: kmsop2

==》建议在kmsop1 实例所在主机上 检查 Oracle进程的内存使用状况:


同时贴出以下命令的输出(注意是在 RAC的另一台主机上 ):

ps aux | head -10

svmon -G
svmon -U oracle
ps vg|grep ora|awk ‘{sum=sum+$7-$10} END {print sum/1024}’

回复 只看该作者 道具 举报

4#
发表于 2012-4-12 13:07:16
ODM FINDING:

MEMORY LEAK IN LMS

Hdr: 5209163 9.2.0.7 RDBMS 9.2.0.7 RAC PRODID-5 PORTID-226 4673610
Abstract: MEMORY LEAK IN LMS



PROBLEM:
--------
While trying to startup getting:

Thu Apr 27 10:33:35 2006
Errors in file /u01/app/oracle/admin/BOELV/bdump/boelv1_lmon_20455.trc:
ORA-7445: exception encountered: core dump [__kill()+17] [SIGIOT] [unknown
code] [0x4FE7] [] []
ORA-27300: OS system dependent operation:connect failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgxpdoaconr
ORA-27303: additional information: remote process is out of memory

DIAGNOSTIC ANALYSIS:
--------------------
On the other node, checked:

ps aux | grep lms,  saw that RSS of LMS is 2.5 GB, which indicates a  memory
leak within LMS.

At startup time LMS's   RSS is 60 MB,   after 7 days of running:

ps aux | head -1
USER       PID %CPU %MEM   VSZ  RSS   TTY      STAT START   TIME COMMAND

:>ps aux | grep lms

oracle   23444  0.8 23.1 2759916 1869904 ?            S    Apr27  89:46      
  
    ora_lms0_BOELV2

oracle   23448  0.9 23.2 2761112 1871232 ?            S    Apr27   96:31


RSS is at about 1.9 GB, in a few days likely to reach the maximum
and refuse new connections.

WORKAROUND:
-----------
none, killing instances is obviously unacceptable.

RELATED BUGS:
-------------
4945886.

回复 只看该作者 道具 举报

5#
发表于 2012-4-12 13:08:17
也可以参考拙作 理解Oracle在AIX平台上的内存使用  http://www.oracledatabase12g.com ... n-aix-platform.html

回复 只看该作者 道具 举报

6#
发表于 2012-4-12 15:32:08
节点一案发时刻的alertWed Apr 11 23:19:27 2012
Thread 1 advanced to log sequence 5078
  Current log# 5 seq# 5078 mem# 0: /dev/rks_log13_1g
Thu Apr 12 00:10:27 2012
Thread 1 advanced to log sequence 5079
  Current log# 2 seq# 5079 mem# 0: /dev/rks_log12_1g
Thu Apr 12 05:25:53 2012
Trace dumping is performing id=[cdmp_20120412052553]
Thu Apr 12 05:26:55 2012
Trace dumping is performing id=[cdmp_20120412052654]
Thu Apr 12 05:27:55 2012
Trace dumping is performing id=[cdmp_20120412052755]
Thu Apr 12 05:28:39 2012
Trace dumping is performing id=[cdmp_20120412052839]
Thu Apr 12 05:29:56 2012
Trace dumping is performing id=[cdmp_20120412052956]
Thu Apr 12 05:30:39 2012
Trace dumping is performing id=[cdmp_20120412053039]
Thu Apr 12 05:31:58 2012
Trace dumping is performing id=[cdmp_20120412053158]
Thu Apr 12 05:33:58 2012
Trace dumping is performing id=[cdmp_20120412053358]
Thu Apr 12 05:35:58 2012
Trace dumping is performing id=[cdmp_20120412053558]
Thu Apr 12 05:37:58 2012
Trace dumping is performing id=[cdmp_20120412053758]
Thu Apr 12 05:38:41 2012
Trace dumping is performing id=[cdmp_20120412053841]
Thu Apr 12 05:40:20 2012
Trace dumping is performing id=[cdmp_20120412054020]
Thu Apr 12 05:41:58 2012
Trace dumping is performing id=[cdmp_20120412054158]
Thu Apr 12 05:43:01 2012
Trace dumping is performing id=[cdmp_20120412054301]
Thu Apr 12 05:43:59 2012
Trace dumping is performing id=[cdmp_20120412054358]
Thu Apr 12 09:08:58 2012
Thread 1 advanced to log sequence 5080
  Current log# 1 seq# 5080 mem# 0: /dev/rks_log11_1g
Thu Apr 12 13:41:13 2012
Thread 1 advanced to log sequence 5081
  Current log# 5 seq# 5081 mem# 0: /dev/rks_log13_1g


节点2的alert.txt (16.94 KB, 下载次数: 1173) --节点2案发时刻的alert;

node1.jpg --节点1的cpu和I/O情况

node2.jpg --节点2的cpu和I/O情况
node2内存.jpg --节点2的内存使用情况
---------------------------------------------------------------------------------------------------------------------------------------------------
节点1的情况:
kmolcom01:/home/oracle/tbs_monitor#ps aux | head -10
USER         PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
root       57372  5.0  0.0  384  320      - A      Mar 22 224946:22 wait
root       73764  4.7  0.0  384  320      - A      Mar 22 210808:01 wait
root       61470  4.7  0.0  384  320      - A      Mar 22 210320:47 wait
root       65568  4.5  0.0  384  320      - A      Mar 22 199612:04 wait
root       77862  4.2  0.0  384  320      - A      Mar 22 188862:32 wait
root        8196  4.2  0.0  384  320      - A      Mar 22 187842:25 wait
root       69666  4.2  0.0  384  320      - A      Mar 22 187573:26 wait
root       53274  3.9  0.0  384  320      - A      Mar 22 173727:28 wait
oracle   2093182  0.9  0.0 55228 29524      - A      Apr 09 334:11 oraclekmsop1 (L

kmolcom01:/home/oracle/tbs_monitor#svmon -G ---询问:这个命令执行失败?请问,我要做什么配置啊?估计哪里没有弄对。
ksh: svmon: 0403-006 Execute permission denied.
kmolcom01:/home/oracle/tbs_monitor#svmon -U oracle
ksh: svmon: 0403-006 Execute permission denied.
kmolcom01:/home/oracle/tbs_monitor#ps vg|grep ora|awk ‘{sum=sum+$7-$10} END {print sum/1024}’
Syntax Error The source line is 1.
The error context is
                 >>> ?<<<
awk: 0602-500 Quitting The source line is 1.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
节点2情况:
kmolcom02:/home/oracle/tbs_monitor# ps aux | head -10
USER         PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
root       73764  5.7  0.0  384  384      - A      Mar 07 23620:22 wait
root       57372  5.4  0.0  384  384      - A      Mar 07 22235:07 wait
root       65568  5.3  0.0  384  384      - A      Mar 07 21823:43 wait
root        8196  5.3  0.0  384  384      - A      Mar 07 21767:38 wait
root       77862  5.1  0.0  384  384      - A      Mar 07 20976:17 wait
root       61470  5.0  0.0  384  384      - A      Mar 07 20723:24 wait
root       69666  5.0  0.0  384  384      - A      Mar 07 20523:53 wait
root       53274  4.9  0.0  384  384      - A      Mar 07 20121:16 wait
oracle   1319038  1.2  0.0 55988 29420      - A      Mar 12 4142:26 oraclekmsop2 (L

kmolcom02:/home/oracle/tbs_monitor#ps vg|grep ora|awk ‘{sum=sum+$7-$10} END {print sum/1024}’
Syntax Error The source line is 1.
The error context is
                 >>> ?<<<
awk: 0602-500 Quitting The source line is 1.


----你给我的命令我照着贴了,可惜~~好像不对~~呜呜~~~~~~~(>_<)~~~~

回复 只看该作者 道具 举报

7#
发表于 2012-4-12 15:39:37
svmon 在AIX上默认 普通用户权限无法执行

就ps aux | head -10 的输出看 oracle进程的RSS都不大 ,建议你关注节点1 上LMS、LMD等RAC fatal process的内存使用情况

回复 只看该作者 道具 举报

8#
发表于 2012-4-13 09:25:17
好的,谢谢啊!这个事故实在是不典型,一下就不见了,不好追踪,也不知道下次什么时候再现,我会持续关注+持续学习,有问题再来询问你啊,谢谢刘大!

[ 本帖最后由 ricky 于 2012-4-13 09:28 编辑 ]

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-12-24 02:55 , Processed in 0.053255 second(s), 24 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569