- 最后登录
- 2016-12-12
- 在线时间
- 22 小时
- 威望
- 0
- 金钱
- 76
- 注册时间
- 2012-1-30
- 阅读权限
- 10
- 帖子
- 24
- 精华
- 0
- 积分
- 0
- UID
- 188
|
1#
发表于 2015-5-26 10:31:22
|
查看: 3427 |
回复: 1
版本:11.2.0.3.0 单实例
平台: redhat linux 6.5 X86-64
硬件:内存128G \CPU:4*8core E7- 4820 @ 2.00GHz\存储 EMC vnx
问题描述:5.13日21:11分用户及应用无法连接数据库,21:22分后系统恢复正常。
附件中主要错误日志.txt内容为目前已经发现的主要错误信息:
21:08:02 系统日志出现hald内存分配失败,21:09 oracle进程内存分配失败
21:08:54alert 日志出现Fatal NI connect error 12170.及类似错误,21:11:17开始出现opiodr aborting process unknown ospid (8931) as a result of ORA-609错误, 21:23:01 alert开始正常,期间出现如下错误:
Errors in file /opt/ora11/diag/rdbms/gnntpri/gnnt/trace/gnnt_ora_19004.trc (incident=382849):
ORA-03137: TTC 协议内部错误: [12333] [19] [3] [15] [] [] [] []
Errors in file /opt/ora11/diag/rdbms/gnntpri/gnnt/trace/gnnt_cjq0_18684.trc (incident=377041):
ORA-00445: background process "J000" did not start after 120 seconds
kkjcre1p: unable to spawn jobq slave process
Errors in file /opt/ora11/diag/rdbms/gnntpri/gnnt/trace/gnnt_cjq0_18684.trc:
Wed May 13 21:20:35 2015
gnnt_cjq0_18684.trc出现内存不足情况:
*** 2015-05-13 21:12:37.425
loadavg : 328.48 154.27 69.98
Memory (Avail / Total) = 265.17M / 129034.95M
Swap (Avail / Total) = 27197.34M / 32767.99M
skgpgcmdout: read() for cmd /bin/ps -elf | /bin/egrep 'PID | 9104' | /bin/grep -v grep timed out after 14.975 seconds
*** 2015-05-13 21:13:07.535
loadavg : 640.50 255.66 107.24
Memory (Avail / Total) = 267.29M / 129034.95M
Swap (Avail / Total) = 26915.58M / 32767.99M
*** 2015-05-13 21:13:37.173
loadavg : 633.26 291.48 123.93
Memory (Avail / Total) = 271.08M / 129034.95M
Swap (Avail / Total) = 26561.19M / 32767.99M
(问:这里的已用内存是否是真实的使用内存,而不包括文件系统缓冲的,或是文件系统缓冲已经收缩到了最小?)
21:18:02 listener日志出现
TNS-12518: TNS:listener could not hand off client connection
TNS-12540: TNS:internal limit restriction exceeded
21:19:40 listenre重启
13-MAY-2015 21:19:40 * establish * 1159
TNS-01159: Internal connection limit has been reached; listener has shut down
TNS-12540: TNS:internal limit restriction exceeded
awr中该时段主要等待是Library cache:mutex ,主要等待集中在审计上insert aud$语句,数据库开启log on/off审计
对9:00-9:30的ASH数据进行分别采样,发现21:07前系统比较正常,主要等待事件为logfile sync,21:07-21:11系统负载升高,主要等待事件为latch: row cache objects,21:11-21:20:20之间没有ASH数据,21:20-21:27主要等待事件为library cache:Mutex X
AWR中DB time主要在解析上,解析次数、解析调用的SQL、version count sql都比较正常,
ASH中library cache:Mutex X的等待事件的blocker为DIAG
请大神帮忙进行下分析。
日志.rar
(355.8 KB, 下载次数: 715)
|
|