clevernby 发表于 2014-2-25 15:27:14

11.2.0.2.0 RAC数据库onnn进程高CPU消耗诊断一例

本帖最后由 clevernby 于 2014-2-25 15:45 编辑

Database 11.2.0.2.0 Linux x86-64
Grid 11.2.0.2.0 Linux x86-64
opatch lsinv 12378147
2节点RAC

同事反映测试用数据库响应速度变慢,登录操作系统执行top命令发现实例1的多个onnn进程CPU占用率100%,如下
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
5916 oracle    25   0 41.2g  20m  17m R 100.3  0.0  30833:01 ora_o002_xxxx1
16727 oracle    25   0 41.2g  19m  16m R 100.3  0.0  50928:39 ora_o001_xxxx1
......
onnn系ASM连接池进程(ASM Connection Pool Process),维护ASM实例连接,这些连接被用于执行ASM元数据相关操作(Maintains a connection to the ASM instance for metadata operations)。

1. Oracle Call Stack分析
每隔5秒采样一次o002进程的call stack,结果如下:
(1)
ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1900<-sspuser()+112<-__sighandler()<-FFFFFFFFFF60008E<-dbgtTrc_int()+788<-ksvreceive()+3059<-kfncPoolCb()+2323<-ksvrdp()+1779<-opirip()+674<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
(2)
ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1900<-sspuser()+112<-__sighandler()<-dbgtTrc_int()+202<-ksvreceive()+3059<-kfncPoolCb()+2323<-ksvrdp()+1779<-opirip()+674<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
(3)
ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1900<-sspuser()+112<-__sighandler()<-dbgtTrc_int()+75<-ksvreceive()+3059<-kfncPoolCb()+2323<-ksvrdp()+1779<-opirip()+674<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
值得注意的call stack信息为ksvrdp->kfncPoolCb->ksvreceive。此处存在明细spin迹象

2. level 10 processstate分析
经分析current wait stack为class slave wait,该事件系idle等待事件
0: waiting for 'class slave wait'
    slave id=0x0, =0x0, =0x0
    wait_id=2 seq_num=3 snap_id=1
    wait times: snap=388840 min 29 sec, exc=388840 min 29 sec, total=388840 min 29 sec
    wait times: max=infinite, heur=388840 min 29 sec
    wait counts: calls=14855 os=14855
    in_wait=1 iflags=0x15a0
3. OS call分析
使用ltrace attach ospid 5916,结果如下
gettimeofday(0x7fff1ce308c0, NULL) = 0
gettimeofday(0x7fff1ce308c0, NULL) = 0
gettimeofday(0x7fff1ce308c0, NULL) = 0
gettimeofday(0x7fff1ce308c0, NULL) = 0
gettimeofday(0x7fff1ce308c0, NULL) = 0
......
此处存在明细spin迹象

综上信息判断触发Oracle缺陷可能性大,据此在MOS搜关键字“ksvrdp kfncPoolCb ksvreceive onnn”发现与Problem 1459376.1极度相似,该问题由Bug 10285394触发。
Bug 10285394 - ORA-700 when using ASM (Doc ID 10285394.8)
补丁下载界面中有以下提示信息值得注意
DBMV2: ORA-700 SIGNALED BY CKPT AFTER EXHAUSTED FRA (10285394)
FRA耗尽,查询v$recovery_area_usage、alert日志均未发现有FRA耗尽过迹象,但alert日志中有记录发生过归档失败(未使用FRA),或许归档失败也会诱发此缺陷吧。

Maclean Liu(刘相兵 发表于 2014-2-25 16:38:48

ora_o00n Process High CPU Usage in 11.2.0.2 (Doc ID 1459376.1)

APPLIES TO:

Oracle Server - Enterprise Edition - Version 11.2.0.2 to 11.2.0.2.0
Information in this document applies to any platform.
SYMPTOMS

11.2.0.2 database on ASM, the ora_o00n process consumes high CPU:

top command output
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND         
  
2921 oracle    25   0 1715m  17m  15m R 99.5  0.1 658:32.91 ora_o001_trng


Oracle process dump:
Call stack:  .. dbgtTrc_int ksvreceive kfncPoolCb ksvrdp opirip ..

Current Wait Stack:
     0: waiting for 'class slave wait'
        slave id=0x0, =0x0, =0x0
        wait_id=139 seq_num=140 snap_id=1
        wait times: snap=358705 min 17 sec, exc=358705 min 17 sec, total=358705 min 17 sec



OS truss/strace command shows the process is spinning on OS call gettimeofday
CAUSE

Due to bug 10285394

Multiple duplicates open: bug 13767869 bug 13789248 bug 13767461 bug 13918332 bug 13952464 bug 12929268  bug 12671004 bug 13363390 bug 14044697

SOLUTION

Refer to note 10285394.8 for more info about the bug.

REFERENCES

BUG:12929268 - HIGH CPU ON ORA_O00N PROCESS
页: [1]
查看完整版本: 11.2.0.2.0 RAC数据库onnn进程高CPU消耗诊断一例