nub备份归档文件时,产生大量IO导致系统HANG,请前辈指点
操作系统:AIX 6.1数据库版本:ORACLE 10.2.0.4 RAC
故障现象:当NBU脚本执行到备份归档文件时,SWAP盘瞬间IO 100%,整个系统HANG住,应用无法连接数据库。只有取消掉NBU备份,过一会才能正常,目前一直无法完成备份任务。
观察crsd.log时,发现有如下提示:
32CAAMonitorHandler :: 0:Action Script /oracle/product/10.2.0/crs/bin/racgwrap(check) timed out for ora.imsdb1.vip! (timeout=60)
以为遇到Bug 6196746,但是进程里面没有过多的racgmain累积。因为观察到SWAP空间IO 100%,认为内存不足,但是topas内存使用50%左右,SWAP使用30%左右,但是SWAP盘仍有较多的PI PO
附AWR、LOG、系统参数文件,请前辈给予指点,谢谢!
就AWR看 解析等待居多 :
Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
latch: library cache 124,490 31,081 250 226.0 Concurrency
cursor: pin S wait on X 1,334,178 13,678 10 99.5 Concurrency
CPU time 3,508 25.5
latch: shared pool 9,581 2,386 249 17.4 Concurrency
可能是LCK进程当时HANG住了,导致其HANG的可能包括 CRS不正常 或者资源不足等等。
需要当时的一些SYSTEM级别DUMP,你可以查看alert.log中是否显示生成了此种DUMP
alert.log中没有dump信息,要做系统级别dump需要重现故障,会导致应用系统中断太久。看能否申请维护窗口
Mon Mar 24 22:57:14 2014
Thread 1 advanced to log sequence 5827 (LGWR switch)
Current log# 2 seq# 5827 mem# 0: /dev/rimsh_redo02
Mon Mar 24 23:03:24 2014
WARNING: inbound connection timed out (ORA-3136)
Mon Mar 24 23:03:24 2014
WARNING: inbound connection timed out (ORA-3136)
Mon Mar 24 23:04:57 2014
WARNING: inbound connection timed out (ORA-3136)
Mon Mar 24 23:09:41 2014
WARNING: inbound connection timed out (ORA-3136)
Mon Mar 24 23:10:10 2014
WARNING: inbound connection timed out (ORA-3136)
Mon Mar 24 23:10:11 2014
WARNING: inbound connection timed out (ORA-3136)
Mon Mar 24 23:17:04 2014
Expanded controlfile section 15 from 1302 to 2604 records
Requested to grow by 1302 records; added 6 blocks of records
Mon Mar 24 23:25:47 2014
WARNING: inbound connection timed out (ORA-3136)
Mon Mar 24 23:25:58 2014
WARNING: inbound connection timed out (ORA-3136)
Mon Mar 24 23:26:01 2014
WARNING: inbound connection timed out (ORA-3136)
Mon Mar 24 23:26:02 2014
WARNING: inbound connection timed out (ORA-3136)
Mon Mar 24 23:26:13 2014
WARNING: inbound connection timed out (ORA-3136)
Mon Mar 24 23:35:28 2014
Thread 1 advanced to log sequence 5828 (LGWR switch)
Current log# 1 seq# 5828 mem# 0: /dev/rimsh_redo01
Mon Mar 24 23:40:34 2014
Thread 1 advanced to log sequence 5829 (LGWR switch)
Current log# 5 seq# 5829 mem# 0: /dev/rimsh_redo03
Mon Mar 24 23:45:17 2014
Thread 1 advanced to log sequence 5830 (LGWR switch)
Current log# 2 seq# 5830 mem# 0: /dev/rimsh_redo02
Mon Mar 24 23:53:50 2014
Thread 1 advanced to log sequence 5831 (LGWR switch)
Current log# 1 seq# 5831 mem# 0: /dev/rimsh_redo01
Tue Mar 25 00:03:56 2014
Thread 1 advanced to log sequence 5832 (LGWR switch)
Current log# 5 seq# 5832 mem# 0: /dev/rimsh_redo03
页:
[1]