Avg message sent queue time 很高问题
本帖最后由 SKYLINE.LIU 于 2013-12-8 16:11 编辑数据库版本 11.1.0.7.0 RAC
最近正在做SQL调优,在节点1 的AWR中,Avg message sent queue time (ms) 高的有点离谱,2s~5s
% of indirect sent messages 大于50%
查看数据库服务器的网卡信息为100M 1000M自适应网卡,硬盘读写性能比较差,硬件升级正在申报中,但新硬件什么时候能下来就要看领导心情了。
基于目前的硬件环境,如何降低Avg message sent queue time?请各位专家给支支招!
节点1 的网卡信息#lsdev -Cc adapter|grep ent
ent8 Available 00-08 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent9 Available 00-09 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent10 Available 0A-08 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent11 Available 0A-09 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent12 Available 0B-08 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent13 Available 0B-09 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
# netstat -v en8 |grep -i Speed
Media Speed Selected: Auto negotiation
Media Speed Running: 1000 Mbps Full Duplex
# netstat -v en12 |grep -i Speed
Media Speed Selected: Auto negotiation
Media Speed Running: 1000 Mbps Full Duplex
节点2 的网卡信息
#lsdev -Cc adapter|grep ent
ent8 Available 01-08 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent9 Available 01-09 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent10 Available 0C-08 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent11 Available 0C-09 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent12 Available 0D-08 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent13 Available 0D-09 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
# netstat -v en8 |grep -i Speed
Media Speed Selected: Auto negotiation
Media Speed Running: 1000 Mbps Full Duplex
# netstat -v en12 |grep -i Speed
Media Speed Selected: Auto negotiation
Media Speed Running: 1000 Mbps Full Duplex
table fetch continued row 145,976,684 4,502.40 1,144.49
table fetch continued row 240,636,086 7,422.13 1,441.22
Segments by Direct Physical Reads
Total Direct Physical Reads: 81,426,283
Captured Segments account for 97.7% of Total
Owner Tablespace Name Object Name Subobject Name Obj. Type Direct Reads %Total
ZFBA ZFBA T_DOC_DOCINSTANCE TABLE 39,277,014 48.24
ZFBA ZFBA T_DOC_WSQX TABLE 33,007,280 40.54
ZFBA ZFBA T_DOC_WSQXHIS TABLE 6,004,372 7.37
ZFBA ZFBA SYS_LOB0000091928C00007$$ LOB 820,060 1.01
ZFBA ZFBA T_RY_JJRYXX TABLE 325,926 0.40
Back to Segment Statistics
Back to Top
Segments by Physical Writes
Total Physical Writes: 1,076,923
Captured Segments account for 60.2% of Total
Owner Tablespace Name Object Name Subobject Name Obj. Type Physical Writes %Total
ZFBA ZFBA SYS_LOB0000108886C00013$$ LOB 253,301 23.52
ZFBA ZFBA_IDX INDEX_DOC_PARAVALUES_IDCODE INDEX 113,754 10.56
ZFBA ZFBA SYS_LOB0000091928C00007$$ LOB 46,038 4.27
ZFBA ZFBA_IDX INDEX_DOC_SPRECORDEX_KEYVALUE INDEX 13,577 1.26
ZFBA ZFBA T_DOC_WSQX TABLE 11,220 1.04
可能是LOB 引起的table fetch continued row ,或者本身有chained/migrated rows Avg message sent queue time (ms): 2,560.6
Cluster 5,491,385 0 20,214 4 7.38
我更倾向于 这个 queue time的超长平均等待是 AWR数据讹误或者 偶发的现象 T_DOC_DOCINSTANCE 这个表中有三个CLOB字段,其中只有一个CLOB字段有数据,保存WORD内容
T_DOC_WSQX 与 T_DOC_WSQXHIS 表中没有大字段,但涉及该表的查询语句条件可选择性太差,让ORACLE工程师来调过几次,都没有什么好的办法。
新获取了周日全天的AWR,节点1的 Avg message sent queue time 还是很高。
系统中 链接行/行链接 超过1000 行的表,这些表属于基础业务表,用户及公司都不建议整理这些表。
TABLE_NAME NUM_ROWS CHAIN_CNT LAST_ANALYZED
T_RY_JJRYXX 5573006 245154 2013/10/27 12:06:33
T_AJ_JJDJ 5682656 179980 2013/10/23 22:07:48
T_DOC_PROCESSINSTANCE 2348472 145901 2013/10/29 12:24:43
T_DOC_DOCINSTANCE 5208553 10640 2013/10/28 22:34:20
TMP_SJ_JF 24466 3247 2013/9/14 12:17:39
T_RY_JCYYJXX 84941 2522 2013/10/29 12:14:16
T_AJBL_XWBL 369876 2161 2013/10/30 22:16:49
T_AJ_SSAJRYDY 188486 1184 2013/10/26 6:02:49
T_TJBB_FZCLCS 318797 1086 2013/10/29 22:12:22
T_RY_RYCLXX 833368 1048 2013/10/30 22:01:37 听你的 AWR调优鹰眼二中讲, private network 建议选用10GB带宽。
这样看来,服务器的内部网卡是否也是瓶颈之一呢?即便再快的硬盘读写速度,网卡最快也就125M/s。 Operating System Statistics - Detail
Snap Time Load %busy %user %sys %idle %iowait
08-Dec 00:00:07 0.42
08-Dec 01:00:11 0.63 3.65 1.35 2.30 2.26 96.35
08-Dec 02:00:16 1.12 3.43 1.15 2.28 2.08 96.57
08-Dec 03:00:20 1.39 3.74 1.71 2.04 0.91 96.26
08-Dec 04:00:25 1.09 3.09 0.97 2.13 0.21 96.91
08-Dec 05:00:16 0.97 3.06 0.93 2.13 0.16 96.94
08-Dec 06:00:34 1.04 2.89 0.86 2.03 0.15 97.11
08-Dec 07:00:17 0.73 3.00 0.88 2.12 0.15 97.00
08-Dec 08:00:38 1.32 3.13 1.00 2.13 0.33 96.87
08-Dec 09:00:41 1.90 4.23 2.12 2.11 1.13 95.77
08-Dec 10:00:01 1.27 5.08 4.34 0.74 1.94 94.92
08-Dec 11:00:19 1.62 3.73 3.19 0.54 2.45 96.27
08-Dec 12:00:10 1.95 2.86 2.37 0.48 2.10 97.14
08-Dec 13:00:35 1.24 1.46 1.06 0.41 0.81 98.54
08-Dec 14:00:04 0.67 1.92 1.50 0.43 1.40 98.08
08-Dec 15:00:15 0.66 1.75 1.30 0.45 1.20 98.25
08-Dec 16:00:15 0.59 1.83 1.33 0.50 0.89 98.17
08-Dec 17:00:35 1.02 2.33 1.82 0.51 0.78 97.67
08-Dec 18:00:53 1.68 4.60 4.09 0.51 2.88 95.40
08-Dec 19:00:10 0.97 1.97 1.50 0.46 1.55 98.03
08-Dec 20:00:22 1.14 1.80 1.35 0.45 0.71 98.20
08-Dec 21:00:24 1.82 3.21 2.72 0.49 0.89 96.79
08-Dec 22:00:50 0.93 3.03 2.50 0.53 3.29 96.97
08-Dec 23:00:04 1.83 2.25 1.77 0.48 1.09 97.75
09-Dec 00:00:11 0.63 1.75 1.32 0.43 1.04 98.25
还是先将IO wait降下来, 再讨论该问题, 有可能解决了 首要问题 后 ,次要问题同时被化解 Hdr: 6032232 10.2.0.2 RDBMS 10.2.0.2 SVRMAN AWR PRODID-5 PORTID-226
Abstract: AWR REPORT SHOWS INVALID OR HIGH VALUES FOR AVG MESSAGE SENT QUEUE TIME
不知道这个BUG修复了没有 Liu Maclean(刘相兵 发表于 2013-12-9 19:54 static/image/common/back.gif
Hdr: 6032232 10.2.0.2 RDBMS 10.2.0.2 SVRMAN AWR PRODID-5 PORTID-226
Abstract: AWR REPORT SHOWS INVAL ...
我在论坛搜了半天也没搜到,刘大能否给个patch number/博文的标题,明天我去单位查查补丁打了没有~ 非常感谢刘大的耐心回答,受教了!
页:
[1]