LGWR进程的trace里总是报20多秒的警告信息
-bash-3.2$ cat /etc/releaseOracle Solaris 10 8/11 s10s_u10wos_17b SPARC
Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
Assembled 23 August 2011
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE 11.2.0.2.0 Production
TNS for Solaris: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
RAC环境,两台节点,系统已打补丁到11.2.0.2.10
SQL> select comments from dba_registry_history;
COMMENTS
------------------------------
Upgraded from 9.2.0.6.0
view invalidation
Upgraded from 10.2.0.4.0
Patchset 11.2.0.2.0
PSU 11.2.0.2.10
现在的问题是其中一台节点在commit的时候有时特别慢,另一台节点正常。
慢的节点AWR中log file sync事件在top5中,后来发现LGWR进程的trace文件中总是报20多秒的警告信息,跟业务人员一起观察发现,只要commit慢的时候,trace文件里就会出现这么长时间的警告信息。
初步判断为存储的问题,存储型号为sun 7320,存储中未发现异常日志,同时另一台节点是正常的,所以这个问题一直很困惑。
后来把redo都放在另一个新存储的卷里,还是有这个问题,新存储也是在7320这个机头下,只是不同的柜子。
我现在猜测的原因:
1.BUG,查看metalink里还未打的更新的PSU补丁里未提及相关BUG。
2.存储问题,虽然redo换存储了,但是还是在同一个机头下,机头BUG,或者光纤卡BUG。
希望各位高手能帮助解决此问题!还需要什么信息我再发上来。 本帖最后由 wushi8 于 2013-11-15 14:36 编辑
asm磁盘信息:
SQL> select group_number,os_mb,total_mb,free_mb from v$asm_disk;
GROUP_NUMBER OS_MB TOTAL_MB FREE_MB
------------ ---------- ---------- ----------
2 1484467 1484467 667684
3 1638002 1638002 620978
5 573052 573052 251989
6 716417 716417 314081
1 573052 573052 523928
4 1855 1855 1546
4 1855 1855 1546
4 1855 1855 1547
7 614004 614004 613403
asm磁盘组信息
SQL> select group_number,name,state,type,total_mb,free_mb,cold_used_mb from v$asm_diskgroup;
GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB COLD_USED_MB
------------ ------------------------------ ----------- ------ ---------- ---------- ------------
1 ARCH CONNECTED EXTERN 573052 523492 49560
2 DATA CONNECTED EXTERN 1484467 667684 816783
3 IDX CONNECTED EXTERN 1638002 620978 1017024
4 OCR MOUNTED NORMAL 5565 4639 926
5 SYS CONNECTED EXTERN 573052 251989 321063
6 UNDO CONNECTED EXTERN 716417 314081 402336
7 REDO CONNECTED EXTERN 614004 613403 601 1.log file parallel write的时间如何
2.lgwr进程的优先级是否正常
3._use_adaptive_log_file_sync是否TRUE(如果true, lgwr的trace里是否有类似“Log file sync switching to polling”
信息) 补充AWR
SGA+PGA 用到了物理内存的 80%
Begin End
Host Mem (MB): 262,144.0 262,144.0
SGA use (MB): 204,800.0 204,800.0
PGA use (MB): 3,377.0 5,685.4
% Host Mem used for SGA+PGA: 79.41 80.29
Event Waits %Time -outs Total Wait Time (s) Avg wait (ms) Waits /txn % bg time
db file parallel write 538,429 0 5,612 10 0.25 11.90
gcs log flush sync 583,193 63 5,158 9 0.27 10.94
gcs log flush sync 超时比例达到63%
Global Cache and Enqueue Services - Workload Characteristics
Avg global enqueue get time (ms): 0.0
Avg global cache cr block receive time (ms): 0.8
Avg global cache current block receive time (ms): 0.7
Avg global cache cr block build time (ms): 0.0
Avg global cache cr block send time (ms): 0.0
Global cache log flushes for cr blocks served %: 0.0
Avg global cache cr block flush time (ms):
Avg global cache current block pin time (ms): 545.7
Avg global cache current block send time (ms): 0.0
Global cache log flushes for current blocks served %: 0.3
Avg global cache current block flush time (ms): 4,922.9
Global Cache and Enqueue Services - Messaging Statistics
Avg message sent queue time (ms): 0.1
Avg message sent queue time on ksxp (ms): 3.3
Avg message received queue time (ms): 0.1
Avg GCS message process time (ms): 0.1
Avg GES message process time (ms): 0.0
% of direct sent messages: 55.58
% of indirect sent messages: 43.44
% of flow controlled messages: 0.98
Avg global cache current block flush time (ms): 4,922.9
Avg message sent queue time on ksxp (ms): 3.3
心跳网络的状况不好,
IOStat by Function summary
'Data' columns suffixed with M,G,T,P are in multiples of 1024 other columns suffixed with K,M,G,T,P are in multiples of 1000
ordered by (Data Read + Write) desc
Function Name Reads: Data Reqs per sec Data per sec Writes: Data Reqs per sec Data per sec Waits: Count Avg Tm(ms)
Others 34.7G 30.87 2.46551 30.7G 6.59 2.18212 434.3K 1.79
DBWR 0M 0.00 0M 50.9G 179.60 3.61591 0
Buffer Cache Reads 43.6G 348.35 3.09509 0M 0.00 0M 4998.8K 14.86
LGWR 8M 0.09 .000554 30.4G 76.55 2.15550 947.7K 6.88
Direct Writes 3M 0.00 .000207 1.6G 0.67 .116461 0
Direct Reads 410M 0.71 .028422 109M 0.09 .007556 0
TOTAL: 78.7G 380.02 5.58979 113.8G 263.51 8.07755 6380.8K 12.79
考虑:
1、关闭一个节点并观察现象
2、使用IO calibrate测一下IO
页:
[1]