- 最后登录
- 2014-2-24
- 在线时间
- 33 小时
- 威望
- 81
- 金钱
- 592
- 注册时间
- 2012-1-10
- 阅读权限
- 50
- 帖子
- 45
- 精华
- 0
- 积分
- 81
- UID
- 162
|
1#
发表于 2012-5-18 16:37:57
|
查看: 5905 |
回复: 1
OS: linux 2.6
Oracle: 10.2.0.5.0 两节点RAC
症状: 用户抱怨第一节点的CPU资源消耗殆尽,要求尽快解决;
登陆后使用TOP命令,输出如下
top - 09:54:50 up 34 days, 17:27, 13 users, load average: 10.15, 10.10,
Tasks: 286 total, 12 running, 274 sleeping, 0 stopped, 0 zombie
Cpu(s): 84.6% us, 15.4% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.1% si
Mem: 17517944k total, 16596828k used, 921116k free, 1355284k buffers
Swap: 2097144k total, 50744k used, 2046400k free, 12154200k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20344 oracle 25 0 6287m 23m 17m R 50.2 0.1 4418:21 oracle
20309 oracle 25 0 6289m 40m 34m R 49.8 0.2 4418:45 oracle 20315 oracle 25 0 6289m 23m 17m R 49.8 0.1 4418:01 oracle
20332 oracle 25 0 6287m 23m 17m R 49.2 0.1 4412:39 oracle 10664 oracle 17 0 6477m 664m 473m R 43.8 3.9 0:06.01 oracle
20340 oracle 25 0 6287m 23m 17m R 33.5 0.1 4417:03 oracle
20348 oracle 25 0 6287m 23m 17m R 33.5 0.1 4420:16 oracle
20292 oracle 25 0 6289m 23m 17m R 33.2 0.1 4414:04 oracle 20357 oracle 25 0 6287m 23m 17m R 28.9 0.1 4418:29 oracle
20336 oracle 25 0 6287m 23m 17m R 27.2 0.1 4419:38 oracle
根据列出的top oracle进程spid,登陆数据库查看发现它们都在进行i/o slave wait等待,使用oradebug dump errorstack 3发现都在执行如下语句
select /*+ parallel(t,14) parallel_index(t,14) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */ count(*),count(distinct "ID"),sum(sys_op_o
pnsize("ID")),count(distinct "TYPE"),sum(sys_op_opnsize("TYPE")),count(distinct "TASK_ID"),sum(sys_op_opnsize("TASK_ID")),count("ATTR1"),count(distinct "ATTR1"),sum(sys_op_opnsi
ze("ATTR1")),substrb(dump(min(substrb("ATTR1",1,32)),16,0,32),1,120),substrb(dump(max(substrb("ATTR1",1,32)),16,0,32),1,120),count("ATTR2"),count(distinct "ATTR2"),sum(sys_op_op
nsize("ATTR2")),substrb(dump(min(substrb("ATTR2",1,32)),16,0,32),1,120),substrb(dump(max(substrb("ATTR2",1,32)),16,0,32),1,120),count("ATTR3"),count(distinct "ATTR3"),sum(sys_op
_opnsize("ATTR3")),substrb(dump(min(substrb("ATTR3",1,32)),16,0,32),1,120),substrb(dump(max(substrb("ATTR3",1,32)),16,0,32),1,120),count(case when "ATTR4" is null then null else
1 end),sum(sys_op_opnsize("ATTR4")),count("ATTR5"),count(distinct "ATTR5"),sum(sys_op_opnsize("ATTR5")),substrb(dump(min(substrb("ATTR5",1,32)),16,0,32),1,120),substrb(dump(max
(substrb("ATTR5",1,32)),16,0,32),1,120),count(case when "OTHER" is null then null else 1 end),sum(sys_op_opnsize("OTHER")) from "SYS"."WRI$_ADV_OBJECTS" t
SQL> select event,count(*) from v$session group by event order by 2;
EVENT COUNT(*)
---------------------------------------------------------------- ----------
Streams AQ: waiting for messages in the queue 1
Streams AQ: qmn coordinator idle wait 1
Streams AQ: waiting for time management or cleanup tasks 1
class slave wait 1
pmon timer 1
smon timer 1
ges remote message 1
DIAG idle wait 1
ASM background timer 1
jobq slave wait 2
Streams AQ: qmn slave idle wait 2
gcs remote message 2
SQL*Net message to client 2
i/o slave wait 14
rdbms ipc message 14
SQL*Net message from client 16
SQL> select QCSID,SID ,SERVER_GROUP,SERVER_SET from V$PX_SESSION;
QCSID SID SERVER_GROUP SERVER_SET
---------- ---------- ------------ ----------
474 452 1 2
474 453 1 2
498 443 1 2
498 445 1 2
474 464 1 2
498 447 1 2
498 444 1 2
498 437 1 2
498 438 1 2
498 439 1 2
498 440 1 2
我的问题如下:
1.
Sql执行在第一节点,第二节点压力正常,但是v$session.machine均显示为第二节点的机器名,什么条件下会发生这种状况?
这些session都在执行sql,却为何v$session.sql_id/ PREV_SQL_ID均为空?
2.
我们知道,并行查询需要一个coordinator,但如何获取该进程sid?
此时貌似除了kill session,暂时没有好的解决办法,总共14个并行进程,是否需要一一kill,还是只需要kill coordinator进程?
烦劳麦克林恩解答一下
[ 本帖最后由 myownstars 于 2012-5-18 16:39 编辑 ] |
|