unora

17 积分	0 好友	0 主题

发消息

enq: TX - index contention

1^#

发表于 2012-4-25 14:13:30 | 查看: 10118| 回复: 8

OS:AIX 6.1
version:10.2.0.5.4

请参看一下附件，包含4个文件
分别为：正常情况awr/异常awr/awrdd以及oracle support

非常严重的enq: TX - index contention等待，伴随非常高的failed probes on index block reclamation。

我们已经有3个数据库出现了这个问题。

出现频率非常低，每个数据库都只出现了1次，而且持续时间很短，oracle原厂说可能是个潜在的bug，但想抓到时间点做systemstate等操作基本上不可能。

烦请分析一下原因。谢谢

index split.rar

96.24 KB, 下载次数: 1672

分享0

收藏0 回复只看该作者道具举报

Maclean Liu(刘相兵

2^#

发表于 2012-4-25 15:24:02

10.2.0.5.0

failed probes on index block reclamation 113,196 31.60 0.87
branch node splits 15 0.00 0.00
leaf node 90-10 splits 94 0.03 0.00
leaf node splits 1,540 0.43 0.01
enq: TX - index contention 2,891 7,934 2,744 93.0 Concurrency
root node splits 1 0.00 0.00

可以看到在 bad awr中发生过 1540次的 index leaf block split  ， 15次的branch block split ，和  1次的root block split

Top SQL

7,979 15 22,248 0.36 93.51 7z0xy3tqz4ch1      insert into DLog (seq, action_...

enq: TX - index contention常由大量并发INSERT 造成的 index split 引起，这里可以看到引发enq: TX - index contention的是SQL :

insert into DLog (seq, action_code, cutoffday, bip_id, activity_id, activity_type, conv_id, odomain_id, osn_duns, hdomain_id, hsn_duns, icnt, dub_flag, status, rev_type, rev_desc) values (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15, :16)

关于 index split的更多信息可以参考 http://www.oracledatabase12g.com ... 88%86%E6%9E%90.html

建议：

1. 定期对 DLog 相关的索引执行 coalesce 操作而非 rebuild 操作。  rebuild 可能导致索引高度下降，当再有大量插入后可能会导致索引root node split

2. 考虑使用global hash index  ，global hash index可以很大程度上避免索引插入争用

回复只看该作者道具举报

anbob

3^#

发表于 2012-4-25 15:28:24

把index加到更大的块的tablespace上

回复只看该作者道具举报

不了峰

4^#

发表于 2012-4-25 18:00:04

原帖由 anbob 于 2012-4-25 15:28 发表
把index加到更大的块的tablespace上

更大块，不是会引起更严重的竞争 ?

还有，刘总，他这个系统，应该都是insert DLog ,没有对表进行del
那对索引的 coalesce 还有意义吗？

回复只看该作者道具举报

Maclean Liu(刘相兵

5^#

发表于 2012-4-25 21:47:43

“failed probes on index block reclamation”  每秒 31.60次，这个statistcs说明了在split leaf node时实际split的进程进行了较多的search space工作，它的机制是这样的如果 Server process在leaf node的临近块找不到可用的block的话，它不会一开始就分配新的空间（这会造成index segment的空间增长) ，而是到该index segment的其他地方搜索可用的Free Block，这些Free Block的要求是 bitmap block显示这些block的status是75%-100% Free的， server process会扫描这些 75%-100% Free的block 并确认这些block 实际上是空的，如果找到 FULL Free Block则使用；如果没有则继续搜索，直到所有候选block都被检查过；这个行为叫做 probes on index block reclamation，若检查了一个block发现不符合要求，则累加 “failed probes on index block reclamation”  这个statistics。

probes on index block reclamation在有大量 75%-100% Free block的索引上会造成大量的 Buffer Get，如上述链接中我演示的那样。

而造成索引存在大量 75%-100% Free block 而又不完全为空的块 (NOT FULL FREE)原因是常见的是对表上(索引)的数据执行了不完全的删除。  若没有大量删除过索引上数据，不会造成大量 5%-100% Free block 而又不完全为空的 (NOT FULL FREE)块，也就不存在大量的“failed probes on index block reclamation”

实际上在以往的案例中我曾经遇到过完全一致的问题，其临时的解决方案就是不删除表上的历史数据。

Metalink bug 8286901 note中叙述了一位用户遇到相同的问题并提交了SR,当时oracle support给出了one-off补丁，但该用户在apply了该补丁后仍未解决问题。

以下为note 原文：
It is similar to bug8286901, but after applied patch8286901, still see enq tx
contentiona with high “failed probes on index block reclamation”

Issue encountered by customer and Oracle developer (Stefan Pommerenk).

He describes is thus:

"Space search performed by the index splitter can't find space in neighboring

blocks, and then instead of allocating new space, we go and continue to

search for space elsewhere, which manifests itself in block reads from disk,

block cleanouts, and subsequent blocks written due to aggressive MTTR

setting."

"To clarify: the cleanouts are not the problem per se. The culprit seems to

be that the space search performed by the index splitter can't find space in

neighboring blocks, and then instead of allocating new space, we go and

continue to search for space elsewhere, which manifests itself in block reads

from disk, block cleanouts, and subsequent blocks written due to aggressive

MTTR setting. This action has caused other sessions to get blocked on TX

enqueue contention, blocked on the splitting session. Advice was to set 10224

trace event for the splitter for a short time only in order to get

diagnostics as to why the space search rejected most blocks.

> A secondary symptom are the bitmap level 1 block updates, which may or may

not be related to the space search; I've not seen them before, maybe because

I didn't really pay attention :P , but the symptoms seen in the ASH trace

indicate it's the same problem. Someone in space mgmt has to look at it to

confirm it is the same problem."

与该用户进行了mail私下交流，他的回复：
I still have a case open with Oracle. I believe that this is a bug in the Oracle code. The problem is that it has been difficult to create a reproducible test case for Oracle support. My specific issue was basically put on hold pending the results of another customer’s service request that appeared to have had the same issue, (9034788). Unfortunately they couldn’t reproduce the issue in that case either.

I believe that there is a correlation between the enq TX – index contention wait event and a spike in the number of ‘failed probes on index block reclamation. I have specifically asked Oracle to explain why there is a spike in the ‘failed probes on index block reclamation’ during the same time frame as the enq TX index contention wait event, but they have not answered my question.

I was hoping that some investigation by Oracle Support into the failed probes metric might get someone on the right track to discovering the bug. That hasn’t happened though.

Hi ,

Thanks for your sharing .  The bug (or specific ktsp behave) is fatal in response time sensitive  OLTP env.

I would like to ask my customer to coalesce those index where massive deleted regularly.

Thanks for your help again!

Yes, I saw that. I have applied patch 8286901 and set the event for version 10.2.0.4, but the problem still occurs periodically. And as I mentioned before, we see a correlation between enq TX waits and the failed probes on index block reclamation. Which is why I still think that it is a bug. I agree that trying to rebuild or coalesce the indexes are simply attempts to workaround the issue and not solve the root cause.

Early on when I started on this issue I did do some index dumps and could clearly see that we had lots of blocks with only 1 or 2 records after our mass delete jobs. I have provided Oracle Support with this information as well as oradump files while the problem is occurring, but they don’t seem to be able to find anything wrong so far.

If you are interested in seeing if you are experiencing a high ‘failed probes on index block reclamation’ event run the query below.

select SS.snap_id,
SS.stat_name,
TO_CHAR(S.BEGIN_INTERVAL_TIME, ‘DAY’) DAY,
S.BEGIN_INTERVAL_TIME,
S.END_INTERVAL_TIME,
SS.value,
SS.value – LAG(SS.VALUE, 1, ss.value) OVER (ORDER BY SS.SNAP_ID) AS DIFF
from DBA_HIST_SYSSTAT SS,
DBA_HIST_SNAPSHOT S
where S.SNAP_ID = SS.SNAP_ID
AND SS.stat_NAME = ‘failed probes on index block reclamation’
ORDER BY SS.SNAP_ID ;

我们可以猜测Oracle提供的one-off补丁中可能是为叶块分裂所会扫描的“空块”附加了一个上限，在未达到上限的情况下扫描仍会发生。而在主流的公开的发行版本中Oracle不会引入该补丁的内容。

回复只看该作者道具举报

dla001

6^#

发表于 2012-6-27 17:47:16

而造成索引存在大量 75%-100% Free block 而又不完全为空的块 (NOT FULL FREE)原因是常见的是对表上(索引)的数据执行了不完全的删除。若没有大量删除过索引上数据，不会造成大量 5%-100% Free block 而又不完全为空的 (NOT FULL FREE)块，也就不存在大量的“failed probes on index block reclamation”

这里是笔误吧 5%-100%

回复只看该作者道具举报

zhouyh139

7^#

发表于 2013-3-14 10:52:24

server process会扫描这些 75%-100% Free的block 并确认这些block 实际上是空的，如果找到 FULL Free Block则使用；如果没有则继续搜索，直到所有候选block都被检查过；这个行为叫做 probes on index block reclamation，
如果是按这样分配，sever 进程为什么不直接查找100% 的 full free block 呢，而要到75%-100%中去检索呢？

回复只看该作者道具举报

digdeep

8^#

发表于 2013-6-12 13:08:00

zhouyh139 发表于 2013-3-14 10:52
server process会扫描这些 75%-100% Free的block 并确认这些block 实际上是空的，如果找到 FULL Free Bl ...

段头的BMB，根据block空间使用情况，将block分为了几类，大致是：0~%25，%25~%50，%50~%75，75%~100%。没有特别将100%的单独出来，所以index 块split时，要去search75%~100%的块。判断是否full free block.

回复只看该作者道具举报

digdeep

9^#

发表于 2013-6-12 13:09:39

Maclean Liu(刘相兵发表于 2012-4-25 21:47
“failed probes on index block reclamation” 每秒 31.60次，这个statistcs说明了在split leaf node时 ...

第一次学习到index 块具体的split原理。感谢。

回复只看该作者道具举报

返回列表

		自动登录	找回密码
密码			注册