合 Oracle等待事件队列等待之TX - allocate ITL entry引起的死锁处理
Tags: Oracle故障处理等待事件死锁TX - allocate ITL entry案例块级死锁
前言部分
导读和注意事项
各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~:
① enq: TX - allocate ITL entry等待事件的解决
② 一般等待事件的解决办法
③ 队列等待的基本知识
④ ITL死锁解决
⑤ ITL死锁模拟
⑥ Merge语句的非关联形式的查询优化
故障分析及解决过程
故障环境介绍
项目 | source db |
---|---|
db 类型 | RAC |
db version | 11.2.0.3.0 |
db 存储 | ASM |
OS版本及kernel版本 | AIX 64位 7.1.0.0 |
故障发生现象及故障分析解决
早上刚来上班,同事就发了一个SQL过来,说是有锁,然后我就查了查系统里的锁,结果一个锁都没得。好吧,还是得干点事的,先看看SQL语句:
MERGE INTO TLHR.TLHRBOKBAL S
USING (SELECT A.BOOKACCOUNT AS BOOKACCOUNT,
(A.CURRBALANCE + NVL(B.BAL, 0.00)) AS BANKAMT
FROM TLHR.TLHRBOKBAL_TMP A,
(SELECT T1.BOOKACCOUNT AS BOOKACCOUNT,
SUM(DECODE(T1.DCFLAG, 'D', -T1.AMT, 'C', T1.AMT, 0)) AS BAL
FROM TLHR.TLHRBOKBALJN T1
WHERE T1.BOOKACCOUNT LIKE '13500000%'
AND T1.TRANDATE = '20150901'
AND (T1.REASON IN ('1', '2') OR
(T1.REASON = '0' AND T1.ONLINEFLAG = '1'))
GROUP BY T1.BOOKACCOUNT) B
WHERE A.BOOKACCOUNT = B.BOOKACCOUNT(+)
AND A.BOOKACCOUNT LIKE '13500000%') T
ON (S.BOOKACCOUNT = T.BOOKACCOUNT)
WHEN MATCHED THEN
UPDATE
SET S.LASTBALANCE = T.BANKAMT,
S.CURRBALANCE = T.BANKAMT,
S.DEBITAMT = 0.00,
S.CREDITAMT = 0.00;
看起来是一个MERGE语句,按照小麦苗以前的经验,这一类的SQL最好是修改为MERGE的非关联形式比较好,我们先看看执行计划有没有问题:
先找到SQL_ID为53qv858pwwwwb:
SELECT a.ELAPSED_TIME,a.EXECUTIONS,a.* FROM v$sql a WHERE a.SQL_TEXT LIKE '%MERGE INTO TLHRBOKBAL S%' AND A.SQL_TEXT LIKE '%13500000%' ;
查询历史执行计划:
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_AWR(SQL_ID => '53qv858pwwwwb' )) ;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | Plan hash value: 2695089823 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | MERGE STATEMENT | | | | | 155K(100)| | | 1 | MERGE | TLHRBOKBAL | | | | | | | 2 | VIEW | | | | | | | | 3 | HASH JOIN RIGHT OUTER | | 153K| 15M| | 155K (2)| 00:31:04 | | 4 | VIEW | | 1 | 31 | | 6 (0)| 00:00:01 | | 5 | SORT GROUP BY | | 1 | 41 | | 6 (0)| 00:00:01 | | 6 | TABLE ACCESS BY INDEX ROWID| TLHRBOKBALJN | 1 | 41 | | 6 (0)| 00:00:01 | | 7 | INDEX RANGE SCAN | PK_TLHRBOKBALJN | 2 | | | 4 (0)| 00:00:01 | | 8 | HASH JOIN | | 153K| 10M| 5848K| 155K (2)| 00:31:04 | | 9 | TABLE ACCESS FULL | TLHRBOKBAL_TMP | 153K| 4048K| | 85415 (2)| 00:17:05 | | 10 | TABLE ACCESS FULL | TLHRBOKBAL | 305K| 13M| | 68755 (3)| 00:13:46 | ------------------------------------------------------------------------------------------------------------ |
可以看到,该执行计划的顺序为【7-->6-->5-->4-->9-->10-->8-->3-->2-->1-->0】,而耗费性能的地方在9、10、8这3个步骤上,走的是全表扫描,我们先看看2个大表的数据量:
SELECT COUNT(*) FROM TLHR.TLHRBOKBAL_TMP A WHERE A.BOOKACCOUNT LIKE '13500000%'; --306043/38998765
SELECT COUNT(*) FROM TLHR.TLHRBOKBAL A WHERE A.BOOKACCOUNT LIKE '13500000%'; --306043/38826275
从3000万的数据里边取出30万的数据,还是比较少的,所以应该去走索引的,看了一下统计信息,也是最新收集的,好吧,算了,先修改一下SQL让其走索引扫描看看,:
MERGE INTO TLHR.TLHRBOKBAL S
USING (SELECT S.ROWID ROWIDS,
A.BOOKACCOUNT AS BOOKACCOUNT,
(A.CURRBALANCE + NVL(B.BAL, 0.00)) AS BANKAMT
FROM (SELECT /*+index(NB,PK_TLHRBOKBAL_TMP)*/NB.CURRBALANCE,NB.BOOKACCOUNT
FROM TLHR.TLHRBOKBAL_TMP NB
WHERE NB.BOOKACCOUNT LIKE '13500000%') A,
TLHR.TLHRBOKBAL S,
(SELECT T1.BOOKACCOUNT AS BOOKACCOUNT,
SUM(DECODE(T1.DCFLAG, 'D', -T1.AMT, 'C', T1.AMT, 0)) AS BAL
FROM TLHR.TLHRBOKBALJN T1
WHERE T1.BOOKACCOUNT LIKE '13500000%'
AND T1.TRANDATE = '20150901'
AND (T1.REASON IN ('1', '2') OR
(T1.REASON = '0' AND T1.ONLINEFLAG = '1'))
GROUP BY T1.BOOKACCOUNT) B
WHERE A.BOOKACCOUNT = B.BOOKACCOUNT(+)
AND S.BOOKACCOUNT = A.BOOKACCOUNT
AND S.BOOKACCOUNT LIKE '13500000%') T
ON (T.ROWIDS = S.ROWID)
WHEN MATCHED THEN
UPDATE
SET S.LASTBALANCE = T.BANKAMT,
S.CURRBALANCE = T.BANKAMT,
S.DEBITAMT = 0.00,
S.CREDITAMT = 0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | Plan Hash Value : 273017430 ---------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ---------------------------------------------------------------------------------------------------------- | 0 | MERGE STATEMENT | | 152885 | 4280780 | 283362 | 00:56:41 | | 1 | MERGE | TLHRBOKBAL | | | | | | 2 | VIEW | | | | | | | 3 | NESTED LOOPS | | 152885 | 20945245 | 283362 | 00:56:41 | | * 4 | HASH JOIN RIGHT OUTER | | 152885 | 14065420 | 130342 | 00:26:05 | | 5 | VIEW | | 124 | 3844 | 15668 | 00:03:09 | | 6 | SORT GROUP BY | | 124 | 5084 | 15668 | 00:03:09 | | * 7 | TABLE ACCESS BY INDEX ROWID | TLHRBOKBALJN | 124 | 5084 | 15668 | 00:03:09 | | * 8 | INDEX RANGE SCAN | PK_TLHRBOKBALJN | 165 | | 15501 | 00:03:07 | | * 9 | HASH JOIN | | 152885 | 9325985 | 114671 | 00:22:57 | | 10 | TABLE ACCESS BY INDEX ROWID | TLHRBOKBAL_TMP | 153563 | 4146201 | 112930 | 00:22:36 | | * 11 | INDEX RANGE SCAN | PK_TLHRBOKBAL_TMP | 153563 | | 1159 | 00:00:14 | | * 12 | INDEX RANGE SCAN | PK_TLHRBOKBAL | 152884 | 5198056 | 1117 | 00:00:14 | | 13 | TABLE ACCESS BY USER ROWID | TLHRBOKBAL | 1 | 45 | 1 | 00:00:01 | ---------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 4 - access("NB"."BOOKACCOUNT"="B"."BOOKACCOUNT"(+)) * 7 - filter("T1"."REASON"='0' AND "T1"."ONLINEFLAG"='1' OR "T1"."REASON"='1' OR "T1"."REASON"='2') * 8 - access("T1"."BOOKACCOUNT" LIKE '13500000%' AND "T1"."TRANDATE"='20150901') * 8 - filter("T1"."BOOKACCOUNT" LIKE '13500000%' AND "T1"."TRANDATE"='20150901') * 9 - access("S"."BOOKACCOUNT"="NB"."BOOKACCOUNT") * 11 - access("NB"."BOOKACCOUNT" LIKE '13500000%') * 11 - filter("NB"."BOOKACCOUNT" LIKE '13500000%') * 12 - access("S"."BOOKACCOUNT" LIKE '13500000%') * 12 - filter("S"."BOOKACCOUNT" LIKE '13500000%') |
执行计划中,基本都走了索引了,跑了一下,大约1分种多,但是里边有个HINTS,分析了一下表TLHRBOKBAL_TMP上的索引情况,发现是个主键索引,且有2个列(BOOKACCOUNT,CURRENCY),但是不包含列CURRBALANCE,可能是Oracle觉得回表读的耗费比较大吧,那这里可以使用虚拟索引测试一下索引的性能:
CREATE INDEX IX_VI01_ID ON TLHR.TLHRBOKBAL_TMP(CURRBALANCE, CURRENCY,BOOKACCOUNT) NOSEGMENT;
ALTER SESSION SET "_USE_NOSEGMENT_INDEXES"=TRUE;
EXPLAIN PLAN FOR MERGE INTO TLHR.TLHRBOKBAL S
USING (SELECT S.ROWID ROWIDS,
A.BOOKACCOUNT AS BOOKACCOUNT,
(A.CURRBALANCE + NVL(B.BAL, 0.00)) AS BANKAMT
FROM (SELECT NB.CURRBALANCE,NB.BOOKACCOUNT
FROM TLHR.TLHRBOKBAL_TMP NB
WHERE NB.BOOKACCOUNT LIKE '13500000%') A,
TLHR.TLHRBOKBAL S,
(SELECT T1.BOOKACCOUNT AS BOOKACCOUNT,
SUM(DECODE(T1.DCFLAG, 'D', -T1.AMT, 'C', T1.AMT, 0)) AS BAL
FROM TLHR.TLHRBOKBALJN T1
WHERE T1.BOOKACCOUNT LIKE '13500000%'
AND T1.TRANDATE = '20150901'
AND (T1.REASON IN ('1', '2') OR
(T1.REASON = '0' AND T1.ONLINEFLAG = '1'))
GROUP BY T1.BOOKACCOUNT) B
WHERE A.BOOKACCOUNT = B.BOOKACCOUNT(+)
AND S.BOOKACCOUNT = A.BOOKACCOUNT
AND S.BOOKACCOUNT LIKE '13500000%') T
ON (T.ROWIDS = S.ROWID)
WHEN MATCHED THEN
UPDATE
SET S.LASTBALANCE = T.BANKAMT,
S.CURRBALANCE = T.BANKAMT,
S.DEBITAMT = 0.00,
S.CREDITAMT = 0.00;
SELECT * FROM TABLE(DBMS_XPLAN.display);
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | Plan hash value: 983878991 ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------- | 0 | MERGE STATEMENT | | 152K| 4180K| | 170K (1)| 00:34:06 | | 1 | MERGE | TLHRBOKBAL | | | | | | | 2 | VIEW | | | | | | | | 3 | NESTED LOOPS | | 152K| 19M| | 170K (1)| 00:34:06 | |* 4 | HASH JOIN RIGHT OUTER | | 152K| 13M| | 17421 (1)| 00:03:30 | | 5 | VIEW | | 124 | 3844 | | 15668 (1)| 00:03:09 | | 6 | SORT GROUP BY | | 124 | 5084 | | 15668 (1)| 00:03:09 | |* 7 | TABLE ACCESS BY INDEX ROWID| TLHRBOKBALJN | 124 | 5084 | | 15668 (1)| 00:03:09 | |* 8 | INDEX RANGE SCAN | PK_TLHRBOKBALJN | 165 | | | 15501 (1)| 00:03:07 | |* 9 | HASH JOIN | | 152K| 9107K| 5856K| 1750 (1)| 00:00:22 | |* 10 | INDEX FAST FULL SCAN | IX_VI01_ID | 153K| 4049K| | 9 (0)| 00:00:01 | |* 11 | INDEX RANGE SCAN | PK_TLHRBOKBAL | 152K| 5076K| | 1117 (1)| 00:00:14 | | 12 | TABLE ACCESS BY USER ROWID | TLHRBOKBAL | 1 | 45 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("NB"."BOOKACCOUNT"="B"."BOOKACCOUNT"(+)) 7 - filter("T1"."REASON"='0' AND "T1"."ONLINEFLAG"='1' OR "T1"."REASON"='1' OR "T1"."REASON"='2') 8 - access("T1"."BOOKACCOUNT" LIKE '13500000%' AND "T1"."TRANDATE"='20150901') filter("T1"."BOOKACCOUNT" LIKE '13500000%' AND "T1"."TRANDATE"='20150901') 9 - access("S"."BOOKACCOUNT"="NB"."BOOKACCOUNT") 10 - filter("NB"."BOOKACCOUNT" LIKE '13500000%') 11 - access("S"."BOOKACCOUNT" LIKE '13500000%') filter("S"."BOOKACCOUNT" LIKE '13500000%') |
说明创建3个列的索引是可以的。我们先将该虚拟索引删除DROP INDEX IX_VI01_ID;
ITL死锁问题解决
另外一个问题,是开发说上边的SQL语句产生了死锁,起初我还半信半疑,先去告警日志中用命令(more alert* | grep Deadlock)搜了一下:
结果发现很多的死锁,拿到相关的文件,看到如下一段:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | user session for deadlock lock 0x7000008094d14e0 sid: 332 ser: 47221 audsid: 991000 user: 84/TLHR flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x40009) -/-/INC pid: 101 O/S info: user: grid, term: UNKNOWN, ospid: 6489034 image: oracle@ZFTLHRDB1 client details: O/S info: user: TLHR, term: , ospid: 34406578 machine: ZFTLHRAP1 program: bat_CheckBookBal@ZFTLHRAP1 (TNS V1-V3) application name: bat_CheckBookBal@ZFTLHRAP1 (TNS V1-V3), hash value=446537749 current SQL: MERGE INTO TLHRBOKBAL S USING (SELECT A.BOOKACCOUNT AS BOOKACCOUNT, (A.CURRBALANCE + nvl(B.BAL, 0.00)) AS BANKAMT FROM TLHRBOKBAL_TMP A, (SELECT T1.BOOKACCOUNT AS BOOKACCOUNT, SUM(DECODE(T1.DCFLAG, 'D', -T1.AMT, 'C', T1.AMT, 0)) AS BAL FROM TLHRBOKBALJN T1 WHERE T1.BOOKACCOUNT LIKE '13450000'||'%' AND T1.TRANDATE='20160901' AND (T1.REASON = '2' OR T1.REASON = '1' OR (T1.REASON = '0' AND T1.ONLINEFLAG = '1')) GROUP BY T1.BOOKACCOUNT) B WHERE A.BOOKACCOUNT = B.BOOKACCOUNT(+) AND A.BOOKACCOUNT LIKE '13450000'||'%') T ON (S.BOOKACCOUNT = T.BOOKACCOUNT) WHEN MATCHED THEN UPDATE SET S.LASTBALANCE=T.BANKAMT,S.CURRBALANCE=T.BANKAMT,S.DEBITAMT=0.00,S.CREDITAMT=0.00 DUMP LOCAL BLOCKER: initiate state dump for DEADLOCK possible owner[101.6489034] on resource TX-00EE0009-00005EA6 *** 2016-09-01 18:30:38.014 Submitting asynchronized dump request [28]. summary=[ges process stack dump (kjdglblkrdm1)]. Global blockers dump end:----------------------------------- Global Wait-For-Graph(WFG) at ddTS[0.2fe0] : BLOCKED 0x7000008e9c8bc28 3 wq 2 cvtops x1 TX 0x159001e.0x2379(ext 0x5,0x0)[1006-0065-0019365C] inst 1 BLOCKER 0x700000809ab4b28 3 wq 1 cvtops x28 TX 0x159001e.0x2379(ext 0x5,0x0)[2005-005E-00185E15] inst 2 BLOCKED 0x700000891b48708 3 wq 2 cvtops x1 TX 0x1c2001d.0x4b82(ext 0x2,0x0)[2005-005E-00185E15] inst 2 BLOCKER 0x7000008e9c8b148 3 wq 1 cvtops x28 TX 0x1c2001d.0x4b82(ext 0x2,0x0)[1004-004D-0000C03E] inst 1 BLOCKED 0x70000089a636970 3 wq 2 cvtops x1 TX 0x1c0000b.0x18f6(ext 0x2,0x0)[1004-004D-0000C03E] inst 1 BLOCKER 0x7000008e9c8b4e8 3 wq 1 cvtops x28 TX 0x1c0000b.0x18f6(ext 0x2,0x0)[1005-0058-000DD3D9] inst 1 BLOCKED 0x700000891d5fc50 3 wq 2 cvtops x1 TX 0xee0009.0x5ea6(ext 0x2,0x0)[1005-0058-000DD3D9] inst 1 BLOCKER 0x7000008094d14e0 3 wq 1 cvtops x28 TX 0xee0009.0x5ea6(ext 0x2,0x0)[1006-0065-0019365C] inst 1 * Cancel deadlock victim lockp 0x7000008e9c8bc28 *** 2016-09-01 18:30:43.001 kjddt2vb: valblk [0.2fe1] > local ts [0.2fe0] *** 2016-09-01 18:30:47.000 kjddt2vb: valblk [0.2fe5] > local ts [0.2fe4] *** 2016-09-01 18:40:38.062 kjddt2vb: valblk [0.2ff1] > local ts [0.2ff0] *** 2016-09-01 18:42:01.084 kjddt2vb: valblk [0.2ff4] > local ts [0.2ff3] 2016-09-01 22:33:52.213848 : Setting 3-way CR grants to 0 global-lru off? 0 *** 2016-09-01 22:34:23.163 2016-09-01 22:34:23.163681 : Setting 3-way CR grants to 1 global-lru off? 0 2016-09-01 22:50:00.603305 : Setting 3-way CR grants to 0 global-lru off? 0 *** 2016-09-01 22:51:33.104 2016-09-01 22:51:33.104615 : Setting 3-way CR grants to 1 global-lru off? 0 2016-09-02 05:30:18.751891 : Setting 3-way CR grants to 0 global-lru off? 0 2016-09-02 05:49:01.360730 : Setting 3-way CR grants to 1 global-lru off? 0 2016-09-02 10:28:55.429293 : Setting 3-way CR grants to 0 global-lru off? 0 |
果然,产生死锁的SQL还是上边分析优化的SQL,其中会话信息为:(332,47221),我们去DBA_HIST_ACTIVE_SESS_HISTORY视图里查询:
SELECT D.SQL_ID, D.CURRENT_OBJ#, D.EVENT, COUNT(1)
FROM DBA_HIST_ACTIVE_SESS_HISTORY D
WHERE D.SAMPLE_TIME BETWEEN
TO_DATE('2016-09-01 18:25:00', 'YYYY-MM-DD HH24:MI:SS') AND
TO_DATE('2016-09-01 18:45:00', 'YYYY-MM-DD HH24:MI:SS')
AND D.BLOCKING_SESSION_STATUS = 'VALID'
AND D.SESSION_ID = 332