合 Oracle多列统计信息
简介
Oracle优化器对于基数值的估算是否准确关系到能否生成最优的执行计划,而基数值估算的准确性又取决于SQL中各个对象的统计信息是否完整、是否能真实反映出对象的数据分布情况。因此使用何种方法收集统计信息是很有讲究的:对于数据倾斜度较大的表需要收集直方图,在此基础上如果有多个列存在相关性,那么多列统计信息(也叫扩展统计信息)收集又是一个更好的选择。
在一般情况下,SQL语句的WHERE子句后面针对单张表都有多个条件,也就是根据多列的条件筛选得到数据。默认情况下,Oracle会把多列的选择率(Selectivity)相乘从而得到WHERE语句的选择率,但是这样有可能造成选择率不准确,从而导致优化器做出错误的判断。为了能够让优化器做出准确的判断,从而生成准确的执行计划,Oracle在11g数据库中引入了收集多列统计信息。多列统计信息包含列组统计信息(Column Group Statistics)和表达式的统计信息(Expression Statistics)。
使用程序包DBMS_STATS中的新函数CREATE_EXTENDED_STATS创建一个虚拟列,然后对表收集统计信息。如下所示,定义了两个扩展列:
1 2 3 4 5 6 7 | SELECT DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME => 'TEST', TABNAME => 'T', EXTENSION => '(UPPER(PAD))'), DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME => 'TEST', TABNAME => 'T', EXTENSION => '(VAL2,VAL3)') FROM DUAL; |
以上SQL是对TEST用户下的T表,分别基于表达式和基于多列创建虚拟列,下次再收集表的统计信息时,将会自动收集到多列统计信息。需要注意的是,不能对SYS用户下的表创建扩展的统计信息,否则会报错“ORA-20000: Unable to create extension: not supported for SYS owned table”。
使用Oracle自带的DBMS_STATS包提供的存储过程DROP_EXTENDED_STATS来删除扩展统计信息:
1 2 | EXEC DBMS_STATS.DROP_EXTENDED_STATS(OWNNAME => 'TEST',TABNAME => 'T',EXTENSION => '(UPPER(PAD))'); EXEC DBMS_STATS.DROP_EXTENDED_STATS(OWNNAME => 'TEST',TABNAME => 'T',EXTENSION => '(VAL2,VAL3)'); |
定义扩展统计信息也可以直接在包DBMS_STATS中指定METHOD_OPT,收集统计信息时,把列组合作为单独列使用,如下所示:
1 2 3 4 5 6 7 8 9 | BEGIN DBMS_STATS.GATHER_TABLE_STATS ( OWNNAME => 'SCOTT', TABNAME => 'BOOKS', ESTIMATE_PERCENT=> 100, METHOD_OPT => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS (HOTEL_ID,RATE_CATEGORY)', CASCADE => TRUE ); END; |
在视图DBA_STAT_EXTENSIONS中,可以看到在数据库中定义的扩展统计信息:
1 2 3 4 5 6 | SQL> SELECT EXTENSION_NAME, EXTENSION 2 FROM DBA_STAT_EXTENSIONS 3 WHERE TABLE_NAME='BOOKS'; EXTENSION_NAME EXTENSION ------------------------------ ------------------------------ SYS_STUW3MXAI1XLZHCHDYKJ9E4K90 ("HOTEL_ID","RATE_CATEGORY") |
当不清楚需要创建哪些列的扩展统计信息时,可以针对一个表,基于特定的工作负荷,通过使用DBMS_STATS.SEED_COL_USAGE和REPORT_COL_USAGE来确定需要哪些列组。需要注意的是,这种技术不适用于包含表达式列的统计工作。主要过程如下所示:
1 2 3 4 | EXEC DBMS_STATS.SEED_COL_USAGE(NULL,NULL,TIME_LIMIT=>100); EXPLAIN PLAN FOR SQL语句; SELECT DBMS_STATS.REPORT_COL_USAGE(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL; SELECT DBMS_STATS.CREATE_EXTENDED_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR') FROM DUAL; |
示例
首先,创建测试表:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | DROP TABLE T_ES_20170601_LHR; CREATE TABLE T_ES_20170601_LHR (C1 NUMBER,C2 VARCHAR2(2),C3 VARCHAR2(20)); DECLARE BEGIN FOR I IN 1 .. 5000 LOOP INSERT INTO T_ES_20170601_LHR VALUES (1, 'AA', DBMS_RANDOM.STRING('l', 20)); INSERT INTO T_ES_20170601_LHR VALUES (2, 'BB', DBMS_RANDOM.STRING('l', 20)); INSERT INTO T_ES_20170601_LHR VALUES (3, 'CC', DBMS_RANDOM.STRING('l', 20)); INSERT INTO T_ES_20170601_LHR VALUES (4, 'DD', DBMS_RANDOM.STRING('l', 20)); END LOOP; COMMIT; END; / INSERT INTO T_ES_20170601_LHR VALUES(11,'A','AAAAAAA'); INSERT INTO T_ES_20170601_LHR VALUES(22,'B','BBBBBBB'); INSERT INTO T_ES_20170601_LHR VALUES(33,'C','CCCCCCC'); INSERT INTO T_ES_20170601_LHR VALUES(44,'D','DDDDDDD'); COMMIT; |
数据分布如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | LHR@orclasm > SELECT COUNT(1) FROM T_ES_20170601_LHR; COUNT(1) ---------- 20004 LHR@orclasm > SELECT C1,C2,COUNT(1) FROM T_ES_20170601_LHR GROUP BY C1,C2 ORDER BY C1; C1 C2 COUNT(1) ---------- -- ---------- 1 AA 5000 2 BB 5000 3 CC 5000 4 DD 5000 11 A 1 22 B 1 33 C 1 44 D 1 8 rows selected. |
接下来收集T_ES_20170601_LHR表的统计信息,但不收集直方图的信息(收集前确认默认的ESTIMATE_PERCENT为AUTO_SAMPLE_SIZE):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | LHR@orclasm > SELECT DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT',NULL,NULL) FROM DUAL; DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT',NULL,NULL) ----------------------------------- DBMS_STATS.AUTO_SAMPLE_SIZE LHR@orclasm > EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'LHR',TABNAME=>'T_ES_20170601_LHR',METHOD_OPT=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. LHR@orclasm > SET LINESIZE 200 LHR@orclasm > SELECT OWNER,TABLE_NAME,NUM_DISTINCT,SAMPLE_SIZE,COLUMN_NAME,HISTOGRAM FROM DBA_TAB_COL_STATISTICS WHERE OWNER='LHR' AND TABLE_NAME='T_ES_20170601_LHR'; OWNER TABLE_NAME NUM_DISTINCT SAMPLE_SIZE COLUMN_NAME HISTOGRAM ------------------------------ ------------------------------ ------------ ----------- ------------------------------ --------------- LHR T_ES_20170601_LHR 8 20004 C1 NONE LHR T_ES_20170601_LHR 8 20004 C2 NONE LHR T_ES_20170601_LHR 20004 20004 C3 NONE |
下面分别执行如下2条SQL语句,然后查看预估行数:
SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';
SELECT * FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A';
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | LHR@orclasm > SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; COUNT(*) ---------- 5000 LHR@orclasm > EXPLAIN PLAN FOR SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA'; Explained. LHR@orclasm > SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY()); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 3668985715 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 6 | | | |* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 313 | 1878 | 27 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("C1"=1 AND "C2"='AA') LHR@orclasm > SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A'; COUNT(*) ---------- 1 LHR@orclasm > EXPLAIN PLAN FOR SELECT COUNT(*) FROM T_ES_20170601_LHR WHERE C1=11 AND C2='A'; Explained. LHR@orclasm > SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY()); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 3668985715 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 27 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 6 | | | |* 2 | TABLE ACCESS FULL| T_ES_20170601_LHR | 313 | 1878 | 27 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("C1"=11 AND "C2"='A') |
可以看到有如下的结果:
SELECT * FROM T_ES_20170601_LHR WHERE C1=1 AND C2='AA';--实际返回5000条,预估313条