Java Freeman: Simple B-tree Access

一、测试数据准备
create table t1
nologging
as
select
    trunc(dbms_random.value(0,25))    n1,
    rpad('x',40)            ind_pad,
    trunc(dbms_random.value(0,20))    n2,
    lpad(rownum,10,'0')        small_vc,
    rpad('x',200)            padding
from
    all_objects
where
    rownum <= 10000
;

create index t1_i1 on t1(n1, ind_pad, n2) nologging pctfree 91;

A couple of oddities about this data may need a little explanation. First, the pctfree setting on the index is unusually large; I’ve done this to force the index to spread itself across a large number of leaf blocks when it is first created. But pctfree does not apply to branch blocks—which is why I have introduced ind_pad as the second column of the index. Because this holds the same value for all rows, it doesn’t affect the overall statistics and distribution, but it does stop Oracle from being able to pack lots of rows into each branch block, which conveniently pushes the index up to a blevel of 2.

二、Cost计算公式
Before quoting the formula, though, it is worth creating a mental image of what you would do to walk through a typical index access path.
• You have predicates on some of the columns that define the index.
• You locate the root block of the index.
• You descend through the branch levels of the index to the leaf block that is the only place for the first possible entry (the start key) that could match your predicates.
• You walk along a chain of leaf blocks until you overshoot the entry that is the last possible entry (the stop key) that could match your predicates.
• For each index entry, you decide whether or not to visit a table block.

So the formula for the cost of accessing a table by way of an index ought to cover three block-related components: the number of branch levels you descend through, the number of leaf blocks you traverse, and the number of table block visits you make. And as we know, the optimizer assumes that a single block visit equates to a real I/O request—so the number of block visits is the cost (if you have not enabled CPU costing).

cost =
        blevel +
        ceiling(leaf_blocks * effective index selectivity) +
        ceiling(clustering_factor * effective table selectivity)

• The first line of the formula represents the number of block visits needed to descend through the index (excluding the cost of actually hitting the first leaf block you want). Oracle implements a version of Balanced B-trees, so every leaf block is the same distance from the root block, and we can talk about the height of the index quite safely. The number of layers of branch blocks you have to descend through is the same whichever leaf block you want to get to.
• The second line of the formula represents the number of leaf blocks that you will have to walk along to acquire all the rowids matching a given set of input values. The effective index selectivity corresponds to the entry labelled ix_sel in the 10053 trace file.
• The third line represents the number of visits to table blocks that you will have to make to pick up the rows by way of the selected index. The effective table selectivity corresponds to the thing that used to be labelled tb_sel in the 10053 trace file, but ends up being labelled (more accurately) as ix_sel_with_filters in the 10g trace file. This line often generates the biggest component of the cost, and introduces the biggest error in the calculation of the cost of using a B-tree index.

三、统计信息收集
begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for all columns size 1'
    );
end;
/

select
    table_name,
    blocks,
    num_rows
from    user_tables
where    table_name = 'T1'
;

TABLE_NAME               BLOCKS   NUM_ROWS
-------------------- ---------- ----------
T1                          371      10000

select
    num_rows, distinct_keys,
    blevel, leaf_blocks, clustering_factor,
    avg_leaf_blocks_per_key, avg_data_blocks_per_key
from
    user_indexes
where    table_name = 'T1'
and    index_name = 'T1_I1'
;

NUM_ROWS ：10000
DISTINCT_KEYS ：500
BLEVEL ：2
LEAF_BLOCKS ：1111
CLUSTERING_FACTOR ：9745
AVG_LEAF_BLOCKS_PER_KEY ：2
AVG_DATA_BLOCKS_PER_KEY ：19

select
    column_name,
    num_nulls, num_distinct, density,
    low_value, high_value
from
    user_tab_columns
where    table_name = 'T1'
and    column_name in ('N1','N2','IND_PAD')
order by
    column_name
;
COLUMN_NAME           NUM_NULLS NUM_DISTINCT    DENSITY
-------------------- ---------- ------------ ----------
IND_PAD                       0            1          1
N1                            0           25        .04
N2                            0           20        .05

四、clustering_factor
The clustering_factor is a measure that compares the order of the index with the degree of disorder in the table. The optimizer appears to calculate the clustering_factor by walking the table in index order and keeping track of how many times the walk jumped from one table block to another. (Of course, it doesn’t really work like this; the code simply scans the index extracting table block addresses from rowids.) On every jump, a counter is incremented—the final value of the counter is the clustering_factor.Figure 4-1 illustrates the principle.

In Figure 4-1, we have a table with four blocks and 20 rows, and an index on the column V1, whose values are shown. If you start to walk across the bottom of the index, the first rowid points to the third row in the first block. We haven’t visited any blocks yet, so this is a new block, so we count 1. Take one step along the index, and the rowid points to the fourth row of the second block—we’ve changed block, so increment the count. Take one step along the index, and the rowid points to the second row of the first block—we’ve changed block again, so
increment the count again. Take one step along the index, and the rowid points to the fifth row of the first block—we haven’t changed blocks, so don’t increment the count.
In the diagram, I have put a number against each row of the table—this is to show the value of the counter as the walk gets to that row. By the time we get to the end of the index, we have changed table blocks ten times, so the clustering factor is 10.
Given the way the clustering_factor is calculated, you will appreciate that the smallest possible value has to be the same as the number of blocks in the table, and the largest possible value has to be the same as the number of rows in the table—provided you have computed statistics.

四、测试
实例1
select
        small_vc
from
        t1
where
        n1      = 2
and     ind_pad = rpad('x',40)
and     n2      = 3
;

Execution Plan
----------------------------------------------------------
Plan hash value: 1429545322

---------------------------------------------------------------------
| Id | Operation                   | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    20 | 1160 |    25 |
|   1 | TABLE ACCESS BY INDEX ROWID| T1    |    20 | 1160 |    25 |
|* 2 |   INDEX RANGE SCAN          | T1_I1 |    20 |       |     5 |
---------------------------------------------------------------------

n1 = {constant} (Target: 1 row in 25, or 4% of the rows, or 0.04 * number of rows)
ind_pad = {constant} (Target: 100% of the rows)
n2 = {constant} (Target: 1 row in 20, or 5% of the rows, or 0.05 * number of rows)

利用上面公式计算

effective index selectivity = 0.04 * 1 * 0.05 = 0.002

effective table selectivity = 0.04 * 1 * 0.05 = 0.002

cost =
        blevel +
        ceiling(leaf_blocks * effective index selectivity) +
        ceiling(clustering_factor * effective table selectivity)

= 2 + ceiling(1111 * 0.002) + ceiling(9745 * 0.002)

= 2 + 3 + 20 = 25 (与Plan中的Cost相符）

实例2
select
        /*+ index(t1) */
        small_vc
from
        t1
where
        n1      = 2
and     ind_pad = rpad('x',40)
and     n2      between 1 and 3
;
Plan hash value: 1429545322

---------------------------------------------------------------------
| Id | Operation                   | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    82 | 4756 |    93 |
|   1 | TABLE ACCESS BY INDEX ROWID| T1    |    82 | 4756 |    93 |
|* 2 |   INDEX RANGE SCAN          | T1_I1 |   164 |       |    12 |
---------------------------------------------------------------------

selectivity (n2 between 1 and 3) =
required range / total range + 1/num_distinct + 1/num_distinct =

= (（3-1）/(20-1)+1/20+1/20) = 0.205263

effective index selectivity = 0.04 * 1 * 0.205263 = 0.0082105

effective table selectivity = 0.04 * 1 * 0.205263 = 0.0082105

cost = 2 + ceiling（1111 * 0.0082105） + ceiling（9745 * 0.0082105）= 93

实例3

select
   small_vc
from
   t1
where
   n1   = 2
and   ind_pad   = rpad('x',40)
and   n2   between 1 and 3
;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
----------------------------------------------------------
|   0 | SELECT STATEMENT |      |    82 | 4756 |    58 |
|* 1 | TABLE ACCESS FULL| T1   |    82 | 4756 |    58 |
----------------------------------------------------------

因为是全表检索，所以cost = blocks/Adjusted dbf_mbrc+1=371/6.41=58

实例4

select
    /*+ index(t1) */
    small_vc
from
    t1
where
    n1    between 1 and 3
and    ind_pad    = rpad('x',40)
and    n2    = 2
;

Execution Plan
----------------------------------------------------------
Plan hash value: 1429545322

---------------------------------------------------------------------
| Id | Operation                   | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    82 | 4756 |   264 |
|   1 | TABLE ACCESS BY INDEX ROWID| T1    |    82 | 4756 |   264 |
|* 2 |   INDEX RANGE SCAN          | T1_I1 |   163 |       |   184 |
---------------------------------------------------------------------

• The selectivity for column n1 is (3 – 1) / (24 – 0) + 2/25 = 0.1633333.
• The selectivity of ind_pad is still 1.
• The selectivity of n2 is still 0.05.

In this case, the effective index selectivity has to be calculated from the predicate on just the n1 column. Because the test on n1 is range-based, the predicates on index_pad and n2 do not restrict the number of index leaf blocks we have to walk. Of course, when we finally get to examine an index leaf row, we can use all three predicates to check whether we should go to the table, so the effective table selectivity still includes all three individual column predicates.

Cost = 2 +                                  -- blevel
                 Ceiling(0.1633333 * 1,111) +    -- 182
                 Ceiling(0.0081667 * 9745)       -- 80
         =   184 + 80 = 264                      -- as expected

实例5

select
    /*+ index(t1) */
    small_vc
from
    t1
where
    n1    between 1 and 3
and    ind_pad    = rpad('x',40)
and    n2    = 2
and    small_vc = lpad(100,10)
;

同实例4，只是table扫描的时候，把更多的不符合条件的record给过滤掉了。

实例6（Ranges Compared to In-Lists）

select
    /*+ index(t1) */
    small_vc
from
    t1
where    n1    = 5
and    ind_pad = rpad('x',40)
and    n2 in (1,6,18)
;

Execution Plan
----------------------------------------------------------
Plan hash value: 3456773112

----------------------------------------------------------------------
| Id | Operation                    | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |    60 | 3480 |    70 |
|   1 | INLIST ITERATOR             |       |       |       |       |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |    60 | 3480 |    70 |
|* 3 |    INDEX RANGE SCAN          | T1_I1 |    60 |       |    11 |
----------------------------------------------------------------------

• The selectivity for column n1 is 0.04.
• The selectivity of ind_pad is still 1.
• The selectivity of n2 is still 0.05 * 3.

cost = 2 + 1111*0.04*1*0.15 + 9745*0.04*0.15 =9 + 59 = 68

10g的时候cost就是68，但是不知道11g为什么变成70了。

alter session set optimizer_index_caching = 25;--cost=6+59=65

alter session set optimizer_index_caching = 50;--cost=6+59=65

alter session set optimizer_index_caching = 75;--cost=3+59=62

alter session set optimizer_index_caching = 100;--cost=0+59=65

OPTIMIZER_INDEX_CACHING AND IN-LISTS The optimizer_index_caching parameter was introduced in 8i to allow some room for correcting the optimizer’s assumption that all reads are physical reads. It is usually mentioned as having an impact on the cost calculation for index block accesses for the inner (second) table of nested loop joins, but it also has an effect on the cost calculation for in-list iteration. Very specifically, it does not have an impact on the cost calcu-
lation for a simple, single-table indexed access path.

具体是如何作用的，不是很清楚，待查。

实例7（部分index使用）

select
    /*+ index(t1) */
    small_vc
from
    t1
where
    n1=1
and    ind_pad    = rpad('x',40)
and    small_vc = lpad(100,10)
;

Execution Plan
----------------------------------------------------------
Plan hash value: 1429545322

---------------------------------------------------------------------
| Id | Operation                   | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    55 |   437 |
|* 1 | TABLE ACCESS BY INDEX ROWID| T1    |     1 |    55 |   437 |
|* 2 |   INDEX RANGE SCAN          | T1_I1 |   400 |       |    47 |
---------------------------------------------------------------------

• The selectivity for column n1 is 0.04.
• The selectivity of ind_pad is still 1.
• The selectivity of small_vc is still 1/10000. (这个只是用来过滤的条件，和Cost无关）

cost = 2 + 1111*0.04*1 + 9745*0.04* =2 + 45 * 390 = 437

实例8（Index Full Scans）

select
    /*+ index(t1) */
    small_vc
from
    t1
where    n2    = 2
order by n1
;

Execution Plan
----------------------------------------------------------
Plan hash value: 3901968637

---------------------------------------------------------------------
| Id | Operation                   | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   500 | 8500 | 1601 |
|   1 | TABLE ACCESS BY INDEX ROWID| T1    |   500 | 8500 | 1601 |
|* 2 |   INDEX FULL SCAN           | T1_I1 |   500 |       | 1113 |
---------------------------------------------------------------------

Informally, we have no restrictions on the index until after we have reached the leaf blocks, so we could expect the effective index selectivity to be 1.00 (100%). When we examine the leaf blocks, we can identify the entries where n2 = 2, a single predicate with a selectivity of 0.05, so we could expect this to be the effective table selectivity. So let’s put these numbers into the formula and check:

cost = 2 + 1111*1 + 9745*0.05 =2 + 1111 * 488 = 437

实例9（Index-only Queries）

select
    /*+ index(t1) */
    n2
from
    t1
where    n1    between 6 and 9
order by n1
;

Execution Plan
----------------------------------------------------------
Plan hash value: 3836157954

----------------------------------------------------------
| Id | Operation        | Name | Rows | Bytes | Cost |
----------------------------------------------------------
|   0 | SELECT STATEMENT |       | 2050 | 12300 |   230 |
|* 1 | INDEX RANGE SCAN| T1_I1 | 2050 | 12300 |   230 |
----------------------------------------------------------

• The selectivity for column n1 is (9 – 6) / (24 – 0) + 2/25 = 3/24 + 2/25 = 0.205.

所以数据可以再index中找到，所以不需要扫描表。

cost = 2 + 1111*0.205 =2 + 228 = 230

实例10（The Three Selectivities）

select
    /*+
        index(t1)
    */
    small_vc
from
    t1
where
    n1    between 1 and 3
and    ind_pad    = rpad('x',40)
and    n2    = 2
and    small_vc = lpad(100,10)
;

Execution Plan
----------------------------------------------------------
Plan hash value: 1429545322

---------------------------------------------------------------------
| Id | Operation                   | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    58 |   262 |
|* 1 | TABLE ACCESS BY INDEX ROWID| T1    |     1 |    58 |   262 |
|* 2 |   INDEX RANGE SCAN          | T1_I1 |    79 |       |   184 |
---------------------------------------------------------------------

• The selectivity for column n1 is （3-1）/24 + 1/25 + 1/25 = 0.1633333333.
• The selectivity of ind_pad is still 1.
• The selectivity of n2 is 0.05.

cost = 2 + 1111*0.16333333*1 + 9745*0.1633333333*1*0.05 =2 + 182 * 80 = 264

实例11（Index Skip Scan）
select /*+ index_ss(t1) */small_vc
from t1
where n1 between 1 and 3
and ind_pad= rpad('x',40)
and n2= 2
and small_vc = lpad(100,10);

Execution Plan
----------------------------------------------------------
Plan hash value: 2886394002

---------------------------------------------------------------------
| Id | Operation                   | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    58 |    86 |
|* 1 | TABLE ACCESS BY INDEX ROWID| T1    |     1 |    58 |    86 |
|* 2 |   INDEX SKIP SCAN           | T1_I1 |    73 |       |    14 |
---------------------------------------------------------------------

计算方法待查

Java Freeman

2011年7月22日星期五

Simple B-tree Access

没有评论:

发表评论