显示标签为“CBO”的博文。显示所有博文
显示标签为“CBO”的博文。显示所有博文

2011年7月31日星期日

Histograms

一、Getting Started
Everyone knows that histograms are supposed to be used for data where a column has a few special values that appear far more frequently than the rest; and everyone knows that you should only build histograms on indexed columns. These beliefs are so misleading that I’m going to ignore them for the moment and start my explanation of histograms with an example that has no extreme values and no indexes. 
1.1 测试数据
create table t1
as
with kilo_row as (
    select /*+ materialize */
        rownum
    from all_objects
    where rownum <= 1000
)
select
    trunc(7000 * dbms_random.normal)    normal
from
    kilo_row    k1,
    kilo_row    k2
where
    rownum <= 1000000
;

select
    tenth                        tenth,
    min(normal)                    low_val,
    max(normal)                    high_val,
    max(normal) - min(normal)            width,
    round(100000 / (max(normal) - min(normal)),2)     height
from (
    select
        normal,
        ntile(10) over (order by normal) tenth
    from t1
)
group by tenth
order by tenth
;
     TENTH    LOW_VAL   HIGH_VAL      WIDTH     HEIGHT
---------- ---------- ---------- ---------- ----------
         1     -32003      -8966      23037       4.34
         2      -8966      -5883       3083      32.44
         3      -5883      -3659       2224      44.96
         4      -3659      -1761       1898      52.69
         5      -1761         17       1778      56.24
         6         17       1792       1775      56.34
         7       1792       3678       1886      53.02
         8       3678       5897       2219      45.07
         9       5897       8974       3077       32.5
        10       8974      34660      25686       3.89


begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for columns normal size 10'
    );
end;
/

select
    rownum                    tenth,
    prev                    low_val,
    curr                    high_val,
    curr - prev                width,
    round(100000 / (curr - prev) , 2)    height
from
    (
    select
        endpoint_value            curr,
        lag(endpoint_value,1) over (
            order by endpoint_number
        )                 prev
    from
        user_tab_histograms
    where
        table_name = 'T1'
    and    column_name = 'NORMAL'
    )
where
    prev is not null
order by
    curr
;

结果同上。
从上面的结果可以看出,T1表的数据不是平均分布的,各个区段的分布密度是不一样的,这种情况下就需要使用histograms来优化。如果不使用histograms,就会认为是平均分布的,那么oracle计算sql plan的cost时候,就会出现问题,导致性能问题。

二、Generic Histograms
 You can actually use two different strategies: one for data sets with only a few (fewer than 255) distinct values, and one for data sets with lots of distinct values. Oracle calls the former a frequency histogram and the latter a height balanced histogram.
 After generating a histogram for a column, you will usually find that the density value reported in user_tab_columns (et al.) is no longer equal to 1/num_distinct.

三、When Oracle Ignores Histograms
It is quite expensive to generate histograms, so you should think very carefully about doing it. Oracle doesn’t use them very much, and often won’t get any benefit from them. In fact, there are several cases where Oracle ignores histograms when you might think they would be used.

3.1 Histograms and Joins
Oracle only uses histograms fully when there are highly visible input values—in other words, in cases where you have predicates of the form column operator constant (although the constant might come from peeking at a bind variable). Think about this carefully, and you realize that Oracle may not be able to make much use of histogram information to help it optimize joins.

3.2 Histograms and Distributed Queries
Oracle keeps on getting better at handling distributed queries—but even in Oracle 10g, the optimizer doesn’t try to pull histograms from the remote site(s) to improve its execution plan,and it’s not just a question of ignoring histograms on join columns. Even when you have remote columns compared to literal constants, Oracle does not attempt to use the histogram. Again, it falls back to the num_distinct (or density, depending on version).

四、Frequency Histograms
The value 1 appears once, 2 appears twice, and so on up to the value 80 that appears 80 times for a total of 3,240 rows.

create table generator as
select
    rownum     id
from    all_objects
where    rownum <= 2000
;

create table t1 (
    skew        not null,   
    padding
)
as
select
    /*+ ordered use_nl(v2) */
    v1.id,
    rpad('x',400)
from
    generator    v1,
    generator    v2
where
    v1.id <= &m_demo_size
and    v2.id <= &m_demo_size
and    v2.id <= v1.id
order by
    v2.id,v1.id
;

create index t1_i1 on t1(skew);

测试实例1
begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for all columns size 1'
    );
end;
/

select
    num_distinct, density, num_Buckets
from
    user_tab_columns
where
    table_name = 'T1'
and    column_name = 'SKEW'
;
NUM_DISTINCT    DENSITY NUM_BUCKETS
------------ ---------- -----------
          80      .0125           1

select skew, padding
from t1
where skew = 5
;
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    41 | 16564 |    32 |
|*  1 |  TABLE ACCESS FULL| T1   |    41 | 16564 |    32 |
----------------------------------------------------------
Cardinality = 0.0125 * 3240 = 41

测试实例2
begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for all columns size 80'
    );
end;
/

select
    num_distinct, density, num_Buckets
from
    user_tab_columns
where
    table_name = 'T1'
and    column_name = 'SKEW'
;

NUM_DISTINCT    DENSITY NUM_BUCKETS
------------ ---------- -----------
          80 .000154321          80

density = 1 / (2 * num_rows)

select
    endpoint_number, endpoint_value
from
    user_tab_histograms
where
    column_name = 'SKEW'
and    table_name = 'T1'
order by
    endpoint_number
;
ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              1              1
              3              2
              6              3
             10              4
             ...
           3160             79
           3240             80

select
    endpoint_value                row_value,
    curr_num - nvl(prev_num, 0)        row_count
from    (
    select
        endpoint_value,
        endpoint_number            curr_num,
        lag(endpoint_number,1) over (
            order by endpoint_number
        )                prev_num
    from
        user_tab_histograms
    where
        column_name = 'SKEW'
    and    table_name = 'T1'
)
order by
    endpoint_value
;
ROW_VALUE  ROW_COUNT
---------- ----------
         1          1
         2          2
         3          3
         4          4
         5          5
        ...
         80         80

select
    skew, padding
from
    t1
where
    skew = 5
;
Execution Plan
----------------------------------------------------------
Plan hash value: 1429545322

---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     5 |  2020 |     6 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |     5 |  2020 |     6 |
|*  2 |   INDEX RANGE SCAN          | T1_I1 |     5 |       |     1 |
---------------------------------------------------------------------
根据上面histograms的统计信息,skew = 5的Cardinality = 5

select
    skew, padding
from
    t1
where
    skew = 70
;
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    70 | 28280 |    32 |
|*  1 |  TABLE ACCESS FULL| T1   |    70 | 28280 |    32 |
----------------------------------------------------------
根据上面histograms的统计信息,skew = 70的Cardinality = 70

Predicate                   Description                              CBO    Human
skew = 40                   Column = constant                        40     40
skew = 40.5                 Column = nonexistent, but in-range       1      0
skew between 21 and 24      Between range with mapped values         90     90
skew between 20.5 and 24.5  Between range with mapped values         90     90
skew between 1 and 2        Between range at extremes                3      3
skew between 79 and 80      Between range at extremes                159    159
skew > 4 and skew < 8       Greater than/less than range             18     18
skew = -10                  Below high value                         1      0
skew = 100                  Above high value                         1      0
skew between –5 and –3    Range below low value                    1      0
skew between 92 and 94      Range above high value                   1      0
skew between 79 and 82      Range crossing boundary                  159    159
skew = :b1                  Column = :bind                           41     ???
skew between :b1 and :b2    Column between :bind1 and :bind2         8      ???

In the two examples with bind variables, we see that the CBO has used the standard
0.25% for the range, and when we check column = :bind, we find that 41 comes from
num_rows / num_distinct.
Finally, looking carefully at the density, we discover (and confirm with a few more test data sets) that density = 1 / (2 * num_rows)—which is probably why we end up seeing a cardinality of 1 whenever we go outside the low/high range, or ask for a value that is not in the histogram.

- Frequency histograms have no impact on expressions with bind variables (if they are not peeked). The selectivity on column = :bind is still 1 / num_distinct, the selectivity on ranges are still 5% and 0.25% for unbounded and bounded ranges.

五、“Height Balanced” Histograms
5.1 每个值建立一个histogram
begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for all columns size 80'
    );
end;
/

select
    num_distinct, density, num_Buckets
from
    user_tab_columns
where
    table_name = 'T1'
and    column_name = 'SKEW'
;
NUM_DISTINCT    DENSITY NUM_BUCKETS
------------ ---------- -----------
          80 .000154321          80

select
    endpoint_number, endpoint_value
from
    user_tab_histograms
where
    column_name = 'SKEW'
and    table_name = 'T1'
order by
    endpoint_number
;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              1              1
              3              2
              6              3
             10              4
             ...
           3160             79
           3240             80

select
    count(*)
from    t1
where    skew = 77
;
Execution Plan
----------------------------------------------------------
Plan hash value: 2432955788

-----------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |     3 |     1 |
|   1 |  SORT AGGREGATE   |       |     1 |     3 |       |
|*  2 |   INDEX RANGE SCAN| T1_I1 |    77 |   231 |     1 |
-----------------------------------------------------------
根据上面histograms的统计信息,skew = 77的Cardinality = 77

select
    count(*)
from    t1
where    skew = 75
;
Execution Plan
----------------------------------------------------------
Plan hash value: 2432955788

-----------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |     3 |     1 |
|   1 |  SORT AGGREGATE   |       |     1 |     3 |       |
|*  2 |   INDEX RANGE SCAN| T1_I1 |    75 |   225 |     1 |
-----------------------------------------------------------
根据上面histograms的统计信息,skew = 75的Cardinality = 75

5.2 建立比num_distinct值少的histogram

begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for all columns size 75'
    );
end;
/


select
    num_distinct, density, num_Buckets
from
    user_tab_columns
where
    table_name = 'T1'
and    column_name = 'SKEW'
;
NUM_DISTINCT    DENSITY NUM_BUCKETS
------------ ---------- -----------
          80 .013973812          75



select
    endpoint_number, endpoint_value
from
    user_tab_histograms
where
    column_name = 'SKEW'
and    table_name = 'T1'
order by
    endpoint_number
;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
             ...
             46             63
             48             64
             49             65
             51             66
             52             67
             54             68
             56             69
             57             70
             59             71
             60             72
             62             73
             64             74
             66             75
             67             76
             69             77
             71             78

六、The Arithmetic

When the optimizer takes advantage of a height balanced histogram to work out cardinality, it adopts one of three major strategies.
 - Strategy 1 for calculating selectivity/cardinality: For column = constant, the constant may be a popular value. A popular value is one that fills at least one bucket, and therefore results in a gap in user_tab_histograms view. If Oracle spots such a value, it does its arithmetic by buckets. For example, in the preceding, the value 77 appears at endpoint 69, and the previous value (76)
appears at endpoint 67: consequently Oracle considers 77 to be a popular value that spans 2 buckets out of 75. The selectivity of 77 is therefore 2/75, and the cardinality is 3,240 * 2/75 = 86.4.

select
    count(*)
from    t1
where    skew = 77
;

Execution Plan
----------------------------------------------------------
Plan hash value: 2432955788

-----------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |     3 |     1 |
|   1 |  SORT AGGREGATE   |       |     1 |     3 |       |
|*  2 |   INDEX RANGE SCAN| T1_I1 |    86 |   258 |     1 |
-----------------------------------------------------------

- Strategy 2 for calculating selectivity/cardinality: We have already noted that 70 is not a special value, so what happens when we execute a query to select all rows for that value? Oracle uses the density (not 1/num_distinct, note). We have the density recorded as 0.013973812, so we expect a cardinality of 3,240 * .013973812 = 45.27,
Execution Plan
----------------------------------------------------------
Plan hash value: 2432955788

-----------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |     3 |     1 |
|   1 |  SORT AGGREGATE   |       |     1 |     3 |       |
|*  2 |   INDEX RANGE SCAN| T1_I1 |    29 |    87 |     1 |
-----------------------------------------------------------
实际的Cardinality是29,这个可能是11g的一个bug。有待12g中测试。

In purely descriptive terms, the density is as follows:
  sum of the square of the frequency of the nonpopular values /
  (number of nonnull rows * number of nonpopular nonnull rows)

 - Strategy 3 for calculating selectivity/cardinality: After that brief digression, we can investi-gate the issue of range scans.

Strategy 3的测试实例如下。
测试实例1
The script generates 10,000 rows using a scaled normal distribution centered on the value 3,000 ranging from –5,000 to 11,000. Superimposed on this distribution, I have 500 rows each of the values 500, 1,000, 1,500, and so on up to 10,000—a list of 20 popular values.After analyzing this table with 250 buckets (80 rows per bucket,总共20000条记录), I get a histogram that shows the spikes from my 20 popular values. The number of distinct values is 5,626 and the column density is 0.000119361 (a convincing drop from 1 / 5,626 = 0.000177746).My first test case is a range scan across buckets with no popular values: n1 between 100 and 200. If we check the histogram, we find that the required values fall across three buckets with endpoints 17, 117, and 251.

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              8           -120
              9             17
             10            117
             11            251
             12            357

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    63 |   945 |    60 |
|*  1 |  TABLE ACCESS FULL| T1   |    63 |   945 |    60 |
----------------------------------------------------------

Selectivity = (required range) / (high value - low value) + 2 * density =
              (200–117)/(251-117) + (117-100)/(117-17) + 2 * 0.000177746 =
              0.619403 + 0.17 + .000355486 =
              0.789047508
Cardinality = selectivity * number of rows IN A BUCKET =
              0.789047508 * 80 = 63.1238

测试实例2
The second test case is, of course, a range scan across buckets with a popular value: n1 between 400 and 600. (One of my spikes was at n1 = 500.) We need to check up on a little section from the histogram that will be relevant:
ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
             12            357
             13            450
             19            500        ***
             20            520
             21            598
             22            670
Note that 500 is a (very) popular value—with six buckets allocated from the histogram. So we expect to account for at least six buckets plus the rest of the range. Checking carefully, we see that the range 400 to 600 will extend from buckets 12 to 22—broken down as follows:
• A selection from 357 to 450
• All buckets from 450 to 598
• A selection from 598 to 670

So we have eight whole buckets (endpoint_numbers 13 to 21) plus
(450 - 400) / (450 - 357) + (600 - 598) / (670 - 598) + 2 * 0.000177746 =
        50 / 93 + 2 / 72 + 0.000355486 =
        0.537634 + 0.0277778 + 0.000355486 =
        0.565768
Remembering to add the 8 for the complete buckets, we get a cardinality of
Cardinality = selectivity * number of rows IN A BUCKET = 8.565867 * 80 = 685.3
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   685 | 10275 |    60 |
|*  1 |  TABLE ACCESS FULL| T1   |   685 | 10275 |    60 |
----------------------------------------------------------

七、Data Problems Revisited
7.1 Daft Datatypes
create table t1 (
    d1        date,
    n1        number(8),
    v1        varchar2(8)
)
;

insert into t1
select
    d1,
    to_number(to_char(d1,'yyyymmdd')),
    to_char(d1,'yyyymmdd')
from    (
    select
        to_date('31-Dec-1999') + rownum    d1
    from all_objects
    where
        rownum <= 1827
    )
;

As we saw in Selectivity issue, a query with a predicate of
where n1 between 20021230 and 20030105
produced a cardinality of 396 in Oracle。

下面是使用直方图的时候的测试。
begin
    dbms_stats.gather_table_stats(
        ownname            => user,
        tabname            => 't1',
        cascade            => true,
        estimate_percent    => null,
        method_opt        =>'for all columns size 120'
    );
end;
/
select
    rownum                    bucket,
    prev                    low_val,
    curr                    high_val,
    curr - prev                width,
    round( (1827/120) / (curr - prev) , 4)    height
from
    (
    select
        endpoint_value                curr,
        lag(endpoint_value,1) over (
            order by endpoint_number
        )                     prev
    from
        user_tab_histograms
    where
        table_name = 'T1'
    and    column_name = 'N1'
    )
where
    prev is not null
order by
    curr
;
    BUCKET    LOW_VAL   HIGH_VAL      WIDTH     HEIGHT
---------- ---------- ---------- ---------- ----------
        68   20021028   20021112         84      .1813
        69   20021112   20021127         15      1.015
        70   20021127   20021212         85      .1791
        71   20021212   20021227         15      1.015
        72   20021227   20030111       8884      .0017
        73   20030111   20030126         15      1.015
        74   20030126   20030210         84      .1813
        75   20030210   20030225         15      1.015
        76   20030225   20030312         87       .175
        77   20030312   20030327         15      1.015
        78   20030327   20030411         84      .1813
        79   20030411   20030426         15      1.015
        80   20030426   20030511         85      .1791
        81   20030511   20030526         15      1.015
        82   20030526   20030610         84      .1813
        83   20030610   20030625         15      1.015
        84   20030625   20030710         85      .1791

select num_distinct, density, num_Buckets from user_tab_columns where table_name = 'T1' and column_name = 'N1' ;

NUM_DISTINCT    DENSITY NUM_BUCKETS
------------ ---------- -----------
        1827 .000547345         120


Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    15 |   345 |     4 |
|*  1 |  TABLE ACCESS FULL| T1   |    15 |   345 |     4 |
----------------------------------------------------------

selectivity =
        ('required range' / 'total range of bucket' ) + 2 * density =
        (20030105 - 20021230) / (20030111 - 20021227) + 2 * 0.000547345 =
           8875 / 8884 + 0.00109469 =
                1.00008
But this is the selectivity that has to be applied to the number of rows in the buckets involved—not to the entire table—and there are 1,827 / 120 = 15 rows per bucket—round(15 * 1.00008) = 15, as required.

7.2 Dangerous Defaults
同上。都存在直方图Figure7-2问题。

2011年7月30日星期六

Selectivity Issues

一、Different Data Types
We have been using the basic formula for the selectivity of a range-based predicate, and know that this varies slightly depending on whether you have zero, one, or two closed (meaning the equality in <=, >=) ends to your range. Using N as the number of closed ends, the version of the formula that uses user_tab_columns.num_distinct can be written as

   (required range) / (column high value - column low value) + N / num_distinct

For example, in a column with 1,000 different (integer) values ranging from 1 to 1000, and the pair of predicates colX > 10 and colX <= 20, you have one closed end (<=), and the formula would give you
   (20 - 10) / (1000 - 1) + 1/1000 = 10/999 + 1/1000

1.1 Date Values
Consider a date-only column holding values from 1 January 2000 to 31 December 2004; a predicate date_col between 30th Dec 2002 and 5th Jan 2003 would then give you a selectivity of
   (5th Jan 2003 - 30th Dec 2002 ) / ( 31st Dec 2004 - 1st Jan 2000 ) +
   2/(number of different dates) = 6 / 1826 + 2 / 1827

1.2 Character Values
In a test table, t1, with two columns, one of type varchar2(10) and one of type char(10), I have inserted the strings 'Aardvark', 'Apple', 'Blueberry', and 'Zymurgy' in both columns, and then generated statistics—including histograms.

create table t1 (
    v10    varchar2(10),
    c10    char(10)
)
;

insert into t1 values('Apple','Apple');
insert into t1 values('Blueberry','Blueberry');
insert into t1 values('Aardvark','Aardvark');
insert into t1 values('Zymurgy','Zymurgy');

commit;

begin
    dbms_stats.gather_table_stats(
        ownname            => user,
        tabname            => 't1',
        cascade            => true,
        estimate_percent    => null,
        method_opt        =>'for all columns size 10'
    );
end;
/

select
        column_name,
        endpoint_number,
        endpoint_value
from
        user_tab_histograms
where
        table_name = 'T1'
order by
        column_name,
        endpoint_Number
;

Col   End no                                        End Value
----- ------ ------------------------------------------------
C10        1  339,475,752,638,459,000,000,000,000,000,000,000     -- 'Aardvark  '
           2  339,779,832,781,209,000,000,000,000,000,000,000     -- 'Apple     '
           3  344,891,393,972,447,000,000,000,000,000,000,000     -- 'Blueberry '
           4  469,769,561,047,943,000,000,000,000,000,000,000     -- 'Zymurgy   '
V10        1  339,475,752,638,459,000,000,000,000,000,000,000     -- 'Aardvark'
           2  339,779,832,781,057,000,000,000,000,000,000,000     -- 'Apple'
           3  344,891,393,972,447,000,000,000,000,000,000,000     -- 'Blueberry'
           4  469,769,561,047,943,000,000,000,000,000,000,000     -- 'Zymurgy'

Oracle appears to behave as follows:
• Extract a maximum of 32 bytes from the column; this representation of the column value is how the low_value, high_value and end_point_actual values are stored.
• Extract the first 15 bytes from the 32, padding with zeros at the right if necessary.
• Convert the 15 bytes hex number to decimal and round to 15 significant figures.

Let’s take a worked example—the string 'Aardvark'—to see if we can end up with the value stored in the histogram:
• 'Aardvark', when dumped in hex from the char(10) column, contains the following list of byte values (note the 20,20 space padding that is imposed by the char(10) definition):
'41,61,72,64,76,61,72,6B,20,20'.
• Since this is less than 15 bytes, we append a few zeros to produce the number
0x416172647661726B20200000000000. (If the column had been declared as char(40) the value would already have been padded with spaces (0x20) up to 40 characters, so we would stop at the fifteenth byte, and the value we would use would look like 0x416172647661726B20202020202020.)
• Converting this rather large hex number to decimal we get 339,475,752,638,459,043,065,991,628,037,554,176.
• And if we throw away everything after the first 15 digits then, as required, we get
339,475,752,638,459,000,000,000,000,000,000,000.

Looking at the rounding, you might notice that after about the first six or seven characters of a string, the rest of the letters don’t have any impact on the numeric representation used by the optimizer—which is why the numeric values for 'Apple' are the only ones that vary when comparing the char(10) with the varchar2(10) versions. We don’t really need to go into any more detail—but you could imagine that the optimizer might have trouble coping with data consisting (for example) of URLs, when lots of them start with http://.

The problem can be much worse, in fact, because of the increasing popularity of national language support. If you pick a multibyte character set for your database character set, then Oracle will be using the first 15 bytes of the string, not the first 15 characters. So the precision gets even worse.

1.3 Daft Data Types
One of the first examples in this chapter examined the predicate date_col between 30th Dec 2002 and 5th Jan 2003 on a table holding data from 1 January 2000 to 31 December 2004, and we saw how the optimizer could work out the selectivity of such a predicate. But let’s see how things can go wrong in popular implementations of storing dates. We create and populate a table with one row per date over a five-year date range

create table t1 (
    d1        date,
    n1        number(8),
    v1        varchar2(8)
)
;

insert into t1
select
    d1,
    to_number(to_char(d1,'yyyymmdd')),
    to_char(d1,'yyyymmdd')
from    (
    select
        to_date('31-Dec-1999') + rownum    d1
    from all_objects
    where
        rownum <= 1827
    )
;

begin
    dbms_stats.gather_table_stats(
        ownname            => user,
        tabname            => 't1',
        cascade            => true,
        estimate_percent    => null,
        method_opt        =>'for all columns size 1'
    );
end;
/

测试实例1
select    *
from    t1
where     d1 between to_date('30-Dec-2002','dd-mon-yyyy')
       and     to_date('05-Jan-2003','dd-mon-yyyy')
;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     8 |   184 |     4 |
|*  1 |  TABLE ACCESS FULL| T1   |     8 |   184 |     4 |
----------------------------------------------------------
selectivity=6/1826 + 2/1827
cardinality=selectivity*row_nums=8(四舍五入)

测试实例2

select    *
from    t1
where     n1 between 20021230 and 20030105
;
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   396 |  9108 |     4 |
|*  1 |  TABLE ACCESS FULL| T1   |   396 |  9108 |     4 |
----------------------------------------------------------
Selectivity = (required range) / (high value - low value) + 2/num_distinct
                 = (20030105 - 20021230) / (20041231 - 20000101) + 2/1827
                 =         8875          /         41130         + 2/1827
                 = 0.215779 + 0.001095
                 = 0.216874           -- more than 20% of the data apparently!
Cardinality = Selectivity * row_nums = 396

测试实例3
select    *
from    t1
where     v1 between '20021230' and '20030105'
;
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   396 |  9108 |     4 |
|*  1 |  TABLE ACCESS FULL| T1   |   396 |  9108 |     4 |
----------------------------------------------------------
同测试实例2
Oracle seems to have treated the varchar2 column and its predicate values as if they were numeric. It is possible that a programmer in the CBO group has slipped in a “cunning guess” tweak to the optimizer in this very special case. The tweak disappears as soon as you format the date-like string to include nonnumeric characters (e.g., '2002-01-02').

二、Leading Zeros
create table t1
nologging
pctfree 0
as
with generator as (
    select    --+ materialize
        rownum     id
    from    all_objects
    where    rownum <= 2000
)
select
    /*+ ordered use_nl(v2) */
    trunc((rownum-1)/10000)        grp_id,
    lpad(rownum,18,'0')        id
--    'A' || lpad(rownum, 17, '0')    id
from
    generator    v1,
    generator    v2
where
    rownum <= 2000000
;

begin
    dbms_stats.gather_table_stats(
        ownname            => user,
        tabname            => 't1',
        cascade            => true,
        estimate_percent    => null,
        method_opt        =>'for all columns size 1'
    );
end;
/

select
    *
from t1
where
    id between '000000000000060000'
       and       '000000000000070000'
;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10002 |   214K|  1059 |
|*  1 |  TABLE ACCESS FULL| T1   | 10002 |   214K|  1059 |
----------------------------------------------------------
11g的Cardinality已经修改过来了,9g的时候Cardinality计算是不对的。

三、Deadly Defaults
What might the average database-independent developer choose as a good value to represent a null date? How about something far in the future, like 31 December 4000?
Doing a quick calculation for the range scan selectivity, you believe the selectivity is
  ( 31 Dec 2003 - 01 Jan 2003 ) / (31 Dec 2004 - 01 Jan 2000)  + 2/1827 = 0.20044
But, given the extra, far-out, value, the optimizer thinks the selectivity is
  ( 31 Dec 2003 - 01 Jan 2003 ) / (31 Dec 4000 - 01 Jan 2000)  + 2/1828 = 0.00159

四、Function Figures
function(colx) = 'SMITH'             Fixed 1% selectivity.
not function(colx) = 'SMITH'         Fixed 5% of selectivity.
function(colx) > 'SMITH'             Fixed 5% of selectivity.
not function(colx) > 'SMITH'         Fixed 5% of selectivity.
function(colx) >= 'SMITH' and function(colx) < 'SMITI'   
                                     Derived 0.25% selectivity (5% * 5%).
function(colx) between 'SMITHA' and 'SMITHZ'  
                                     Derived 0.25% selectivity (5% * 5%).
not function(colx) between 'SMITHA' and 'SMITHZ'  
                                     Derived 9.75% selectivity (5%+5%–(5% *5%)).
function(colx) like 'SMITH%'         Fixed 5% of selectivity
not function(colx) like 'SMITH%'     Fixed 5% of selectivity.
function(colx) in ('SMITH','JONES')  Derived 1.99% (1% + 1% – (1% * 1%))

五、Correlated Columns
create table t1
nologging
as
select
    trunc(dbms_random.value(0,25))    n1,
    rpad('x',40)            ind_pad,
    trunc(dbms_random.value(0,20))    n2,
    lpad(rownum,10,'0')        small_vc,
    rpad('x',200)            padding
from
    all_objects
where
    rownum  <= 10000
;

update t1 set n2 = n1;

create index t1_i1 on t1(n1, ind_pad, n2)
nologging
pctfree 91
;

begin
    dbms_stats.gather_table_stats(
        ownname => user,
        tabname    => 'T1',
        cascade    => true,
        estimate_percent => null,
        method_opt => 'for all columns size 1'
    );
end;
/

select   
    table_name,
    blocks,
    num_rows
from    user_tables
where    table_name = 'T1'
;
TABLE_NAME               BLOCKS   NUM_ROWS
-------------------- ---------- ----------
T1                          387      10000

select
    num_rows, distinct_keys,
    blevel, leaf_blocks, clustering_factor,
    avg_leaf_blocks_per_key, avg_data_blocks_per_key
from
    user_indexes
where    table_name = 'T1'
and    index_name = 'T1_I1'
;


  NUM_ROWS DISTINCT_KEYS     BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
---------- ------------- ---------- ----------- ----------------- ----------------------- -----------------------
     10000            25          2        1107              6153                      44                     246

select
    column_name,
    num_nulls, num_distinct, density,
    low_value, high_value
from
    user_tab_columns
where    table_name = 'T1'
and    column_name in ('N1','N2','IND_PAD')
order by
    column_name
;
Col                   NUM_NULLS NUM_DISTINCT    DENSITY
-------------------- ---------- ------------ ---------- IND_PAD                       0            1         1  
N1                            0           25        .04
N2                            0           25        .04

测试实例1
select
    /*+ index(t1) */
    small_vc
from
    t1
where
    n1    = 2
and    ind_pad    = rpad('x',40)
and    n2    = 2
;
Execution Plan
----------------------------------------------------------
Plan hash value: 1429545322

---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   400 | 23200 |   294 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   400 | 23200 |   294 |
|*  2 |   INDEX RANGE SCAN          | T1_I1 |   400 |       |    47 |
---------------------------------------------------------------------
11g能够意思到n1和n2是有关联的,通过index的DISTINCT_KEYS来计算Cardinality
Cardinality = row_nums / DISTINCT_KEYS = 10000 / 25 = 400

2011年7月27日星期三

The Clustering Factor

一、BaseLine Example
1.1测试数据准备
create table t1(
    date_ord    date        constraint t1_dto_nn    not null,
    seq_ord        number(6)    constraint t1_sqo_nn    not null,
    small_vc    varchar2(10)
)
pctfree 90
pctused 10
;

drop sequence t1_seq;
create sequence t1_seq;

create or replace procedure t1_load(i_tag varchar2) as

m_date    date;   

begin
    dbms_output.put_line(
        dbms_lock.request(
            1,dbms_lock.s_mode, release_on_commit => true
        )
    );
    commit;
    for i in 0..25 loop
        m_date :=  trunc(sysdate) + i;

        for j in 1..200 loop
            insert into t1 values(
                m_date,
                t1_seq.nextval,
                i_tag || j
            );
            commit;
            dbms_lock.sleep(0.01);
        end loop;
    end loop;
end;
/


begin
    dbms_output.put_line(
        dbms_lock.request(
            1,dbms_lock.x_mode,
            release_on_commit=>true
        )
    );
end;
/

prompt
prompt    From five different sessions, execute a call to procedure
prompt    t1_load with a different tag for each session, e.g.
prompt        execute t1_LOAD('a')
prompt
prompt    These will all suspend, waiting to acquire a share lock that
prompt    this progam is holding in exclusive mode. When all five sessions
prompt    have called the procedure, press return in this screen to allow them
prompt    to continue

accept x

commit;

create index t1_i1 on t1(date_ord, seq_ord);


1.2统计信息收集
rem
rem    Report how many times five processes have collided on a block
rem    The first column is the number of different processes that
rem    have insertde into a block, the second is the number of blocks
rem    with that many hits.
rem
select    ct, count(*)
from
    (
    select block, count(*) ct
    from
        (
        select
            distinct dbms_rowid.rowid_block_number(rowid) block,
            substr(small_vc,1,1)
        from t1
        )
    group by block
    )
group by ct
;
        CT   COUNT(*)
---------- ----------
         5        738   有738个数据块是5个session一起插入数据的
         4          2   有2个数据块是4个session一起插入数据的
         3          1   有2个数据块是3个session一起插入数据的


begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for all columns size 1'
    );
end;
/

select
    blocks,
    num_rows
from
    user_tables
where
    table_name = 'T1';

    BLOCKS   NUM_ROWS
---------- ----------
       744      26000

前面的查询结果来看,真正用于数据存储的使用了738+2+1=741,剩下的3个数据块用于甚么去了呢?待查?

select
    index_name, blevel, leaf_blocks, clustering_factor
from
    user_indexes
where   
    table_name = 'T1'
;

INDEX_NAME               BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- ---------- ----------- -----------------
T1_I1                         1          86               840

1.3测试
实例1
select
    count(small_vc)
from
    t1
where
    date_ord = trunc(sysdate) + 7
;

Execution Plan
----------------------------------------------------------
Plan hash value: 269862921

----------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost  |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    13 |    38 |
|   1 |  SORT AGGREGATE              |       |     1 |    13 |       |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |  1000 | 13000 |    38 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |  1000 |       |     5 |
----------------------------------------------------------------------
cost =
        blevel +
        ceil(effective index selectivity * leaf_blocks) +
        ceil(effective table selectivity * clustering_factor)
In this case, we are after 1 day out of 26—a selectivity of 3.846% or 0.03846—and the two selectivities are identical. Putting these figures into the formula:
cost =
        1 +
        ceil(0.03846 * 86) +
        ceil(0.03846 * 840)
        = 1 + 4 + 33 = 38

select
        /*+ full(t1) */
        rowid, date_ord, seq_ord, small_vc
from
        t1
where
        rownum <= 10
;

ROWID              DATE_ORD     SEQ_ORD SMALL_VC
------------------ --------- ---------- ----------
AAAMJHAAJAAAAAKAAA 26-JUL-11          1 A1
AAAMJHAAJAAAAAKAAB 26-JUL-11          2 B1
AAAMJHAAJAAAAAKAAC 26-JUL-11          3 C1
AAAMJHAAJAAAAAKAAD 26-JUL-11          4 A2
AAAMJHAAJAAAAAKAAE 26-JUL-11          5 D1
AAAMJHAAJAAAAAKAAF 26-JUL-11          6 E1
AAAMJHAAJAAAAAKAAG 26-JUL-11          7 B2
AAAMJHAAJAAAAAKAAH 26-JUL-11          8 D2
AAAMJHAAJAAAAAKAAI 26-JUL-11          9 B3
AAAMJHAAJAAAAAKAAJ 26-JUL-11         10 E2

Remember that the extended rowid is made up of the following:
• object_id First six letters (AAAMJH)
• Relative file_id Next three letters (AAJ)
• Block within file Next six letters (AAAAAK)
• Row within block Last three letters (AAA, AAB, AAC ...)
All these rows are in the same block (AAAAAK). In my test run, I populated the column small_vc with a tag that could be used to identify the process that inserted the row. All five of our processes were busy hitting the same table block at the same time. In a very busy system (in particular, one with a high degree of concurrency), we might have seen lots of buffer busy waits for blocks of class data block for whichever one block was the current focus of all the inserts.
How do we address this issue? Simple: we read the advice in the Oracle Performance Tuning Guide and Reference and (for older versions of Oracle particularly) realize that we should have created the table with multiple freelists. In this case, because we expect the typical degree of concurrency to be 5, we might go to exactly that limit, and create the table with the extra clause:
  storage (freelists 5)

二、Reducing Table Contention (Multiple Freelists)
把1.1测试数据准备部分的表结构定义改成如下定义。
create table t1(
    date_ord    date        constraint t1_dto_nn    not null,
    seq_ord        number(6)    constraint t1_sqo_nn    not null,
    small_vc    varchar2(10)
)
pctfree 90
pctused 10
storage (freelists 5)
;

        CT   COUNT(*)
---------- ----------
         1        298
         3        445



    BLOCKS   NUM_ROWS
---------- ----------
       754      26000

INDEX_NAME               BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- ---------- ----------- -----------------
T1_I1                         1          86             16330

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    13 |   116 |
|   1 |  SORT AGGREGATE    |      |     1 |    13 |       |
|*  2 |   TABLE ACCESS FULL| T1   |  1000 | 13000 |   116 |
-----------------------------------------------------------
全表查询cost=754/6.41=118
如果走index,按照上面的公式
cost =
        1 +
        ceil(0.03846 * 86) +
        ceil(0.03846 * 16330)
        = 1 + 4 + 629 = 634
所以Oracle选择使用全表扫描。

三、Reducing Leaf Block Contention (Reverse Key Indexes)
利用1.1测试数据准备部分创建数据,之后使用下面的index定义创建reverce index。
alter index t1_i1 rebuild reverse;

A reverse key index operates by reversing the byte order of each column of the index before inserting the resulting value into the index structure. The effect of this is to turn sequential values into index entries that are randomly scattered.

四、Column Order
This is just one important consideration when deciding the column order of an index. Another is the possibility for improving the compressibility of an index by putting the least selective (most repetitive) columns first. Another is the option for arranging the columns so that some very popular queries can perform an order by without doing a sort (an execution mechanism that typically appears as sort (order by) nosort in the execution plan).
  Whatever your reason for deciding to change the order of columns in an index, remember that it might change the clustering_factor. The knock-on effect of this might be that the calculated cost of the index for a range scan becomes so high that Oracle ignores the index.

4.1测试数据准备
create table t1
pctfree 90
pctused 10
as
select
    trunc((rownum-1)/ 100)    clustered,
    mod(rownum - 1, 100)    scattered,
    lpad(rownum,10)        small_vc
from
    all_objects
where
    rownum <= 10000
;

create index t1_i1_good on t1(clustered, scattered);
create index t1_i2_bad  on t1(scattered, clustered);

4.2统计信息收集
begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for all columns size 1'
    );
end;
/

select
    blocks,
    num_rows
from
    user_tables
where
    table_name = 'T1'
;
    BLOCKS   NUM_ROWS
---------- ----------
       292      10000

select
    index_name, blevel, leaf_blocks, clustering_factor
from
    user_indexes
where   
    table_name = 'T1'
;
INDEX_NAME               BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- ---------- ----------- -----------------
T1_I1_GOOD                    1          24               278
T1_I2_BAD                     1          24             10000

4.3测试
select
    /*+ index(t1 t1_i1_good) */
    count(small_vc)
from
    t1
where
    scattered = 50
and    clustered between 1 and 5
;

Execution Plan
----------------------------------------------------------
Plan hash value: 2833503776

---------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |     1 |    17 |     4 |
|   1 |  SORT AGGREGATE              |            |     1 |    17 |       |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1         |     6 |   102 |     4 |
|*  3 |    INDEX RANGE SCAN          | T1_I1_GOOD |     6 |       |     3 |
---------------------------------------------------------------------------

select
    /*+ index(t1 t1_i2_bad) */
    count(small_vc)
from
    t1
where
    scattered =50
and    clustered between 1 and 5
;
Execution Plan
----------------------------------------------------------
Plan hash value: 2223336776

--------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost  |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |     1 |    17 |     9 |
|   1 |  SORT AGGREGATE              |           |     1 |    17 |       |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1        |     6 |   102 |     9 |
|*  3 |    INDEX RANGE SCAN          | T1_I2_BAD |     6 |       |     2 |
--------------------------------------------------------------------------
  Irrespective of which index we use in this example, we will visit exactly the same number of table blocks—but the order in which we visit them will be different, and this has been enough to make a big difference to the optimizer’s calculations.
  For completeness, let’s just run our statistics through the formula.
Selectivity of scattered = 50: 1 / 100 = 0.01
Selectivity of clustered between 1 and 5: (5 – 1) / ( 99 – 0 ) + 2/100 = 0.060404
Combined selectivity: 0.01 * 0.060404 = 0.00060404
cost (t1_i1_good) =
     1 +
     ceil(0.060404 * 24) +     -- range on first column, invalidates second column
     ceil(0.00060404 * 278)    -- 2nd column can be used before visiting the table
                  = 1 + 2 + 1 = 4
cost (t1_i2_bad) =
     1 +
     ceil(0.00060404 * 24) +   -- can use both columns for start/stop keys
     ceil(0.00060404 * 10000)  -- but the clustering_factor is overwhelming
                  = 1 + 1 + 7 = 9

五、Extra Columns
It’s not just a change in column order that could introduce a problem. It’s a fairly common (and often effective) practice to add a column or two to an existing index. By now I’m sure you won’t be surprised to discover that this, too, can make a dramatic difference to the clustering_factor, hence to the desirability of the index.Imagine a system that includes a table for tracking product movements. It has a fairly obvious index on the movement_date, but after some time, it might become apparent to the DBA that a number of commonly used queries would benefit from the addition of the product_id to this index (see script extra_col.sql in the online code suite).

5.1测试数据准备
create table t1
as
select
    sysdate + trunc((rownum-1) / 500)    movement_date,
    trunc(dbms_random.value(1,60.999))    product_id,
    trunc(dbms_random.value(1,10.000))    qty,
    lpad(rownum,10)                small_vc,
    rpad('x',100)                padding
from
    all_objects
where
    rownum <= 10000
;


5.2测试
5.2.1在movement_date上创建index进行测试
create index t1_i1 on t1(movement_date);
begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for all columns size 1'
    );
end;
/

select
    blocks,
    num_rows
from
    user_tables
where
    table_name = 'T1';


    BLOCKS   NUM_ROWS
---------- ----------
       194      10000

select
    index_name, blevel, leaf_blocks, clustering_factor
from
    user_indexes
where   
    table_name = 'T1'
;

INDEX_NAME               BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- ---------- ----------- -----------------
T1_I1                         1          27               184




测试
select
    sum(qty)
from
    t1
where
    movement_date = trunc(sysdate) + 7
and    product_id = 44
;
Execution Plan
----------------------------------------------------------
Plan hash value: 269862921

----------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost  |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    14 |    12 |
|   1 |  SORT AGGREGATE              |       |     1 |    14 |       |
|*  2 |   TABLE ACCESS BY INDEX ROWID| T1    |     8 |   112 |    12 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |   500 |       |     2 |
----------------------------------------------------------------------

select
    product_id, max(small_vc)
from
    t1
where
    movement_date = trunc(sysdate) + 7
group by
    product_id
;

Execution Plan
----------------------------------------------------------
Plan hash value: 2629508439

----------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost  |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |    60 |  1320 |    40 |
|   1 |  HASH GROUP BY               |       |    60 |  1320 |    40 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |   500 | 11000 |    12 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |   500 |       |     2 |
----------------------------------------------------------------------
The first query is an example of the type of query that encouraged us to add the extra column to the index. The second query is an example of a query that will suffer as a conse-quence of the change. In both cases, Oracle will be visiting the same little clump of about ten blocks in the table—but the extra column changes the order in which the rows are visited (which is what the clustering_factor is about), so the cost changes, and in the second case the execution plan changes for the worse.

5.2.2在movement_date上创建index进行测试
drop index t1_i1;
create index t1_i1 on t1(movement_date, product_id);
begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for all columns size 1'
    );
end;
/

select
    index_name, blevel, leaf_blocks, clustering_factor
from
    user_indexes
where   
    table_name = 'T1'
;
INDEX_NAME               BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- ---------- ----------- -----------------
T1_I1                         1          31              6645

select   
    table_name,
    blocks,
    num_rows
from    user_tables
where    table_name = 'T1'
;

TABLE_NAME               BLOCKS   NUM_ROWS
-------------------- ---------- ----------
T1                          194      10000

select
    sum(qty)
from
    t1
where
    movement_date = trunc(sysdate) + 7
and    product_id = 44
;
Execution Plan
----------------------------------------------------------
Plan hash value: 269862921

----------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost  |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    14 |     7 |
|   1 |  SORT AGGREGATE              |       |     1 |    14 |       |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     8 |   112 |     7 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     8 |       |     1 | ----------------------------------------------------------------------

select
    product_id, max(small_vc)
from
    t1
where
    movement_date = trunc(sysdate) + 7
group by
    product_id
;
Execution Plan
----------------------------------------------------------
Plan hash value: 136660032

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    60 |  1320 |    59 |
|   1 |  HASH GROUP BY     |      |    60 |  1320 |    59 |
|*  2 |   TABLE ACCESS FULL| T1   |   500 | 11000 |    31 |
-----------------------------------------------------------
全表扫描cost = 194/6.41=31
index扫描cost = 1+0.05*31+0.05*6645=336
所以选择全表扫描。