Java Freeman: 2011

2011年11月28日星期一

使用SELECT * 语句的缺点（转）

使用SELECT * 语句的缺点

1、SELECT * 语句取出表中的所有字段，不论该字段的数据对调用的应用程序是否有用，这会对服务器资源造成浪费，甚至会对服务器的性能产生一定的影响。
2、如果表的结构在以后发生了改变，那么SELECT * 语句可能会取到不正确的数据甚至是出错。使用字段名可以快速定位影响范围。
3、执行SELECT * 语句时，SQL Server首先要查找出表中有哪些列，然后才能开始执行SELECT * 语句，这在某些情况会产生性能问题。
4、使用SELECT * 语句将不会使用到覆盖索引，不利于查询的性能优化。
5、在文档角度来看，SELECT * 语句没有列明将要取出哪些字段进行操作，所以也是不推荐的。

2011年10月27日星期四

给胡XX、温XX的信

给胡XX、温XX的信：7 ~5 k8 a# A% X
主X、总X；
你们好！
据国家有关当局统计，改革开放三十年来，升值最快的是住房、墓地、乌纱帽、月饼和二奶。贬值最快的是职称、文凭、道德、诚信和人民币。
中国已初步建设成为一个由月光族、啃老族、打工族、蜗居族、蚁族、牢骚族、抱怨族、行骗族、逐利族和隐婚族组成的多民族国家。
这是一个给力的一年，这是一个杯具的一年，这也是一个纠结的一年。
这一年，江西的一个钉子户用最无奈的方式结束了自己的生命，他的死证明不了有关部门的野蛮与无耻，只能证明汽油一点就着。
这一年，8名香港游客死在了菲律宾警察的枪口下，让香港人知道菲律宾除了有“菲佣”，还有“菲警”。
这一年，一个卖汽车的和一个卖运动鞋的在关于谁“给力”和谁“不给力”的问题上发生了分歧从而引发了“3Q大战”，最后“狗日的腾讯”做出了“一个艰难的决定”，末了才知道这俩孙子何止是“坑爹”，简直就是“坑爹”。
这一年，李刚没有出名，李刚的儿子也没有出名，但是李刚的儿子的爸爸出名5 j3 Z- H; e& q* J$ h: }0 A
了。　　
最近，国家某部公布了一项统计数据，告诉人们：你要不是三大式人物（大款，大官，大腕）而想在北京买套100平方米总价300万的房，社会阶层所付出的代价请看：
1 农民：种三亩地每亩纯收入400元的话要从唐朝开始至今才能凑齐（还不能有灾年）；7 ?/ D1 |" L" b2 S& Z! O
, j9 K# |& U; b5 Q
2 工人：每月工资1500元需从鸦片战争上班至今（双休日不能休）；
3 白领：年薪6万，需从1960年上班就拿这么多钱至今不吃不喝（取消法定假日）；+ L9 w! H4 l. p. i; l/ F1 y
4 抢劫犯：连续作案2500次（必须事主是白领）约30年。8 G9 [6 j) M, U8 `' q
5：妓女：连续接客10000次，以每天都接一次客，需备战10000天，从18岁起按此频率接客到46岁（中间还不能来例假)  c& d7 ^) q8 L3 J( Q5 M4 v
以上还不算装修、家具、家电等等费用。
中国的现状(经典)% V- T9 C5 Q6 s( i0 S) j
1、ATM取出假钱--->银行无责
2、网上银行被盗--->储户责任
3、银行多给了钱--->储户义务归还
4、银行少给了钱--->离开柜台概不负责. q' H$ l* Q+ T$ Y% N" x
5、ATM机出现故障少给钱--->用户负责
6、ATM机出现故障多给钱--->用户盗窃
7、广东开平银行行长贪污4亿--->判2年: R0 k) V9 \- R# S; D% j# H
8、ATM多吐7万给老百姓许霆--->判无期
声明：转发不
会送人民币，但如果觉得作为消费者，全中国十三亿储户不公平的，请转发，有多少群转发多少群。- m# G3 d" c# Z! T: g* U- P
中国现状：2 j7 c/ k( [- n1 \
生不起,剖腹一刀五千起；* P0 U! o( e. P+ q# F$ P; q# A
读不起,选个学校三万起；
住不起，二万多元一平米；
老婆不是娶不起，没房没车谁嫁你
养不起，父母下岗儿下地；4 H5 ^$ j: K  Q; L7 O
病不起，药费利润十倍起；0 d2 H  E/ x4 K& }5 u( L+ W
活不起，一月辛劳一千几；* l9 M- K3 i7 i% F  F- @5 z
死不起，火化下葬三万几。
总结（八个大字）：求生不得，求死不能, f' T- U2 B! n# H
多传传，让领导人们看看，就不信传不到中央胡书记与温总理那里去。有群的都转发加一句：谁看到谁最少转发一个群，转发2个以上群的，愿他买彩票中500万。; d( ]% @+ i: t, v8 f0 a$ T8 I
教育：希望进去，绝望出来；
房产：蜗居进去，房奴出来；( m% n! X: D4 N8 v. `6 O' i
演艺：玉女进去，小姐出来；* I0 s# ?+ k7 \% R" |( B
信访：窦娥进去，疯子出来；
官场：海瑞进去，和绅出来；6 C: F0 e. s/ Y: S
煤窑：蹲着进去，躺着出来；
大学：校花进去，残花出来！
股市：杨百万进去，杨白劳出来；宝马进去，自行车出来；西装革履进去，三角裤出来。3 B$ ?1 v! e# T, ]
老百姓在思考，为啥玩不过政府呢; ]- g- O4 a- d8 g8 s
原因如下：
1、你和他讲道理，他和你耍流氓；
2、你和他耍流氓，他和你讲法制；' D+ }5 A8 l: ~  H9 o& u9 z* o9 o
3、你和他讲法制，他和你讲政治；9 a1 s: b: i( F; }  E: Q& B8 k
4、你和他讲政治，他和你讲国情；; d3 H2 N! E0 j/ @0 ^( W& V
5、你和他讲国情，他和你讲接轨；+ X0 ~$ U$ I9 d% r) z
6、你和他讲接轨，他和你讲文化；
7、你和他讲文化，他和你讲孔子；; N' r0 r6 b7 b  d
8、你和他讲孔子，他和你讲老子；
9、你和他讲老子，他给你装孙子！

用奶粉灭掉00后，. X* k# X6 s4 v  u# w" l
用考试灭掉90后，/ q1 u  U2 f& t
用房价灭掉80后，" q5 M2 }( c  b9 [
用失业灭掉70后，- C" _% C; Q& F" T
用城管灭掉60后，
用下岗灭掉50后，
用拆迁灭掉40后，
用医改灭掉30后，
……最后，活着且活得很好的人都去开两会。
有人问：“啥是两会呢”
农民代表答：会养猪、会交配。
工人代表答：会挣钱、会消费。$ |; n# \% A  i; a4 U) _
民工代表答：会讨薪、会下跪。* ?' V% o  O- B7 e3 h; i. s. M
保姆代表答：会插足、会叠被。( h. C+ L7 }1 h3 y7 ~
艺人代表答：会炒作、会陪睡。
商人代表答：会赚钱、会逃税。, c6 B) }& i1 b1 T3 g. i( e
官员代表答：会撒谎、会受贿。( G0 `$ |& H( Q; J- p5 c4 E2 J6 C8 f7 L
股民代表答：会割肉，会流泪。
革命工作苦啊：
反应慢的会被玩死；
能力差的会被闲死；
胆子小的会被吓死；
酒量小的会被灌死；! x" J+ }1 {4 ]& r7 N
身体差的会被累死；; q) v. N2 s: C# X
讲话直的会被整死；: w' a2 l$ U* E8 H
能干活的会被用死。) ^  w- {6 ~9 t! f
所以呀：
人不能太敬业了。
董存瑞拿得太稳了；  n/ p+ h' T' \5 A; d
刘胡兰嘴巴太紧了；) @8 h  V- j  l
邱少云趴得太死了；
黄继光扑得太准了；
张思德跑得太晚了；
白求恩会得太多了。6 |1 \  y/ m' j6 S" S- @  W* l
教训呀，
心态好才是最重要的！2 i5 H1 z) e; `9 _4 j$ C. c% Y
不然人在天堂，: z# \2 r4 B0 }0 I$ N2 {
钱在银行

2011年8月19日星期五

centos挂载ntfs移动硬盘(转)

1, 首先确认fuse，CentOS 5.5 带有fuse，可以使用 rpm -qa | grep fuse 查看是否安装。

[root@localhost ntfs-3g-2010.5.22]# rpm -qa | grep fuse
fuse-2.7.4-8.el5

说明已经安装fuse

如果没有显示，请执行

[root@localhost ntfs-3g-2010.5.22]#yum install fuse

2，安装ntfs-3g, 系统默认的软件库更新不到ntfs-3g，可以自己下载编译安装。
下载网址：http://www.tuxera.com/community/ntfs-3g-download/

下载完后安装，安装方法下载点有。

简单介绍一下，下载后解压

[root@localhost packages]# tar -zxvf ntfs-3g-2010.5.22.tgz

进入解压后的目录

[root@localhost packages]# cd ntfs-3g-2010.5.22
[root@localhost ntfs-3g-2010.5.22]#

接着依次执行

[root@localhost ntfs-3g-2010.5.22]#./configure
[root@localhost ntfs-3g-2010.5.22]#make
[root@localhost ntfs-3g-2010.5.22]#make install

完了之后就可以正常mount ntfs 文件了

[root@localhost ntfs-3g-2010.5.22]# mount -t ntfs-3g /dev/sda2 /mnt
[root@localhost ntfs-3g-2010.5.22]# ls /mnt

卸载移动硬盘请执行[root@localhost ntfs-3g-2010.5.22]#umount /mnt

2011年7月31日星期日

How to use hints in Oracle sql for performance（转）

With hints one can influence the optimizer. The usage of hints (with exception of the RULE-hint) causes Oracle to use the Cost Based optimizer.

The following syntax is used for hints:

select /*+ HINT */ name
from emp
where id =1;

Where HINT is replaced by the hint text.
When the syntax of the hint text is incorrect, the hint text is ignored and will not be used.

Hints for Optimization Approaches and Goals
ALL_ROWS	The ALL_ROWS hint explicitly chooses the cost-based approach to optimize a statement block with a goal of best throughput (that is, minimum total resource consumption).
FIRST_ROWS	The FIRST_ROWS hint explicitly chooses the cost-based approach to optimize a statement block with a goal of best response time (minimum resource usage to return first row). In newer Oracle version you should give a parameter with this hint: FIRST_ROWS(n) means that the optimizer will determine an executionplan to give a fast response for returning the first n rows.
CHOOSE	The CHOOSE hint causes the optimizer to choose between the rule-based approach and the cost-based approach for a SQL statement based on the presence of statistics for the tables accessed by the statement
RULE	The RULE hint explicitly chooses rule-based optimization for a statement block. This hint also causes the optimizer to ignore any other hints specified for the statement block. The RULE hint does not work any more in Oracle 10g.
Hints for Access Paths
FULL	The FULL hint explicitly chooses a full table scan for the specified table. The syntax of the FULL hint is FULL(table) where table specifies the alias of the table (or table name if alias does not exist) on which the full table scan is to be performed.
ROWID	The ROWID hint explicitly chooses a table scan by ROWID for the specified table. The syntax of the ROWID hint is ROWID(table) where table specifies the name or alias of the table on which the table access by ROWID is to be performed. (This hint depricated in Oracle 10g)
CLUSTER	The CLUSTER hint explicitly chooses a cluster scan to access the specified table. The syntax of the CLUSTER hint is CLUSTER(table) where table specifies the name or alias of the table to be accessed by a cluster scan.
HASH	The HASH hint explicitly chooses a hash scan to access the specified table. The syntax of the HASH hint is HASH(table) where table specifies the name or alias of the table to be accessed by a hash scan.
HASH_AJ	The HASH_AJ hint transforms a NOT IN subquery into a hash anti-join to access the specified table. The syntax of the HASH_AJ hint is HASH_AJ(table) where table specifies the name or alias of the table to be accessed.(depricated in Oracle 10g)
INDEX	The INDEX hint explicitly chooses an index scan for the specified table. The syntax of the INDEX hint is INDEX(table index) where:table specifies the name or alias of the table associated with the index to be scanned and index specifies an index on which an index scan is to be performed. This hint may optionally specify one or more indexes:
NO_INDEX	The NO_INDEX hint explicitly disallows a set of indexes for the specified table. The syntax of the NO_INDEX hint is NO_INDEX(table index)
INDEX_ASC	The INDEX_ASC hint explicitly chooses an index scan for the specified table. If the statement uses an index range scan, Oracle scans the index entries in ascending order of their indexed values.
INDEX_COMBINE	If no indexes are given as arguments for the INDEX_COMBINE hint, the optimizer will use on the table whatever boolean combination of bitmap indexes has the best cost estimate. If certain indexes are given as arguments, the optimizer will try to use some boolean combination of those particular bitmap indexes. The syntax of INDEX_COMBINE is INDEX_COMBINE(table index).
INDEX_JOIN	Explicitly instructs the optimizer to use an index join as an access path. For the hint to have a positive effect, a sufficiently small number of indexes must exist that contain all the columns required to resolve the query.
INDEX_DESC	The INDEX_DESC hint explicitly chooses an index scan for the specified table. If the statement uses an index range scan, Oracle scans the index entries in descending order of their indexed values.
INDEX_FFS	This hint causes a fast full index scan to be performed rather than a full table.
NO_INDEX_FFS	Do not use fast full index scan (from Oracle 10g)
INDEX_SS	Exclude range scan from query plan (from Oracle 10g)
INDEX_SS_ASC	Exclude range scan from query plan (from Oracle 10g)
INDEX_SS_DESC	Exclude range scan from query plan (from Oracle 10g)
NO_INDEX_SS	The NO_INDEX_SS hint causes the optimizer to exclude a skip scan of the specified indexes on the specified table. (from Oracle 10g)
Hints for Query Transformations
NO_QUERY_TRANSFORMATION	Prevents the optimizer performing query transformations. (from Oracle 10g)
USE_CONCAT	The USE_CONCAT hint forces combined OR conditions in the WHERE clause of a query to be transformed into a compound query using the UNION ALL set operator. Normally, this transformation occurs only if the cost of the query using the concatenations is cheaper than the cost without them.
NO_EXPAND	The NO_EXPAND hint prevents the optimizer from considering OR-expansion for queries having OR conditions or IN-lists in the WHERE clause. Usually, the optimizer considers using OR expansion and uses this method if it decides that the cost is lower than not using it.
REWRITE	The REWRITE hint forces the optimizer to rewrite a query in terms of materialized views, when possible, without cost consideration. Use the REWRITE hint with or without a view list. If you use REWRITE with a view list and the list contains an eligible materialized view, then Oracle uses that view regardless of its cost.
NOREWRITE / NO_REWRITE	In Oracle 10g renamed to NO_REWRITE. The NOREWRITE/NO_REWRITE hint disables query rewrite for the query block, overriding the setting of the parameter QUERY_REWRITE_ENABLED.
MERGE	The MERGE hint lets you merge views in a query.
NO_MERGE	The NO_MERGE hint causes Oracle not to merge mergeable views. This hint is most often used to reduce the number of possible permutations for a query and make optimization faster.
FACT	The FACT hint indicated that the table should be considered as a fact table. This is used in the context of the star transformation.
NO_FACT	The NO_FACT hint is used in the context of the star transformation to indicate to the transformation that the hinted table should not be considered as a fact table.
STAR_TRANSFORMATION	The STAR_TRANSFORMATION hint makes the optimizer use the best plan in which the transformation has been used. Without the hint, the optimizer could make a query optimization decision to use the best plan generated without the transformation, instead of the best plan for the transformed query.
NO_STAR_TRANSFORMATION	Do not use star transformation (from Oracle 10g)
UNNEST	The UNNEST hint specifies subquery unnesting.（展开）
NO_UNNEST	Use of the NO_UNNEST hint turns off unnesting for specific subquery blocks.（NO_UNNEST引导优化器不要使用UNNEST，即让子查询能够独立地执行完毕之后再跟外围的查询做FILTER。）
Hints for Join Orders
LEADING	Give this hint to indicate the leading table in a join. This will indicate only 1 table. If you want to specify the whole order of tables, you can use the ORDERED hint. Syntax: LEADING(table)
ORDERED	The ORDERED hint causes Oracle to join tables in the order in which they appear in the FROM clause. If you omit the ORDERED hint from a SQL statement performing a join , the optimizer chooses the order in which to join the tables. You may want to use the ORDERED hint to specify a join order if you know something about the number of rows selected from each table that the optimizer does not. Such information would allow you to choose an inner and outer table better than the optimizer could.
Hints for Join Operations
USE_NL	The USE_NL hint causes Oracle to join each specified table to another row source with a nested loops join using the specified table as the inner table. The syntax of the USE_NL hint is USE_NL(table table) where table is the name or alias of a table to be used as the inner table of a nested loops join.
NO_USE_NL	Do not use nested loop (from Oracle 10g)
USE_NL_WITH_INDEX	Specifies a nested loops join. (from Oracle 10g)
USE_MERGE	The USE_MERGE hint causes Oracle to join each specified table with another row source with a sort-merge join. The syntax of the USE_MERGE hint is USE_MERGE(table table) where table is a table to be joined to the row source resulting from joining the previous tables in the join order using a sort-merge join.
NO_USE_MERGE	Do not use merge (from Oracle 10g)
USE_HASH	The USE_HASH hint causes Oracle to join each specified table with another row source with a hash join. The syntax of the USE_HASH hint is USE_HASH(table table) where table is a table to be joined to the row source resulting from joining the previous tables in the join order using a hash join.
NO_USE_HASH	Do not use hash (from Oracle 10g)
Hints for Parallel Execution
PARALLEL	The PARALLEL hint allows you to specify the desired number of concurrent query servers that can be used for the query. The syntax is PARALLEL(table number number). The PARALLEL hint must use the table alias if an alias is specified in the query. The PARALLEL hint can then take two values separated by commas after the table name. The first value specifies the degree of parallelism for the given table, the second value specifies how the table is to be split among the instances of a parallel server. Specifying DEFAULT or no value signifies the query coordinator should examine the settings of the initialization parameters (described in a later section) to determine the default degree of parallelism.
NOPARALLEL / NO_PARALLEL	The NOPARALLEL hint allows you to disable parallel scanning of a table, even if the table was created with a PARALLEL clause. In Oracle 10g this hint was renamed to NO_PARALLEL.
PQ_DISTRIBUTE	The PQ_DISTRIBUTE hint improves the performance of parallel join operations. Do this by specifying how rows of joined tables should be distributed among producer and consumer query servers. Using this hint overrides decisions the optimizer would normally make.
NO_PARALLEL_INDEX	The NO_PARALLEL_INDEX hint overrides a PARALLEL attribute setting on an index to avoid a parallel index scan operation.
Additional Hints
APPEND	When the APPEND hint is used with the INSERT statement, data is appended to the table. Existing free space in the block is not used. If a table or an index is specified with nologging, this hint applied with an insert statement produces a direct path insert which reduces generation of redo.
NOAPPEND	Overrides the append mode.
CACHE	The CACHE hint specifies that the blocks retrieved for the table in the hint are placed at the most recently used end of the LRU list in the buffer cache when a full table scan is performed. This option is useful for small lookup tables. In the following example, the CACHE hint overrides the table default caching specification.
NOCACHE	The NOCACHE hint specifies that the blocks retrieved for this table are placed at the least recently used end of the LRU list in the buffer cache when a full table scan is performed. This is the normal behavior of blocks in the buffer cache.
PUSH_PRED	The PUSH_PRED hint forces pushing of a join predicate into the view.
NO_PUSH_PRED	The NO_PUSH_PRED hint prevents pushing of a join predicate into the view.
PUSH_SUBQ	The PUSH_SUBQ hint causes nonmerged subqueries to be evaluated at the earliest possible place in the execution plan.
NO_PUSH_SUBQ	The NO_PUSH_SUBQ hint causes non-merged subqueries to be evaluated as the last step in the execution plan.
QB_NAME	Specifies a name for a query block. (from Oracle 10g)
CURSOR_SHARING_EXACT	Oracle can replace literals in SQL statements with bind variables, if it is safe to do so. This is controlled with the CURSOR_SHARING startup parameter. The CURSOR_SHARING_EXACT hint causes this behavior to be switched off. In other words, Oracle executes the SQL statement without any attempt to replace literals by bind variables.
DRIVING_SITE	The DRIVING_SITE hint forces query execution to be done for the table at a different site than that selected by Oracle
DYNAMIC_SAMPLING	The DYNAMIC_SAMPLING hint lets you control dynamic sampling to improve server performance by determining more accurate predicate selectivity and statistics for tables and indexes. You can set the value of DYNAMIC_SAMPLING to a value from 0 to 10. The higher the level, the more effort the compiler puts into dynamic sampling and the more broadly it is applied. Sampling defaults to cursor level unless you specify a table.
SPREAD_MIN_ANALYSIS	This hint omits some of the compile time optimizations of the rules, mainly detailed dependency graph analysis, on spreadsheets. Some optimizations such as creating filters to selectively populate spreadsheet access structures and limited rule pruning are still used. (from Oracle 10g)
Hints with unknown status
MERGE_AJ	The MERGE_AJ hint transforms a NOT IN subquery into a merge anti-join to access the specified table. The syntax of the MERGE_AJ hint is MERGE_AJ(table) where table specifies the name or alias of the table to be accessed.(depricated in Oracle 10g)
AND_EQUAL	The AND_EQUAL hint explicitly chooses an execution plan that uses an access path that merges the scans on several single-column indexes. The syntax of the AND_EQUAL hint is AND_EQUAL(table index index) where table specifies the name or alias of the table associated with the indexes to be merged. and index specifies an index on which an index scan is to be performed. You must specify at least two indexes. You cannot specify more than five. (depricated in Oracle 10g)
STAR	The STAR hint forces the large table to be joined last using a nested loops join on the index. The optimizer will consider different permutations of the small tables. (depricated in Oracle 10g)
BITMAP	Usage: BITMAP(table_name index_name) Uses a bitmap index to access the table. (depricated ?)
HASH_SJ	Use a Hash Anti-Join to evaluate a NOT IN sub-query. Use this hint in the sub-query, not in the main query. Use this when your high volume NOT IN sub-query is using a FILTER or NESTED LOOPS join. Try MERGE_AJ if HASH_AJ refuses to work.(depricated in Oracle 10g)
NL_SJ	Use a Nested Loop in a sub-query. (depricated in Oracle 10g)
NL_AJ	Use an anti-join in a sub-query. (depricated in Oracle 10g)
ORDERED_PREDICATES	(depricated in Oracle 10g)
EXPAND_GSET_TO_UNION	(depricated in Oracle 10g)

Histograms

一、Getting Started
Everyone knows that histograms are supposed to be used for data where a column has a few special values that appear far more frequently than the rest; and everyone knows that you should only build histograms on indexed columns. These beliefs are so misleading that I’m going to ignore them for the moment and start my explanation of histograms with an example that has no extreme values and no indexes.
1.1 测试数据
create table t1
as
with kilo_row as (
    select /*+ materialize */
        rownum
    from all_objects
    where rownum <= 1000
)
select
    trunc(7000 * dbms_random.normal)    normal
from
    kilo_row    k1,
    kilo_row    k2
where
    rownum <= 1000000
;

select
    tenth                        tenth,
    min(normal)                    low_val,
    max(normal)                    high_val,
    max(normal) - min(normal)            width,
    round(100000 / (max(normal) - min(normal)),2)     height
from (
    select
        normal,
        ntile(10) over (order by normal) tenth
    from t1
)
group by tenth
order by tenth
;
     TENTH    LOW_VAL   HIGH_VAL      WIDTH     HEIGHT
---------- ---------- ---------- ---------- ----------
         1     -32003      -8966      23037       4.34
         2      -8966      -5883       3083      32.44
         3      -5883      -3659       2224      44.96
         4      -3659      -1761       1898      52.69
         5      -1761         17       1778      56.24
         6         17       1792       1775      56.34
         7       1792       3678       1886      53.02
         8       3678       5897       2219      45.07
         9       5897       8974       3077       32.5
        10       8974      34660      25686       3.89

begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for columns normal size 10'
    );
end;
/

select
    rownum                    tenth,
    prev                    low_val,
    curr                    high_val,
    curr - prev                width,
    round(100000 / (curr - prev) , 2)    height
from
    (
    select
        endpoint_value            curr,
        lag(endpoint_value,1) over (
            order by endpoint_number
        )                 prev
    from
        user_tab_histograms
    where
        table_name = 'T1'
    and    column_name = 'NORMAL'
    )
where
    prev is not null
order by
    curr
;

结果同上。

从上面的结果可以看出，T1表的数据不是平均分布的，各个区段的分布密度是不一样的，这种情况下就需要使用histograms来优化。如果不使用histograms，就会认为是平均分布的，那么oracle计算sql plan的cost时候，就会出现问题，导致性能问题。

二、Generic Histograms
You can actually use two different strategies: one for data sets with only a few (fewer than 255) distinct values, and one for data sets with lots of distinct values. Oracle calls the former a frequency histogram and the latter a height balanced histogram.
After generating a histogram for a column, you will usually find that the density value reported in user_tab_columns (et al.) is no longer equal to 1/num_distinct.

三、When Oracle Ignores Histograms
It is quite expensive to generate histograms, so you should think very carefully about doing it. Oracle doesn’t use them very much, and often won’t get any benefit from them. In fact, there are several cases where Oracle ignores histograms when you might think they would be used.

3.1 Histograms and Joins
Oracle only uses histograms fully when there are highly visible input values—in other words, in cases where you have predicates of the form column operator constant (although the constant might come from peeking at a bind variable). Think about this carefully, and you realize that Oracle may not be able to make much use of histogram information to help it optimize joins.

3.2 Histograms and Distributed Queries
Oracle keeps on getting better at handling distributed queries—but even in Oracle 10g, the optimizer doesn’t try to pull histograms from the remote site(s) to improve its execution plan,and it’s not just a question of ignoring histograms on join columns. Even when you have remote columns compared to literal constants, Oracle does not attempt to use the histogram. Again, it falls back to the num_distinct (or density, depending on version).

四、Frequency Histograms
The value 1 appears once, 2 appears twice, and so on up to the value 80 that appears 80 times for a total of 3,240 rows.

create table generator as
select
    rownum     id
from    all_objects
where    rownum <= 2000
;

create table t1 (
    skew        not null,
    padding
)
as
select
    /*+ ordered use_nl(v2) */
    v1.id,
    rpad('x',400)
from
    generator    v1,
    generator    v2
where
    v1.id <= &m_demo_size
and    v2.id <= &m_demo_size
and    v2.id <= v1.id
order by
    v2.id,v1.id
;

create index t1_i1 on t1(skew);

测试实例1
begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for all columns size 1'
    );
end;
/

select
    num_distinct, density, num_Buckets
from
    user_tab_columns
where
    table_name = 'T1'
and    column_name = 'SKEW'
;
NUM_DISTINCT    DENSITY NUM_BUCKETS
------------ ---------- -----------
          80      .0125           1

select skew, padding
from t1
where skew = 5
;
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
----------------------------------------------------------
|   0 | SELECT STATEMENT |      |    41 | 16564 |    32 |
|* 1 | TABLE ACCESS FULL| T1   |    41 | 16564 |    32 |
----------------------------------------------------------
Cardinality = 0.0125 * 3240 = 41

测试实例2
begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for all columns size 80'
    );
end;
/

select
    num_distinct, density, num_Buckets
from
    user_tab_columns
where
    table_name = 'T1'
and    column_name = 'SKEW'
;

NUM_DISTINCT    DENSITY NUM_BUCKETS
------------ ---------- -----------
          80 .000154321          80

density = 1 / (2 * num_rows)

select
    endpoint_number, endpoint_value
from
    user_tab_histograms
where
    column_name = 'SKEW'
and    table_name = 'T1'
order by
    endpoint_number
;
ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              1              1
              3              2
              6              3
             10              4
             ...
           3160             79
           3240             80

select
    endpoint_value                row_value,
    curr_num - nvl(prev_num, 0)        row_count
from    (
    select
        endpoint_value,
        endpoint_number            curr_num,
        lag(endpoint_number,1) over (
            order by endpoint_number
        )                prev_num
    from
        user_tab_histograms
    where
        column_name = 'SKEW'
    and    table_name = 'T1'
)
order by
    endpoint_value
;
ROW_VALUE ROW_COUNT
---------- ----------
         1          1
         2          2
         3          3
         4          4
         5          5
        ...
         80         80

select
    skew, padding
from
    t1
where
    skew = 5
;
Execution Plan
----------------------------------------------------------
Plan hash value: 1429545322

---------------------------------------------------------------------
| Id | Operation                   | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     5 | 2020 |     6 |
|   1 | TABLE ACCESS BY INDEX ROWID| T1    |     5 | 2020 |     6 |
|* 2 |   INDEX RANGE SCAN          | T1_I1 |     5 |       |     1 |
---------------------------------------------------------------------

根据上面histograms的统计信息，skew = 5的Cardinality = 5

select
    skew, padding
from
    t1
where
    skew = 70
;
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
----------------------------------------------------------
|   0 | SELECT STATEMENT |      |    70 | 28280 |    32 |
|* 1 | TABLE ACCESS FULL| T1   |    70 | 28280 |    32 |
----------------------------------------------------------
根据上面histograms的统计信息，skew = 70的Cardinality = 70

Predicate                   Description                              CBO    Human
skew = 40                   Column = constant                        40     40
skew = 40.5                 Column = nonexistent, but in-range       1      0
skew between 21 and 24      Between range with mapped values         90     90
skew between 20.5 and 24.5 Between range with mapped values         90     90
skew between 1 and 2        Between range at extremes                3      3
skew between 79 and 80      Between range at extremes                159    159
skew > 4 and skew < 8       Greater than/less than range             18     18
skew = -10                  Below high value                         1      0
skew = 100                  Above high value                         1      0
skew between –5 and –3    Range below low value                    1      0
skew between 92 and 94      Range above high value 1      0
skew between 79 and 82      Range crossing boundary                  159    159
skew = :b1                  Column = :bind                           41     ???
skew between :b1 and :b2    Column between :bind1 and :bind2         8      ???

In the two examples with bind variables, we see that the CBO has used the standard
0.25% for the range, and when we check column = :bind, we find that 41 comes from
num_rows / num_distinct.
Finally, looking carefully at the density, we discover (and confirm with a few more test data sets) that density = 1 / (2 * num_rows)—which is probably why we end up seeing a cardinality of 1 whenever we go outside the low/high range, or ask for a value that is not in the histogram.

- Frequency histograms have no impact on expressions with bind variables (if they are not peeked). The selectivity on column = :bind is still 1 / num_distinct, the selectivity on ranges are still 5% and 0.25% for unbounded and bounded ranges.

五、“Height Balanced” Histograms
5.1 每个值建立一个histogram
begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for all columns size 80'
    );
end;
/

select
    num_distinct, density, num_Buckets
from
    user_tab_columns
where
    table_name = 'T1'
and    column_name = 'SKEW'
;
NUM_DISTINCT    DENSITY NUM_BUCKETS
------------ ---------- -----------
          80 .000154321          80

select
    endpoint_number, endpoint_value
from
    user_tab_histograms
where
    column_name = 'SKEW'
and    table_name = 'T1'
order by
    endpoint_number
;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              1              1
              3              2
              6              3
             10              4
             ...
           3160             79
           3240             80

select
    count(*)
from    t1
where    skew = 77
;
Execution Plan
----------------------------------------------------------
Plan hash value: 2432955788

-----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |     3 |     1 |
|   1 | SORT AGGREGATE   |       |     1 |     3 |       |
|* 2 |   INDEX RANGE SCAN| T1_I1 |    77 |   231 |     1 |
-----------------------------------------------------------

根据上面histograms的统计信息，skew = 77的Cardinality = 77

select
    count(*)
from    t1
where    skew = 75
;
Execution Plan
----------------------------------------------------------
Plan hash value: 2432955788

-----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |     3 |     1 |
|   1 | SORT AGGREGATE   |       |     1 |     3 |       |
|* 2 |   INDEX RANGE SCAN| T1_I1 |    75 |   225 |     1 |
-----------------------------------------------------------
根据上面histograms的统计信息，skew = 75的Cardinality = 75

5.2 建立比num_distinct值少的histogram

begin
    dbms_stats.gather_table_stats(
        user,
        't1',
        cascade => true,
        estimate_percent => null,
        method_opt => 'for all columns size 75'
    );
end;
/

select
    num_distinct, density, num_Buckets
from
    user_tab_columns
where
    table_name = 'T1'
and    column_name = 'SKEW'
;
NUM_DISTINCT    DENSITY NUM_BUCKETS
------------ ---------- -----------
          80 .013973812          75

select
    endpoint_number, endpoint_value
from
    user_tab_histograms
where
    column_name = 'SKEW'
and    table_name = 'T1'
order by
    endpoint_number
;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
             ...
             46             63
             48             64
             49             65
             51             66
             52             67
             54             68
             56             69
             57             70
             59             71
             60             72
             62             73
             64             74
             66             75
             67             76
             69             77
             71             78

六、The Arithmetic

When the optimizer takes advantage of a height balanced histogram to work out cardinality, it adopts one of three major strategies.
- Strategy 1 for calculating selectivity/cardinality: For column = constant, the constant may be a popular value. A popular value is one that fills at least one bucket, and therefore results in a gap in user_tab_histograms view. If Oracle spots such a value, it does its arithmetic by buckets. For example, in the preceding, the value 77 appears at endpoint 69, and the previous value (76)
appears at endpoint 67: consequently Oracle considers 77 to be a popular value that spans 2 buckets out of 75. The selectivity of 77 is therefore 2/75, and the cardinality is 3,240 * 2/75 = 86.4.

select
    count(*)
from    t1
where    skew = 77
;

Execution Plan
----------------------------------------------------------
Plan hash value: 2432955788

-----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |     3 |     1 |
|   1 | SORT AGGREGATE   |       |     1 |     3 |       |
|* 2 |   INDEX RANGE SCAN| T1_I1 |    86 |   258 |     1 |
-----------------------------------------------------------

- Strategy 2 for calculating selectivity/cardinality: We have already noted that 70 is not a special value, so what happens when we execute a query to select all rows for that value? Oracle uses the density (not 1/num_distinct, note). We have the density recorded as 0.013973812, so we expect a cardinality of 3,240 * .013973812 = 45.27,
Execution Plan
----------------------------------------------------------
Plan hash value: 2432955788

-----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |     3 |     1 |
|   1 | SORT AGGREGATE   |       |     1 |     3 |       |
|* 2 |   INDEX RANGE SCAN| T1_I1 |    29 |    87 |     1 |
-----------------------------------------------------------
实际的Cardinality是29，这个可能是11g的一个bug。有待12g中测试。

In purely descriptive terms, the density is as follows:
sum of the square of the frequency of the nonpopular values /
(number of nonnull rows * number of nonpopular nonnull rows)

- Strategy 3 for calculating selectivity/cardinality: After that brief digression, we can investi-gate the issue of range scans.

Strategy 3的测试实例如下。
测试实例1
The script generates 10,000 rows using a scaled normal distribution centered on the value 3,000 ranging from –5,000 to 11,000. Superimposed on this distribution, I have 500 rows each of the values 500, 1,000, 1,500, and so on up to 10,000—a list of 20 popular values.After analyzing this table with 250 buckets (80 rows per bucket，总共20000条记录), I get a histogram that shows the spikes from my 20 popular values. The number of distinct values is 5,626 and the column density is 0.000119361 (a convincing drop from 1 / 5,626 = 0.000177746).My first test case is a range scan across buckets with no popular values: n1 between 100 and 200. If we check the histogram, we find that the required values fall across three buckets with endpoints 17, 117, and 251.

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              8           -120
              9             17
             10            117
             11            251
             12            357

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
----------------------------------------------------------
|   0 | SELECT STATEMENT |      |    63 |   945 |    60 |
|* 1 | TABLE ACCESS FULL| T1   |    63 |   945 |    60 |
----------------------------------------------------------

Selectivity = (required range) / (high value - low value) + 2 * density =
              (200–117)/(251-117) + (117-100)/(117-17) + 2 * 0.000177746 =
              0.619403 + 0.17 + .000355486 =
              0.789047508
Cardinality = selectivity * number of rows IN A BUCKET =
              0.789047508 * 80 = 63.1238

测试实例2
The second test case is, of course, a range scan across buckets with a popular value: n1 between 400 and 600. (One of my spikes was at n1 = 500.) We need to check up on a little section from the histogram that will be relevant:
ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
             12            357
             13            450
             19            500        ***
             20            520
             21            598
             22            670

Note that 500 is a (very) popular value—with six buckets allocated from the histogram. So we expect to account for at least six buckets plus the rest of the range. Checking carefully, we see that the range 400 to 600 will extend from buckets 12 to 22—broken down as follows:
• A selection from 357 to 450
• All buckets from 450 to 598
• A selection from 598 to 670

So we have eight whole buckets (endpoint_numbers 13 to 21) plus
(450 - 400) / (450 - 357) + (600 - 598) / (670 - 598) + 2 * 0.000177746 =
        50 / 93 + 2 / 72 + 0.000355486 =
        0.537634 + 0.0277778 + 0.000355486 =
        0.565768
Remembering to add the 8 for the complete buckets, we get a cardinality of
Cardinality = selectivity * number of rows IN A BUCKET = 8.565867 * 80 = 685.3
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
----------------------------------------------------------
|   0 | SELECT STATEMENT |      |   685 | 10275 |    60 |
|* 1 | TABLE ACCESS FULL| T1   |   685 | 10275 |    60 |
----------------------------------------------------------

七、Data Problems Revisited
7.1 Daft Datatypes
create table t1 (
    d1        date,
    n1        number(8),
    v1        varchar2(8)
)
;

insert into t1
select
    d1,
    to_number(to_char(d1,'yyyymmdd')),
    to_char(d1,'yyyymmdd')
from    (
    select
        to_date('31-Dec-1999') + rownum    d1
    from all_objects
    where
        rownum <= 1827
    )
;

As we saw in Selectivity issue, a query with a predicate of
where n1 between 20021230 and 20030105
produced a cardinality of 396 in Oracle。

下面是使用直方图的时候的测试。
begin
    dbms_stats.gather_table_stats(
        ownname            => user,
        tabname            => 't1',
        cascade            => true,
        estimate_percent    => null,
        method_opt        =>'for all columns size 120'
    );
end;
/
select
    rownum                    bucket,
    prev                    low_val,
    curr                    high_val,
    curr - prev                width,
    round( (1827/120) / (curr - prev) , 4)    height
from
    (
    select
        endpoint_value                curr,
        lag(endpoint_value,1) over (
            order by endpoint_number
        )                     prev
    from
        user_tab_histograms
    where
        table_name = 'T1'
    and    column_name = 'N1'
    )
where
    prev is not null
order by
    curr
;
    BUCKET    LOW_VAL   HIGH_VAL      WIDTH     HEIGHT
---------- ---------- ---------- ---------- ----------
        68   20021028   20021112         84      .1813
        69   20021112   20021127         15      1.015
        70   20021127   20021212         85      .1791
        71   20021212   20021227         15      1.015
        72   20021227   20030111       8884      .0017
        73   20030111   20030126         15      1.015
        74   20030126   20030210         84      .1813
        75   20030210   20030225         15      1.015
        76   20030225   20030312         87       .175
        77   20030312   20030327         15      1.015
        78   20030327   20030411         84      .1813
        79   20030411   20030426         15      1.015
        80   20030426   20030511         85      .1791
        81   20030511   20030526         15      1.015
        82   20030526   20030610         84      .1813
        83   20030610   20030625         15      1.015
        84   20030625   20030710         85      .1791

select num_distinct, density, num_Buckets from user_tab_columns where table_name = 'T1' and column_name = 'N1' ;

NUM_DISTINCT    DENSITY NUM_BUCKETS
------------ ---------- -----------
        1827 .000547345         120

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
----------------------------------------------------------
|   0 | SELECT STATEMENT |      |    15 |   345 |     4 |
|* 1 | TABLE ACCESS FULL| T1   |    15 |   345 |     4 |
----------------------------------------------------------

selectivity =
        ('required range' / 'total range of bucket' ) + 2 * density =
        (20030105 - 20021230) / (20030111 - 20021227) + 2 * 0.000547345 =
           8875 / 8884 + 0.00109469 =
                1.00008

But this is the selectivity that has to be applied to the number of rows in the buckets involved—not to the entire table—and there are 1,827 / 120 = 15 rows per bucket—round(15 * 1.00008) = 15, as required.

7.2 Dangerous Defaults
同上。都存在直方图Figure7-2问题。

2011年7月30日星期六

Selectivity Issues

一、Different Data Types
We have been using the basic formula for the selectivity of a range-based predicate, and know that this varies slightly depending on whether you have zero, one, or two closed (meaning the equality in <=, >=) ends to your range. Using N as the number of closed ends, the version of the formula that uses user_tab_columns.num_distinct can be written as

   (required range) / (column high value - column low value) + N / num_distinct

For example, in a column with 1,000 different (integer) values ranging from 1 to 1000, and the pair of predicates colX > 10 and colX <= 20, you have one closed end (<=), and the formula would give you
   (20 - 10) / (1000 - 1) + 1/1000 = 10/999 + 1/1000

1.1 Date Values
Consider a date-only column holding values from 1 January 2000 to 31 December 2004; a predicate date_col between 30th Dec 2002 and 5th Jan 2003 would then give you a selectivity of
   (5th Jan 2003 - 30th Dec 2002 ) / ( 31st Dec 2004 - 1st Jan 2000 ) +
   2/(number of different dates) = 6 / 1826 + 2 / 1827

1.2 Character Values
In a test table, t1, with two columns, one of type varchar2(10) and one of type char(10), I have inserted the strings 'Aardvark', 'Apple', 'Blueberry', and 'Zymurgy' in both columns, and then generated statistics—including histograms.

create table t1 (
    v10    varchar2(10),
    c10    char(10)
)
;

insert into t1 values('Apple','Apple');
insert into t1 values('Blueberry','Blueberry');
insert into t1 values('Aardvark','Aardvark');
insert into t1 values('Zymurgy','Zymurgy');

commit;

begin
   dbms_stats.gather_table_stats(
        ownname           => user,
        tabname           => 't1',
        cascade           => true,
        estimate_percent   => null,
        method_opt       =>'for all columns size 10'
   );
end;
/

select
        column_name,
        endpoint_number,
        endpoint_value
from
        user_tab_histograms
where
        table_name = 'T1'
order by
        column_name,
        endpoint_Number
;

Col   End no                                        End Value
----- ------ ------------------------------------------------
C10        1 339,475,752,638,459,000,000,000,000,000,000,000     -- 'Aardvark '
           2 339,779,832,781,209,000,000,000,000,000,000,000     -- 'Apple     '
           3 344,891,393,972,447,000,000,000,000,000,000,000     -- 'Blueberry '
           4 469,769,561,047,943,000,000,000,000,000,000,000     -- 'Zymurgy   '
V10        1 339,475,752,638,459,000,000,000,000,000,000,000     -- 'Aardvark'
           2 339,779,832,781,057,000,000,000,000,000,000,000     -- 'Apple'
           3 344,891,393,972,447,000,000,000,000,000,000,000     -- 'Blueberry'
           4 469,769,561,047,943,000,000,000,000,000,000,000     -- 'Zymurgy'

Oracle appears to behave as follows:
• Extract a maximum of 32 bytes from the column; this representation of the column value is how the low_value, high_value and end_point_actual values are stored.
• Extract the first 15 bytes from the 32, padding with zeros at the right if necessary.
• Convert the 15 bytes hex number to decimal and round to 15 significant figures.

Let’s take a worked example—the string 'Aardvark'—to see if we can end up with the value stored in the histogram:
• 'Aardvark', when dumped in hex from the char(10) column, contains the following list of byte values (note the 20,20 space padding that is imposed by the char(10) definition):
'41,61,72,64,76,61,72,6B,20,20'.
• Since this is less than 15 bytes, we append a few zeros to produce the number
0x416172647661726B20200000000000. (If the column had been declared as char(40) the value would already have been padded with spaces (0x20) up to 40 characters, so we would stop at the fifteenth byte, and the value we would use would look like 0x416172647661726B20202020202020.)
• Converting this rather large hex number to decimal we get 339,475,752,638,459,043,065,991,628,037,554,176.
• And if we throw away everything after the first 15 digits then, as required, we get
339,475,752,638,459,000,000,000,000,000,000,000.

Looking at the rounding, you might notice that after about the first six or seven characters of a string, the rest of the letters don’t have any impact on the numeric representation used by the optimizer—which is why the numeric values for 'Apple' are the only ones that vary when comparing the char(10) with the varchar2(10) versions. We don’t really need to go into any more detail—but you could imagine that the optimizer might have trouble coping with data consisting (for example) of URLs, when lots of them start with http://.

The problem can be much worse, in fact, because of the increasing popularity of national language support. If you pick a multibyte character set for your database character set, then Oracle will be using the first 15 bytes of the string, not the first 15 characters. So the precision gets even worse.

1.3 Daft Data Types
One of the first examples in this chapter examined the predicate date_col between 30th Dec 2002 and 5th Jan 2003 on a table holding data from 1 January 2000 to 31 December 2004, and we saw how the optimizer could work out the selectivity of such a predicate. But let’s see how things can go wrong in popular implementations of storing dates. We create and populate a table with one row per date over a five-year date range

create table t1 (
    d1        date,
    n1        number(8),
    v1        varchar2(8)
)
;

insert into t1
select
    d1,
    to_number(to_char(d1,'yyyymmdd')),
    to_char(d1,'yyyymmdd')
from    (
    select
        to_date('31-Dec-1999') + rownum    d1
    from all_objects
    where
        rownum <= 1827
    )
;

begin
    dbms_stats.gather_table_stats(
        ownname            => user,
        tabname            => 't1',
        cascade            => true,
        estimate_percent    => null,
        method_opt        =>'for all columns size 1'
    );
end;
/

测试实例1
select    *
from    t1
where     d1 between to_date('30-Dec-2002','dd-mon-yyyy')
       and     to_date('05-Jan-2003','dd-mon-yyyy')
;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
----------------------------------------------------------
|   0 | SELECT STATEMENT |      |     8 |   184 |     4 |
|* 1 | TABLE ACCESS FULL| T1   |     8 |   184 |     4 |
----------------------------------------------------------
selectivity=6/1826 + 2/1827
cardinality=selectivity*row_nums=8(四舍五入）

测试实例2

select    *
from    t1
where     n1 between 20021230 and 20030105
;
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
----------------------------------------------------------
|   0 | SELECT STATEMENT |      |   396 | 9108 |     4 |
|* 1 | TABLE ACCESS FULL| T1   |   396 | 9108 |     4 |
----------------------------------------------------------
Selectivity = (required range) / (high value - low value) + 2/num_distinct
                 = (20030105 - 20021230) / (20041231 - 20000101) + 2/1827
                 =         8875          /         41130         + 2/1827
                 = 0.215779 + 0.001095
                 = 0.216874           -- more than 20% of the data apparently!
Cardinality = Selectivity * row_nums = 396

测试实例3
select    *
from    t1
where     v1 between '20021230' and '20030105'
;
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
----------------------------------------------------------
|   0 | SELECT STATEMENT |      |   396 | 9108 |     4 |
|* 1 | TABLE ACCESS FULL| T1   |   396 | 9108 |     4 |
----------------------------------------------------------
同测试实例2
Oracle seems to have treated the varchar2 column and its predicate values as if they were numeric. It is possible that a programmer in the CBO group has slipped in a “cunning guess” tweak to the optimizer in this very special case. The tweak disappears as soon as you format the date-like string to include nonnumeric characters (e.g., '2002-01-02').

二、Leading Zeros
create table t1
nologging
pctfree 0
as
with generator as (
    select    --+ materialize
        rownum     id
    from    all_objects
    where    rownum <= 2000
)
select
    /*+ ordered use_nl(v2) */
    trunc((rownum-1)/10000)        grp_id,
    lpad(rownum,18,'0')        id
--    'A' || lpad(rownum, 17, '0')    id
from
    generator    v1,
    generator    v2
where
    rownum <= 2000000
;

begin
    dbms_stats.gather_table_stats(
        ownname            => user,
        tabname            => 't1',
        cascade            => true,
        estimate_percent    => null,
        method_opt        =>'for all columns size 1'
    );
end;
/

select
    *
from t1
where
    id between '000000000000060000'
       and       '000000000000070000'
;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

----------------------------------------------------------
| Id | Operation         | Name | Rows | Bytes | Cost |
----------------------------------------------------------
|   0 | SELECT STATEMENT |      | 10002 |   214K| 1059 |
|* 1 | TABLE ACCESS FULL| T1   | 10002 |   214K| 1059 |
----------------------------------------------------------
11g的Cardinality已经修改过来了，9g的时候Cardinality计算是不对的。

三、Deadly Defaults
What might the average database-independent developer choose as a good value to represent a null date? How about something far in the future, like 31 December 4000?
Doing a quick calculation for the range scan selectivity, you believe the selectivity is
( 31 Dec 2003 - 01 Jan 2003 ) / (31 Dec 2004 - 01 Jan 2000) + 2/1827 = 0.20044
But, given the extra, far-out, value, the optimizer thinks the selectivity is
( 31 Dec 2003 - 01 Jan 2003 ) / (31 Dec 4000 - 01 Jan 2000) + 2/1828 = 0.00159

四、Function Figures
function(colx) = 'SMITH'             Fixed 1% selectivity.
not function(colx) = 'SMITH'         Fixed 5% of selectivity.
function(colx) > 'SMITH' Fixed 5% of selectivity.
not function(colx) > 'SMITH' Fixed 5% of selectivity.
function(colx) >= 'SMITH' and function(colx) < 'SMITI'
                                     Derived 0.25% selectivity (5% * 5%).
function(colx) between 'SMITHA' and 'SMITHZ'
                                     Derived 0.25% selectivity (5% * 5%).
not function(colx) between 'SMITHA' and 'SMITHZ'
                                     Derived 9.75% selectivity (5%+5%–(5% *5%)).
function(colx) like 'SMITH%'         Fixed 5% of selectivity
not function(colx) like 'SMITH%'     Fixed 5% of selectivity.
function(colx) in ('SMITH','JONES') Derived 1.99% (1% + 1% – (1% * 1%))

五、Correlated Columns
create table t1
nologging
as
select
    trunc(dbms_random.value(0,25))    n1,
    rpad('x',40)            ind_pad,
    trunc(dbms_random.value(0,20))    n2,
    lpad(rownum,10,'0')        small_vc,
    rpad('x',200)            padding
from
    all_objects
where
    rownum <= 10000
;

update t1 set n2 = n1;

create index t1_i1 on t1(n1, ind_pad, n2)
nologging
pctfree 91
;

begin
    dbms_stats.gather_table_stats(
        ownname => user,
        tabname    => 'T1',
        cascade    => true,
        estimate_percent => null,
        method_opt => 'for all columns size 1'
    );
end;
/

select
    table_name,
    blocks,
    num_rows
from    user_tables
where    table_name = 'T1'
;
TABLE_NAME               BLOCKS   NUM_ROWS
-------------------- ---------- ----------
T1                          387      10000

select
    num_rows, distinct_keys,
    blevel, leaf_blocks, clustering_factor,
    avg_leaf_blocks_per_key, avg_data_blocks_per_key
from
    user_indexes
where    table_name = 'T1'
and    index_name = 'T1_I1'
;

NUM_ROWS DISTINCT_KEYS     BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
---------- ------------- ---------- ----------- ----------------- ----------------------- -----------------------
     10000            25          2        1107              6153                      44                     246

select
    column_name,
    num_nulls, num_distinct, density,
    low_value, high_value
from
    user_tab_columns
where    table_name = 'T1'
and    column_name in ('N1','N2','IND_PAD')
order by
    column_name
;
Col                   NUM_NULLS NUM_DISTINCT    DENSITY
-------------------- ---------- ------------ ---------- IND_PAD                       0            1         1
N1                            0           25        .04
N2                            0           25        .04

测试实例1
select
    /*+ index(t1) */
    small_vc
from
    t1
where
    n1    = 2
and    ind_pad    = rpad('x',40)
and    n2    = 2
;
Execution Plan
----------------------------------------------------------
Plan hash value: 1429545322

---------------------------------------------------------------------
| Id | Operation                   | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   400 | 23200 |   294 |
|   1 | TABLE ACCESS BY INDEX ROWID| T1    |   400 | 23200 |   294 |
|* 2 |   INDEX RANGE SCAN          | T1_I1 |   400 |       |    47 |
---------------------------------------------------------------------
11g能够意思到n1和n2是有关联的，通过index的DISTINCT_KEYS来计算Cardinality
Cardinality = row_nums / DISTINCT_KEYS = 10000 / 25 = 400

订阅：博文 (Atom)

2011年11月28日星期一

使用SELECT * 语句的缺点（转）

2011年10月27日星期四

给胡XX、温XX的信

2011年8月19日星期五

centos挂载ntfs移动硬盘(转)

2011年7月31日星期日

How to use hints in Oracle sql for performance（转）

Hints for Optimization Approaches and Goals

Hints for Access Paths

Hints for Query Transformations

Hints for Join Orders

Hints for Join Operations

Histograms

2011年7月30日星期六

Selectivity Issues