multi-column indexes, statistics and selectivity [Oracle]

Prev: createing db 11gr1 on grid 11gr2
Next: Documentation

From: bugbear on 4 Dec 2009 11:48

Can anyone tell me (or point me at information) about
how statistics are generated/used for multi-column indexes?

We have a 2 column status model (major and minor status, if you like).
This is key to our application.

"Naturally" we have put a 2 column index on these two columns.

But we have "doubts" about whether the CBO is always
doing as good as job as we'd like.

The key question seems to be:

Are the columns "treated as a pair" so that the
frequency estimate for a pair of statuses depends
on both of them or...

Are the columns treated in isolation
so that the frequency estimate
is simply obtained by getting the estimate
for each column, and combining them mathematically.

This would make a big difference in our app,
since our major/minor status (in practice) have interesting
and complex correlations.

BugBear

From: bugbear on 4 Dec 2009 12:09

bugbear wrote:
> Can anyone tell me (or point me at information) about
> how statistics are generated/used for multi-column indexes?
>

Apologies: I should have said "in 10g".

I've also found a reference to a new feature in 11g;
"Multi-column histograms"

If this is new in 11g it has "unfortunate" implications
for our running on 10g...

BugBear

From: joel garry on 4 Dec 2009 12:29

On Dec 4, 9:09 am, bugbear <bugbear(a)trim_papermule.co.uk_trim> wrote:
> bugbear wrote:
> > Can anyone tell me (or point me at information) about
> > how statistics are generated/used for multi-column indexes?
>
> Apologies: I should have said "in 10g".
>
> I've also found a reference to a new feature in 11g;
> "Multi-column histograms"
>
> If this is new in 11g it has "unfortunate" implications
> for our running on 10g...
>
> BugBear

Check this out:
http://richardfoote.wordpress.com/2008/09/16/index-monitoring-and-index-statistics-the-great-gig-in-the-sky/#comments

jg
--
@home.com is bogus.
"...Network hardware includes Dell servers, Dell workstations, IBM
RS6000 and P570 RISK based systems, Cisco routers, and HP ProCurve
switchs.
MINIMUM REQUIREMENTS
Qualified candidates must have a Bachelors degree in Computer Science
or a related degree with 4-6 years programming experience with HTML/
CGI/Java, ASP.Net and an understanding of internet working and
knowledge of applications available on the WEB, Business Objects XIR2,
Crystal Reports XIR2, VB6 Programming Language, Visual Basic and
Studio, in an Oracle/AIX/2003 environment, or..." - metrolink job ad.

From: Mladen Gogala on 4 Dec 2009 17:21

On Fri, 04 Dec 2009 16:48:43 +0000, bugbear wrote:

> Can anyone tell me (or point me at information) about how statistics are
> generated/used for multi-column indexes?
>
> We have a 2 column status model (major and minor status, if you like).
> This is key to our application.
>
> "Naturally" we have put a 2 column index on these two columns.
>
> But we have "doubts" about whether the CBO is always doing as good as
> job as we'd like.
>
> The key question seems to be:
>
> Are the columns "treated as a pair" so that the frequency estimate for a
> pair of statuses depends on both of them or...
>
> Are the columns treated in isolation
> so that the frequency estimate
> is simply obtained by getting the estimate for each column, and
> combining them mathematically.
>
> This would make a big difference in our app, since our major/minor
> status (in practice) have interesting and complex correlations.
>
> BugBear

10g doesn't do multi-column histograms. 10g will create histograms for
every single column of the index and will estimate the number of entries
retrieved for each column by using those histograms. After that, it will
calculate the selectivity of the conditions by multiplying the
selectivities for every single column. Selectivity is the estimated
number of retrieved values divided by the total number of values in that
column. When the CBO gets the selectivity, it will multiply it by the
index clustering factor to estimate the number of blocks that need to be
retrieved. The correlation between the number of blocks and the price of
the query is known only to the God and Jonathan, but there definitely is
one.

--
http://mgogala.byethost5.com

From: Robert Klemme on 5 Dec 2009 08:18

On 12/04/2009 11:21 PM, Mladen Gogala wrote:
> On Fri, 04 Dec 2009 16:48:43 +0000, bugbear wrote:
>
>> Can anyone tell me (or point me at information) about how statistics are
>> generated/used for multi-column indexes?
>>
>> We have a 2 column status model (major and minor status, if you like).
>> This is key to our application.
>>
>> "Naturally" we have put a 2 column index on these two columns.
>>
>> But we have "doubts" about whether the CBO is always doing as good as
>> job as we'd like.
>>
>> The key question seems to be:
>>
>> Are the columns "treated as a pair" so that the frequency estimate for a
>> pair of statuses depends on both of them or...
>>
>> Are the columns treated in isolation
>> so that the frequency estimate
>> is simply obtained by getting the estimate for each column, and
>> combining them mathematically.
>>
>> This would make a big difference in our app, since our major/minor
>> status (in practice) have interesting and complex correlations.
>
> 10g doesn't do multi-column histograms. 10g will create histograms for
> every single column of the index and will estimate the number of entries
> retrieved for each column by using those histograms. After that, it will
> calculate the selectivity of the conditions by multiplying the
> selectivities for every single column. Selectivity is the estimated
> number of retrieved values divided by the total number of values in that
> column. When the CBO gets the selectivity, it will multiply it by the
> index clustering factor to estimate the number of blocks that need to be
> retrieved. The correlation between the number of blocks and the price of
> the query is known only to the God and Jonathan, but there definitely is
> one.

Are there cases where it makes sense to use a FBI to get "multi column"
histogram information? Of course, queries then would also have to use
that concatenated value as query criteria which makes usage of this
quite nasty (especially if the SQL is generated by some kind of
persistence container).

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

| Next | Last
Pages: 1 2 3
Prev: createing db 11gr1 on grid 11gr2
Next: Documentation