Table Partitioning the way to go? [DB2]

Prev: Official Hollywood NetBook ™ Financial Professionals
Next: Insert slow in DPF environment.

From: shorti on 2 Jul 2010 13:22

I am working on a new database and am trying to determine how best to
set up a specific table (or set of tables). I did some research on
table level partitioning and was thinking this might be the way to go,
however, I am not entirely convinced and wanted to see if there was a
better solution. Here are the circumstances:

This will be set up on Linux using DB2 V9.7. There is a possibility
that HADR might be used. This particular table could contain 10's of
millions of records. The records could actually be split up into
groups to help split the table up and allow better performance for
searches. Each group could own up to 2 million records each. The
groups would not be time related...in other words, this is not a set
of daily or monthly data that would expire. Each group could start
out small and grow to the 2 million max. Each group could stay around
for an infinite amount of time (typical) or the entire group could be
removed or dropped (not so typical). Groups could also become
combined. A new group could be added at any time or removed at any
time. There will also be a high level of transactions to all the
groups so separating them may help performance. There could be a
large number of groups, however, I expect the typical number would be
around 24. It is expected that inserts will come in bulk to a group.
Also, there will be bulk updates to records within a group. Other
than potentially merging two groups, transactions to multiple groups
would not be common.

In saying all that, the objective is to be a high performance database
by reducing search times. Table partitioning seems to allow the
ability to separate common data into these groups for faster searches
and with the ability to separate bufferpools and indexing. The one
thing that bothers me is it seem table partitioning is used mainly for
time related data were the tables are separated by a 24 hour day and
possibly later merged to form a month then archived or removed
entirely. In my application, removing and merging partitions will not
be so common. Also, I am not sure how to tell DB2 where to store the
data (which partition) other than to say "Group1" or "Group2" since it
is not time related I cannot say put data in that group that is for
May 2010. Is this a problem...maybe I am just not seeing yet how to
set the groups up.

Are there any other suggestions?

From: Serge Rielau on 3 Jul 2010 08:53

On 7/2/2010 1:22 PM, shorti wrote:
> I am working on a new database and am trying to determine how best to
> set up a specific table (or set of tables). I did some research on
> table level partitioning and was thinking this might be the way to go,
> however, I am not entirely convinced and wanted to see if there was a
> better solution. Here are the circumstances:
>
> This will be set up on Linux using DB2 V9.7. There is a possibility
> that HADR might be used. This particular table could contain 10's of
> millions of records. The records could actually be split up into
> groups to help split the table up and allow better performance for
> searches. Each group could own up to 2 million records each. The
> groups would not be time related...in other words, this is not a set
> of daily or monthly data that would expire. Each group could start
> out small and grow to the 2 million max. Each group could stay around
> for an infinite amount of time (typical) or the entire group could be
> removed or dropped (not so typical). Groups could also become
> combined. A new group could be added at any time or removed at any
> time. There will also be a high level of transactions to all the
> groups so separating them may help performance. There could be a
> large number of groups, however, I expect the typical number would be
> around 24. It is expected that inserts will come in bulk to a group.
> Also, there will be bulk updates to records within a group. Other
> than potentially merging two groups, transactions to multiple groups
> would not be common.
>
> In saying all that, the objective is to be a high performance database
> by reducing search times. Table partitioning seems to allow the
> ability to separate common data into these groups for faster searches
> and with the ability to separate bufferpools and indexing. The one
> thing that bothers me is it seem table partitioning is used mainly for
> time related data were the tables are separated by a 24 hour day and
> possibly later merged to form a month then archived or removed
> entirely. In my application, removing and merging partitions will not
> be so common. Also, I am not sure how to tell DB2 where to store the
> data (which partition) other than to say "Group1" or "Group2" since it
> is not time related I cannot say put data in that group that is for
> May 2010. Is this a problem...maybe I am just not seeing yet how to
> set the groups up.
>
> Are there any other suggestions?
>
You may also look at multi-dimentional clustering (MDC).
In our case the dimension would likely be one.
Unliek range partitioning MDC is self maintaining.
That is a new "group" is added when the first row for it is added.
A group disappears when the last row disappears.
Delete's are sped up compared to regular deletes and they take less
logging (one log record per page).
You will get speed up similar to range partitioning for queries.

Teh downside of MDC compared to range partitioning is that you cannot
move a partitition (i.e. detach and archive) or attach a preloaded set
of data. Also indexes are global.

Cheers
Serge

--
Serge Rielau
SQL Architect DB2 for LUW
IBM Toronto Lab

From: shorti on 6 Jul 2010 11:44

On Jul 3, 5:53 am, Serge Rielau <srie...(a)ca.ibm.com> wrote:
> On 7/2/2010 1:22 PM, shorti wrote:
>
> > I am working on a new database and am trying to determine how best to
> > set up a specific table (or set of tables). I did some research on
> > table level partitioning and was thinking this might be the way to go,
> > however, I am not entirely convinced and wanted to see if there was a
> > better solution. Here are the circumstances:
>
> > This will be set up on Linux using DB2 V9.7. There is a possibility
> > that HADR might be used. This particular table could contain 10's of
> > millions of records. The records could actually be split up into
> > groups to help split the table up and allow better performance for
> > searches. Each group could own up to 2 million records each. The
> > groups would not be time related...in other words, this is not a set
> > of daily or monthly data that would expire. Each group could start
> > out small and grow to the 2 million max. Each group could stay around
> > for an infinite amount of time (typical) or the entire group could be
> > removed or dropped (not so typical). Groups could also become
> > combined. A new group could be added at any time or removed at any
> > time. There will also be a high level of transactions to all the
> > groups so separating them may help performance. There could be a
> > large number of groups, however, I expect the typical number would be
> > around 24. It is expected that inserts will come in bulk to a group.
> > Also, there will be bulk updates to records within a group. Other
> > than potentially merging two groups, transactions to multiple groups
> > would not be common.
>
> > In saying all that, the objective is to be a high performance database
> > by reducing search times. Table partitioning seems to allow the
> > ability to separate common data into these groups for faster searches
> > and with the ability to separate bufferpools and indexing. The one
> > thing that bothers me is it seem table partitioning is used mainly for
> > time related data were the tables are separated by a 24 hour day and
> > possibly later merged to form a month then archived or removed
> > entirely. In my application, removing and merging partitions will not
> > be so common. Also, I am not sure how to tell DB2 where to store the
> > data (which partition) other than to say "Group1" or "Group2" since it
> > is not time related I cannot say put data in that group that is for
> > May 2010. Is this a problem...maybe I am just not seeing yet how to
> > set the groups up.
>
> > Are there any other suggestions?
>
> You may also look at multi-dimentional clustering (MDC).
> In our case the dimension would likely be one.
> Unliek range partitioning MDC is self maintaining.
> That is a new "group" is added when the first row for it is added.
> A group disappears when the last row disappears.
> Delete's are sped up compared to regular deletes and they take less
> logging (one log record per page).
> You will get speed up similar to range partitioning for queries.
>
> Teh downside of MDC compared to range partitioning is that you cannot
> move a partitition (i.e. detach and archive) or attach a preloaded set
> of data. Also indexes are global.
>
> Cheers
> Serge
>
> --
> Serge Rielau
> SQL Architect DB2 for LUW
> IBM Toronto Lab

Thanks Serge, I am amazed you were able to find my post among all the
junk spam posts. I will do some research on MDC and see if its a
better fit!

From: Naresh Chainani on 6 Jul 2010 12:52

On Jul 2, 10:22 am, shorti <lbrya...(a)juno.com> wrote:
> I am working on a new database and am trying to determine how best to
> set up a specific table (or set of tables). I did some research on
> table level partitioning and was thinking this might be the way to go,
> however, I am not entirely convinced and wanted to see if there was a
> better solution. Here are the circumstances:
>
> This will be set up on Linux using DB2 V9.7. There is a possibility
> that HADR might be used. This particular table could contain 10's of
> millions of records. The records could actually be split up into
> groups to help split the table up and allow better performance for
> searches. Each group could own up to 2 million records each. The
> groups would not be time related...in other words, this is not a set
> of daily or monthly data that would expire. Each group could start
> out small and grow to the 2 million max. Each group could stay around
> for an infinite amount of time (typical) or the entire group could be
> removed or dropped (not so typical). Groups could also become
> combined. A new group could be added at any time or removed at any
> time. There will also be a high level of transactions to all the
> groups so separating them may help performance. There could be a
> large number of groups, however, I expect the typical number would be
> around 24. It is expected that inserts will come in bulk to a group.
> Also, there will be bulk updates to records within a group. Other
> than potentially merging two groups, transactions to multiple groups
> would not be common.
>
> In saying all that, the objective is to be a high performance database
> by reducing search times. Table partitioning seems to allow the
> ability to separate common data into these groups for faster searches
> and with the ability to separate bufferpools and indexing. The one
> thing that bothers me is it seem table partitioning is used mainly for
> time related data were the tables are separated by a 24 hour day and
> possibly later merged to form a month then archived or removed
> entirely. In my application, removing and merging partitions will not
> be so common. Also, I am not sure how to tell DB2 where to store the
> data (which partition) other than to say "Group1" or "Group2" since it
> is not time related I cannot say put data in that group that is for
> May 2010. Is this a problem...maybe I am just not seeing yet how to
> set the groups up.
>
> Are there any other suggestions?

Few thoughts:

While table partitioning is commonly used for time-based partitioning,
we have some customers using it to partition various groups (prodID,
transactionID) into different table spaces. In your case, the likely
partitioning key will be "groupid" and at table creation time you will
specify each of the 24 partitions (aka groups) along with data and
index table spaces for each partition. Having a notion of time for
each partition is not required.

Do you need the flexibility of choosing data and/or index table space
for each group? For instance, may be some groups are more important
than others and need to be backed up more frequently. Table
partitioning provides you this flexibility.

MDC accommodates creation of new groups and deletion of old groups
automatically while with table partitioning required adding a new
table partition (ALTER TABLE ADD PARTITION) or detaching an existing
partition that is no longer required using ALTER TABLE DETACH
PARTITION.

Combining groups would be trickier with either option unless you
actually updating the "groupid" to achieve the same.

Finally, either option should provide decent search performance by
elimination unnecessary groups. With table partitioning, partition
elimination will come into play where the optimizer looks at query
predicates to determine what partitions need to be accessed.

Naresh

From: shorti on 6 Jul 2010 17:00

On Jul 6, 9:52 am, Naresh Chainani <fornar...(a)gmail.com> wrote:
>
> Few thoughts:
>
> While table partitioning is commonly used for time-based partitioning,
> we have some customers using it to partition various groups (prodID,
> transactionID) into different table spaces. In your case, the likely
> partitioning key will be "groupid" and at table creation time you will
> specify each of the 24 partitions (aka groups) along with data and
> index table spaces for each partition. Having a notion of time for
> each partition is not required.
>
> Do you need the flexibility of choosing data and/or index table space
> for each group? For instance, may be some groups are more important
> than others and need to be backed up more frequently. Table
> partitioning provides you this flexibility.
>
> MDC accommodates creation of new groups and deletion of old groups
> automatically while with table partitioning required adding a new
> table partition (ALTER TABLE ADD PARTITION) or detaching an existing
> partition that is no longer required using ALTER TABLE DETACH
> PARTITION.
>
> Combining groups would be trickier with either option unless you
> actually updating the "groupid" to achieve the same.
>
> Finally, either option should provide decent search performance by
> elimination unnecessary groups. With table partitioning, partition
> elimination will come into play where the optimizer looks at query
> predicates to determine what partitions need to be accessed.
>
> Naresh

Naresh,

Thanks for the information. It answers a few of my concerns regarding
the use of the table
partitions. Its nice to know I have the choice to backup one
partition more frequently.

|
Pages: 1
Prev: Official Hollywood NetBook ™ Financial Professionals
Next: Insert slow in DPF environment.