From: shorti on 2 Jul 2010 13:22 I am working on a new database and am trying to determine how best to set up a specific table (or set of tables). I did some research on table level partitioning and was thinking this might be the way to go, however, I am not entirely convinced and wanted to see if there was a better solution. Here are the circumstances: This will be set up on Linux using DB2 V9.7. There is a possibility that HADR might be used. This particular table could contain 10's of millions of records. The records could actually be split up into groups to help split the table up and allow better performance for searches. Each group could own up to 2 million records each. The groups would not be time related...in other words, this is not a set of daily or monthly data that would expire. Each group could start out small and grow to the 2 million max. Each group could stay around for an infinite amount of time (typical) or the entire group could be removed or dropped (not so typical). Groups could also become combined. A new group could be added at any time or removed at any time. There will also be a high level of transactions to all the groups so separating them may help performance. There could be a large number of groups, however, I expect the typical number would be around 24. It is expected that inserts will come in bulk to a group. Also, there will be bulk updates to records within a group. Other than potentially merging two groups, transactions to multiple groups would not be common. In saying all that, the objective is to be a high performance database by reducing search times. Table partitioning seems to allow the ability to separate common data into these groups for faster searches and with the ability to separate bufferpools and indexing. The one thing that bothers me is it seem table partitioning is used mainly for time related data were the tables are separated by a 24 hour day and possibly later merged to form a month then archived or removed entirely. In my application, removing and merging partitions will not be so common. Also, I am not sure how to tell DB2 where to store the data (which partition) other than to say "Group1" or "Group2" since it is not time related I cannot say put data in that group that is for May 2010. Is this a problem...maybe I am just not seeing yet how to set the groups up. Are there any other suggestions?
From: Serge Rielau on 3 Jul 2010 08:53 On 7/2/2010 1:22 PM, shorti wrote: > I am working on a new database and am trying to determine how best to > set up a specific table (or set of tables). I did some research on > table level partitioning and was thinking this might be the way to go, > however, I am not entirely convinced and wanted to see if there was a > better solution. Here are the circumstances: > > This will be set up on Linux using DB2 V9.7. There is a possibility > that HADR might be used. This particular table could contain 10's of > millions of records. The records could actually be split up into > groups to help split the table up and allow better performance for > searches. Each group could own up to 2 million records each. The > groups would not be time related...in other words, this is not a set > of daily or monthly data that would expire. Each group could start > out small and grow to the 2 million max. Each group could stay around > for an infinite amount of time (typical) or the entire group could be > removed or dropped (not so typical). Groups could also become > combined. A new group could be added at any time or removed at any > time. There will also be a high level of transactions to all the > groups so separating them may help performance. There could be a > large number of groups, however, I expect the typical number would be > around 24. It is expected that inserts will come in bulk to a group. > Also, there will be bulk updates to records within a group. Other > than potentially merging two groups, transactions to multiple groups > would not be common. > > In saying all that, the objective is to be a high performance database > by reducing search times. Table partitioning seems to allow the > ability to separate common data into these groups for faster searches > and with the ability to separate bufferpools and indexing. The one > thing that bothers me is it seem table partitioning is used mainly for > time related data were the tables are separated by a 24 hour day and > possibly later merged to form a month then archived or removed > entirely. In my application, removing and merging partitions will not > be so common. Also, I am not sure how to tell DB2 where to store the > data (which partition) other than to say "Group1" or "Group2" since it > is not time related I cannot say put data in that group that is for > May 2010. Is this a problem...maybe I am just not seeing yet how to > set the groups up. > > Are there any other suggestions? > You may also look at multi-dimentional clustering (MDC). In our case the dimension would likely be one. Unliek range partitioning MDC is self maintaining. That is a new "group" is added when the first row for it is added. A group disappears when the last row disappears. Delete's are sped up compared to regular deletes and they take less logging (one log record per page). You will get speed up similar to range partitioning for queries. Teh downside of MDC compared to range partitioning is that you cannot move a partitition (i.e. detach and archive) or attach a preloaded set of data. Also indexes are global. Cheers Serge -- Serge Rielau SQL Architect DB2 for LUW IBM Toronto Lab
From: shorti on 6 Jul 2010 11:44 On Jul 3, 5:53 am, Serge Rielau <srie...(a)ca.ibm.com> wrote: > On 7/2/2010 1:22 PM, shorti wrote: > > > I am working on a new database and am trying to determine how best to > > set up a specific table (or set of tables). I did some research on > > table level partitioning and was thinking this might be the way to go, > > however, I am not entirely convinced and wanted to see if there was a > > better solution. Here are the circumstances: > > > This will be set up on Linux using DB2 V9.7. There is a possibility > > that HADR might be used. This particular table could contain 10's of > > millions of records. The records could actually be split up into > > groups to help split the table up and allow better performance for > > searches. Each group could own up to 2 million records each. The > > groups would not be time related...in other words, this is not a set > > of daily or monthly data that would expire. Each group could start > > out small and grow to the 2 million max. Each group could stay around > > for an infinite amount of time (typical) or the entire group could be > > removed or dropped (not so typical). Groups could also become > > combined. A new group could be added at any time or removed at any > > time. There will also be a high level of transactions to all the > > groups so separating them may help performance. There could be a > > large number of groups, however, I expect the typical number would be > > around 24. It is expected that inserts will come in bulk to a group. > > Also, there will be bulk updates to records within a group. Other > > than potentially merging two groups, transactions to multiple groups > > would not be common. > > > In saying all that, the objective is to be a high performance database > > by reducing search times. Table partitioning seems to allow the > > ability to separate common data into these groups for faster searches > > and with the ability to separate bufferpools and indexing. The one > > thing that bothers me is it seem table partitioning is used mainly for > > time related data were the tables are separated by a 24 hour day and > > possibly later merged to form a month then archived or removed > > entirely. In my application, removing and merging partitions will not > > be so common. Also, I am not sure how to tell DB2 where to store the > > data (which partition) other than to say "Group1" or "Group2" since it > > is not time related I cannot say put data in that group that is for > > May 2010. Is this a problem...maybe I am just not seeing yet how to > > set the groups up. > > > Are there any other suggestions? > > You may also look at multi-dimentional clustering (MDC). > In our case the dimension would likely be one. > Unliek range partitioning MDC is self maintaining. > That is a new "group" is added when the first row for it is added. > A group disappears when the last row disappears. > Delete's are sped up compared to regular deletes and they take less > logging (one log record per page). > You will get speed up similar to range partitioning for queries. > > Teh downside of MDC compared to range partitioning is that you cannot > move a partitition (i.e. detach and archive) or attach a preloaded set > of data. Also indexes are global. > > Cheers > Serge > > -- > Serge Rielau > SQL Architect DB2 for LUW > IBM Toronto Lab Thanks Serge, I am amazed you were able to find my post among all the junk spam posts. I will do some research on MDC and see if its a better fit!
From: Naresh Chainani on 6 Jul 2010 12:52 On Jul 2, 10:22 am, shorti <lbrya...(a)juno.com> wrote: > I am working on a new database and am trying to determine how best to > set up a specific table (or set of tables). I did some research on > table level partitioning and was thinking this might be the way to go, > however, I am not entirely convinced and wanted to see if there was a > better solution. Here are the circumstances: > > This will be set up on Linux using DB2 V9.7. There is a possibility > that HADR might be used. This particular table could contain 10's of > millions of records. The records could actually be split up into > groups to help split the table up and allow better performance for > searches. Each group could own up to 2 million records each. The > groups would not be time related...in other words, this is not a set > of daily or monthly data that would expire. Each group could start > out small and grow to the 2 million max. Each group could stay around > for an infinite amount of time (typical) or the entire group could be > removed or dropped (not so typical). Groups could also become > combined. A new group could be added at any time or removed at any > time. There will also be a high level of transactions to all the > groups so separating them may help performance. There could be a > large number of groups, however, I expect the typical number would be > around 24. It is expected that inserts will come in bulk to a group. > Also, there will be bulk updates to records within a group. Other > than potentially merging two groups, transactions to multiple groups > would not be common. > > In saying all that, the objective is to be a high performance database > by reducing search times. Table partitioning seems to allow the > ability to separate common data into these groups for faster searches > and with the ability to separate bufferpools and indexing. The one > thing that bothers me is it seem table partitioning is used mainly for > time related data were the tables are separated by a 24 hour day and > possibly later merged to form a month then archived or removed > entirely. In my application, removing and merging partitions will not > be so common. Also, I am not sure how to tell DB2 where to store the > data (which partition) other than to say "Group1" or "Group2" since it > is not time related I cannot say put data in that group that is for > May 2010. Is this a problem...maybe I am just not seeing yet how to > set the groups up. > > Are there any other suggestions? Few thoughts: While table partitioning is commonly used for time-based partitioning, we have some customers using it to partition various groups (prodID, transactionID) into different table spaces. In your case, the likely partitioning key will be "groupid" and at table creation time you will specify each of the 24 partitions (aka groups) along with data and index table spaces for each partition. Having a notion of time for each partition is not required. Do you need the flexibility of choosing data and/or index table space for each group? For instance, may be some groups are more important than others and need to be backed up more frequently. Table partitioning provides you this flexibility. MDC accommodates creation of new groups and deletion of old groups automatically while with table partitioning required adding a new table partition (ALTER TABLE ADD PARTITION) or detaching an existing partition that is no longer required using ALTER TABLE DETACH PARTITION. Combining groups would be trickier with either option unless you actually updating the "groupid" to achieve the same. Finally, either option should provide decent search performance by elimination unnecessary groups. With table partitioning, partition elimination will come into play where the optimizer looks at query predicates to determine what partitions need to be accessed. Naresh
From: shorti on 6 Jul 2010 17:00 On Jul 6, 9:52 am, Naresh Chainani <fornar...(a)gmail.com> wrote: > > Few thoughts: > > While table partitioning is commonly used for time-based partitioning, > we have some customers using it to partition various groups (prodID, > transactionID) into different table spaces. In your case, the likely > partitioning key will be "groupid" and at table creation time you will > specify each of the 24 partitions (aka groups) along with data and > index table spaces for each partition. Having a notion of time for > each partition is not required. > > Do you need the flexibility of choosing data and/or index table space > for each group? For instance, may be some groups are more important > than others and need to be backed up more frequently. Table > partitioning provides you this flexibility. > > MDC accommodates creation of new groups and deletion of old groups > automatically while with table partitioning required adding a new > table partition (ALTER TABLE ADD PARTITION) or detaching an existing > partition that is no longer required using ALTER TABLE DETACH > PARTITION. > > Combining groups would be trickier with either option unless you > actually updating the "groupid" to achieve the same. > > Finally, either option should provide decent search performance by > elimination unnecessary groups. With table partitioning, partition > elimination will come into play where the optimizer looks at query > predicates to determine what partitions need to be accessed. > > Naresh Naresh, Thanks for the information. It answers a few of my concerns regarding the use of the table partitions. Its nice to know I have the choice to backup one partition more frequently.
|
Pages: 1 Prev: Official Hollywood NetBook ™ Financial Professionals Next: Insert slow in DPF environment. |