Slow Backups [DB2]

Prev: Need HowTo for using included TSA on windows platform
Next: Newbie Question

From: esmith2112 on 14 Dec 2009 10:53

In a corporate consolidation of servers, our DB2 server got migrated
to a new shared enviroment comprised of IBM P6 with 2 LPARs, 9 CPUs,
25 GB memory, and an EMC Symmetrix RAID5 SAN for storage. We're
running AIX 5.3 and DB2 9.5. Before the migration we ran on a slower
16-CPU machine with a fully mirrored disk system. In general
performance is better on the new machine for transactional processing.
The exception is one database backup which is now 2 to 3 times slower
for the actual "backup database" command. What's strange, to me at
least, is that there are two other databases which are in the 20-30 GB
size range which have backup times similar to their times on the old
server. The slow database is just under 200 GB in size and backs up in
6 hours instead of 2 hours and change.

The new system was created by imaging the old system and laying down a
copy on the new box with all directory structures being identical. All
tablespace storage is SMS. We suspected this had something to do with
the SAN component and played with registry settings like
DB2_PARALLEL_IO, and also tried altering the backup command to specify
explicit PARALLELISM values, all to no avail. Since it's just the one
database, we thought there was something special about the way it was
defined, but failed to turn up any significant differences (not that
there aren't any--we just couldn't find them).

What other parameters should we be looking at? My skill set is more
toward applications, than system administration, so I'm at a little
bit of a loss here.

Thanks,

Evan

From: stefan.albert on 15 Dec 2009 05:28

The first thing I would look at, is the backup device and the physical
attachment of it.
You don't say anything about it, so my first shot would be: if the
read performance improved, the backup time should be also faster if
you use the same backup device AND (and that's my second shot) it's
not the same where the data lives. The data disks were mirrored and
are now a RAID-5, if you write your backup on the same disks, the
write rate can be slower because of parity generation and writing.
This could explain the decrease of write performance. What about your
write performance for data itself - and where do you write your backup
to?
Anyway: For security reasons, the backup should not be on the platform
where the data lives

On Dec 14, 4:53 pm, esmith2112 <esmith2...(a)gmail.com> wrote:
> In a corporate consolidation of servers, our DB2 server got migrated
> to a new shared enviroment comprised of IBM P6 with 2 LPARs, 9 CPUs,
> 25 GB memory, and an EMC Symmetrix RAID5 SAN for storage. We're
> running AIX 5.3 and DB2 9.5. Before the migration we ran on a slower
> 16-CPU machine with a fully mirrored disk system. In general
> performance is better on the new machine for transactional processing.
> The exception is one database backup which is now 2 to 3 times slower
> for the actual "backup database" command. What's strange, to me at
> least, is that there are two other databases which are in the 20-30 GB
> size range which have backup times similar to their times on the old
> server. The slow database is just under 200 GB in size and backs up in
> 6 hours instead of 2 hours and change.
>
> The new system was created by imaging the old system and laying down a
> copy on the new box with all directory structures being identical. All
> tablespace storage is SMS. We suspected this had something to do with
> the SAN component and played with registry settings like
> DB2_PARALLEL_IO, and also tried altering the backup command to specify
> explicit PARALLELISM values, all to no avail. Since it's just the one
> database, we thought there was something special about the way it was
> defined, but failed to turn up any significant differences (not that
> there aren't any--we just couldn't find them).
>
> What other parameters should we be looking at? My skill set is more
> toward applications, than system administration, so I'm at a little
> bit of a loss here.
>
> Thanks,
>
> Evan

From: esmith2112 on 15 Dec 2009 10:15

On Dec 15, 5:28 am, "stefan.albert" <stefan.alb...(a)spb.de> wrote:
> The first thing I would look at, is the backup device and the physical
> attachment of it.
> You don't say anything about it, so my first shot would be: if the
> read performance improved, the backup time should be also faster if
> you use the same backup device AND (and that's my second shot) it's
> not the same where the data lives. The data disks were mirrored and
> are now a RAID-5, if you write your backup on the same disks, the
> write rate can be slower because of parity generation and writing.
> This could explain the decrease of write performance. What about your
> write performance for data itself - and where do you write your backup
> to?
> Anyway: For security reasons, the backup should not be on the platform
> where the data lives
>

Oops, I guess it doesn't paint a complete picture without detailing
the target. The files are indeed backed up to disk on the same SAN
device where the data resides, then picked up by TSM and written to
tape. We find it odd that the other databases backed up in the same
fashion, don't suffer from similar performance hits. We suspected it
was something particular to the instance or database itself in
relationship to the SAN.

But you may be on to something with the parity generation. Since after
posting, I loaded data via the IMPORT command into a table on the
database in question. It was 32K rows (50-bytes each) that takes
approximately 20 seconds on the old server but takes over 10 minutes
on the new server. Could the RAID overhead cause such hit?

From: darko on 15 Dec 2009 13:07

On Dec 15, 4:15 pm, esmith2112 <esmith2...(a)gmail.com> wrote:
> On Dec 15, 5:28 am, "stefan.albert" <stefan.alb...(a)spb.de> wrote:
>
> > The first thing I would look at, is the backup device and the physical
> > attachment of it.
> > You don't say anything about it, so my first shot would be: if the
> > read performance improved, the backup time should be also faster if
> > you use the same backup device AND (and that's my second shot) it's
> > not the same where the data lives. The data disks were mirrored and
> > are now a RAID-5, if you write your backup on the same disks, the
> > write rate can be slower because of parity generation and writing.
> > This could explain the decrease of write performance. What about your
> > write performance for data itself - and where do you write your backup
> > to?
> > Anyway: For security reasons, the backup should not be on the platform
> > where the data lives
>
> Oops, I guess it doesn't paint a complete picture without detailing
> the target. The files are indeed backed up to disk on the same SAN
> device where the data resides, then picked up by TSM and written to
> tape. We find it odd that the other databases backed up in the same
> fashion, don't suffer from similar performance hits. We suspected it
> was something particular to the instance or database itself in
> relationship to the SAN.
>
> But you may be on to something with the parity generation. Since after
> posting, I loaded data via the IMPORT command into a table on the
> database in question. It was 32K rows (50-bytes each) that takes
> approximately 20 seconds on the old server but takes over 10 minutes
> on the new server. Could the RAID overhead cause such hit?

You did not state clearly if the backup is written to the same disks
(in RAID 5) that contain the database. Then, it might be possible that
you have slower backup due to disk layout. Regarding RAID 5, everyone
should at least look at www.baarf.com. RAID 5 should not be
performance limiting factor for backup operations since most write
operations during backup should be full stripe writes, which avoid
write penalty of RAID 5.

It would not be good practice to put everything (and especially
tablespaces and logs) on same disks in RAID 5 configuration, although
Symmetrix storages have massive caches.

However, I doubt that disk layout may be the only one to blame for
slowing down from 20 seconds to over 10 minutes for data load. You
will probably have to investigate for additional suspects.

Darko Krstic

From: stefan.albert on 17 Dec 2009 04:49

Hmm - thats a difficult one...
One thing comes up in my mind: Block sizes. May be the block sizes of
DB / OS / EMC² don't match, therefore much more data is written than
actually needed.
You can try to monitor the traffic from DB-disks to your server and
then the traffic back to the disks where the backup lives. If these
are the same adapters the traffic is mixed up, but there might be the
chance to look for the reads (DB->server) and writes (server->backup).
But different adapters would be better to monitor.
Or you have the chance to monitor the traffic for the file systems.
For AIX (I don't know your OS) you could use nmon...
When you see much more writing than reading (if only the backup is
active on the server) I would assume, that the page sizes don't match.
I don't know if there is an internal monitor for the traffic in the
EMC² box - that would also be a good thing to look at...

On Dec 15, 4:15 pm, esmith2112 <esmith2...(a)gmail.com> wrote:
> On Dec 15, 5:28 am, "stefan.albert" <stefan.alb...(a)spb.de> wrote:
>
> > The first thing I would look at, is the backup device and the physical
> > attachment of it.
> > You don't say anything about it, so my first shot would be: if the
> > read performance improved, the backup time should be also faster if
> > you use the same backup device AND (and that's my second shot) it's
> > not the same where the data lives. The data disks were mirrored and
> > are now a RAID-5, if you write your backup on the same disks, the
> > write rate can be slower because of parity generation and writing.
> > This could explain the decrease of write performance. What about your
> > write performance for data itself - and where do you write your backup
> > to?
> > Anyway: For security reasons, the backup should not be on the platform
> > where the data lives
>
> Oops, I guess it doesn't paint a complete picture without detailing
> the target. The files are indeed backed up to disk on the same SAN
> device where the data resides, then picked up by TSM and written to
> tape. We find it odd that the other databases backed up in the same
> fashion, don't suffer from similar performance hits. We suspected it
> was something particular to the instance or database itself in
> relationship to the SAN.
>
> But you may be on to something with the parity generation. Since after
> posting, I loaded data via the IMPORT command into a table on the
> database in question. It was 32K rows (50-bytes each) that takes
> approximately 20 seconds on the old server but takes over 10 minutes
> on the new server. Could the RAID overhead cause such hit?

|
Pages: 1
Prev: Need HowTo for using included TSA on windows platform
Next: Newbie Question