Recovery Model for the Distribution Database [Replication]

Prev: Web replication
Next: Snapshot agent won't start

From: Aramid on 14 Dec 2009 11:29

It's been a heated discussion in the office now whether our Distribution
database should be on FULL or SIMPLE (default) recovery model, and I hope
anyone of you could shed some light.

Our infrastructure is composed of four (4) P2P sites, each one located in a
different geographical area. We are using SQL 2005 x64 EE, on 32GB 8-way
machines. Each site has a 2-node cluster for the production database, and
another 2-node cluster solely dedicated for the distribution db. The shared
storage for both clusters in all sites are SANs. There's a relatively large
amount of traffic between the sites, with about 100mb of data updates per
site every hour. Our application must run 24/7.

Synch with backup is enabled on the distributor, but not on the publication.
Right now, we decided to put the distribution database in FULL recovery
mode, thinking that frequent t-log backups (hourly) will help truncate the
publication logs quickly, which are also backed up hourly.

Some engineers in the organization argues that the distribution database
should be set back to SIMPLE (the default) and argues that this is better in
terms of overall performance and ease of management. Others claim that
having the distribution database in full recovery helps in keeping it trim,
with a fixed-size transaction log.

I've searched the web for days and could not see any authoritative
discussion on this, other than Hilary Cotters opinion that it should be set
in full to keep the t-log size static. What should be the recovery model for
the distribution database relative to our scenario?

Thanks for your help!

Aramid

From: hilary on 14 Dec 2009 14:53

When it comes down to it, it really depends on what your recoverability/DR
goals are.

There is little performance advantage to full over simple. I always like to
use full as it helps me to maintain the tlogs. I have run into situations
with simple where the log does grow uncontrollably and I don't feel
comfortable with that in a production environment.

What you need to focus on is what will happen if you loose one of your
nodes. When you restore you can expect a lot of write activty as each node
backfills missing transactions with the node which has just come back
online. If you can only restore this database back to the time of last
nights backup there will be a lot of read and write activity occuring.
Minimizing this window by restoring only the tlogs will help.

While your engineers are correct - simple does mean less management, in my
opinion I would rather do the minimally more management (essentially one
script which can be scheduled) and be protected from a disaster, as opposed
to using simple and having to wait hours to catch up from a disaster.

Full does not really guanrantee you a fixed size of log. Your log will vary
depending on the amount of data you log. If you do a reindexing your logs
can be huge. But over time their growth will stabilize, ie you should reach
a steady state where the log file size is generally fixed, but the log
backups can vary. You want to avoid log growth, but with both simple and
full you should reach a steady state as long as your workloads remain the
same.

The sync with backup option will not help you at all if you only have it
enabled for the distribution database. you will need to use it in
conjunction with the publication database as well.

HTH

Hilary

"Aramid" <Aramid(a)discussions.microsoft.com> wrote in message
news:F61C97B0-4544-4CFB-B4A3-987BCEE482F8(a)microsoft.com...
> It's been a heated discussion in the office now whether our Distribution
> database should be on FULL or SIMPLE (default) recovery model, and I hope
> anyone of you could shed some light.
>
> Our infrastructure is composed of four (4) P2P sites, each one located in
> a
> different geographical area. We are using SQL 2005 x64 EE, on 32GB 8-way
> machines. Each site has a 2-node cluster for the production database, and
> another 2-node cluster solely dedicated for the distribution db. The
> shared
> storage for both clusters in all sites are SANs. There's a relatively
> large
> amount of traffic between the sites, with about 100mb of data updates per
> site every hour. Our application must run 24/7.
>
> Synch with backup is enabled on the distributor, but not on the
> publication.
> Right now, we decided to put the distribution database in FULL recovery
> mode, thinking that frequent t-log backups (hourly) will help truncate the
> publication logs quickly, which are also backed up hourly.
>
> Some engineers in the organization argues that the distribution database
> should be set back to SIMPLE (the default) and argues that this is better
> in
> terms of overall performance and ease of management. Others claim that
> having the distribution database in full recovery helps in keeping it
> trim,
> with a fixed-size transaction log.
>
> I've searched the web for days and could not see any authoritative
> discussion on this, other than Hilary Cotters opinion that it should be
> set
> in full to keep the t-log size static. What should be the recovery model
> for
> the distribution database relative to our scenario?
>
> Thanks for your help!
>
> Aramid

|
Pages: 1
Prev: Web replication
Next: Snapshot agent won't start