Differential backup [PgSql]

Prev: [HACKERS] Differential backup
Next: pgsql: Make CheckRequiredParameterValues()depend upon correct

From: Merlin Moncure on 27 Apr 2010 12:14

On Tue, Apr 27, 2010 at 11:13 AM, Kevin Grittner
<Kevin.Grittner(a)wicourts.gov> wrote:
> Merlin Moncure <mmoncure(a)gmail.com> wrote:
>
>> The proposal only seems a win to me if a fair percentage of the
>> larger files don't change, which strikes me as a relatively low
>> level case to optimize for.
>
> That's certainly a situation we face, with a relatively slow WAN in
> the middle.
>
> http://archives.postgresql.org/pgsql-admin/2009-07/msg00071.php
>
> I don't know how rare or common that is.

hm...interesting read. pretty clever. Your archiving requirements are high.

With the new stuff (HS/SR) taken into consideration, would you have
done your DR the same way if you had to do it all over again?

Part of my concern here is that manual filesystem level backups are
going to become an increasingly arcane method of doing things as the
HS/SR train starts leaving the station.

hm, it would be pretty neat to see some of the things you do pushed
into logical (pg_dump) style backups...with some enhancements so that
it can skip tables haven't changed and are exhibited in a previously
supplied dump. This is more complicated but maybe more useful for a
broader audience?

Side question: is it impractical to backup via pg_dump a hot standby
because of query conflict issues?

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on 27 Apr 2010 12:24

Merlin Moncure <mmoncure(a)gmail.com> wrote:

> Your archiving requirements are high.

They are set by a Steering Committee composed of the Directory of
State Courts and various District Court Administrators, Judges,
Clerks of Court, and Registers in Probate who rely on this data and
*really* want to be safe. I just work here. ;-)

> With the new stuff (HS/SR) taken into consideration, would you
> have done your DR the same way if you had to do it all over again?

When SR is available, if I can maintain the flow of WAL files while
doing so, I would feed our "warm standby" farm with SR connections.
Otherwise I'd do the same. It's pretty much mandated that we keep
those copies. It'd be ideal if SR could reconstruct the WAL file
segments on the receiving end, to avoid sending the data twice.
Dare I dream? :-)

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 27 Apr 2010 14:32

On Tue, Apr 27, 2010 at 10:08 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
> On Tue, 2010-04-27 at 08:59 -0500, Kevin Grittner wrote:
>> > An explicit mechanism where Postgres could authoritatively say
>> > which files have changed would make many feel safer, especially
>> > when other databases also do this.
>>
>> Why? I must be missing something, because my feeling is that if you
>> can't trust your OS to cover something like this, how can you trust
>> any application *running* under that OS to do it?
>
> Good questions. I'm exploring a perceived need.
>
> I don't think people want this because they think the OS is flaky. It's
> more about trusting all of the configurations of all of the filesystems
> in use. An explicit mechanism would be more verifiably accurate. It
> might just be about control and blame.

What I think would be cool, though it's not what you proposed, is an
integrated base backup feature. Say your SR slave gets too far behind
and can't catch up for some reason (the system administrator
accidentally nuked the archive, or you were living on the edge and not
keeping one). It would be neat to have a way, either manually or
maybe even automatically, to tell the slave, hey, go make a new base
backup. And it would connect to the master and do pg_start_backup()
and stream down the whole database contents and do pg_stop_backup().
Of course you can do all of this with scripts, but ISTM an integrated
capability would be much easier to administer and might offer some
interesting opportunities for compression.

With respect to what you actually proposed, like Kevin, I'm not sure
what it's good for. It might make sense if we know what the use case
is but the value certainly isn't obvious.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Hannu Krosing on 28 Apr 2010 11:38

On Tue, 2010-04-27 at 14:30 +0100, Simon Riggs wrote:
> Thinking about allowing a backup to tell which files have changed in the
> database since last backup. This would allow an external utility to copy
> away only changed files.
>
> Now there's a few ways of doing this and many will say this is already
> possible using file access times.
>
> An explicit mechanism where Postgres could authoritatively say which
> files have changed would make many feel safer, especially when other
> databases also do this.
>
> We keep track of which files require fsync(), so we could also keep
> track of changed files using that same information.

Would it make sense to split this in two , one for DML/"logical
changes" (insert, update, delete, truncate) and another for physical,
"non-functional", file-level changes (vacuum, setting hint bits, ...)

BTW, is the stats-collection reliable enough for this or is it still
possible to lose some changes if we did this together with updating info
for pg_stat_user_tables/pg_statio_user_tables ?

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Hannu Krosing on 28 Apr 2010 11:46

On Tue, 2010-04-27 at 08:59 -0500, Kevin Grittner wrote:
> Simon Riggs <simon(a)2ndQuadrant.com> wrote:
>
> > Thinking about allowing a backup to tell which files have changed
> > in the database since last backup. This would allow an external
> > utility to copy away only changed files.
> >
> > Now there's a few ways of doing this and many will say this is
> > already possible using file access times.
>
> Who would say otherwise? Under what circumstances would PostgreSQL
> modify a file without changing the "last modified" timestamp or the
> file size? If you're concerned about the converse, with daemon-
> based rsync you can copy just the modified portions of a file on
> which the directory information has changed. Or is this targeting
> platforms which don't have rsync?

I see the main value when doing pg_dump based backups and being able to
know if the table was modified by DML (insert/update/delete/truncate) or
by something "invisible" like vacuum or setting hint bits.

Currently the only way to keep this info is by having triggers on all
tables on all DML

> > An explicit mechanism where Postgres could authoritatively say
> > which files have changed would make many feel safer, especially
> > when other databases also do this.
>
> Why? I must be missing something, because my feeling is that if you
> can't trust your OS to cover something like this, how can you trust
> any application *running* under that OS to do it?
>
> > Is this route worthwhile?
>
> I'm not seeing it, but I could be missing something. Can you
> describe a use case where this would be beneficial?
>
> -Kevin
>

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: [HACKERS] Differential backup
Next: pgsql: Make CheckRequiredParameterValues()depend upon correct