Differential backup [PgSql]

Prev: [HACKERS] Differential backup
Next: pgsql: Make CheckRequiredParameterValues()depend upon correct

From: Florian Pflug on 27 Apr 2010 10:22

On Apr 27, 2010, at 16:08 , Simon Riggs wrote:
> On Tue, 2010-04-27 at 08:59 -0500, Kevin Grittner wrote:
>> Why? I must be missing something, because my feeling is that if you
>> can't trust your OS to cover something like this, how can you trust
>> any application *running* under that OS to do it?
>
> Good questions. I'm exploring a perceived need.
>
> I don't think people want this because they think the OS is flaky. It's
> more about trusting all of the configurations of all of the filesystems
> in use. An explicit mechanism would be more verifiably accurate. It
> might just be about control and blame.

I believe a reason for people (including me) to not have 100% faith in file modification times are non-monotone system clocks. I've seen more than one system where a cron job running ntpdate every night was used as a poor man's replacement for ntpd...

So the real advantage of rolling our own solution is the ability to use LSNs instead of timestamps I'd say.

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Michael Tharp on 27 Apr 2010 10:32

On 04/27/2010 09:59 AM, Kevin Grittner wrote:
> Under what circumstances would PostgreSQL
> modify a file without changing the "last modified" timestamp or the
> file size?

Do all OSes have sub-second precision mtimes? Because otherwise I could
see a scenario such at this:

* File is modified
* Backup inspects and copies the file in the same second
* File is modified again in the same second, so the mtime doesn't change
* Backup is run again some time later and sees that the mtime has not
changed

Even with microsecond precision this kind of scenario makes me squidgy,
especially if some OSes decide that skipping frequent mtime updates is
OK. Florian's point about clock changes is also very relevant. Since
Postgres has the capability to give a better answer about what is in the
file, it would be best to use that.

-- m. tharp

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Merlin Moncure on 27 Apr 2010 11:07

On Tue, Apr 27, 2010 at 10:32 AM, Michael Tharp
<gxti(a)partiallystapled.com> wrote:
> On 04/27/2010 09:59 AM, Kevin Grittner wrote:
>>
>> Under what circumstances would PostgreSQL
>> modify a file without changing the "last modified" timestamp or the
>> file size?
>
> Do all OSes have sub-second precision mtimes? Because otherwise I could see
> a scenario such at this:
>
> * File is modified
> * Backup inspects and copies the file in the same second
> * File is modified again in the same second, so the mtime doesn't change
> * Backup is run again some time later and sees that the mtime has not
> changed
>
> Even with microsecond precision this kind of scenario makes me squidgy,
> especially if some OSes decide that skipping frequent mtime updates is OK.
> Florian's point about clock changes is also very relevant. Since Postgres
> has the capability to give a better answer about what is in the file, it
> would be best to use that.

Why not just force all files to be checked irregardless of mtime? The
proposal only seems a win to me if a fair percentage of the larger
files don't change, which strikes me as a relatively low level case to
optimize for. Maybe I'm missing the objective, but it looks like the
payoff is to avoid scanning large files for checksums. If I was even
infinitesimally insecure about rsync missing files because of
clock/filesystem issues, I'd simply force it.

One cool thing about making postgres 'aware' of last backup time is
that you could warn the user in various places that the database is
not being properly backed up (pg_dump would have to monitor
last_backup_time as well then). Good stuff, but I bet most people who
aren't backing up the database also aren't checking the log :-).

The block level case seems pretty much covered by the hot standby feature.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on 27 Apr 2010 11:13

Merlin Moncure <mmoncure(a)gmail.com> wrote:

> The proposal only seems a win to me if a fair percentage of the
> larger files don't change, which strikes me as a relatively low
> level case to optimize for.

That's certainly a situation we face, with a relatively slow WAN in
the middle.

http://archives.postgresql.org/pgsql-admin/2009-07/msg00071.php

I don't know how rare or common that is.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Csaba Nagy on 27 Apr 2010 11:28

Hi all,

On Tue, 2010-04-27 at 11:07 -0400, Merlin Moncure wrote:
> The block level case seems pretty much covered by the hot standby feature.

One use case we would have is to dump only the changes from the last
backup of a single table. This table takes 30% of the DB disk space, it
is in the order of ~400GB, and it's only inserted, never updated, then
after ~1 year the old entries are archived. There's ~10M new entries
daily in this table. If the backup would be smart enough to only read
the changed blocks (in this case only for newly inserted records), it
would be a fairly big win...

Cheers,
Csaba.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: [HACKERS] Differential backup
Next: pgsql: Make CheckRequiredParameterValues()depend upon correct