From: Alvaro Herrera on
Simon Riggs wrote:
>
> Thinking about allowing a backup to tell which files have changed in the
> database since last backup. This would allow an external utility to copy
> away only changed files.
>
> Now there's a few ways of doing this and many will say this is already
> possible using file access times.
>
> An explicit mechanism where Postgres could authoritatively say which
> files have changed would make many feel safer, especially when other
> databases also do this.
>
> We keep track of which files require fsync(), so we could also keep
> track of changed files using that same information.

Why file level? Seems a bit too coarse (particularly if you have large
file support enabled). Maybe we could keep block-level last change info
in a separate fork.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Tue, 2010-04-27 at 09:50 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
> >
> > Thinking about allowing a backup to tell which files have changed in the
> > database since last backup. This would allow an external utility to copy
> > away only changed files.
> >
> > Now there's a few ways of doing this and many will say this is already
> > possible using file access times.
> >
> > An explicit mechanism where Postgres could authoritatively say which
> > files have changed would make many feel safer, especially when other
> > databases also do this.
> >
> > We keep track of which files require fsync(), so we could also keep
> > track of changed files using that same information.
>
> Why file level? Seems a bit too coarse (particularly if you have large
> file support enabled). Maybe we could keep block-level last change info
> in a separate fork.

Block-level is mostly available by using LSN, you just need to scan the
file. So block level seems not useful enough for the extra overhead.

File-level would be sufficient for most purposes. If you wanted to go
finer grained you can then scan just the files that have changed.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Simon Riggs <simon(a)2ndQuadrant.com> wrote:

> Thinking about allowing a backup to tell which files have changed
> in the database since last backup. This would allow an external
> utility to copy away only changed files.
>
> Now there's a few ways of doing this and many will say this is
> already possible using file access times.

Who would say otherwise? Under what circumstances would PostgreSQL
modify a file without changing the "last modified" timestamp or the
file size? If you're concerned about the converse, with daemon-
based rsync you can copy just the modified portions of a file on
which the directory information has changed. Or is this targeting
platforms which don't have rsync?

> An explicit mechanism where Postgres could authoritatively say
> which files have changed would make many feel safer, especially
> when other databases also do this.

Why? I must be missing something, because my feeling is that if you
can't trust your OS to cover something like this, how can you trust
any application *running* under that OS to do it?

> Is this route worthwhile?

I'm not seeing it, but I could be missing something. Can you
describe a use case where this would be beneficial?

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Tue, 2010-04-27 at 08:59 -0500, Kevin Grittner wrote:
> > An explicit mechanism where Postgres could authoritatively say
> > which files have changed would make many feel safer, especially
> > when other databases also do this.
>
> Why? I must be missing something, because my feeling is that if you
> can't trust your OS to cover something like this, how can you trust
> any application *running* under that OS to do it?

Good questions. I'm exploring a perceived need.

I don't think people want this because they think the OS is flaky. It's
more about trusting all of the configurations of all of the filesystems
in use. An explicit mechanism would be more verifiably accurate. It
might just be about control and blame.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Florian Pflug on
On Apr 27, 2010, at 15:50 , Alvaro Herrera wrote:
> Simon Riggs wrote:
>> Thinking about allowing a backup to tell which files have changed in the
>> database since last backup. This would allow an external utility to copy
>> away only changed files.
>>
>> Now there's a few ways of doing this and many will say this is already
>> possible using file access times.
>>
>> An explicit mechanism where Postgres could authoritatively say which
>> files have changed would make many feel safer, especially when other
>> databases also do this.
>>
>> We keep track of which files require fsync(), so we could also keep
>> track of changed files using that same information.
>
> Why file level? Seems a bit too coarse (particularly if you have large
> file support enabled). Maybe we could keep block-level last change info
> in a separate fork.

Hm, but most backup solutions work per-file and not per-block, so file-level tracking probably has more use-cases that block-level tracking..

In any case, it seems that this information could easily be extracted from the WAL. The archive_command could call a simple tool that parses the WAL and tracks the latest LSN per database file or page or whatever granularity is required. This, together with the backup label of the last backup should be enough to compute the list of changed files I think.

best regards,
Florian Pflug


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers