Prev: [HACKERS] Differential backup
Next: pgsql: Make CheckRequiredParameterValues()depend upon correct
From: Alvaro Herrera on 27 Apr 2010 09:50 Simon Riggs wrote: > > Thinking about allowing a backup to tell which files have changed in the > database since last backup. This would allow an external utility to copy > away only changed files. > > Now there's a few ways of doing this and many will say this is already > possible using file access times. > > An explicit mechanism where Postgres could authoritatively say which > files have changed would make many feel safer, especially when other > databases also do this. > > We keep track of which files require fsync(), so we could also keep > track of changed files using that same information. Why file level? Seems a bit too coarse (particularly if you have large file support enabled). Maybe we could keep block-level last change info in a separate fork. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 27 Apr 2010 09:55 On Tue, 2010-04-27 at 09:50 -0400, Alvaro Herrera wrote: > Simon Riggs wrote: > > > > Thinking about allowing a backup to tell which files have changed in the > > database since last backup. This would allow an external utility to copy > > away only changed files. > > > > Now there's a few ways of doing this and many will say this is already > > possible using file access times. > > > > An explicit mechanism where Postgres could authoritatively say which > > files have changed would make many feel safer, especially when other > > databases also do this. > > > > We keep track of which files require fsync(), so we could also keep > > track of changed files using that same information. > > Why file level? Seems a bit too coarse (particularly if you have large > file support enabled). Maybe we could keep block-level last change info > in a separate fork. Block-level is mostly available by using LSN, you just need to scan the file. So block level seems not useful enough for the extra overhead. File-level would be sufficient for most purposes. If you wanted to go finer grained you can then scan just the files that have changed. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 27 Apr 2010 09:59 Simon Riggs <simon(a)2ndQuadrant.com> wrote: > Thinking about allowing a backup to tell which files have changed > in the database since last backup. This would allow an external > utility to copy away only changed files. > > Now there's a few ways of doing this and many will say this is > already possible using file access times. Who would say otherwise? Under what circumstances would PostgreSQL modify a file without changing the "last modified" timestamp or the file size? If you're concerned about the converse, with daemon- based rsync you can copy just the modified portions of a file on which the directory information has changed. Or is this targeting platforms which don't have rsync? > An explicit mechanism where Postgres could authoritatively say > which files have changed would make many feel safer, especially > when other databases also do this. Why? I must be missing something, because my feeling is that if you can't trust your OS to cover something like this, how can you trust any application *running* under that OS to do it? > Is this route worthwhile? I'm not seeing it, but I could be missing something. Can you describe a use case where this would be beneficial? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 27 Apr 2010 10:08 On Tue, 2010-04-27 at 08:59 -0500, Kevin Grittner wrote: > > An explicit mechanism where Postgres could authoritatively say > > which files have changed would make many feel safer, especially > > when other databases also do this. > > Why? I must be missing something, because my feeling is that if you > can't trust your OS to cover something like this, how can you trust > any application *running* under that OS to do it? Good questions. I'm exploring a perceived need. I don't think people want this because they think the OS is flaky. It's more about trusting all of the configurations of all of the filesystems in use. An explicit mechanism would be more verifiably accurate. It might just be about control and blame. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Florian Pflug on 27 Apr 2010 10:14 On Apr 27, 2010, at 15:50 , Alvaro Herrera wrote: > Simon Riggs wrote: >> Thinking about allowing a backup to tell which files have changed in the >> database since last backup. This would allow an external utility to copy >> away only changed files. >> >> Now there's a few ways of doing this and many will say this is already >> possible using file access times. >> >> An explicit mechanism where Postgres could authoritatively say which >> files have changed would make many feel safer, especially when other >> databases also do this. >> >> We keep track of which files require fsync(), so we could also keep >> track of changed files using that same information. > > Why file level? Seems a bit too coarse (particularly if you have large > file support enabled). Maybe we could keep block-level last change info > in a separate fork. Hm, but most backup solutions work per-file and not per-block, so file-level tracking probably has more use-cases that block-level tracking.. In any case, it seems that this information could easily be extracted from the WAL. The archive_command could call a simple tool that parses the WAL and tracks the latest LSN per database file or page or whatever granularity is required. This, together with the backup label of the last backup should be enough to compute the list of changed files I think. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: [HACKERS] Differential backup Next: pgsql: Make CheckRequiredParameterValues()depend upon correct |