Prev: [HACKERS] Differential backup
Next: pgsql: Make CheckRequiredParameterValues()depend upon correct
From: Hannu Krosing on 28 Apr 2010 11:48 On Tue, 2010-04-27 at 10:32 -0400, Michael Tharp wrote: > On 04/27/2010 09:59 AM, Kevin Grittner wrote: > > Under what circumstances would PostgreSQL > > modify a file without changing the "last modified" timestamp or the > > file size? > > Do all OSes have sub-second precision mtimes? Because otherwise I could > see a scenario such at this: > > * File is modified > * Backup inspects and copies the file in the same second > * File is modified again in the same second, so the mtime doesn't change > * Backup is run again some time later and sees that the mtime has not > changed > > Even with microsecond precision this kind of scenario makes me squidgy, > especially if some OSes decide that skipping frequent mtime updates is > OK. To be on the safe side you need to record the latest table data change time _after_ the backup anyway, it is easy to wait a few secs to be sure. > Florian's point about clock changes is also very relevant. Since > Postgres has the capability to give a better answer about what is in the > file, it would be best to use that. > > -- m. tharp > -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 28 Apr 2010 11:50 Hannu Krosing <hannu(a)2ndquadrant.com> wrote: > I see the main value when doing pg_dump based backups Ah, now that makes more sense. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Hannu Krosing on 28 Apr 2010 11:52 On Tue, 2010-04-27 at 12:14 -0400, Merlin Moncure wrote: > On Tue, Apr 27, 2010 at 11:13 AM, Kevin Grittner > <Kevin.Grittner(a)wicourts.gov> wrote: > > Merlin Moncure <mmoncure(a)gmail.com> wrote: > > > >> The proposal only seems a win to me if a fair percentage of the > >> larger files don't change, which strikes me as a relatively low > >> level case to optimize for. > > > > That's certainly a situation we face, with a relatively slow WAN in > > the middle. > > > > http://archives.postgresql.org/pgsql-admin/2009-07/msg00071.php > > > > I don't know how rare or common that is. > > hm...interesting read. pretty clever. Your archiving requirements are high. > > With the new stuff (HS/SR) taken into consideration, would you have > done your DR the same way if you had to do it all over again? > > Part of my concern here is that manual filesystem level backups are > going to become an increasingly arcane method of doing things as the > HS/SR train starts leaving the station. Actually the HS/SR speaks _for_ adding explicit change dates to files, as the mod times on slave side will be different, and you may still want to know when the table really was last modified > > hm, it would be pretty neat to see some of the things you do pushed > into logical (pg_dump) style backups...with some enhancements so that > it can skip tables haven't changed and are exhibited in a previously > supplied dump. This is more complicated but maybe more useful for a > broader audience? Yes, I see the main value in of this for pg_dump backups, as physical files already have this in terms of file ctime/mtime/atime > > Side question: is it impractical to backup via pg_dump a hot standby > because of query conflict issues? > > merlin > -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Hannu Krosing on 28 Apr 2010 11:58 On Tue, 2010-04-27 at 17:28 +0200, Csaba Nagy wrote: > Hi all, > > On Tue, 2010-04-27 at 11:07 -0400, Merlin Moncure wrote: > > The block level case seems pretty much covered by the hot standby feature. > > One use case we would have is to dump only the changes from the last > backup of a single table. This table takes 30% of the DB disk space, it > is in the order of ~400GB, and it's only inserted, never updated, then > after ~1 year the old entries are archived. There's ~10M new entries > daily in this table. If the backup would be smart enough to only read > the changed blocks (in this case only for newly inserted records), it > would be a fairly big win... The standard trick for this kind of table is having this table partitioned by insertion date - this way you have two benefits: 1) you already know which table to backup (the latest, and maye one before that if you just switche to new one) 2) archiving will be fast (copy full latest table away and the truncate it) instead of slow (copy "old enough" records out, then do delete of the same records, both ow which are quite slow, and you also need to do vacuum after that, which is also slow on large tables) This would actually be a good sample case for tracking "latest dml", except that in this particular corner case you can arrange for this yourself. -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 28 Apr 2010 12:28 Hannu Krosing <hannu(a)2ndquadrant.com> wrote: > On Tue, 2010-04-27 at 17:28 +0200, Csaba Nagy wrote: >> One use case we would have is to dump only the changes from the >> last backup of a single table. This table takes 30% of the DB >> disk space, it is in the order of ~400GB, and it's only inserted, >> never updated, then after ~1 year the old entries are archived. >> There's ~10M new entries daily in this table. If the backup would >> be smart enough to only read the changed blocks (in this case >> only for newly inserted records), it would be a fairly big win... That is covered pretty effectively in PITR-style backups with the hard link and rsync approach cited earlier in the thread. Those 1GB table segment files which haven't changed aren't read or written, and only those portions of the other files which have actually changed are sent over the wire (although the entire disk file is written on the receiving end). > The standard trick for this kind of table is having this table > partitioned by insertion date That doesn't always work. In our situation the supreme court sets records retention rules which can be quite complex, but usually key on *final disposition* of a case rather than insertion date; that is, the earliest date on which the data related to a case is *allowed* to be deleted isn't known until weeks or years after insertion. Additionally, it is the elected clerk of court in each county who determines when and if data for that county will be purged once it has reached the minimum retention threshold set by supreme court rules. That's not to say that partitioning couldn't help with some backup strategies; just that it doesn't solve all "insert-only" (with eventual purge) use cases. One of the nicest things about PostgreSQL is the availability of several easy and viable backup strategies, so that you can tailor one to fit your environment. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: [HACKERS] Differential backup Next: pgsql: Make CheckRequiredParameterValues()depend upon correct |