From: Joachim Wieland on 10 Feb 2010 05:36 Hi Markus, On Fri, Feb 5, 2010 at 6:29 PM, Markus Wanner <markus(a)bluegap.ch> wrote: > > So, let's first concentrate on the intended use case: allowing parallel > pg_dump. To me it seems like a pragmatic and quick solution, however, I'm > not sure if requiring superuser privileges is acceptable. http://www.postgresql.org/docs/8.4/static/backup-dump.html already states about pg_dump: "In particular, it must have read access to all tables that you want to back up, so in practice you almost always have to run it as a database superuser." so I think there is not a big loss here... > Reading the code, I'm missing the part that actually acquires the snapshot > for the transaction(s). After setting up multiple transactions with > pg_synchronize_snapshot and pg_synchronize_snapshot_taken, they still don't > have a snapshot, do they? They more or less get it "by chance" :-) They acquire a snapshot when they call pg_synchronize_snapshot_taken() and if all the backends do it while the other backend holds the lock in shared mode, we know that the snapshot won't change, so they all get the same snapshot. > Also, you should probably ensure the calling transactions don't have a > snapshot already (let alone a transaction id). True... > In a similar vein, and answering your question in a comment: yes, I'd say > you want to ensure your transactions are in SERIALIZABLE isolation mode. > There's no other isolation level for which that kind of snapshot > serialization makes sense, is there? That's probably true but I didn't want to enforce this in the first place. As said, all backends just "happen" to get the same snapshot but they are still independent of each other so they are free to do whatever they want to in their transactions. > Using the exposed functions in a more general sense, I think it's important > to note that the patch only intents to synchronize snapshots at the start of > the transaction, not contiguously. Thus, normal transaction isolation > applies for concurrent writes and each of the transactions can commit or > rollback independently. > > The timeout is nice, but is it really required? Isn't the normal query > cancellation infrastructure sufficient? It seemed more robust and convenient to have an expiration in the backend itself. What would happen if you called pg_synchronize_snapshots() and if right after that your network connection dropped? Without the server noticing, it would continue to hold the lock and you could not log in anymore... But you are right: The proposed feature is a pragmatic and quick solution for pg_dump and similar but we might want to have a more general snapshot cloning procedure instead. Not having a delay for other activities at all and not requiring superuser privileges would be a big advantage over what I have proposed. Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Markus Wanner on 10 Feb 2010 13:05 Hi Joachim, On Wed, 10 Feb 2010 11:36:41 +0100, Joachim Wieland <joe(a)mcknight.de> wrote: > http://www.postgresql.org/docs/8.4/static/backup-dump.html already > states about pg_dump: "In particular, it must have read access to all > tables that you want to back up, so in practice you almost always have > to run it as a database superuser." so I think there is not a big loss > here... Hm.. I doubt somewhat that's common practice. After all, read access to all tables is still a *lot* less than superuser privileges. But yeah, the documentation currently states that. > They more or less get it "by chance" :-) They acquire a snapshot when > they call pg_synchronize_snapshot_taken() Oh, I see, calling the function by itself already acquires a snapshot. Even in case of a fast path call, it seems. Then your approach is correct. (I'd still feel more comfortable, it I had seen a GetTransactionSnapshot() or something akin in there). > and if all the backends do > it while the other backend holds the lock in shared mode, we know that > the snapshot won't change, so they all get the same snapshot. Agreed, that works. (Ab)using the ProcArrayLock for synchronization is probably acceptable for pg_dump, however, I'd rather take another approach for a more general implementation. >> Also, you should probably ensure the calling transactions don't have a >> snapshot already (let alone a transaction id). > > True... Hm.. realizing that a function call per-se acquires a snapshot, I fail to see how we could check if we really acquired a snapshot. Consider the following (admittedly stupid) example: BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT version(); ... time goes by ... SELECT pg_synchronize_snapshot_taken(..); As it stands, your function would silently fail to "synchronize" the snapshots, if other transactions committed in between the two function calls. > It seemed more robust and convenient to have an expiration in the > backend itself. What would happen if you called > pg_synchronize_snapshots() and if right after that your network > connection dropped? Without the server noticing, it would continue to > hold the lock and you could not log in anymore... Hm.. that's a point. Given this approach uses the ProcArrayLock, it's probably better to use an explicit timeout. > But you are right: The proposed feature is a pragmatic and quick > solution for pg_dump and similar but we might want to have a more > general snapshot cloning procedure instead. Not having a delay for > other activities at all and not requiring superuser privileges would > be a big advantage over what I have proposed. Agreed. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 10 Feb 2010 13:15 Markus Wanner wrote: > On Wed, 10 Feb 2010 11:36:41 +0100, Joachim Wieland <joe(a)mcknight.de> > wrote: >> http://www.postgresql.org/docs/8.4/static/backup-dump.html already >> states about pg_dump: "In particular, it must have read access to all >> tables that you want to back up, so in practice you almost always have >> to run it as a database superuser." so I think there is not a big loss >> here... > > Hm.. I doubt somewhat that's common practice. After all, read access to > all tables is still a *lot* less than superuser privileges. But yeah, > the documentation currently states that. I think running as database owner gets you pretty far as far as pg_dump goes. It would be good to lift the limitation that you have to be superuser. >> But you are right: The proposed feature is a pragmatic and quick >> solution for pg_dump and similar but we might want to have a more >> general snapshot cloning procedure instead. Not having a delay for >> other activities at all and not requiring superuser privileges would >> be a big advantage over what I have proposed. > > Agreed. Yeah, a big advantage of the proposed approach is that it's pretty simple to implement as an external module, allowing you to write scripts using it for older versions too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
First
|
Prev
|
Pages: 1 2 Prev: Streaming replication status Next: [COMMITTERS] pgsql: Tidy up and refactor plperl.c. |