Prev: [HACKERS] including PID or backend ID in relpath of temp rels
Next: [HACKERS] improve plpgsql's EXECUTE 'select into' message with a hint
From: Jaime Casanova on 25 Apr 2010 22:19 On Sun, Apr 25, 2010 at 8:07 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > > 1. We could move the responsibility for removing the files associated > with temp rels from the background writer to the owning backend.  I > think the reason why we initially truncate the files and only later > remove them is because somebody else might have 'em open, so it > mightn't be necessary for temp rels. > what happens if the backend crash and obviously doesn't remove the file associated with temp rels? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL AsesorÃa y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 26 Apr 2010 06:17 On Sun, Apr 25, 2010 at 10:19 PM, Jaime Casanova <jcasanov(a)systemguards.com.ec> wrote: > On Sun, Apr 25, 2010 at 8:07 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: >> >> 1. We could move the responsibility for removing the files associated >> with temp rels from the background writer to the owning backend. I >> think the reason why we initially truncate the files and only later >> remove them is because somebody else might have 'em open, so it >> mightn't be necessary for temp rels. >> > > what happens if the backend crash and obviously doesn't remove the > file associated with temp rels? Currently, they just get orphaned. As I understand it, if the catalog entry survives the crash, autovacuum will remove them 2 BILLION transactions later (and emit warning messages in the meantime); otherwise we won't even know they're there. As I further understand it, the main point of this change is that if temporary tables have a distinctive name of some kind, then when we can run through the directory and blow away files with those names without fearing that it's *permanent* table data that somehow got orphaned. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 27 Apr 2010 21:59 On Sun, Apr 25, 2010 at 9:07 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > 4. We could add an additional 32-bit value to RelFileNode to identify > the backend (or a sentinel value when not temp) and create a separate > structure XLogRelFileNode or PermRelFileNode or somesuch for use in > contexts where no temp rels are allowed. I experimented with this approach and created LocalRelFileNode and GlobalRelFileNode and, for use in the buffer headers, BufferRelFileNode (same as GlobalRelFileNode, but named differently for clarity). LocallRelFileNode = GlobalRelFileNode + the ID of the owning backend for temp rels; or InvalidBackendId if referencing a non-temporary rel. These might not be the greatest names, but I think the concept is good, because it really breaks the things that need to be adjusted quite thoroughly. In the course of repairing the damage I came across a couple of things I wasn't sure about: [relcache.c] RelationInitPhysicalAddr can't initialize relation->rd_node.backend properly for a non-local temporary relation, because that information isn't available. But I'm not clear on why we would need to create a relcache entry for a non-local temporary relation. If we do need to, then we'll probably need to store the backend ID in pg_class. That seems like something that would be best avoided, all things being equal, especially since I can't see how to generalize it to global temporary tables. [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary relations? I think the only backend that can have an smgr reference to a temprel other than the owning backend is bgwriter, and AFAICS bgwriter will only have such a reference if it's responding to a request by the owning backend to unlink the associated files, in which case (I think) the owning backend will have no reference. [dbsize.c] As with relcache.c, there's a problem if we're asked for the size of a temporary relation that is not our own: we can't call relpath() without knowing the ID of the owning backend, and there's no way to acquire that information for pg_class. I guess we could just refuse to answer the question in that case, but that doesn't seem real cool. Or we could physically scan the directory for files that match a suitably constructed wildcard, I suppose. [storage.c,xact.c,twophase.c] smgrGetPendingDeletes returns via an out parameter (its second argument) a list of RelFileNodes pending delete, which we then write to WAL or to the two-phase state file. Of course, if the backend ID (or pid, but I picked backend ID somewhat arbitrarily) is part of the filename, then we need to write that to WAL, too. It seems somewhat unfortunate to have to WAL-log temprels here; as best I can tell, this is the only case where it's necessary. But if we implement a more general mechanism for cleaning up temp files, then might the need to do this go away? Not sure. [syncscan.c] It seems we pursue this optimization even for temprels; I can't think of why that would be useful in practice. If it's useless overhead, should we skip it? This is really independent of this project; just a side thought. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Alvaro Herrera on 4 May 2010 14:06 Robert Haas escribi�: > [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary > relations? I think the only backend that can have an smgr reference > to a temprel other than the owning backend is bgwriter, and AFAICS > bgwriter will only have such a reference if it's responding to a > request by the owning backend to unlink the associated files, in which > case (I think) the owning backend will have no reference. Hmm, wasn't there a proposal to have the owning backend delete the files instead of asking the bgwriter to? > [dbsize.c] As with relcache.c, there's a problem if we're asked for > the size of a temporary relation that is not our own: we can't call > relpath() without knowing the ID of the owning backend, and there's no > way to acquire that information for pg_class. I guess we could just > refuse to answer the question in that case, but that doesn't seem real > cool. Or we could physically scan the directory for files that match > a suitably constructed wildcard, I suppose. I don't very much like the wildcard idea; but I don't think it's unreasonable to refuse to provide a file size. If the owning backend has still got part of the table in local buffers, you'll get a misleading answer, so perhaps it's best to not give an answer at all. Maybe this problem could be solved if we could somehow force that backend to write down its local buffers, in which case it'd be nice to have a solution to the dbsize problem. > [syncscan.c] It seems we pursue this optimization even for temprels; I > can't think of why that would be useful in practice. If it's useless > overhead, should we skip it? This is really independent of this > project; just a side thought. Maybe recently used buffers are more likely to be in the OS page cache, so perhaps it's not good to disable it. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 4 May 2010 14:24
On Tue, May 4, 2010 at 2:06 PM, Alvaro Herrera <alvherre(a)commandprompt.com> wrote: > Robert Haas escribió: Hey, thanks for writing back! I just spent the last few hours thinking about this and beating my head against the wall. >> [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary >> relations? I think the only backend that can have an smgr reference >> to a temprel other than the owning backend is bgwriter, and AFAICS >> bgwriter will only have such a reference if it's responding to a >> request by the owning backend to unlink the associated files, in which >> case (I think) the owning backend will have no reference. > > Hmm, wasn't there a proposal to have the owning backend delete the files > instead of asking the bgwriter to? I did propose that upthread; it may have been proposed previously also. This might be worth doing independently of the rest of the patch (which I'm starting to fear is doomed, cue ominous soundtrack) since it would reduce the chance of orphaning data files and possibly simplify the logic also. >> [dbsize.c] As with relcache.c, there's a problem if we're asked for >> the size of a temporary relation that is not our own: we can't call >> relpath() without knowing the ID of the owning backend, and there's no >> way to acquire that information for pg_class. I guess we could just >> refuse to answer the question in that case, but that doesn't seem real >> cool. Or we could physically scan the directory for files that match >> a suitably constructed wildcard, I suppose. > > I don't very much like the wildcard idea; but I don't think it's > unreasonable to refuse to provide a file size. If the owning backend > has still got part of the table in local buffers, you'll get a > misleading answer, so perhaps it's best to not give an answer at all. > > Maybe this problem could be solved if we could somehow force that > backend to write down its local buffers, in which case it'd be nice to > have a solution to the dbsize problem. I'm sure we could add some kind of signaling mechanism that would tell all backends to flush their local buffers, but I'm not too sure it would help this case very much, because you likely wouldn't want to wait for all the backends to complete that process before reporting results. >> [syncscan.c] It seems we pursue this optimization even for temprels; I >> can't think of why that would be useful in practice. If it's useless >> overhead, should we skip it? This is really independent of this >> project; just a side thought. > > Maybe recently used buffers are more likely to be in the OS page cache, > so perhaps it's not good to disable it. I don't get it. If the whole relation fits in the page cache, it doesn't much matter where you start a seqscan. If it doesn't, starting where the last one ended is anti-optimal. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |