From: Jaime Casanova on
On Sun, Apr 25, 2010 at 8:07 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>
> 1. We could move the responsibility for removing the files associated
> with temp rels from the background writer to the owning backend.  I
> think the reason why we initially truncate the files and only later
> remove them is because somebody else might have 'em open, so it
> mightn't be necessary for temp rels.
>

what happens if the backend crash and obviously doesn't remove the
file associated with temp rels?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Sun, Apr 25, 2010 at 10:19 PM, Jaime Casanova
<jcasanov(a)systemguards.com.ec> wrote:
> On Sun, Apr 25, 2010 at 8:07 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>>
>> 1. We could move the responsibility for removing the files associated
>> with temp rels from the background writer to the owning backend.  I
>> think the reason why we initially truncate the files and only later
>> remove them is because somebody else might have 'em open, so it
>> mightn't be necessary for temp rels.
>>
>
> what happens if the backend crash and obviously doesn't remove the
> file associated with temp rels?

Currently, they just get orphaned. As I understand it, if the catalog
entry survives the crash, autovacuum will remove them 2 BILLION
transactions later (and emit warning messages in the meantime);
otherwise we won't even know they're there.

As I further understand it, the main point of this change is that if
temporary tables have a distinctive name of some kind, then when we
can run through the directory and blow away files with those names
without fearing that it's *permanent* table data that somehow got
orphaned.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Sun, Apr 25, 2010 at 9:07 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> 4. We could add an additional 32-bit value to RelFileNode to identify
> the backend (or a sentinel value when not temp) and create a separate
> structure XLogRelFileNode or PermRelFileNode or somesuch for use in
> contexts where no temp rels are allowed.

I experimented with this approach and created LocalRelFileNode and
GlobalRelFileNode and, for use in the buffer headers,
BufferRelFileNode (same as GlobalRelFileNode, but named differently
for clarity). LocallRelFileNode = GlobalRelFileNode + the ID of the
owning backend for temp rels; or InvalidBackendId if referencing a
non-temporary rel. These might not be the greatest names, but I think
the concept is good, because it really breaks the things that need to
be adjusted quite thoroughly. In the course of repairing the damage I
came across a couple of things I wasn't sure about:

[relcache.c] RelationInitPhysicalAddr can't initialize
relation->rd_node.backend properly for a non-local temporary relation,
because that information isn't available. But I'm not clear on why we
would need to create a relcache entry for a non-local temporary
relation. If we do need to, then we'll probably need to store the
backend ID in pg_class. That seems like something that would be best
avoided, all things being equal, especially since I can't see how to
generalize it to global temporary tables.

[smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary
relations? I think the only backend that can have an smgr reference
to a temprel other than the owning backend is bgwriter, and AFAICS
bgwriter will only have such a reference if it's responding to a
request by the owning backend to unlink the associated files, in which
case (I think) the owning backend will have no reference.

[dbsize.c] As with relcache.c, there's a problem if we're asked for
the size of a temporary relation that is not our own: we can't call
relpath() without knowing the ID of the owning backend, and there's no
way to acquire that information for pg_class. I guess we could just
refuse to answer the question in that case, but that doesn't seem real
cool. Or we could physically scan the directory for files that match
a suitably constructed wildcard, I suppose.

[storage.c,xact.c,twophase.c] smgrGetPendingDeletes returns via an out
parameter (its second argument) a list of RelFileNodes pending delete,
which we then write to WAL or to the two-phase state file. Of course,
if the backend ID (or pid, but I picked backend ID somewhat
arbitrarily) is part of the filename, then we need to write that to
WAL, too. It seems somewhat unfortunate to have to WAL-log temprels
here; as best I can tell, this is the only case where it's necessary.
But if we implement a more general mechanism for cleaning up temp
files, then might the need to do this go away? Not sure.

[syncscan.c] It seems we pursue this optimization even for temprels; I
can't think of why that would be useful in practice. If it's useless
overhead, should we skip it? This is really independent of this
project; just a side thought.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Alvaro Herrera on
Robert Haas escribi�:

> [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary
> relations? I think the only backend that can have an smgr reference
> to a temprel other than the owning backend is bgwriter, and AFAICS
> bgwriter will only have such a reference if it's responding to a
> request by the owning backend to unlink the associated files, in which
> case (I think) the owning backend will have no reference.

Hmm, wasn't there a proposal to have the owning backend delete the files
instead of asking the bgwriter to?

> [dbsize.c] As with relcache.c, there's a problem if we're asked for
> the size of a temporary relation that is not our own: we can't call
> relpath() without knowing the ID of the owning backend, and there's no
> way to acquire that information for pg_class. I guess we could just
> refuse to answer the question in that case, but that doesn't seem real
> cool. Or we could physically scan the directory for files that match
> a suitably constructed wildcard, I suppose.

I don't very much like the wildcard idea; but I don't think it's
unreasonable to refuse to provide a file size. If the owning backend
has still got part of the table in local buffers, you'll get a
misleading answer, so perhaps it's best to not give an answer at all.

Maybe this problem could be solved if we could somehow force that
backend to write down its local buffers, in which case it'd be nice to
have a solution to the dbsize problem.

> [syncscan.c] It seems we pursue this optimization even for temprels; I
> can't think of why that would be useful in practice. If it's useless
> overhead, should we skip it? This is really independent of this
> project; just a side thought.

Maybe recently used buffers are more likely to be in the OS page cache,
so perhaps it's not good to disable it.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Tue, May 4, 2010 at 2:06 PM, Alvaro Herrera
<alvherre(a)commandprompt.com> wrote:
> Robert Haas escribió:

Hey, thanks for writing back! I just spent the last few hours
thinking about this and beating my head against the wall.

>> [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary
>> relations?  I think the only backend that can have an smgr reference
>> to a temprel other than the owning backend is bgwriter, and AFAICS
>> bgwriter will only have such a reference if it's responding to a
>> request by the owning backend to unlink the associated files, in which
>> case (I think) the owning backend will have no reference.
>
> Hmm, wasn't there a proposal to have the owning backend delete the files
> instead of asking the bgwriter to?

I did propose that upthread; it may have been proposed previously
also. This might be worth doing independently of the rest of the patch
(which I'm starting to fear is doomed, cue ominous soundtrack) since
it would reduce the chance of orphaning data files and possibly
simplify the logic also.

>> [dbsize.c] As with relcache.c, there's a problem if we're asked for
>> the size of a temporary relation that is not our own: we can't call
>> relpath() without knowing the ID of the owning backend, and there's no
>> way to acquire that information for pg_class.  I guess we could just
>> refuse to answer the question in that case, but that doesn't seem real
>> cool.  Or we could physically scan the directory for files that match
>> a suitably constructed wildcard, I suppose.
>
> I don't very much like the wildcard idea; but I don't think it's
> unreasonable to refuse to provide a file size.  If the owning backend
> has still got part of the table in local buffers, you'll get a
> misleading answer, so perhaps it's best to not give an answer at all.
>
> Maybe this problem could be solved if we could somehow force that
> backend to write down its local buffers, in which case it'd be nice to
> have a solution to the dbsize problem.

I'm sure we could add some kind of signaling mechanism that would tell
all backends to flush their local buffers, but I'm not too sure it
would help this case very much, because you likely wouldn't want to
wait for all the backends to complete that process before reporting
results.

>> [syncscan.c] It seems we pursue this optimization even for temprels; I
>> can't think of why that would be useful in practice.  If it's useless
>> overhead, should we skip it?  This is really independent of this
>> project; just a side thought.
>
> Maybe recently used buffers are more likely to be in the OS page cache,
> so perhaps it's not good to disable it.

I don't get it. If the whole relation fits in the page cache, it
doesn't much matter where you start a seqscan. If it doesn't,
starting where the last one ended is anti-optimal.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers