From: Tom Lane on
Takahiro Itagaki <itagaki.takahiro(a)oss.ntt.co.jp> writes:
> In mdunlink(), we truncate the first main fork to zero length
> and actually unlink at the next checkpoint, but other segments
> are not truncated and only unlinked. Then, if another backend
> open the segments, disk spaces occupied by them are not reclaimed
> until all of the backends close their file descriptors. Longer
> checkpoint timeout and connection pooling make things worse.

Truncating seems like an ugly kluge that's not fixing the real problem.
Why are there open descriptors for a dropped relation? They should all
get closed as a consequence of relcache flush.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Takahiro Itagaki on

Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Truncating seems like an ugly kluge that's not fixing the real problem.
> Why are there open descriptors for a dropped relation? They should all
> get closed as a consequence of relcache flush.

Relcache will be flushed at the next command, but there could be some
*idle backends* kept by connection pooling. They won't close dropped files
until shared cache invalidation queue are almost filled, that might take
long time.

There might be another solution that we send PROCSIG_CATCHUP_INTERRUPT
signal not only on the threshold of queue length but also on timeout,
where the signal is sent when we have some old messages in the queue
longer than 30sec - 1min.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on
On Tue, Jul 6, 2010 at 9:59 AM, Takahiro Itagaki
<itagaki.takahiro(a)oss.ntt.co.jp> wrote:
>
> Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> Truncating seems like an ugly kluge that's not fixing the real problem.
>> Why are there open descriptors for a dropped relation? �They should all
>> get closed as a consequence of relcache flush.
>
> Relcache will be flushed at the next command, but there could be some
> *idle backends* kept by connection pooling. They won't close dropped files
> until shared cache invalidation queue are almost filled, that might take
> long time.

Right. Since many connection poolers use LIFO method to manage the pooled
connections, this problem is very likely to happen.

> There might be another solution that we send PROCSIG_CATCHUP_INTERRUPT
> signal not only on the threshold of queue length but also on timeout,
> where the signal is sent when we have some old messages in the queue
> longer than 30sec - 1min.

REINDEX or something should not send PROCSIG_CATCHUP_INTERRUPT immediately?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers