From: Andrew Dunstan on


Robert Creager wrote:
> On Mar 11, 2010, at 6:00 PM, Andrew Dunstan wrote:
>
>
>> Tom Lane wrote:
>>
>>> I was looking at this recent nonrepeatable buildfarm failure:
>>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=polecat&dt=2010-03-11%2021:45:10
>>> which has several instances of the known "pgstat wait timeout" problem
>>> during the parallel regression tests.
>>>
>
>
> You've got me trained well now. I'm now looking at my build machine failures. Wasn't sure what to do about that one, since no relevant files changed.
>
> Is there any value in setting "keep_error_builds => 0,"? I know Andrew was able to get the complete log file.
>
>
>

You normally want this as 0, to avoid eating up disk space. You
certainly don't want it non-zero for long unless you have many Gb to spare.

I doubt keeping this particular build would have helped much. The build
was probably fine, the bug is apparently triggered by some hard to
repeat timing condition, from what I gather from Tom's analysis.

And from now on we will not have logs truncated by the presence of nul
bytes.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Robert Creager <robert(a)logicalchaos.org> writes:
> Is there any value in setting "keep_error_builds => 0,"? I know Andrew was able to get the complete log file.

The data is all uploaded to the buildfarm server, so as long as EDB
doesn't holler uncle about the amount of storage that's taking, I don't
think you need to keep 'em locally.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andrew Dunstan on


Tom Lane wrote:
> Robert Creager <robert(a)logicalchaos.org> writes:
>
>> Is there any value in setting "keep_error_builds => 0,"? I know Andrew was able to get the complete log file.
>>
>
> The data is all uploaded to the buildfarm server, so as long as EDB
> doesn't holler uncle about the amount of storage that's taking, I don't
> think you need to keep 'em locally.
>
>


Er, that's CMD who host it. Credit where credit is due ;-)

One of these days they will get upset and we'll purge the back logs so
we only keep the last 6 months or a year.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Creager on

On Mar 11, 2010, at 6:00 PM, Andrew Dunstan wrote:

>
>
> Tom Lane wrote:
>> I was looking at this recent nonrepeatable buildfarm failure:
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=polecat&dt=2010-03-11%2021:45:10
>> which has several instances of the known "pgstat wait timeout" problem
>> during the parallel regression tests.


You've got me trained well now. I'm now looking at my build machine failures. Wasn't sure what to do about that one, since no relevant files changed.

Is there any value in setting "keep_error_builds => 0,"? I know Andrew was able to get the complete log file.

Cheers,
Rob


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
I wrote:
> Anyway it's only a guess. It could well be that that machine was simply
> so heavily loaded that the stats collector couldn't respond fast enough.
> I'm just wondering whether there's an unrecognized bug lurking here.

Still meditating on this ... and it strikes me that the pgstat.c code
is really uncommunicative about problems. In particular,
pgstat_read_statsfile_timestamp and pgstat_read_statsfile don't complain
at all about being unable to read a stats file. It seems to me that the
only "expected" case is ENOENT (and even that isn't really expected, in
normal operation). Surely we should at least elog(LOG) any other
failure condition?

Another place that could probably do with elog(LOG) is where
pgstat_write_statsfile resets last_statrequest in case it's in the
future. That shouldn't ever happen. While the reset is probably
a good thing for robustness, wouldn't logging it be a good idea?

Lastly, backend_read_statsfile is designed to send an inquiry message
every time through the loop, ie, every 10 msec. This is said to be in
case the stats collector drops one. But is this enough to flood the
collector and make things worse? I wonder if there should be some
backoff there.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers