From: Andrew Dunstan on


Tom Lane wrote:
> I was looking at this recent nonrepeatable buildfarm failure:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=polecat&dt=2010-03-11%2021:45:10
> which has several instances of the known "pgstat wait timeout" problem
> during the parallel regression tests.
>
> I was disappointed to see that the postmaster log part of the report
> is truncated near the start of the run, so there's no way to see if
> anything interesting got logged near the point of the failure.
>
> When I run "make check" on my own OS X machine, I notice that the
> postmaster.log file usually has some runs of a few dozen null bytes in
> it. I suspect this is an artifact of Apple's stdio buffering
> implementation when several backends are writing to the same log file.
> I suppose that what happened in the above case is that some nulls got
> embedded in postmaster.log, and then the file got truncated at the first
> null, perhaps during the upload to the buildfarm server, or maybe it's
> intact on the server but the web page is failing to display anything
> past that point.
>
> There's probably not much we can do about Apple's stdio (and I would bet
> that they inherited this behavior from the BSDen anyway). What we
> *could* do is
>
> (1) encourage buildfarm owners to run the tests with logging_collector
> turned on, and/or
>
> (2) fix the buildfarm reporting mechanisms to not be fazed by nulls in
> the log files.
>
> I have no clear idea how hard either of these things is.
>
>
>

Well, the good news is that we actually have the data on the server, in
a temp file that will be cleaned up, but hasn't been yet. I have placed
a copy at <http://buildfarm.postgresql.org/polecat-check.log.gz>. And
thus we know that the client does exactly what you want, and the problem
is entirely on the server. That's more good news.

Now, the log_text field in our build_status_log table is text, so it's
on insertion to the database that it gets truncated. I'm thinking that I
should just escape nulls with a regex ( 's/\x00/\\0/g' or similar)
before inserting the data.

Anyone got any better ideas?

(BTW, your idea of using logging_collector won't work without
significant changes in the buildfarm client. Nice idea, though.)

cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Andrew Dunstan <andrew(a)dunslane.net> writes:
> Tom Lane wrote:
>> I was looking at this recent nonrepeatable buildfarm failure:
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=polecat&dt=2010-03-11%2021:45:10

> Well, the good news is that we actually have the data on the server, in
> a temp file that will be cleaned up, but hasn't been yet. I have placed
> a copy at <http://buildfarm.postgresql.org/polecat-check.log.gz>.

Thanks for that. I found the relevant part of the postmaster log:

[4b99671f.ddb4:36] LOG: statement: DELETE FROM vactst WHERE i != 0;
[4b99671f.ddb4:37] LOG: statement: VACUUM (FULL) vactst;
[4b99671f.ddb4:38] WARNING: pgstat wait timeout
[4b99671f.ddb4:39] LOG: statement: DELETE FROM vactst;
[4b99671f.ddb4:40] LOG: statement: SELECT * FROM vactst;
[4b99671f.ddb4:41] LOG: statement: VACUUM (FULL, FREEZE) vactst;
[4b99671f.ddb4:42] WARNING: pgstat wait timeout
[4b99671f.ddb4:43] LOG: statement: VACUUM (ANALYZE, FULL) vactst;
[4b99671f.ddb4:44] WARNING: pgstat wait timeout
[4b99671f.ddb4:45] LOG: statement: CREATE TABLE vaccluster (i INT PRIMARY KEY);
[4b99671f.ddb4:46] LOG: statement: ALTER TABLE vaccluster CLUSTER ON vaccluster_pkey;
[4b99671f.ddb4:47] LOG: statement: INSERT INTO vaccluster SELECT * FROM vactst;
[4b99671f.ddb4:48] LOG: statement: CLUSTER vaccluster;
[4b99671f.ddb4:49] LOG: statement: VACUUM FULL pg_am;
[4b99671f.ddb4:50] WARNING: pgstat wait timeout
[4b99671f.ddb4:51] LOG: statement: VACUUM FULL pg_class;
[4b99671f.ddb4:52] WARNING: pgstat wait timeout
[4b99671f.ddb4:53] LOG: statement: VACUUM FULL pg_database;
[4b99671f.ddb4:54] WARNING: pgstat wait timeout
[4b99671f.ddb4:55] LOG: statement: VACUUM FULL vaccluster;
[4b996707.dcc0:2] WARNING: pgstat wait timeout
[4b99671f.ddb4:56] WARNING: pgstat wait timeout
[4b99671f.ddb4:57] LOG: statement: VACUUM FULL vactst;
[4b996707.dcc0:3] WARNING: pgstat wait timeout
[4b996747.ddcc:1] WARNING: pgstat wait timeout
[4b99671f.ddba:4] LOG: statement: INSERT INTO DEFAULT_TBL VALUES (1, 'thomas', 57.0613);

Most of the "pgstat wait timeout" gripes are coming from the backend
running the "vacuum" regression test, but there are two that came from
process dcc0, which is shown by other log entries to be the autovacuum
launcher. So now I'm wondering if there could be some issue that occurs
when the autovac launcher and a backend both want stats concurrently.
I have not got further than that in my uncaffeinated state --- anybody
have any ideas?

> Now, the log_text field in our build_status_log table is text, so it's
> on insertion to the database that it gets truncated. I'm thinking that I
> should just escape nulls with a regex ( 's/\x00/\\0/g' or similar)
> before inserting the data.

Works for me.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Alvaro Herrera on
Tom Lane wrote:

> Most of the "pgstat wait timeout" gripes are coming from the backend
> running the "vacuum" regression test, but there are two that came from
> process dcc0, which is shown by other log entries to be the autovacuum
> launcher. So now I'm wondering if there could be some issue that occurs
> when the autovac launcher and a backend both want stats concurrently.

Seems like this issue would exist whenever two backends want stats
concurrently, no? Is this well-tested?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andrew Dunstan on


Tom Lane wrote:
>> Now, the log_text field in our build_status_log table is text, so it's
>> on insertion to the database that it gets truncated. I'm thinking that I
>> should just escape nulls with a regex ( 's/\x00/\\0/g' or similar)
>> before inserting the data.
>>
>
> Works for me.
>
>
>

Done.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Alvaro Herrera <alvherre(a)commandprompt.com> writes:
> Tom Lane wrote:
>> Most of the "pgstat wait timeout" gripes are coming from the backend
>> running the "vacuum" regression test, but there are two that came from
>> process dcc0, which is shown by other log entries to be the autovacuum
>> launcher. So now I'm wondering if there could be some issue that occurs
>> when the autovac launcher and a backend both want stats concurrently.

> Seems like this issue would exist whenever two backends want stats
> concurrently, no? Is this well-tested?

Well, you tell me how much it's been tested ;-). But the autovac
launcher could possibly interact differently than another backend,
since it has a different value for maximum stats file age.

Anyway it's only a guess. It could well be that that machine was simply
so heavily loaded that the stats collector couldn't respond fast enough.
I'm just wondering whether there's an unrecognized bug lurking here.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers