From: Heikki Linnakangas on
In testing, it looks like there's still something wrong with the
subtransaction handling. I created a test function to create a large
number of subtransactions:

CREATE LANGUAGE plpgsql;
CREATE TABLE bar (id int4);

CREATE OR REPLACE FUNCTION subxids (n integer) RETURNS void LANGUAGE
plpgsql AS $$
BEGIN
IF n <= 0 THEN RETURN; END IF;
INSERT INTO bar VALUES (n);
PERFORM subxids(n - 1);
RETURN;
EXCEPTION WHEN raise_exception THEN NULL; END;
$$;

And used that to created 100 nested subtransactions in the primary server:

SELECT subxids(100);


I got this in the standby:

....
LOG: REDO @ 0/A000EA8; LSN 0/A000FCC: prev 0/A000E6C; xid 2662; len
264: Transaction - xid assignment xtop 2599 nsubxids 64; 2600 2601 2602
2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616
2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630
2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644
2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658
2659 2660 2661 2662 0
LOG: extend subtrans xid 2600 page 1 last_page 0
CONTEXT: rmgr 1 redo xid assignment xtop 2599 nsubxids 64; 2600 2601
2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615
2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629
2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643
2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657
2658 2659 2660 2661 2662 0
TRAP: FailedAssertion("!(((newestXact) >= ((TransactionId) 3)))", File:
"subtrans.c", Line: 299)
LOG: startup process (PID 28594) was terminated by signal 6: Aborted
LOG: terminating any other active server processes

Apparently an InvalidXid is sneaking into the unreported xid array
collected in the primary.

I actually bumped into this while I was testing a simpler implementation
of the logic to collect the unreported xids in the primary. As the patch
stands, we keep track of how many of the childXids at each
subtransaction level have already been reported, but it occurs to me
that it would be a lot simpler to populate a separate array of
unreported xids on-the-fly, as xids are assigned. With the simpler
implementation I bumped into another issue (see below), which is why I
wanted to test if it worked without the simplification.

So with the simpler logic, I had this problem. When I do this in the
primary:

postgres=# SELECT subxids(10000);
ERROR: stack depth limit exceeded
HINT: Increase the configuration parameter "max_stack_depth", after
ensuring the platform's stack depth limit is adequate.
CONTEXT: SQL statement "SELECT subxids( $1 - 1)"
PL/pgSQL function "subxids" line 4 at PERFORM
SQL statement "SELECT subxids( $1 - 1)"
....

I get this in the standby:

....
LOG: REDO @ 0/1000B6C4; LSN 0/1000B6F0: prev 0/1000B698; xid 4325; len
16: Transaction - abort: 2009-09-22 12:45:00.938243+03
LOG: record known xact 4325 latestObservedXid 4334
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938243+03
LOG: remove KnownAssignedXid 4325
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938243+03
LOG: REDO @ 0/1000B6F0; LSN 0/1000B71C: prev 0/1000B6C4; xid 4324; len
16: Transaction - abort: 2009-09-22 12:45:00.938438+03
LOG: record known xact 4324 latestObservedXid 4334
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938438+03
LOG: remove KnownAssignedXid 4324
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938438+03
LOG: REDO @ 0/1000B71C; LSN 0/1000B748: prev 0/1000B6F0; xid 4323; len
16: Transaction - abort: 2009-09-22 12:45:00.938632+03
LOG: record known xact 4323 latestObservedXid 4334
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
LOG: remove KnownAssignedXid 4323
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
LOG: 1 KnownAssignedXids 3619
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
FATAL: cannot remove KnownAssignedXid 4323
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
LOG: startup process (PID 31284) exited with exit code 1
LOG: terminating any other active server processes

It looks like the standby tries to remove XID 4323 from the
known-assigned hash table, but it's not there because it was removed and
set in pg_subtrans by an XLOG_XACT_ASSIGNMENT record earlier. I guess we
should just not throw an error in that case, but is there a way we could
catch that narrow case and still keep the check in
KnownAssignedXidsRemove()? It seems like the check might help catch
other bugs, so I'd rather keep it if possible.

I've pushed the simplified code to my git repository if you want to take
a look.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on

On Tue, 2009-09-22 at 12:53 +0300, Heikki Linnakangas wrote:

> In testing, it looks like there's still something wrong with the
> subtransaction handling. I created a test function to create a large
> number of subtransactions:

OK, looking at this now. Thanks for the report.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on

On Tue, 2009-09-22 at 12:53 +0300, Heikki Linnakangas wrote:

> It looks like the standby tries to remove XID 4323 from the
> known-assigned hash table, but it's not there because it was removed
> and set in pg_subtrans by an XLOG_XACT_ASSIGNMENT record earlier. I
> guess we should just not throw an error in that case, but is there a
> way we could catch that narrow case and still keep the check in
> KnownAssignedXidsRemove()? It seems like the check might help catch
> other bugs, so I'd rather keep it if possible.

Yes, a certain test is important and I'm glad we've (almost) achieved
it.

ISTM we can say if not in KnownAssignedXids then check pg_subtrans. If
not in either then report error. May need to add info to the abort
record to check xtop also.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on

On Tue, 2009-09-22 at 12:53 +0300, Heikki Linnakangas wrote:
> In testing, it looks like there's still something wrong with the
> subtransaction handling. I created a test function to create a large
> number of subtransactions:
>
> CREATE LANGUAGE plpgsql;
> CREATE TABLE bar (id int4);
>
> CREATE OR REPLACE FUNCTION subxids (n integer) RETURNS void LANGUAGE
> plpgsql AS $$
> BEGIN
> IF n <= 0 THEN RETURN; END IF;
> INSERT INTO bar VALUES (n);
> PERFORM subxids(n - 1);
> RETURN;
> EXCEPTION WHEN raise_exception THEN NULL; END;
> $$;
>
> And used that to created 100 nested subtransactions in the primary server:
>
> SELECT subxids(100);
>
>
> I got this in the standby:
>
> ...
> LOG: REDO @ 0/A000EA8; LSN 0/A000FCC: prev 0/A000E6C; xid 2662; len
> 264: Transaction - xid assignment xtop 2599 nsubxids 64; 2600 2601 2602
> 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616
> 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630
> 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644
> 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658
> 2659 2660 2661 2662 0
> LOG: extend subtrans xid 2600 page 1 last_page 0
> CONTEXT: rmgr 1 redo xid assignment xtop 2599 nsubxids 64; 2600 2601
> 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615
> 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629
> 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643
> 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657
> 2658 2659 2660 2661 2662 0
> TRAP: FailedAssertion("!(((newestXact) >= ((TransactionId) 3)))", File:
> "subtrans.c", Line: 299)
> LOG: startup process (PID 28594) was terminated by signal 6: Aborted
> LOG: terminating any other active server processes
>
> Apparently an InvalidXid is sneaking into the unreported xid array
> collected in the primary.
>
> I actually bumped into this while I was testing a simpler implementation
> of the logic to collect the unreported xids in the primary. As the patch
> stands, we keep track of how many of the childXids at each
> subtransaction level have already been reported, but it occurs to me
> that it would be a lot simpler to populate a separate array of
> unreported xids on-the-fly, as xids are assigned. With the simpler
> implementation I bumped into another issue (see below), which is why I
> wanted to test if it worked without the simplification.

The bug seems an off-by-one error and would seem easily corrected in any
case; there is no fundamental problem revealed.

I prefer your suggested approach, since it avoids that rather complex
looking code that did the grovelling thru the nested xact states.

I'll get back to you with more on this tomorrow.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Alvaro Herrera on
Heikki Linnakangas escribi�:
> Simon Riggs wrote:
> > On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote:
> >> jjanes=# begin;
> >> BEGIN
> >> jjanes=# lock table pgbench_history in access exclusive mode;
> >> LOCK TABLE
> >> jjanes=# select count(*) from pgbench_history;
> >> count
> >> --------
> >> 519104
> >> (1 row)
> >>
> >> jjanes=# select count(*) from pgbench_history;
> >> count
> >> --------
> >> 527814
> >> (1 row)
> >>
> >> Is this the expected behavior?
> >
> > By me, yes. WAL replay does not require a table lock to progress. Any
> > changes are protected with block-level locks. It does acquire a table
> > lock and cancel conflicting queries when it is about to replay something
> > that would cause a query to explode, such as dropping a table, as
> > explained in docs.
>
> That is rather surprising. You can't get that result in a normal server,
> which I think is enough of a reason to disallow it. If you do LOCK TABLE
> ACCESS EXCLUSIVE, you wouldn't expect the contents to change under your
> nose.

I think the fallout from that argument is that WAL replay should hold
table-level locks (otherwise the workaround to this problem is too
special-casey). I'm not sure about that. If I really wanted to get
consistent results, I'd use a serializable transaction.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers