testing HS/SR - 1 vs 2 performance [PgSql]

Prev: pg_ctl stop -m immediate on the primary server inflatessequences
Next: [HACKERS] non-reproducible failure of random test on HEAD

From: marcin mank on 21 Apr 2010 10:22

On Wed, Apr 21, 2010 at 4:12 PM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
> On Wed, 2010-04-21 at 09:51 -0400, Robert Haas wrote:
>> >
>> > Adding an assertion isn't going to do much because it's unlikely anybody
>> > is going to be running for 2^31 transactions with asserts enabled.
>> >
>
>> I think the assert is a good idea. Â If there's no real problem here,
>> the assert won't trip. Â It's just a safety precaution.
>
> If you believe that, then I think you should add this to all the other
> places in the current server where that assumption is made without
> assertion being added. As a safety precaution.
>

Is that not a good idea that (at least for dev-builds, like with
enable-cassert) the xid counter start at like 2^31 - 1000 ? It could
help catch some bugs.

Greetings
Marcin MaÅk

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 21 Apr 2010 10:49

On Wed, 2010-04-21 at 16:22 +0200, marcin mank wrote:

> Is that not a good idea that (at least for dev-builds, like with
> enable-cassert) the xid counter start at like 2^31 - 1000 ? It could
> help catch some bugs.

It is a good idea, I'm sure that would help catch bugs.

It wouldn't help here because the case in doubt is whether it's possible
to have an xid still showing in memory arrays from the last time the
cycle wrapped. It isn't. These things aren't random. These numbers are
extracted directly from activity that was occurring on the primary and
regularly checked and cleaned as the standby runs.

So you'll need to do 2^31 transactions to prove this isn't true, which
isn't ever going to happen in testing with an assert build and nobody
with that many transactions would run an assert build anyway.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 21 Apr 2010 10:53

On Wed, Apr 21, 2010 at 10:12 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
> On Wed, 2010-04-21 at 09:51 -0400, Robert Haas wrote:
>> >
>> > Adding an assertion isn't going to do much because it's unlikely anybody
>> > is going to be running for 2^31 transactions with asserts enabled.
>> >
>
>> I think the assert is a good idea. If there's no real problem here,
>> the assert won't trip. It's just a safety precaution.
>
> If you believe that, then I think you should add this to all the other
> places in the current server where that assumption is made without
> assertion being added. As a safety precaution.

I feel like this conversation is getting a little heated. We are just
trying to solve a technical problem here. Perhaps I am misreading -
tone doesn't come through very well in email.

I think the assumptions that are being made in this particular case
are different from the ones made elsewhere in the server. Generally,
we don't assume transaction IDs are arriving in ascending order - in
fact, we usually explicitly have to deal with the fact that they might
not be. So if we have a situation where we ARE relying on them
arriving in order, because we have extrinsic reasons why we know it
has to happen that way, adding an assertion to make sure that things
are happening the way we expect doesn't seem out of line. This code
is fairly complex.

There is arguably less value in asserting that the newly added xid
follows the tail as well as the head, but I still like the idea. Not
sure whether that's rational or not.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Florian Pflug on 21 Apr 2010 11:13

On Apr 21, 2010, at 16:49 , Simon Riggs wrote:
> On Wed, 2010-04-21 at 16:22 +0200, marcin mank wrote:
>
>> Is that not a good idea that (at least for dev-builds, like with
>> enable-cassert) the xid counter start at like 2^31 - 1000 ? It could
>> help catch some bugs.
>
> It is a good idea, I'm sure that would help catch bugs.
>
> It wouldn't help here because the case in doubt is whether it's possible
> to have an xid still showing in memory arrays from the last time the
> cycle wrapped. It isn't. These things aren't random. These numbers are
> extracted directly from activity that was occurring on the primary and
> regularly checked and cleaned as the standby runs.
>
> So you'll need to do 2^31 transactions to prove this isn't true, which
> isn't ever going to happen in testing with an assert build and nobody
> with that many transactions would run an assert build anyway.

ISTM that there's no need to actually execute 2^31 transactions to trigger this bug (if it actually exists), it'd be sufficient to increment the xid counter by more than one each time a xid is assigned, no?

Or would that trip snapshot creation on the standby?

best regards,
Florian Pflug

From: Simon Riggs on 23 Apr 2010 18:39

On Fri, 2010-04-23 at 11:32 -0400, Robert Haas wrote:
> >
> > 99% of transactions happen in similar times between primary and standby,
> > everything dragged down by rare but severe spikes.
> >
> > We're looking for something that would delay something that normally
> > takes <0.1ms into something that takes >100ms, yet does eventually
> > return. That looks like a severe resource contention issue.
>
> Wow. Good detective work.

While we haven't fully established the source of those problems, I am
now happy that these test results don't present any reason to avoid
commiting the main patch tested by Erik (not the smaller additional one
I sent). I expect to commit that on Sunday.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Prev: pg_ctl stop -m immediate on the primary server inflatessequences
Next: [HACKERS] non-reproducible failure of random test on HEAD