testing HS/SR - 1 vs 2 performance [PgSql]

Prev: pg_ctl stop -m immediate on the primary server inflatessequences
Next: [HACKERS] non-reproducible failure of random test on HEAD

From: Simon Riggs on 25 Apr 2010 15:07

On Sun, 2010-04-25 at 20:25 +0200, Erik Rijkers wrote:

> Sorry if it's too much data, but to me at least it was illuminating;
> I now understand the effects of the different parameters better.

That's great, many thanks.

A few observations

* Standby performance is actually slightly above normal running. This is
credible because of the way snapshots are now taken. We don't need to
scan the procarray looking for write transactions, since we know
everything is read only. So we scan just the knownassignedxids, which if
no activity from primary will be zero-length, so snapshots will actually
get taken much faster in this case on standby. The snapshot performance
on standby is O(n) where n is the number of write transactions
"currently" on primary (transfer delays blur the word "currently").

* The results for scale factor < 100 are fine, and the results for >100
with few connections get thrown out by long transaction times. With
larger numbers of connections the wait problems seem to go away. Looks
like Erik (and possibly Hot Standby in general) has an I/O problem,
though "from what" is not yet determined. It could be just hardware, or
might be hardware plus other factors.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Erik Rijkers" on 25 Apr 2010 17:52

On Sun, April 25, 2010 20:55, Tom Lane wrote:
>
> That seems weird. Why do most of the runs show primary and standby
> as having comparable speed, but a few show the standby as much slower?
> The parameters for those runs don't seem obviously different from cases
> where it's fast. I think there might have been something else going on
> on the standby during those runs. Or do you think those represent
> cases where the mystery slowdown event happened?
>

the strange case is the scale 100 standby's slow start, followed by
a steady increase during -c 1, then -c 10, and finally getting up to speed
with -c 20 (and up). And these slow-but-growing standby series are interspersed
with normal (high-speed) primary series.

I'll try to repeat this pattern on other hardware; although
if my tests were run with faulty hardware I wouldn't know how/why
that would give the above effect (such a 'regular aberration').

testing is more difficult than I thought...

Erik Rijkers

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 25 Apr 2010 19:18

Simon Riggs <simon(a)2ndQuadrant.com> writes:
> [ v2 patch ]

I've been studying this some more while making notes for improved
comments, and I've about come to the conclusion that having readers
move the tail pointer (at the end of KnownAssignedXidsGetAndSetXmin)
is overly tricky and probably not a performance improvement anyway.
The code is in fact wrong as it stands: it's off-by-one about setting
the new tail value. And there's potential for contention with multiple
readers all wanting to move the tail pointer at once. And most
importantly, KnownAssignedXidsSearch can't move the tail pointer so
we might expend many inefficient searches while never moving the tail
pointer.

I think we should get rid of that and just have the two functions that
can mark entries invalid (which they must do with exclusive lock)
advance the tail pointer when they invalidate the current tail element.
Then we have the very simple rule that only the startup process ever
changes this data structure.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 26 Apr 2010 02:52

On Mon, Apr 26, 2010 at 3:25 AM, Erik Rijkers <er(a)xs4all.nl> wrote:
> FWIW, here are some more results from pgbench comparing
> primary and standby (both with Simon's patch).

Was there a difference in CPU utilization between the primary
and standby?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 26 Apr 2010 03:21

On Sun, 2010-04-25 at 19:18 -0400, Tom Lane wrote:
> Simon Riggs <simon(a)2ndQuadrant.com> writes:
> > [ v2 patch ]
>
> I've been studying this some more while making notes for improved
> comments, and I've about come to the conclusion that having readers
> move the tail pointer (at the end of KnownAssignedXidsGetAndSetXmin)
> is overly tricky and probably not a performance improvement anyway.
> The code is in fact wrong as it stands: it's off-by-one about setting
> the new tail value. And there's potential for contention with multiple
> readers all wanting to move the tail pointer at once.

OK, since contention was my concern, I want to avoid that.

> And most
> importantly, KnownAssignedXidsSearch can't move the tail pointer so
> we might expend many inefficient searches while never moving the tail
> pointer.

> I think we should get rid of that and just have the two functions that
> can mark entries invalid (which they must do with exclusive lock)
> advance the tail pointer when they invalidate the current tail element.

OK

> Then we have the very simple rule that only the startup process ever
> changes this data structure.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Prev: pg_ctl stop -m immediate on the primary server inflatessequences
Next: [HACKERS] non-reproducible failure of random test on HEAD