From: "Erik Rijkers" on
On Tue, May 4, 2010 18:19, Simon Riggs wrote:
> On Tue, 2010-05-04 at 18:10 +0200, Erik Rijkers wrote:
>> It would be interesting if anyone repeated these simple tests and
>> produced evidence that these non-HS.
>>
>> (Unfortunately, I have at the moment not much time for more testing)
>
> Would you be able to make those systems available for further testing?

No. sorry.

> First, I'd perform the same test with the systems swapped, so we know
> more about the symmetry of the issue. After that, would like to look
> more into internals.

you mean "systems swapped", primary and standby? primary and standby were on the same machine in
these tests (even the same raid).

I can eventually move the standby (the 'slow' side, as it stands) to another, quite similar
machine. Not in the coming days though...


> Is it possible to setup SytemTap and dtrace on these systems?

I did install systemtap last week. dtrace is not installed (I think. I've never used either.)



Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on
Erik Rijkers wrote:
> OS: Centos 5.4
> 2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
> Areca 1280ML
> primary and standby db both on a 12 disk array (sata 7200rpm, Seagat Barracuda ES.2)
>

To fill in from data you already mentioned upthread:
32 GB RAM
CentOS release 5.4 (Final), x86_64 Linux, 2.6.18-164.el5

Thanks for the all the reporting you've done here, really helpful.
Questions to make sure I'm trying to duplicate the right thing here:

Is your disk array all configured as one big RAID10 volume, so
essentially a 6-disk stripe with redundancy, or something else? In
particular I want know whether the WAL/database/archives are split onto
separate volumes or all on one big one when you were testing.

Is this is on ext3 with standard mount parameters?

Also, can you confirm that every test you ran only had a single pgbench
worker thread (-j 1 or not specified)? That looked to be the case from
the ones I saw where you posted the whole command used. It would not
surprise me to find that the CPU usage profile of a standby is just
different enough from the primary that it results in the pgbench program
not being scheduled enough time, due to the known Linux issues in that
area. Not going to assume that, of course, just one thing I want to
check when trying to replicate what you've run into.

I didn't see any glaring HS performance issues like you've been
reporting on last time I tried performance testing in this area, just a
small percentage drop. But I didn't specifically go looking for it
either. With your testing rig out of service, we're going to try and
replicate that on a system here. My home server is like a scaled down
version of yours (single quad-core, 8GB RAM, smaller Areca controller, 5
disks instead of 12) and it's running the same CentOS version. If the
problems really universal I should see it here too.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stefan Kaltenbrunner on
Erik Rijkers wrote:
> Hi Simon,
>
> In another thread you mentioned you were lacking information from me:
>
> On Tue, May 4, 2010 17:10, Simon Riggs wrote:
>> There is no evidence that Erik's strange performance has anything to do
>> with HS; it hasn't been seen elsewhere and he didn't respond to
>> questions about the test setup to provide background. The profile didn't
>> fit any software problem I can see.
>>
>
> I'm sorry if I missed requests for things that where not already mentioned.
>
> Let me repeat:
> OS: Centos 5.4
> 2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
> Areca 1280ML
> primary and standby db both on a 12 disk array (sata 7200rpm, Seagat Barracuda ES.2)
>
> It goes without saying (I hope) that apart from the pgbench tests
> and a few ssh sessions (myself), the machine was idle.
>
> It would be interesting if anyone repeated these simple tests and produced
> evidence that these non-HS.
>
> (Unfortunately, I have at the moment not much time for more testing)

FWIW - I'm seeing a behaviour here under pgbench -S workloads that looks
kinda related.

using -j 16 -c 16 -T 120 I get either 100000tps and around 660000
contextswitches per second or on some runs I end up with 150000tps and
around 1M contextswitches/s sustained. I mostly get the 100k result but
once in a while I get the 150k one. And one even can anticipate the
final transaction rate from watching "vmstat 1"...

I'm not sure yet on what is causing that behaviour but that is with
9.0B1 on a Dual Quadcore Nehalem box with 16 cpu threads (8+HT) on a
pure in-memory workload (scale = 20 with 48GB RAM).


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Erik Rijkers" on
On Tue, May 4, 2010 20:26, Greg Smith wrote:
> Erik Rijkers wrote:
>> OS: Centos 5.4
>> 2 quadcores: Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
>> Areca 1280ML
>> primary and standby db both on a 12 disk array (sata 7200rpm, Seagat Barracuda ES.2)
>>
>
> To fill in from data you already mentioned upthread:
> 32 GB RAM
> CentOS release 5.4 (Final), x86_64 Linux, 2.6.18-164.el5
>
> Thanks for the all the reporting you've done here, really helpful.
> Questions to make sure I'm trying to duplicate the right thing here:
>
> Is your disk array all configured as one big RAID10 volume, so
> essentially a 6-disk stripe with redundancy, or something else? In
> particular I want know whether the WAL/database/archives are split onto
> separate volumes or all on one big one when you were testing.

Everything together: the raid is what Areca call 'raid10(1E)'.
(to be honest I don't remember what that 1E exactly means -
extra flexibility in the number of disks, I think).

Btw, some of my emails contained the postgresql.conf of both instances.

>
> Is this is on ext3 with standard mount parameters?

ext3 noatime

> Also, can you confirm that every test you ran only had a single pgbench
> worker thread (-j 1 or not specified)? That looked to be the case from
> the ones I saw where you posted the whole command used. It would not

yes; the literal cmd:
time /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/bin/pgbench -h /tmp -p 6565 -U rijkers
-n -S -c 20 -T 900 -j 1 replicas

To avoid wrapping in the emails I just removed '-h \tmp', -U rijkers', and 'replicas'.

(I may have run the primary's pgbench binary also against the slave - don't think
that should make any difference)

> surprise me to find that the CPU usage profile of a standby is just
> different enough from the primary that it results in the pgbench program
> not being scheduled enough time, due to the known Linux issues in that
> area. Not going to assume that, of course, just one thing I want to
> check when trying to replicate what you've run into.
>
> I didn't see any glaring HS performance issues like you've been
> reporting on last time I tried performance testing in this area, just a
> small percentage drop. But I didn't specifically go looking for it

Here, it seems repeatable, but does not occur with all scales.

Hm, maybe I should just dump *all* of my results on the wiki for reference. (I'll look at that
later).

> either. With your testing rig out of service, we're going to try and
> replicate that on a system here. My home server is like a scaled down
> version of yours (single quad-core, 8GB RAM, smaller Areca controller, 5
> disks instead of 12) and it's running the same CentOS version. If the
> problems really universal I should see it here too.
>

Thanks,


Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Tue, 2010-05-04 at 21:34 +0200, Stefan Kaltenbrunner wrote:

> FWIW - I'm seeing a behaviour here under pgbench -S workloads that looks
> kinda related.
>
> using -j 16 -c 16 -T 120 I get either 100000tps and around 660000
> contextswitches per second or on some runs I end up with 150000tps and
> around 1M contextswitches/s sustained. I mostly get the 100k result but
> once in a while I get the 150k one. And one even can anticipate the
> final transaction rate from watching "vmstat 1"...
>
> I'm not sure yet on what is causing that behaviour but that is with
> 9.0B1 on a Dual Quadcore Nehalem box with 16 cpu threads (8+HT) on a
> pure in-memory workload (scale = 20 with 48GB RAM).

Educated guess at a fix: please test this patch. It's good for
performance testing, but doesn't work correctly at failover, which would
obviously be addressed prior to any commit.

--
Simon Riggs www.2ndQuadrant.com