From: Robert Haas on
On Fri, Jun 25, 2010 at 1:33 PM, Kevin Grittner
<Kevin.Grittner(a)wicourts.gov> wrote:
> Recent discussions involving the possible benefits of a connection
> pool for certain users has reminded me of a brief discussion at The
> Royal Oak last month, where I said I would post a reference a
> concept which might alleviate the need for external connection
> pools. �For those interested, check out section 2.4 of this
> document:
>
> Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007.
> Architecture of a Database System. Foundations and Trends(R) in
> Databases Vol. 1, No. 2 (2007) 141*259.
>
> http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf

I think a good admission control system for memory would be huge for
us. There are innumerable threads on pgsql-performance where we tell
people to set work_mem to a tiny value (like 4MB or 16MB) because any
higher value risks driving the machine into swap in the event that
they get an unusually large number of connections or those connections
issue queries that require an unusual number of hashes or sorts.
There are also innumerable postings from people complaining that
external sorts are way slower than in-memory sorts, and of course a
hash join frequently mops the floor with a nested loop with inner
index-scan.

A really trivial admission control system might let you set a
system-wide limit on work_mem. As we build a plan, we could estimate
the total amount of work_mem it will require by examining all the
sort, hash, and hash aggregate nodes it contains. In shared memory,
we keep a total of this value across all back-ends. Just before
beginning to execute a plan that uses >0 work_mem, we bump this value
up by the value for the current plan, unless that would make us exceed
the system-wide limit. In that case, we sleep, and then next person
to finish executing (and decrease the value in shared memory) will
wake us up to try again. (Ideally, we'd want to make
maintenance_work_mem part of this accounting process also; otherwise,
a system that was humming along nicely might suddenly start thrashing
when a maintenance operation kicks off.) I suppose this would take a
good deal of performance testing to see how well it actually works. A
refinement might be to try to consider an inferior plan that uses less
memory when the system is tight on memory, rather than waiting. But
you'd have to be careful about that, because waiting might be better
(it's worth waiting 15 s if it means the execution time will decrease
by > 15 s).

The idea of doling out queries to engine processes in an interesting
one, but seems very different than our current query execution model.
I can't even begin to speculate as to whether there's anything
interesting we could do in that area without reading some literature
on the topic - got any pointers? But even if we can't or don't want
to do that, we already know that limiting the number of backends and
round-robining queries among them performs MUCH better that setting
max_connections to a large value, and not just because of memory
exhaustion. Our answer to that problem is "use a connection pooler",
but that's not a very good solution even aside from the possible
administrative nuisance, because it only solves half the problem. In
the general case, the question is not whether we can currently afford
to allocate 0 or 1 backends to a given query, but whether we can
afford to allocate 0, 1, or >1; furthermore, if we can afford to
allocate >1 backend, we'd ideally like to reuse an existing backend
rather than starting a new one. I don't think anyone's going to be
too happy with a parallel query implementation with a dependency on an
external connection poooler.

One of the difficulties in reusing an existing backend for a new
query, or in maintaining a pool of backends that could be used as
workers for parallel queries, is that our current system does not
require us to have, nor do we have, a way of resetting a backend to a
completely pristine state. DISCARD ALL is probably pretty close,
because AIUI that's what connection poolers are using today, and if it
didn't work reasonably well, we'd have gotten complaints. But DISCARD
ALL will not let you rebind to a different database, for example,
which would be a pretty useful thing to do in a pooling environment,
so that you needn't maintain separate pools for each database, and it
doesn't let you restart the authentication cycle (with no backsies)
either. Of course, addressing these problems wouldn't by itself give
us a built-in connection pooler or parallel query execution, and
there's some danger of destabilizing the code, but possibly it would
be a good way to get off the ground. Not sure, though. Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Robert Haas <robertmhaas(a)gmail.com> wrote:
> Kevin Grittner <Kevin.Grittner(a)wicourts.gov> wrote:

>> check out section 2.4 of this

> A really trivial admission control system might let you set a
> system-wide limit on work_mem.

Heck, I think an even *more* trivial admission control policy which
limits the number of active database transactions released to
execution might solve a lot of problems. Of course, what you
propose is more useful, although I'd be inclined to think that we'd
want an admission control layer which could be configured so support
both of these and much more. Done correctly, it could almost
completely eliminate the downward slope after you hit the "knee" in
many performance graphs.

> A refinement might be to try to consider an inferior plan that
> uses less memory when the system is tight on memory, rather than
> waiting.

I wouldn't try messing with that until we have the basics down. ;-)
It is within the scope of what an execution admission controller is
intended to be able to do, though.

> The idea of doling out queries to engine processes in an
> interesting one, but seems very different than our current query
> execution model.

That wasn't in section 2.4 itself -- you must have read the whole
chapter. I think any discussion of that should spin off a separate
thread -- the techniques are really orthogonal. And frankly, that's
more ambitious a topic than *I'm* inclined to want to get into at
the moment. An "execution admission controller" that starts simple
but leaves room for growth seems within the realm of possibility.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on
On 6/25/10 12:15 PM, Robert Haas wrote:
> I think a good admission control system for memory would be huge for
> us. There are innumerable threads on pgsql-performance where we tell
> people to set work_mem to a tiny value (like 4MB or 16MB) because any
> higher value risks driving the machine into swap in the event that
> they get an unusually large number of connections or those connections
> issue queries that require an unusual number of hashes or sorts.

Greenplum did this several years ago with the Bizgres project; it had a
resource control manager which was made available for PostgreSQL core.
However, it would have required a large and unpredictable amount of work
to make it compatible with OLTP workloads.

The problem with centralized resource control is the need for
centralized locking on requests for resources. That forces transactions
to be serialized in order to make sure resources are not
double-allocated. This isn't much of a problem in a DW application, but
in a web app with thousands of queries per second it's deadly.
Performance engineering for PostgreSQL over the last 7 years has been
partly about eliminating centralized locking; we don't want to add new
locking.

That means that a realistic admissions control mechanism would need to
operate based on projections and estimates and "best effort" policies.
Not only is this mathematically more complex, it's an open question
whether it puts us ahead of where we are now vis-a-vis underallocation
of memory. Realistically, a lot of tuning and testing would be required
before such a tool was actually an improvement.

Or, to put it another way: the "poor man's admission control" is a waste
of time because it doesn't actually help performance. We're basically
facing doing the hard version, or not bothering.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Josh Berkus <josh(a)agliodbs.com> wrote:

> Greenplum did this several years ago with the Bizgres project

> However, it [was not] compatible with OLTP workloads.

> the "poor man's admission control" is a waste of time because it
> doesn't actually help performance. We're basically facing doing
> the hard version, or not bothering.

I think it's premature to assume that without any evidence. I'm
sure it's possible to create a policy which does more harm than good
for any particular workload; there's no denying that could happen,
but things such as limiting open transactions (as just one example)
might be accomplished at very low cost. Since I have seen dramatic
performance improvements from restricting this through a connection
pool, I'm inclined to believe there could be benefit from such a
simple policy as this. The total work memory policy Robert proposed
sounds likely to more than pay for itself by allowing larger
work_mem settings without risking cache flushing or swapping.

One thing that seems clear to me is that the admission policy should
be configurable, so that it can be tuned base on workload. That
would also be consistent with a "start simple and expand the
capabilities" approach.

C'mon, don't be such a buzz-kill. Why should Greenplum have all the
fun? ;-)

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Fri, Jun 25, 2010 at 4:10 PM, Josh Berkus <josh(a)agliodbs.com> wrote:
> On 6/25/10 12:15 PM, Robert Haas wrote:
>> I think a good admission control system for memory would be huge for
>> us. �There are innumerable threads on pgsql-performance where we tell
>> people to set work_mem to a tiny value (like 4MB or 16MB) because any
>> higher value risks driving the machine into swap in the event that
>> they get an unusually large number of connections or those connections
>> issue queries that require an unusual number of hashes or sorts.
>
> Greenplum did this several years ago with the Bizgres project; it had a
> resource control manager which was made available for PostgreSQL core.
> However, it would have required a large and unpredictable amount of work
> to make it compatible with OLTP workloads.
>
> The problem with centralized resource control is the need for
> centralized locking on requests for resources. �That forces transactions
> to be serialized in order to make sure resources are not
> double-allocated. �This isn't much of a problem in a DW application, but
> in a web app with thousands of queries per second it's deadly.
> Performance engineering for PostgreSQL over the last 7 years has been
> partly about eliminating centralized locking; we don't want to add new
> locking.

I haven't seen the Greenplum code - how did it actually work? The
mechanism I just proposed would (except in the case of an overloaded
system) only require holding a lock for long enough to test and update
a single integer in shared memory, which doesn't seem like it would
cause a serious serialization problem. I might be missing something,
or it might suck for lots of other reasons, but if we already know
that then let's try to be more specific about what the problems are.

> That means that a realistic admissions control mechanism would need to
> operate based on projections and estimates and "best effort" policies.
> Not only is this mathematically more complex, it's an open question
> whether it puts us ahead of where we are now vis-a-vis underallocation
> of memory. �Realistically, a lot of tuning and testing would be required
> before such a tool was actually an improvement.

Before today, that's the only approach I'd ever considered, but this
article made me rethink that. If you have a stream of queries that
can be run quickly with 1GB of memory and much more slowly with any
lesser amount, the only sensible thing to do is wait until there's a
GB of memory available for you to grab. What projection or estimate
of "best effort" would arrive at even approximately the same result?

> Or, to put it another way: the "poor man's admission control" is a waste
> of time because it doesn't actually help performance. �We're basically
> facing doing the hard version, or not bothering.

I think it's an oversimplification to group all approaches as "easy"
and "hard", and even more of an oversimplification to say that all of
the easy ones suck.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers