pessimal trivial-update performance [PgSql]

Prev: [HACKERS] test_fsync output improvement
Next: [HACKERS] proof concept: do statement parametrization

From: "Pierre C" on 5 Jul 2010 06:11

> The problem can generally be written as "tuples seeing multiple
> updates in the same transaction"?
>
> I think that every time PostgreSQL is used with an ORM, there is
> a certain amount of multiple updates taking place. I have actually
> been reworking clientside to get around multiple updates, since they
> popped up in one of my profiling runs. Allthough the time I optimized
> away ended being both "roundtrip time" + "update time", but having
> the database do half of it transparently, might have been sufficient
> to get me to have had a bigger problem elsewhere..
>
> To sum up. Yes I think indeed it is a real-world case.
>
> Jesper

On the Python side, elixir and sqlalchemy have an excellent way of
handling this, basically when you start a transaction, all changes are
accumulated in a "session" object and only flushed to the database on
session commit (which is also generally the transaction commit). This has
multiple advantages, for instance it is able to issue multiple-line
statements, updates are only done once, you save a lot of roundtrips, etc.
Of course it is most of the time not compatible with database triggers, so
if there are triggers the ORM needs to be told about them.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jesper Krogh on 5 Jul 2010 06:26

On 2010-07-05 12:11, Pierre C wrote:
>
> > The problem can generally be written as "tuples seeing multiple
> > updates in the same transaction"?
> >
> > I think that every time PostgreSQL is used with an ORM, there is a
> > certain amount of multiple updates taking place. I have actually
> > been reworking clientside to get around multiple updates, since
> > they popped up in one of my profiling runs. Allthough the time I
> > optimized away ended being both "roundtrip time" + "update time",
> > but having the database do half of it transparently, might have
> > been sufficient to get me to have had a bigger problem elsewhere..
> >
> > To sum up. Yes I think indeed it is a real-world case.
> >
> > Jesper
>
> On the Python side, elixir and sqlalchemy have an excellent way of
> handling this, basically when you start a transaction, all changes
> are accumulated in a "session" object and only flushed to the
> database on session commit (which is also generally the transaction
> commit). This has multiple advantages, for instance it is able to
> issue multiple-line statements, updates are only done once, you save
> a lot of roundtrips, etc. Of course it is most of the time not
> compatible with database triggers, so if there are triggers the ORM
> needs to be told about them.

How about unique constraints, foreign key violations and checks? Would
you also pospone those errors to commit time? And transactions with lots
of data?

It doesn't really seem like a net benefit to me, but I can see applications
where it easily will fit.

Jesper

From: Andres Freund on 5 Jul 2010 06:28

On Monday 05 July 2010 12:11:38 Pierre C wrote:
> > The problem can generally be written as "tuples seeing multiple
> > updates in the same transaction"?
> >
> > I think that every time PostgreSQL is used with an ORM, there is
> > a certain amount of multiple updates taking place. I have actually
> > been reworking clientside to get around multiple updates, since they
> > popped up in one of my profiling runs. Allthough the time I optimized
> > away ended being both "roundtrip time" + "update time", but having
> > the database do half of it transparently, might have been sufficient
> > to get me to have had a bigger problem elsewhere..
> >
> > To sum up. Yes I think indeed it is a real-world case.
> >
> > Jesper
>
> On the Python side, elixir and sqlalchemy have an excellent way of
> handling this, basically when you start a transaction, all changes are
> accumulated in a "session" object and only flushed to the database on
> session commit (which is also generally the transaction commit). This has
> multiple advantages, for instance it is able to issue multiple-line
> statements, updates are only done once, you save a lot of roundtrips, etc.
> Of course it is most of the time not compatible with database triggers, so
> if there are triggers the ORM needs to be told about them.
Its also not concurrency safe in many cases.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 5 Jul 2010 17:57

On Sun, Jul 4, 2010 at 9:48 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Sure. �What you'd need is for HeapTupleSatisfiesVacuum to observe that
> (a) the tuple's xmin and xmax are equal,
> (b) they're equal to my own transaction's XID,
> (c) none of the live snapshots in my backend can see cmin but not cmax,
> (d) cmax < currentCommandId, ensuring that every future snapshot will
> � �see cmax too (not quite convinced this is certain to hold).
[...]
> Of course, you'd also need to get to HeapTupleSatisfiesVacuum in the
> first place. �The complained-of case lacks any VACUUM call. �Maybe a HOT
> cleanup would happen at the right time but I'm not sure. �If it doesn't,
> adding one would represent a significant expenditure that would usually
> not be repaid.

It looks like a HOT cleanup happens when pd_prune_xid falls behind
OldestXmin. Normally, we set pd_prune_xid to the xmax of the deleted
tuple, but we could perhaps fudge that here to get the desired
behavior; maybe just set it to FrozenXID. Where it gets sticky is
that the proposed rules for HeapTupleSatisfiesVacuum() give different
answers depending on who does the vacuuming, so if backend A sets a
hint say, hey, there's vacuumable stuff on this page, and then backend
B tries to prune it, nothing will happen. What would be nicer is if
there were a way for the updater to mark the item pointer or tuple in
some way that would make it look vacuumable to everyone, but without
breaking the HOT chain.

> Another issue here is that since xmin is certainly within the GlobalXmin
> horizon, it would be essential to preserve the update chain ctid links,
> ie, make the tuple's update predecessor point to its successor. �That
> seems workable for the case of cleaning out an intermediate entry in a
> HOT chain, but not otherwise.

Yeah, that's a shame. HOT is huge, but it would be great if we had a
way to do partial vacuuming even when the indexed columns are updated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev |
Pages: 1 2
Prev: [HACKERS] test_fsync output improvement
Next: [HACKERS] proof concept: do statement parametrization