New feature request: FlashBack Query [PgSql]

Prev: Piggybacking vacuum I/O
Next: pg_proc without oid?

From: Hannu Krosing on 20 Feb 2007 11:02

Ühel kenal päeval, T, 2007-02-20 kell 10:20, kirjutas Jonah H. Harris:
> On 2/20/07, Gregory Stark <stark(a)enterprisedb.com> wrote:
> > I used to say that too but I've since realized it's not really true.
>
> Heh, take a joke man... I was following up on Drake's email :)
>
> But, since you want to discuss your view of the systems openly... I'll
> gladly reply :)
>
> > It's more like Oracle is optimized for data that's committed
> > long in the past and we're optimized for data that's
> > been recently updated.
>
> Wrong. When Oracle says it's committed, it's committed. No
> difference between when, where, and how. In Oracle, the committed
> version is *always* the first presented to the user... it takes time
> to go back and look at older versions;

Older versions are also committed :)

He probably meant longer transactions and several versions visible to
different backends.

> but why shouldn't that be a bit slower, it isn't common practice anyway.

Not for pure OLAP, at least when you have fairly fast transactions. But
it can slow things down when you have some hotspot tables.

> Same with rollbacks... why
> should they optimize for them when 97% of transactions commit?

Or other way around, - you should write code, where most transactions
commit ;)

> > In Oracle the data that's been committed long in the past requires no
> > transactional overhead but the data that's been recently updated requires lots
> > of work to fetch the right version.
>
> Wrong. The same transactional overhead applies to *all* data in
> Oracle no matter of when it was committed. Similarly, the only
> overhead required occurs when someone is querying in serializable
> isolation or on read-committed data before or during a commit. On
> short OLTP-type transactions, Oracle has the most optimized solution.
>
> > In Postgres it's the other way around. data that's been committed deleted long
> > ago requires extra work to clean up but data that's been recently changed
> > requires little additional work to see the correct version.
>
> PostgreSQL has little additional work? Like, checking the validity of
> every tuple? Oracle checks visibility at the block level, so there's
> *much* less overhead.

Hmm. How can it check visibility at block level and at the same time do
in-place updates on single tuples ?

> Take most of the benchmarks which can hold ~200
> tuples per block. Tables in those benchmarks are 100+ million rows.
> On a sequential scan, Oracle would perform 500K checks, PostgreSQL
> would perform *all* 100M checks (not counting dead versions due to row
> updates and the like).

My proposal of keeping visibility info in a separate heap would help to
get similar results, that is mostly 1 check per page. That would also
cover much of the index lookup cases below.

> On an index scan, Oracle not only has a
> smaller index and less to check, but also knows the tuple will be
> committed and will, in most cases, not have to perform additional
> physical I/O to find the latest version of a row.

It is also the reason why you can forget about doing simultaneous data
loading and queries on the same table. If you know avoid doing that,
then it "will, in most cases, not have to perform additional physical
I/O to find the latest version of a row" ;)

> Of course, Oracle's design is much more complicated in its ability to
> build read-committed versions of the blocks at runtime; something the
> simplicity of PostgreSQL's MVCC design eliminates.
>
> > In a sense then it's the opposite of what we usually say. Oracle is optimized
> > for mostly static data. Postgres is optimized for changing data.
>
> Care to share an example to prove it?
>
> Like always, there are pros and cons with both designs, but denying
> facts gets us nowhere. We're off-topic now... so we should either
> move this off line or to another thread. I personally don't see much
> of a reason to continue discussing MVCC designs anymore as Oracle's is
> patented and PostgreSQL's is highly unlikely to change drastically.

I don't think we will ever move to rollback segments, but for some
use-cases moving visibility to a separate heap could make sense.

And if we want to bring back time travel (see another thread about
"Flashback Queries"), then we may end up implementing the original
postgresql's design spec and make VACUUM spihon dead tuples over to
archive relations, which already starts looking a little like rollback
segments, only for other purposes :)

> As always, I'd suggest discussing improvements, not the status quo.
> Likewise, discussing Oracle's design, drawbacks, and limitations
> without having used it extensively is quite obvious to anyone familiar
> with Oracle.

Using a system extensively can also create blind spots about some of the
systems (mis)features. One learns to avoid doing some things without
consciously knowing about it.

> Don't get me wrong, it's fine to prefer one design to
> another, but pushing discussion items comparing Oracle to PostgreSQL
> because of things you've heard or read somewhere isn't the same as
> understanding them because you've used them.

To get a really meaningful discussion we should involve someone who has
*designed* them, not merely used them .

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

From: Gregory Stark on 20 Feb 2007 11:28

"Jonah H. Harris" <jonah.harris(a)gmail.com> writes:

> On 2/20/07, Gregory Stark <stark(a)enterprisedb.com> wrote:
>
>> It's more like Oracle is optimized for data that's committed
>> long in the past and we're optimized for data that's
>> been recently updated.
>
> Wrong. When Oracle says it's committed, it's committed. No
> difference between when, where, and how. In Oracle, the committed
> version is *always* the first presented to the user...

Sure, and if it was committed long in the past then you can use it. If it's
committed recently then you'll have to start looking up rollback data instead.

The rest of your post seems to all be predicated on the idea that if data is
committed then that's all you'll need to look at. But that's missing precisely
the point of what I was saying:

>> In a sense then it's the opposite of what we usually say. Oracle is optimized
>> for mostly static data. Postgres is optimized for changing data.

By "changing data" I meant "data in flux", not the action of making changes to
the data.

Looking at data in flux in Oracle -- even other data that's unchanged but
lives on the same page as some record that's in flux -- will require you to
look up rollback data and possibly even have to follow many pages of chained
rollback data. Looking at data in Postgres has no additional overhead when
it's data in flux versus old static data.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

From: August Zajonc on 20 Feb 2007 11:39

Tom Lane wrote:
> August Zajonc <augustz(a)augustz.com> writes:
>> The key is how lightweight the setup could be, which matters because
>> clients are not always willing to pay for a PITR setup. The low overhead
>> would mean you'd feel fine about setting guc to 1hr or so.
>
> This would have exactly the same performance consequences as always
> having an hour-old open transaction. I'm afraid that describing it
> as "low overhead" is mere wishful thinking: it would cripple vacuuming
> of high-update tables and greatly increase the typical load on pg_clog
> and pg_subtrans. We already know that pg_subtrans contention can be a
> source of context-swap storms, with the size of the window back to
> GlobalXmin being the controlling factor for how bad it gets.
>
> It's possible that this last could be addressed by separating the
> concept of "old enough to be vacuumed" from GlobalXmin, but it's
> certainly not a trivial thing.

Isn't globalxmin for open transactions? I thought the idea was that
everything goes as usual, but you can flip a knob and say that vacuum
doesn't vacuum anything more recent then GlobalXmin less x transactions.

Then you can look at your transactions per second and get a rough window
to work within. Or if there are timestamps on commits, that would switch
to a time interval more user friendly.

You end up simply delaying when 1hrs worth of transactions gets
vacuumed. For folks doing nightly cron job vacuums, not too bad.
Autovacuum isn't on by default :)

Of course, this will be clumsy if not per database.

But the thought might be to take advantage of the flashback data already
present under the MVCC model as long as vacuum hasn't hit things (and
being willing to stop activity on a database etc). Given that you are
delaying a vacuum rather then being more aggressive, and know you can
already vacuum up to a more recent transaction xmin, I dunno... Does
anything depend (other then performance) on vacuum actually vacuuming as
far as it can?

- August

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

From: RPK on 20 Feb 2007 13:27

Andrew,

>> Demanding unlimited undo at some time that is arbitrarilly distant in the
>> future strikes me as wholly unreasonable.

I did not mean asking for undo from a life-time log. Since FlashBack
Technology is already there, I just mean that world's most advanced database
(PostgreSQL, as they say), must have an optimized way for undoing of at
least a week changes. A week log is enough and PostgreSQL can keep on
removing old logs automatically.

Secondly, it must be left to the user to decide for the number of days of
archive he want to store. Again upto a week max.

RPK wrote:
>
> I agree that TimeStamp creates an overhead, but I just want to know if an
> accidental update happened to a table and this incident got traced three
> days after, what facility PGSQL provide to bring the table to its original
> condition. You can't wait regretting on why you did not run ROLLBACK
> before
> COMMIT. (Correct me. I am only a user).
>

Why the heck can't you create a reversing transaction? That's what
ordinary mortals do. Demanding unlimited undo at some time that is
arbitrarilly distant in the future strikes me as wholly unreasonable.

What do you mean by "accidental update"? What you really appear to mean is
that a program or a human operator has made an error, and incorrectly told
the database to commit a transaction. The answer surely is to correct the
behaviour of the program or human, rather than wanting the database to
provide an undo facility. Alternatively, this should be handled at the
application layer, using something like table_log.

Some things just don't work well with this sort of facility. Just ask your
bookie if you can undo a bet that you "accidentally" placed with him and
which, three days later, you discover (after the race) was a mistake.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo(a)postgresql.org so that your
message can get through to the mailing list cleanly

--
View this message in context: http://www.nabble.com/New-feature-request%3A-FlashBack-Query-tf3245023.html#a9067564
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

From: Tom Lane on 20 Feb 2007 13:40

RPK <rohitprakash123(a)indiatimes.com> writes:
> I did not mean asking for undo from a life-time log. Since FlashBack
> Technology is already there, I just mean that world's most advanced database
> (PostgreSQL, as they say), must have an optimized way for undoing of at
> least a week changes.

You're living in a dream world. Do you know any Oracle DBs who keep
enough rollback segments to go back a week?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: Piggybacking vacuum I/O
Next: pg_proc without oid?