From: Bruce Momjian on
Jesper Krogh wrote:
> On 2010-05-18 21:56, Bruce Momjian wrote:
> > Jesper Krogh wrote:
> >
> >> On 2010-05-18 20:52, Bruce Momjian wrote:
> >>
> >>> This line above looks very odd because I didn't think the template0
> >>> datfrozenxid could be advanced. Can I see the output of this query:
> >>>
> >>> SELECT datname, datfrozenxid, datallowconn FROM pg_database;
> >>>
> >>>
> >>>
> >> Only from the "old" database:
> >> data=# SELECT datname, datfrozenxid, datallowconn FROM pg_database;
> >> datname | datfrozenxid | datallowconn
> >> -----------+--------------+--------------
> >> template0 | 2073823552 | f
> >> postgres | 2023820521 | t
> >> data | 2023782337 | t
> >> jk | 2023822188 | t
> >> template1 | 2073823552 | t
> >> workqueue | 2023822188 | t
> >> (6 rows)
> >>
> > OK, datallowconn = false is right for template0, but I am still confused
> > how it got set to that high value.
> >
>
> This is the "production system". I have absolutely no indications that
> anything should be wrong in there. It has run rock-solid since it got
> migrated (dump/restore) to 8.4 for about 7 months now. So I am a bit
> scared about you telling that it seems wrong. (but that cannot be
> attributed to pg_upgrade)

I am on chat with Alvaro now and it seems we do somehow connect to
template0 for transaction id wraparound. I think Alvaro will post
shortly on this.

> > OK, thanks. This does seem odd. Frankly, having template0's
> > datfrozenxid be wrong would not cause any kind of instability because
> > template0 is used only by pg_dump, so I am wondering if something else
> > is seriously wrong.
> >
> I also think that something was seriously wrong with the pg_upgrade'd
> version. I'll try to reproduce and be a bit more carefull in tracking
> the steps
> this time.

Thanks, but I think the entire problem might be this template0 xid issue
that Alvaro and I are researching. I can now see how invalid template0
xids could cause the instability you saw in the new database. Odd no
one has seen this bug before.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on
Bruce Momjian wrote:
> > This is the "production system". I have absolutely no indications that
> > anything should be wrong in there. It has run rock-solid since it got
> > migrated (dump/restore) to 8.4 for about 7 months now. So I am a bit
> > scared about you telling that it seems wrong. (but that cannot be
> > attributed to pg_upgrade)
>
> I am on chat with Alvaro now and it seems we do somehow connect to
> template0 for transaction id wraparound. I think Alvaro will post
> shortly on this.
>
> > > OK, thanks. This does seem odd. Frankly, having template0's
> > > datfrozenxid be wrong would not cause any kind of instability because
> > > template0 is used only by pg_dump, so I am wondering if something else
> > > is seriously wrong.
> > >
> > I also think that something was seriously wrong with the pg_upgrade'd
> > version. I'll try to reproduce and be a bit more carefull in tracking
> > the steps
> > this time.
>
> Thanks, but I think the entire problem might be this template0 xid issue
> that Alvaro and I are researching. I can now see how invalid template0
> xids could cause the instability you saw in the new database. Odd no
> one has seen this bug before.

OK, after talking to Alvaro and Heikki, the problem is that while you
cannot connect to template0, it is accessed by autovacuum for vacuum
freeze, even if autovacuum is turned off. I think the reason you are
seeing this bug is that your xid counter is near 2 billion (50% to
wraparound) and the original template0 xids are the maximum distance
from your counter.

I am attaching the newest patch which fixes this issue. I did modify
this code yesterday with another patch, and I am unclear exactly if you
need that patch as well. CVS now has all these changes.

If you could test with this and the earlier patch, I think it will now
work fine. Thanks for the valuable testing, and quick feedback.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
From: Bruce Momjian on
Jesper Krogh wrote:
> On 2010-05-18 18:57, Bruce Momjian wrote:
> > jesper(a)krogh.cc wrote:
> >
> >> Hi
> >>
> >> I tried running pg_upgrade from the current snapshot of postgresql and
> >> upgrading from 8.4.4 to the snapshot version. Everything seem to look fine
> >> in the process and all that came out was only "ok's" but when I tried a
> >> simple query on the databse it keeps throwing these message out of the back
> >> side.
> >>
> >> DETAIL: You might have already suffered transaction-wraparound data loss.
> >> WARNING: some databases have not been vacuumed in over 2 billion
> >> transactions
> >>
> >>
> >> The database was around 600GB and it took a couple of minutes to run
> >> pg_upgrade after I had all the binaries in the correct place.
> >>
> >> It is not really an easy task to throw around 600GB of data, so I cannot
> >> gaurantee that the above is reproducible, but I'll see if I can get time
> >> and try to reproduce it.
> >>
> > This certainly should never have happened, so I am guessing it is a bug.
> > pg_upgrade tries hard to make sure all your datfrozenxid and
> > relfrozenxid are properly migrated from the old server, and the
> > transaction id is set properly. Unfortunately this is the first time I
> > have heard of such a problem, so I am unclear on its cause.
> >
>
> Other people are typically way faster than I am looking into it.
> Depesz has produced a full trace to reproduce the problem here:
> http://www.depesz.com/index.php/2010/05/19/waiting-for-9-0-pg_upgrade/

Thanks. I have commented on the blog to mention that we have fixed the
bug reported there.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers