From: Greg Stark on
On Mon, Feb 1, 2010 at 8:54 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
>> Disallow catalog relocation inside subtransactions, to avoid having
>> to handle subxact abort effects on the local-map-changes state.
>> This could be implemented if desired, but doesn't seem worth it
>> at least in first pass.
>
> Agreed, not needed for emergency maintenance actions like this.

Note that this would mean it will never work if you have psql's
ROLLBACK_ON_ERROR set.


--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes:
> Tom Lane wrote:
>> Once the updated map file is moved into place, the relocation is effectively
>> committed even if we subsequently abort the transaction. We can make that
>> window pretty narrow but not remove it completely.

> We could include the instructions to update the map file in the commit
> record, instead of introducing a new record type, and update the map
> file only *after* writing the commit record. The map file doesn't grow,
> so we can be pretty confident that updating it doesn't fail (failure
> would lead to PANIC).

> I'm assuming the map file is fixed size, with a fixed location for each
> relation, so that we can just overwrite the old file without the
> create+rename dance, and not worry about torn-pages.

That seems too fragile to me, as I don't find it a stretch at all to
think that writing the map file might fail --- just think Windows
antivirus code :-(. Now, once we have written the WAL record for
the mapfile change, we can't really afford a failure in my approach
either. But I think a rename() after successfully creating/writing/
fsync'ing a temp file is a whole lot safer than writing from a standing
start.

The other problem with what you sketch is that it'd require holding the
mapfile write lock across commit, because we still have to have strict
serialization of updates.

[ thinks for awhile ... ] OTOH, overwrite-in-place is what we've always
used for pg_control updates, and I don't recall ever seeing a report of
a problem that could be traced to that. Maybe we should forget the
rename() trick and overwrite the map file in place. I still think it
needs to be a separate WAL record though. I'm thinking

* obtain lock
* open file for read/write
* read current contents
* construct modified contents
* write and sync WAL record
* write back file through already-opened descriptor
* fsync
* release lock

Not totally clear if this is more or less safe than the rename method;
but given the assumption that the file is less than one disk block,
it should be just as atomic as pg_control updates are.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Greg Stark <gsstark(a)mit.edu> writes:
> On Mon, Feb 1, 2010 at 8:54 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
>>> Disallow catalog relocation inside subtransactions, to avoid having
>>> to handle subxact abort effects on the local-map-changes state.
>>> This could be implemented if desired, but doesn't seem worth it
>>> at least in first pass.
>>
>> Agreed, not needed for emergency maintenance actions like this.

> Note that this would mean it will never work if you have psql's
> ROLLBACK_ON_ERROR set.

VACUUM has always failed in such a case, so I don't see this as a
showstopper.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Mon, 2010-02-01 at 10:06 -0500, Tom Lane wrote:

> the assumption that the file is less than one disk block,
> it should be just as atomic as pg_control updates are.

IIRC there were 173 relations affected by this. 4 bytes each we would
have more than 512 bytes.

ISTM you need to treat some of those system relations just as normal
relations rather than give everybody a map entry.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Simon Riggs <simon(a)2ndQuadrant.com> writes:
> On Mon, 2010-02-01 at 10:06 -0500, Tom Lane wrote:
>> the assumption that the file is less than one disk block,
>> it should be just as atomic as pg_control updates are.

> IIRC there were 173 relations affected by this. 4 bytes each we would
> have more than 512 bytes.

Where in the world did you get that number?

There are currently 29 shared relations (counting indexes), and 13
nailed local relations, which would go into a different map file.
I'm not sure if the set of local catalogs requiring the map treatment
is exactly the same as what's presently nailed, but that's probably
a good approximation.

At 8 bytes each (OID + relfilenode), we could fit 64 map entries in
a standard disk sector --- let's say 63 so there's room for a header.
So, barring more-than-doubling of the number of shared catalogs,
there's not going to be a problem.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers