From: Tom Lane on
Greg Stark <gsstark(a)mit.edu> writes:
> So I think we have a bigger problem than just copydir.c. It seems to
> me we should be fsyncing the table space data directories on every
> checkpoint.

Is there any evidence that anyone anywhere has ever lost data because
of a lack of directory fsyncs? I sure don't recall any bug reports
that seem to match that theory.

It seems to me that we're talking about a huge hit in both code
complexity and performance to deal with a problem that doesn't actually
occur in the field; and which furthermore is trivially solved on any
modern filesystem by choosing the right filesystem options. Why don't
we just document those options, instead?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andres Freund on
On Sunday 14 February 2010 18:11:39 Tom Lane wrote:
> Greg Stark <gsstark(a)mit.edu> writes:
> > So I think we have a bigger problem than just copydir.c. It seems to
> > me we should be fsyncing the table space data directories on every
> > checkpoint.
>
> Is there any evidence that anyone anywhere has ever lost data because
> of a lack of directory fsyncs? I sure don't recall any bug reports
> that seem to match that theory.
I have actually seen the issue during create database at least. In a
virtualized hw though...
~1GB template database, lots and lots of small tables, the crash occured maybe
a minute after CREATE DB, filesystem was xfs, kernel 2.6.30.y.

> It seems to me that we're talking about a huge hit in both code
> complexity and performance to deal with a problem that doesn't actually
> occur in the field; and which furthermore is trivially solved on any
> modern filesystem by choosing the right filesystem options. Why don't
> we just document those options, instead?
Which options would that be? I am not aware that there any for any of the
recent linux filesystems.
Well, except "sync" that is, but that sure would be more of a performance hit
than fsyncing the directory...

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Andres Freund <andres(a)anarazel.de> writes:
> On Sunday 14 February 2010 18:11:39 Tom Lane wrote:
>> It seems to me that we're talking about a huge hit in both code
>> complexity and performance to deal with a problem that doesn't actually
>> occur in the field; and which furthermore is trivially solved on any
>> modern filesystem by choosing the right filesystem options. Why don't
>> we just document those options, instead?

> Which options would that be? I am not aware that there any for any of the
> recent linux filesystems.

Shouldn't journaling of metadata be sufficient?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Sun, Feb 14, 2010 at 10:31 AM, Greg Stark <gsstark(a)mit.edu> wrote:
> On Sun, Feb 14, 2010 at 2:03 PM, Greg Stark <gsstark(a)mit.edu> wrote:
>> On Fri, Feb 12, 2010 at 3:49 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>>> Greg Stark, have you managed to get your access issues sorted out?  If
>>
>> Yep, will look at this today.
>
> So I think we have a bigger problem than just copydir.c. It seems to
> me we should be fsyncing the table space data directories on every
> checkpoint. Otherwise any newly created relations or removed relations
> could disappear even though the data in them was fsynced. I'm thinking
> I should add an _mdfd_opentblspc(reln) call which returns a file
> descriptor for the tablespace and have mdsync() use that to sync the
> directory whenever it fsyncs a relation. It would be nice to remember
> which tablespaces have been fsynced and only fsync them once though,
> that would need another hash table just for tablespaces.
>
> We probably also need to fsync the pg_xlog directory every time we
> create or rename an xlog segment.
>
> Are there any other places we do directory operations which we need to
> be permanent?

I agree with Tom that we need to see some actual reproducible test
cases where this is an issue before we go too crazy with it. In
theory what you're talking about could also happen when extending a
relation, if we extend into a new file; but I think we need to
convince ourselves that it really happens before we make any more
changes.

On a pragmatic note, if this does turn out to be a problem, it's a
bug: and we can and do fix bugs whenever we discover them. But the
other part of this patch - to speed up createdb - is a feature - and
we are very rapidly running out of time for 9.0 features. So I'd like
to vote for getting the feature part of this committed (assuming it's
in good shape, of course) and we can continue to investigate the other
issues but without quite as much urgency.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andres Freund on
On Sunday 14 February 2010 21:57:08 Robert Haas wrote:
> On Sun, Feb 14, 2010 at 10:31 AM, Greg Stark <gsstark(a)mit.edu> wrote:
> > On Sun, Feb 14, 2010 at 2:03 PM, Greg Stark <gsstark(a)mit.edu> wrote:
> >> On Fri, Feb 12, 2010 at 3:49 PM, Robert Haas <robertmhaas(a)gmail.com>
wrote:
> >>> Greg Stark, have you managed to get your access issues sorted out? If
> >>
> >> Yep, will look at this today.
> >
> > So I think we have a bigger problem than just copydir.c. It seems to
> > me we should be fsyncing the table space data directories on every
> > checkpoint. Otherwise any newly created relations or removed relations
> > could disappear even though the data in them was fsynced. I'm thinking
> > I should add an _mdfd_opentblspc(reln) call which returns a file
> > descriptor for the tablespace and have mdsync() use that to sync the
> > directory whenever it fsyncs a relation. It would be nice to remember
> > which tablespaces have been fsynced and only fsync them once though,
> > that would need another hash table just for tablespaces.
> >
> > We probably also need to fsync the pg_xlog directory every time we
> > create or rename an xlog segment.
> >
> > Are there any other places we do directory operations which we need to
> > be permanent?
>
> I agree with Tom that we need to see some actual reproducible test
> cases where this is an issue before we go too crazy with it. In
> theory what you're talking about could also happen when extending a
> relation, if we extend into a new file; but I think we need to
> convince ourselves that it really happens before we make any more
> changes.
Ok, will try to reproduce.

> On a pragmatic note, if this does turn out to be a problem, it's a
> bug: and we can and do fix bugs whenever we discover them. But the
> other part of this patch - to speed up createdb - is a feature - and
> we are very rapidly running out of time for 9.0 features. So I'd like
> to vote for getting the feature part of this committed (assuming it's
> in good shape, of course) and we can continue to investigate the other
> issues but without quite as much urgency.
Sound sensible.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers