VFS: fix recent breakage of FS_REVAL

Prev: perf, trace: Remove IRQ-disable from perf/tracepoint interaction
Next: [PATCH] md: fix raid6test build error

From: Al Viro on 24 May 2010 08:00

On Mon, May 24, 2010 at 04:57:56PM +1000, Neil Brown wrote:
>
> Commit 1f36f774b22a0ceb7dd33eca626746c81a97b6a5 broke FS_REVAL_DOT semantics.
>
> In particular, before this patch, the command
> ls -l
> in an NFS mounted directory would always check if the directory on the server
> had changed and if so would flush and refill the pagecache for the dir.
> After this patch, the same "ls -l" will repeatedly return stale date until
> the cached attributes for the directory time out.
>
> The following patch fixes this by ensuring the d_revalidate is called by
> do_last when "." is being looked-up.
> link_path_walk has already called d_revalidate, but in that case LOOKUP_OPEN
> is not set so nfs_lookup_verify_inode chooses not to do any validation.
>
> The following patch restores the original behaviour.
>
> Cc: stable(a)kernel.org
> Signed-off-by: NeilBrown <neilb(a)suse.de>

Applied, but I really don't like the way you do it; note that e.g. foo/bar/.
gets that revalidation as well, for no good reason. If anything, shouldn't
we handle that thing in the _beginning_ of pathname resolution, not in
the end? For now it'd do, and it's a genuine regression, but...

BTW, here's a question for nfs client folks: is it true that for any two
pathnames on _client_ resolving to pairs (mnt1, dentry) and (mnt2, dentry)
resp., nfs_devname(mnt1, dentry, ...) and nfs_devname(mnt2, dentry, ...)
should yield the strings that do not differ past the ':' (i.e. that the
only possible difference is going to be in spelling the server name)?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Al Viro on 24 May 2010 12:00

On Mon, May 24, 2010 at 12:59:03PM +0100, Al Viro wrote:

> BTW, here's a question for nfs client folks: is it true that for any two
> pathnames on _client_ resolving to pairs (mnt1, dentry) and (mnt2, dentry)
> resp., nfs_devname(mnt1, dentry, ...) and nfs_devname(mnt2, dentry, ...)
> should yield the strings that do not differ past the ':' (i.e. that the
> only possible difference is going to be in spelling the server name)?

Actually, there's a related one: suppose we have two mounts from the same
server, with the same flags, etc., ending up sharing a dentry on client.
What will we get from GETATTR asking for fs_locations, in fs_root field?

Can an nfs4 server e.g. have /x/y being a symlink that resolves to /a/b and
allow mounting of both /x/y/c and /a/b/c? Which path would it return to
client that has mounted both, walked to some referral point and called
nfs_do_refmount(), triggering nfs4_proc_fs_locations()?

Trond, Neil?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Trond Myklebust on 24 May 2010 12:30

On Mon, 2010-05-24 at 16:50 +0100, Al Viro wrote:
> On Mon, May 24, 2010 at 12:59:03PM +0100, Al Viro wrote:
>
> > BTW, here's a question for nfs client folks: is it true that for any two
> > pathnames on _client_ resolving to pairs (mnt1, dentry) and (mnt2, dentry)
> > resp., nfs_devname(mnt1, dentry, ...) and nfs_devname(mnt2, dentry, ...)
> > should yield the strings that do not differ past the ':' (i.e. that the
> > only possible difference is going to be in spelling the server name)?
>
> Actually, there's a related one: suppose we have two mounts from the same
> server, with the same flags, etc., ending up sharing a dentry on client.
> What will we get from GETATTR asking for fs_locations, in fs_root field?
>
> Can an nfs4 server e.g. have /x/y being a symlink that resolves to /a/b and
> allow mounting of both /x/y/c and /a/b/c? Which path would it return to
> client that has mounted both, walked to some referral point and called
> nfs_do_refmount(), triggering nfs4_proc_fs_locations()?
>
> Trond, Neil?

When mounting /x/y/c in your example above, the NFSv4 protocol requires
the client itself to resolve the symlink, and then walk down /a/b/c
(looking up component by component), so it will in practice not see
anything other than /a/b/c.

If it walks down to a referral, and then calls nfs_do_refmount, it will
do the same thing: obtain a path /e/f/g on the new server, and then walk
down that component by component while resolving any symlinks and/or
referrals that it crosses in the process.

Cheers
Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Al Viro on 24 May 2010 12:50

On Mon, May 24, 2010 at 12:21:22PM -0400, Trond Myklebust wrote:
> > Can an nfs4 server e.g. have /x/y being a symlink that resolves to /a/b and
> > allow mounting of both /x/y/c and /a/b/c? Which path would it return to
> > client that has mounted both, walked to some referral point and called
> > nfs_do_refmount(), triggering nfs4_proc_fs_locations()?
> >
> > Trond, Neil?
>
> When mounting /x/y/c in your example above, the NFSv4 protocol requires
> the client itself to resolve the symlink, and then walk down /a/b/c
> (looking up component by component), so it will in practice not see
> anything other than /a/b/c.
>
> If it walks down to a referral, and then calls nfs_do_refmount, it will
> do the same thing: obtain a path /e/f/g on the new server, and then walk
> down that component by component while resolving any symlinks and/or
> referrals that it crosses in the process.

Ho-hum... What happens if the same fs is mounted twice on server? I.e.
have ext2 from /dev/sda1 mounted on /a and /b on server, then on the client
do mount -t nfs foo:/a /tmp/a; mount -t nfs foo:/b /tmp/b. Which path
would we get from GETATTR with fs_locations requested, if we do it for
/tmp/a/x and /tmp/b/x resp.? Dentry will be the same, since fsid would
match.

Or would the server refuse to export things that way?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Trond Myklebust on 24 May 2010 13:10

On Mon, 2010-05-24 at 17:47 +0100, Al Viro wrote:
> On Mon, May 24, 2010 at 12:21:22PM -0400, Trond Myklebust wrote:
> > > Can an nfs4 server e.g. have /x/y being a symlink that resolves to /a/b and
> > > allow mounting of both /x/y/c and /a/b/c? Which path would it return to
> > > client that has mounted both, walked to some referral point and called
> > > nfs_do_refmount(), triggering nfs4_proc_fs_locations()?
> > >
> > > Trond, Neil?
> >
> > When mounting /x/y/c in your example above, the NFSv4 protocol requires
> > the client itself to resolve the symlink, and then walk down /a/b/c
> > (looking up component by component), so it will in practice not see
> > anything other than /a/b/c.
> >
> > If it walks down to a referral, and then calls nfs_do_refmount, it will
> > do the same thing: obtain a path /e/f/g on the new server, and then walk
> > down that component by component while resolving any symlinks and/or
> > referrals that it crosses in the process.
>
> Ho-hum... What happens if the same fs is mounted twice on server? I.e.
> have ext2 from /dev/sda1 mounted on /a and /b on server, then on the client
> do mount -t nfs foo:/a /tmp/a; mount -t nfs foo:/b /tmp/b. Which path
> would we get from GETATTR with fs_locations requested, if we do it for
> /tmp/a/x and /tmp/b/x resp.? Dentry will be the same, since fsid would
> match.
>
> Or would the server refuse to export things that way?

I believe that the answer is that most filehandle types include an
encoding of the inode number of the export directory. In other words, as
long as '/a' and '/b' are different directories, then they will result
in the generation of different filehandles for /a/x and /b/x.

It seems that is not always the case, though. According to the
definition of mk_fsid(), it looks as if the 'FSID_UUID8' and
'FSID_UUID16' filehandle types only encode the uuid of the filesystem,
and have no inode information. They will therefore not be able to
distinguish between an export through '/a' or '/b'.

Neil, Bruce am I right?

Cheers
Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3
Prev: perf, trace: Remove IRQ-disable from perf/tracepoint interaction
Next: [PATCH] md: fix raid6test build error

VFS: fix recent breakage of FS_REVAL_DOT