Prev: Generic name to handle and open by handle syscalls
Next: input: evdev: Convert to dynamic event buffer (rev3)
From: Aneesh Kumar K. V on 7 Jul 2010 12:40 On Wed, 7 Jul 2010 10:45:11 -0400, "J. Bruce Fields" <bfields(a)fieldses.org> wrote: > On Wed, Jul 07, 2010 at 03:35:50PM +0200, Miklos Szeredi wrote: > > On Wed, 7 Jul 2010, J. Bruce Fields wrote: > > > > > If you use sys or proc, is it possible to get the uuid from a file > > > > > descriptor or pathname without races? > > > > > > > > You can do stat/fstat to find out the device number (which is unique, > > > > but not persistent) > > > > > > Is it really unique over time? (Can't a given st_dev value map to one > > > filesystem now, and another later?) > > > > It's unique at a single point in time. But if you have a reference > > (e.g. open file descriptor) on the mount then that's not a problem. > > > > fd = open(path, ...); > > fstat(fd, &st); > > search st.st_dev in mountinfo > > close(fd) > > > > is effectively the same as an getuuid(path) syscall (lazy unmounted > > filesystems will not be found in mountinfo, but the reference is still > > there so st_dev will not be reused for other filesystems). > > OK, cool. > > That still leaves the problem that there isn't always an underlying > block device, and/or when there is it doesn't always uniquely specify > the filesystem. > And for this reason we would need this as a syscall right ? -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andreas Dilger on 7 Jul 2010 13:10 On 2010-07-07, at 09:05, J. Bruce Fields wrote: > On Wed, Jul 07, 2010 at 01:40:53AM -0600, Andreas Dilger wrote: >> On 2010-07-06, at 11:09, Aneesh Kumar K. V wrote: >>> Since we know that system wide file handle should include a file system >>> identifier and a file identifier my plan was to retrieve both in the >>> same syscall. >> >> Won't having it be in a separate system call be racy w.r.t. doing the pathname lookup twice? > > It'll be rare that a server will want to *just* get a filehandle; > normally it will at least want to get some attributes at the same time. > So I think it will always need to open the file first and then do the > rest of the operations on the returned filehandle. I think you are assuming too much about the use of the file handle. What I'm interested in is not a userspace file server, but rather a more efficient way to have 10000's to millions of clients to be able to open the same regular file, without having to do full path traversal for each one. >>> That still leaves the problem that there isn't always an underlying >>> block device, and/or when there is it doesn't always uniquely specify >>> the filesystem. >> >> And for this reason we would need this as a syscall right ? > > That's the only solution I see. (Or use an xattr?) Or... return the UUID as part of the file handle in the first place. That avoids races, avoids adding more syscalls that have to be called for each file handle, or IMNSHO the worst proposal that requires applications to parse a text file in some obscure path for each file handle (requiring a stat() to find the major/minor device of the file, walking through /proc or /sys, and other nastiness). Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Aneesh Kumar K. V on 7 Jul 2010 14:20 On Wed, 7 Jul 2010 11:02:47 -0600, Andreas Dilger <andreas.dilger(a)oracle.com> wrote: > On 2010-07-07, at 09:05, J. Bruce Fields wrote: > > On Wed, Jul 07, 2010 at 01:40:53AM -0600, Andreas Dilger wrote: > >> On 2010-07-06, at 11:09, Aneesh Kumar K. V wrote: > >>> Since we know that system wide file handle should include a file system > >>> identifier and a file identifier my plan was to retrieve both in the > >>> same syscall. > >> > >> Won't having it be in a separate system call be racy w.r.t. doing the pathname lookup twice? > > > > It'll be rare that a server will want to *just* get a filehandle; > > normally it will at least want to get some attributes at the same time. > > So I think it will always need to open the file first and then do the > > rest of the operations on the returned filehandle. > > I think you are assuming too much about the use of the file handle. > What I'm interested in is not a userspace file server, but rather a > more efficient way to have 10000's to millions of clients to be able > to open the same regular file, without having to do full path > traversal for each one. With the suggested syscall approach we can do on the client that does the path traversal. fd = open(name) file_identifier = fd_to_handle(fd); fs_identifier = fd_to_fshandle(fd); close(fd); > > >>> That still leaves the problem that there isn't always an underlying > >>> block device, and/or when there is it doesn't always uniquely specify > >>> the filesystem. > >> > >> And for this reason we would need this as a syscall right ? > > > > That's the only solution I see. (Or use an xattr?) > > Or... return the UUID as part of the file handle in the first place. > That avoids races, avoids adding more syscalls that have to be called > for each file handle, or IMNSHO the worst proposal that requires > applications to parse a text file in some obscure path for each file > handle (requiring a stat() to find the major/minor device of the file, > walking through /proc or /sys, and other nastiness). I would also like to get both file system identifier and file identifier in a single call. That would also imply instead of the above sequence of 4 calls, we can do file_handle = name_to_handle(name); -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andreas Dilger on 8 Jul 2010 00:40 On 2010-07-07, at 12:05, Nick Piggin wrote: > On Wed, Jul 07, 2010 at 11:02:47AM -0600, Andreas Dilger wrote: >> I think you are assuming too much about the use of the file handle. What I'm interested in is not a userspace file server, but rather a more efficient way to have 10000's to millions of clients to be able to open the same regular file, without having to do full path traversal for each one. > > Really? What kind of clients? What sort of speedups do you hope to see? > Path traversal can get vastly cheaper in both single threaded and parallel > cases with my locking changes. This is for Lustre clients, but really any kind of network filesystem is equally affected. This isn't really an issue of the local dcache performance, but rather network latency for each component of the path traversal, and for a large number of clients the metadata server is the bottleneck for doing the traversal. > It is not acceptable to work around fixable deficiencies in our critical > infrastructure like path walking with hacks like this. If path walking > is still much too expensive, that's another story... Two different problems, I'm afraid. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Aneesh Kumar K. V on 8 Jul 2010 06:50 On Thu, 8 Jul 2010 08:21:43 +1000, Neil Brown <neilb(a)suse.de> wrote: > On Wed, 7 Jul 2010 10:45:11 -0400 > "J. Bruce Fields" <bfields(a)fieldses.org> wrote: > > > On Wed, Jul 07, 2010 at 03:35:50PM +0200, Miklos Szeredi wrote: > > > On Wed, 7 Jul 2010, J. Bruce Fields wrote: > > > > > > If you use sys or proc, is it possible to get the uuid from a file > > > > > > descriptor or pathname without races? > > > > > > > > > > You can do stat/fstat to find out the device number (which is unique, > > > > > but not persistent) > > > > > > > > Is it really unique over time? (Can't a given st_dev value map to one > > > > filesystem now, and another later?) > > > > > > It's unique at a single point in time. But if you have a reference > > > (e.g. open file descriptor) on the mount then that's not a problem. > > > > > > fd = open(path, ...); > > > fstat(fd, &st); > > > search st.st_dev in mountinfo > > > close(fd) > > > > > > is effectively the same as an getuuid(path) syscall (lazy unmounted > > > filesystems will not be found in mountinfo, but the reference is still > > > there so st_dev will not be reused for other filesystems). > > > > OK, cool. > > > > That still leaves the problem that there isn't always an underlying > > block device, and/or when there is it doesn't always uniquely specify > > the filesystem. > > It doesn't matter if there is an underlying block device, or if it is shared > among subvolmes. > st_dev is *the* primary key for filesystems. Every "struct super_block" has a > unquie s_dev and that is returned in st_dev. > > For "traditional" filesystem, this is the major/minor number of the block > device. > For NFS and btrfs and other filesystems which don't have exclusive use of a > block device, 'set_anon_super' is used to get a unique s_dev based on a major > number of '0'. > > So you can *always* use st_dev as an identifier for the filesystem which is > stable and unique as long as you hold an active reference to the filesystem > (open file descriptor, cwd in fs, etc). > > If you poll(2) /proc/mounts to get notifications of changes to the mount > table, then it should be quite easy to cache st-dev -> uuid mappings in a > race-free way. > > There might be value in getting name_to_handle to return the st_dev of the > target file to ensure that you haven't unexepected crossed into a different > filesystem. I would prefer that to returning a uuid: st_dev is guaranteed > to be unique, a uuid is only supposed to be unique (i.e. that is not > enforced). How about adding mnt_id to the handle ? Documentation file says it is unique (1) mount ID: unique identifier of the mount (may be reused after umount) I also updated (/proc/self/mountinfo) to carry the optional uuid field With the below patch i get in /proc/self/mountinfo 13 1 253:0 / / rw,relatime,uuid:9b5af62a-a34a-43f6-a5bb-1cc22d97e862 - ext3 /dev/root rw,errors=continue,barrier=0,data=writeback And the handle returns the value 13 in mnt_id field. We should able to lookup mountinfo with mnt_id and find the corresponding uuid. diff --git a/fs/namespace.c b/fs/namespace.c index 88058de..498bd9a 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -871,6 +871,9 @@ static int show_mountinfo(struct seq_file *m, void *v) if (IS_MNT_UNBINDABLE(mnt)) seq_puts(m, " unbindable"); + /* print the uuid */ + seq_printf(m, ",uuid:%pU", mnt->mnt_sb->s_uuid); + /* Filesystem specific data */ seq_puts(m, " - "); show_type(m, sb); diff --git a/fs/open.c b/fs/open.c index 23d05d3..13d426e 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1092,6 +1092,8 @@ static long do_sys_name_to_handle(struct path *path, handle_size *= sizeof(u32); handle->handle_type = retval; handle->handle_size = handle_size; + /* copy the mount id */ + handle->mnt_id = path->mnt->mnt_id; if (handle_size > f_handle.handle_size) { /* * set the handle_size to zero so we copy only diff --git a/include/linux/fs.h b/include/linux/fs.h index ffcb9bf..5f43472 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -952,6 +952,7 @@ struct file { }; struct file_handle { + int mnt_id; int handle_size; int handle_type; /* file identifier */ -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: Generic name to handle and open by handle syscalls Next: input: evdev: Convert to dynamic event buffer (rev3) |