Prev: linux-next: build failure after merge of the final tree (rr tree related)
Next: Generic name to handle and open by handle syscalls
From: Neil Brown on 7 Jul 2010 18:30 On Wed, 7 Jul 2010 10:45:11 -0400 "J. Bruce Fields" <bfields(a)fieldses.org> wrote: > On Wed, Jul 07, 2010 at 03:35:50PM +0200, Miklos Szeredi wrote: > > On Wed, 7 Jul 2010, J. Bruce Fields wrote: > > > > > If you use sys or proc, is it possible to get the uuid from a file > > > > > descriptor or pathname without races? > > > > > > > > You can do stat/fstat to find out the device number (which is unique, > > > > but not persistent) > > > > > > Is it really unique over time? (Can't a given st_dev value map to one > > > filesystem now, and another later?) > > > > It's unique at a single point in time. But if you have a reference > > (e.g. open file descriptor) on the mount then that's not a problem. > > > > fd = open(path, ...); > > fstat(fd, &st); > > search st.st_dev in mountinfo > > close(fd) > > > > is effectively the same as an getuuid(path) syscall (lazy unmounted > > filesystems will not be found in mountinfo, but the reference is still > > there so st_dev will not be reused for other filesystems). > > OK, cool. > > That still leaves the problem that there isn't always an underlying > block device, and/or when there is it doesn't always uniquely specify > the filesystem. It doesn't matter if there is an underlying block device, or if it is shared among subvolmes. st_dev is *the* primary key for filesystems. Every "struct super_block" has a unquie s_dev and that is returned in st_dev. For "traditional" filesystem, this is the major/minor number of the block device. For NFS and btrfs and other filesystems which don't have exclusive use of a block device, 'set_anon_super' is used to get a unique s_dev based on a major number of '0'. So you can *always* use st_dev as an identifier for the filesystem which is stable and unique as long as you hold an active reference to the filesystem (open file descriptor, cwd in fs, etc). If you poll(2) /proc/mounts to get notifications of changes to the mount table, then it should be quite easy to cache st-dev -> uuid mappings in a race-free way. There might be value in getting name_to_handle to return the st_dev of the target file to ensure that you haven't unexepected crossed into a different filesystem. I would prefer that to returning a uuid: st_dev is guaranteed to be unique, a uuid is only supposed to be unique (i.e. that is not enforced). NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: J. Bruce Fields on 7 Jul 2010 18:30 On Thu, Jul 08, 2010 at 08:21:43AM +1000, Neil Brown wrote: > On Wed, 7 Jul 2010 10:45:11 -0400 > "J. Bruce Fields" <bfields(a)fieldses.org> wrote: > > > On Wed, Jul 07, 2010 at 03:35:50PM +0200, Miklos Szeredi wrote: > > > It's unique at a single point in time. But if you have a reference > > > (e.g. open file descriptor) on the mount then that's not a problem. > > > > > > fd = open(path, ...); > > > fstat(fd, &st); > > > search st.st_dev in mountinfo > > > close(fd) > > > > > > is effectively the same as an getuuid(path) syscall (lazy unmounted > > > filesystems will not be found in mountinfo, but the reference is still > > > there so st_dev will not be reused for other filesystems). > > > > OK, cool. > > > > That still leaves the problem that there isn't always an underlying > > block device, and/or when there is it doesn't always uniquely specify > > the filesystem. > > It doesn't matter if there is an underlying block device, or if it is shared > among subvolmes. > st_dev is *the* primary key for filesystems. Every "struct super_block" has a > unquie s_dev and that is returned in st_dev. > > For "traditional" filesystem, this is the major/minor number of the block > device. > For NFS and btrfs and other filesystems which don't have exclusive use of a > block device, 'set_anon_super' is used to get a unique s_dev based on a major > number of '0'. Whoops, OK, thanks for the explanation. --b. > So you can *always* use st_dev as an identifier for the filesystem which is > stable and unique as long as you hold an active reference to the filesystem > (open file descriptor, cwd in fs, etc). > > If you poll(2) /proc/mounts to get notifications of changes to the mount > table, then it should be quite easy to cache st-dev -> uuid mappings in a > race-free way. > > There might be value in getting name_to_handle to return the st_dev of the > target file to ensure that you haven't unexepected crossed into a different > filesystem. I would prefer that to returning a uuid: st_dev is guaranteed > to be unique, a uuid is only supposed to be unique (i.e. that is not > enforced). > > NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Neil Brown on 8 Jul 2010 01:10 On Wed, 7 Jul 2010 18:03:36 -0600 Andreas Dilger <andreas.dilger(a)oracle.com> wrote: > On 2010-07-07, at 16:21, Neil Brown wrote: > > It doesn't matter if there is an underlying block device, or if it is shared > > among subvolmes. st_dev is *the* primary key for filesystems. Every "struct super_block" has a unique s_dev and that is returned in st_dev. > > > > For "traditional" filesystem, this is the major/minor number of the block > > device. > > For NFS and btrfs and other filesystems which don't have exclusive use of a > > block device, 'set_anon_super' is used to get a unique s_dev based on a major > > number of '0'. > > But the major/minor number returned is essentially random between different clients, so there is no way to use it on another node that is accessing the same filesystem. Conversely, the UUID will be the same on all of the clients. > > > So you can *always* use st_dev as an identifier for the filesystem which is > > stable and unique as long as you hold an active reference to the filesystem > > (open file descriptor, cwd in fs, etc). > > Only on a single system. Well the system call only runs on a single system. If you want a cluster-unique name, get the cluster software to generate it or enforce it. Performing a mapping is not hard. > > > If you poll(2) /proc/mounts to get notifications of changes to the mount > > table, then it should be quite easy to cache st-dev -> uuid mappings in a > > race-free way. > > This sounds unpleasant for any application to implement. It might be OK for a user-space NFS/CIFS server, but it is complex and error-prone for any normal usage, and doesn't seem like a good API design to me. Define "normal usage" for filehandle-based lookup ??? Identifing a filesystem by st_dev is completely reliable. That is a good start for API design. Identifing by UUID is not unless uniqueness is enforced, and .... > > > There might be value in getting name_to_handle to return the st_dev of the > > target file to ensure that you haven't unexepected crossed into a different > > filesystem. I would prefer that to returning a uuid: st_dev is guaranteed > > to be unique, a uuid is only supposed to be unique (i.e. that is not > > enforced). > > UUID duplication (w.r.t. multiple mounts of the same underlying device) doesn't matter at all for regular file opens, where the only interest is getting a handle for the inode. I wouldn't be against requiring the UUID be unique if that was needed, or failing regular opens in the rare case that there is a non-unique UUID pointing to different devices, or failing directory opens for the case of multiple mountpoints. It has already been said that requiring uuids to be unique breaks current practice (involving mounting dm snapshots of active filesystems). Failing legitimate syscalls in rare circumstances sounds like bad API design to me. NeilBrown > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Neil Brown on 8 Jul 2010 08:30
On Thu, 08 Jul 2010 16:10:09 +0530 "Aneesh Kumar K. V" <aneesh.kumar(a)linux.vnet.ibm.com> wrote: > On Thu, 8 Jul 2010 08:21:43 +1000, Neil Brown <neilb(a)suse.de> wrote: > > On Wed, 7 Jul 2010 10:45:11 -0400 > > "J. Bruce Fields" <bfields(a)fieldses.org> wrote: > > > > > On Wed, Jul 07, 2010 at 03:35:50PM +0200, Miklos Szeredi wrote: > > > > On Wed, 7 Jul 2010, J. Bruce Fields wrote: > > > > > > > If you use sys or proc, is it possible to get the uuid from a file > > > > > > > descriptor or pathname without races? > > > > > > > > > > > > You can do stat/fstat to find out the device number (which is unique, > > > > > > but not persistent) > > > > > > > > > > Is it really unique over time? (Can't a given st_dev value map to one > > > > > filesystem now, and another later?) > > > > > > > > It's unique at a single point in time. But if you have a reference > > > > (e.g. open file descriptor) on the mount then that's not a problem. > > > > > > > > fd = open(path, ...); > > > > fstat(fd, &st); > > > > search st.st_dev in mountinfo > > > > close(fd) > > > > > > > > is effectively the same as an getuuid(path) syscall (lazy unmounted > > > > filesystems will not be found in mountinfo, but the reference is still > > > > there so st_dev will not be reused for other filesystems). > > > > > > OK, cool. > > > > > > That still leaves the problem that there isn't always an underlying > > > block device, and/or when there is it doesn't always uniquely specify > > > the filesystem. > > > > It doesn't matter if there is an underlying block device, or if it is shared > > among subvolmes. > > st_dev is *the* primary key for filesystems. Every "struct super_block" has a > > unquie s_dev and that is returned in st_dev. > > > > For "traditional" filesystem, this is the major/minor number of the block > > device. > > For NFS and btrfs and other filesystems which don't have exclusive use of a > > block device, 'set_anon_super' is used to get a unique s_dev based on a major > > number of '0'. > > > > So you can *always* use st_dev as an identifier for the filesystem which is > > stable and unique as long as you hold an active reference to the filesystem > > (open file descriptor, cwd in fs, etc). > > > > If you poll(2) /proc/mounts to get notifications of changes to the mount > > table, then it should be quite easy to cache st-dev -> uuid mappings in a > > race-free way. > > > > There might be value in getting name_to_handle to return the st_dev of the > > target file to ensure that you haven't unexepected crossed into a different > > filesystem. I would prefer that to returning a uuid: st_dev is guaranteed > > to be unique, a uuid is only supposed to be unique (i.e. that is not > > enforced). > > How about adding mnt_id to the handle ? Documentation file says it is > unique > > (1) mount ID: unique identifier of the mount (may be reused after umount) > > I also updated (/proc/self/mountinfo) to carry the optional uuid field > With the below patch i get in /proc/self/mountinfo > > 13 1 253:0 / / rw,relatime,uuid:9b5af62a-a34a-43f6-a5bb-1cc22d97e862 - ext3 /dev/root rw,errors=continue,barrier=0,data=writeback > > And the handle returns the value 13 in mnt_id field. We should able to > lookup mountinfo with mnt_id and find the corresponding uuid. > That sounds good. mnt_id will even let you know if you have crossed a --bind mount, which st_dev wouldn't. That may not always be useful, but it is good to have it. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |