Prev: [PATCH 6/8] ceph-rbd: osdc support for osd call and rollback operations
Next: [PATCH 5/8] ceph-rbd: refactor mount related functions, add helpers
From: David Cross on 13 Aug 2010 15:10 On Fri, 2010-08-13 at 14:50 -0400, Christoph Hellwig wrote: > On Fri, Aug 13, 2010 at 11:43:12AM -0700, David Cross wrote: > > OK, I am trying to answer all questions on this topic that I am getting, > > but honestly I have not gotten a lot so far. > > Seriously, I think there is absolutely no point in even arguing this. > A driver has absolutely no business looking into any filesystem layout. I think you misunderstand, the driver is not looking into the file system layout. The driver is requesting that the file system allocate a file and tell it where it allocated it to. This is the reason for the fat_get_block call. > > > 1) receive all data, buffer it, then write the file. This is typically a > > slow process USB->Processor->SDRAM->Processor/DMA engine->Media > > 2) pre-allocate the file as soon as it knows that it is coming and how > > big it is, and then send the block addresses to an external DMA engine > > and let it transfer the data from the MTP host directly > > > > The West Bridge driver goes for option two for performance reasons. In > > doing this, it needs to get information from the file system on where to > > store the file. > > And what if someone else changes the layout undernath you? I am not sure I understand the question. The point of making the call into the filesytem rather than reading sectors and "looking into the file system layout" is to make sure that the filesystem is the arbiter of all storage requests. How would someone change the layout underneath me? > Or uses > a different filesystem? That is a good question, I think it is the same one that Greg KH had. To restate the answer in the last email, I don't know how to handle this case. I am sure that it is possible to develop a method to pre-allocate using other filesystems, but I have not looked into it yet as the driver was developed for systems which have removable storage. Removable storage needs to work with card readers, as such it uses FAT. > Basically you will have to introduce operations to the VFS to lock down > a file against layout changes and do DMA transfers. It's fine if you > only implement it for fat in the beginning, and just return an error > for others, although in general the implementation would be easily > extendable to other filesystems using generic code and the get_blocks > callbacks. So, your basic concern is whether or not someone else tries to write to the same file while the transfer is ongoing, correct? I understand the implementation idea you are proposing to get around this issue, but it seems that the simplest solution is to punt to the application handling the transfer (ie don't access the pre-allocated file until you know the transfer is complete). This is the basic idea of how it has been done in previous implementations, so I am not sure why this is not a potential solution in this case. Thanks, David --------------------------------------------------------------- This message and any attachments may contain Cypress (or its subsidiaries) confidential information. If it has been received in error, please advise the sender and immediately delete this message. --------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Cross on 13 Aug 2010 15:20 On Fri, 2010-08-13 at 12:01 -0700, Greg KH wrote: > On Fri, Aug 13, 2010 at 11:43:12AM -0700, David Cross wrote: > > On Fri, 2010-08-13 at 10:54 -0700, Greg KH wrote: > > > On Fri, Aug 13, 2010 at 10:45:39AM -0700, David Cross wrote: > > > > Hello Hirofumi, > > > > I would like to export this symbol from the vfat code and add this patch > > > > to the Linux kernel. You are listed as the MAINTAINER for FAT and VFAT. > > > > As such, I need your approval to do it. > > > > The reason that I need to export this symbol is to allow for file based > > > > DMA in which the file needs to be pre-allocated. The pre-allocated block > > > > addresses are passed to a DMA engine which then directly transfers the > > > > data to non-volatile storage. > > > > Please let me know if you have any objections to exporting this symbol > > > > or if you need additional information from me concerning this change > > > > request. I am happy to answer any questions that you may have about this > > > > in the meantime. > > > > > > Wait, _I_ have an objection to exporting this symbol. > > > > > > I really don't think it is needed at this point in time, as we really > > > need to figure out exactly what your driver is doing to want to need to > > > call this function. > > > > OK, I am trying to answer all questions on this topic that I am getting, > > but honestly I have not gotten a lot so far. > > That's because not many people have noticed the code yet :) > > > > So please, let's not export it for now, when we get to the point that we > > > all agree that this driver is doing the "correct" thing, then we can > > > export the symbol, ok? > > > > Sure, I think that the driver is doing the correct thing, but that is > > mostly because I have not found another way to solve this issue. > > What specifically is the issue here that you are trying to solve? We are trying to solve the issue of all data needing to go through the processor, even when its ultimate destination is non-volatile media. > And why has no other driver ever needed this type of functionality > before? I am not sure, but my guess is because this represents a novel solution to a performance problem. > > > thanks, > > > > >Also, why do this at the FAT level, and not more correctly at the block > > > layer level? > > The short answer is that the block level does not have the file system > > awareness needed to determine where a file should be stored on the NVM > > device. > > The longer answer is to consider the case of an MTP host, which operates > > at the file level. It sends the mobile (Linux) device "SendObject" which > > tells it to store the file on its file system. When the device receives > > this, it has two options: > > 1) receive all data, buffer it, then write the file. This is typically a > > slow process USB->Processor->SDRAM->Processor/DMA engine->Media > > 2) pre-allocate the file as soon as it knows that it is coming and how > > big it is, and then send the block addresses to an external DMA engine > > and let it transfer the data from the MTP host directly > > We have a userspace MTP driver for Linux, using gadgetfs, right? So > none of this is applicable from what I can tell. Yes, the g_mtp development has started, but it is not integrated yet last I checked. Most of the applications for this driver have used gadgetfs as well in order to handle the protocol itself. So, I think it is applicable. > > The West Bridge driver goes for option two for performance reasons. In > > doing this, it needs to get information from the file system on where to > > store the file. > > Look at how Linux already handles MTP for how to handle this properly, I > don't think there is any need to do any of this from within the kernel. I somewhat familiar with how Linux handles MTP. The current model is CPU-centric and all data goes through the main processor from what I have seen. This is a working solution, but not always a very fast one. I agree though that this would not need to be done within the kernel if we had a userspace method for file allocation and commitment. > > >What happens if this isn't a FAT partition on the >device? > > Good question. So far, it has been stored on a FAT partition in most use > > cases because the user typically wants the option to enumerate the > > device as mass storage as well or be able to see content on a PC if the > > SD card is removed. However, there is no reason that this could not be > > done with ext2 or other filesystems on non-removable media. > > Like NTFS? How are you going to handle that when you run out of size > due to the limitations of FAT? Hint, that's not going to work with this > type of solution... Isn't this also a userspace problem? When I run out of space on my Linux machine, the message "no space left on device" pops up. Why is this solution any more prone to size limitations compared with any other? > > This would require some work to figure out how to implement a > > pre-allocation interface with it. Alternatively a more general > > pre-allocation interface could be added (falloc()) to implemented file > > systems to make this process easier. > > Userspace handles this quite easily already, see above. I don't see how userspace handles this from the comments above. I do understand that there is a userspace MTP driver, but I don't see a method for pre-allocation of files from the information above. Am I missing something? Thanks, David --------------------------------------------------------------- This message and any attachments may contain Cypress (or its subsidiaries) confidential information. If it has been received in error, please advise the sender and immediately delete this message. --------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Greg KH on 13 Aug 2010 15:30 On Fri, Aug 13, 2010 at 12:17:39PM -0700, David Cross wrote: > On Fri, 2010-08-13 at 12:01 -0700, Greg KH wrote: > > On Fri, Aug 13, 2010 at 11:43:12AM -0700, David Cross wrote: > > > On Fri, 2010-08-13 at 10:54 -0700, Greg KH wrote: > > > > On Fri, Aug 13, 2010 at 10:45:39AM -0700, David Cross wrote: > > > > > Hello Hirofumi, > > > > > I would like to export this symbol from the vfat code and add this patch > > > > > to the Linux kernel. You are listed as the MAINTAINER for FAT and VFAT. > > > > > As such, I need your approval to do it. > > > > > The reason that I need to export this symbol is to allow for file based > > > > > DMA in which the file needs to be pre-allocated. The pre-allocated block > > > > > addresses are passed to a DMA engine which then directly transfers the > > > > > data to non-volatile storage. > > > > > Please let me know if you have any objections to exporting this symbol > > > > > or if you need additional information from me concerning this change > > > > > request. I am happy to answer any questions that you may have about this > > > > > in the meantime. > > > > > > > > Wait, _I_ have an objection to exporting this symbol. > > > > > > > > I really don't think it is needed at this point in time, as we really > > > > need to figure out exactly what your driver is doing to want to need to > > > > call this function. > > > > > > OK, I am trying to answer all questions on this topic that I am getting, > > > but honestly I have not gotten a lot so far. > > > > That's because not many people have noticed the code yet :) > > > > > > So please, let's not export it for now, when we get to the point that we > > > > all agree that this driver is doing the "correct" thing, then we can > > > > export the symbol, ok? > > > > > > Sure, I think that the driver is doing the correct thing, but that is > > > mostly because I have not found another way to solve this issue. > > > > What specifically is the issue here that you are trying to solve? > We are trying to solve the issue of all data needing to go through the > processor, even when its ultimate destination is non-volatile media. > > > And why has no other driver ever needed this type of functionality > > before? > I am not sure, but my guess is because this represents a novel solution to > a performance problem. What exactly are the performance issues with doing this from userspace, vs. the FAT hack? > > We have a userspace MTP driver for Linux, using gadgetfs, right? So > > none of this is applicable from what I can tell. > Yes, the g_mtp development has started, but it is not integrated yet > last I checked. Most of the applications for this driver have used > gadgetfs as well in order to handle the protocol itself. So, I think it > is applicable. No, there's another MTP stack already released that works just fine on Linux. You can find it at: http://wiki.meego.com/Buteo > > > The West Bridge driver goes for option two for performance reasons. In > > > doing this, it needs to get information from the file system on where to > > > store the file. > > > > Look at how Linux already handles MTP for how to handle this properly, I > > don't think there is any need to do any of this from within the kernel. > I somewhat familiar with how Linux handles MTP. The current model is > CPU-centric and all data goes through the main processor from what I > have seen. This is a working solution, but not always a very fast one. I > agree though that this would not need to be done within the kernel if we > had a userspace method for file allocation and commitment. Again, what's wrong with using the processor here? What else does it have to do at this point in time? > > > >What happens if this isn't a FAT partition on the >device? > > > Good question. So far, it has been stored on a FAT partition in most use > > > cases because the user typically wants the option to enumerate the > > > device as mass storage as well or be able to see content on a PC if the > > > SD card is removed. However, there is no reason that this could not be > > > done with ext2 or other filesystems on non-removable media. > > > > Like NTFS? How are you going to handle that when you run out of size > > due to the limitations of FAT? Hint, that's not going to work with this > > type of solution... > Isn't this also a userspace problem? When I run out of space on my Linux machine, > the message "no space left on device" pops up. Why is this solution any > more prone to size limitations compared with any other? No, my point is that for larger disk sizes, you can't use FAT, you have to use NTFS to be interoperable with other operating systems. Your solution will not handle that jump to larger storage sizes as you are relying on FAT. > > > This would require some work to figure out how to implement a > > > pre-allocation interface with it. Alternatively a more general > > > pre-allocation interface could be added (falloc()) to implemented file > > > systems to make this process easier. > > > > Userspace handles this quite easily already, see above. > I don't see how userspace handles this from the comments above. I do > understand that there is a userspace MTP driver, but I don't see a > method for pre-allocation of files from the information above. Am I > missing something? Yes, the pre-allocation is done in userspace, and then the data is copied to the filesystem then. The kernel doesn't have to have any filesystem specific hacks in it to try to handle this at all. Take a look at the above link for what you might want to do instead. Because of this, I'm guessing that a lot of this code can be removed from the driver, right? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Cross on 13 Aug 2010 16:40 On Fri, 2010-08-13 at 12:28 -0700, Greg KH wrote: > On Fri, Aug 13, 2010 at 12:17:39PM -0700, David Cross wrote: > > On Fri, 2010-08-13 at 12:01 -0700, Greg KH wrote: > > > On Fri, Aug 13, 2010 at 11:43:12AM -0700, David Cross wrote: > > > > On Fri, 2010-08-13 at 10:54 -0700, Greg KH wrote: > > > > > On Fri, Aug 13, 2010 at 10:45:39AM -0700, David Cross wrote: > > > > > > Hello Hirofumi, > > > > > > I would like to export this symbol from the vfat code and add this patch > > > > > > to the Linux kernel. You are listed as the MAINTAINER for FAT and VFAT. > > > > > > As such, I need your approval to do it. > > > > > > The reason that I need to export this symbol is to allow for file based > > > > > > DMA in which the file needs to be pre-allocated. The pre-allocated block > > > > > > addresses are passed to a DMA engine which then directly transfers the > > > > > > data to non-volatile storage. > > > > > > Please let me know if you have any objections to exporting this symbol > > > > > > or if you need additional information from me concerning this change > > > > > > request. I am happy to answer any questions that you may have about this > > > > > > in the meantime. > > > > > > > > > > Wait, _I_ have an objection to exporting this symbol. > > > > > > > > > > I really don't think it is needed at this point in time, as we really > > > > > need to figure out exactly what your driver is doing to want to need to > > > > > call this function. > > > > > > > > OK, I am trying to answer all questions on this topic that I am getting, > > > > but honestly I have not gotten a lot so far. > > > > > > That's because not many people have noticed the code yet :) > > > > > > > > So please, let's not export it for now, when we get to the point that we > > > > > all agree that this driver is doing the "correct" thing, then we can > > > > > export the symbol, ok? > > > > > > > > Sure, I think that the driver is doing the correct thing, but that is > > > > mostly because I have not found another way to solve this issue. > > > > > > What specifically is the issue here that you are trying to solve? > > We are trying to solve the issue of all data needing to go through the > > processor, even when its ultimate destination is non-volatile media. > > > > > And why has no other driver ever needed this type of functionality > > > before? > > I am not sure, but my guess is because this represents a novel solution to > > a performance problem. > > What exactly are the performance issues with doing this from userspace, > vs. the FAT hack? Usually it takes a lot longer. West Bridge can do MTP transfers at the performance of the storage device. Sending the file data through the processor is typically much slower. > > > We have a userspace MTP driver for Linux, using gadgetfs, right? So > > > none of this is applicable from what I can tell. > > Yes, the g_mtp development has started, but it is not integrated yet > > last I checked. Most of the applications for this driver have used > > gadgetfs as well in order to handle the protocol itself. So, I think it > > is applicable. > > No, there's another MTP stack already released that works just fine on > Linux. You can find it at: > http://wiki.meego.com/Buteo Thanks, I have seen this as well. This is not a driver though, it is an MTP protocol stack. This is similar to the applications I have worked with. The driver is not attempting to replace either the protocol stack or the use of gadgetfs. All that it is providing is a gadget peripheral controller driver (that can be used with gadgetfs) along with the ability to perform pre-allocation and allow for direct transfer. I re-checked this stack once again to make sure that it had not fundamentally changed and it seems not to have. What it uses is a storageserver abstraction to the file system. At the low level this is still operating on files at the level of open(), read(), write(), close(). There is no alloc() in the list that I can see. So, I agree that there is a working stack. As you can tell, the driver is not attempting to re-create or replace this working stack. > > > > The West Bridge driver goes for option two for performance reasons. In > > > > doing this, it needs to get information from the file system on where to > > > > store the file. > > > > > > Look at how Linux already handles MTP for how to handle this properly, I > > > don't think there is any need to do any of this from within the kernel. > > I somewhat familiar with how Linux handles MTP. The current model is > > CPU-centric and all data goes through the main processor from what I > > have seen. This is a working solution, but not always a very fast one. I > > agree though that this would not need to be done within the kernel if we > > had a userspace method for file allocation and commitment. > > Again, what's wrong with using the processor here? What else does it > have to do at this point in time? Judging by the current batch of Android phones: run a video conference, update a users twitter page, take high resolution photographs, get live stock updates via desktop widget, receive a phone call, play back Youtube, stream Pandora, manage media content, post a new profile picture on facebook, get corporate email, etc. I am sure we can both come up with many more examples. > > > > >What happens if this isn't a FAT partition on the >device? > > > > Good question. So far, it has been stored on a FAT partition in most use > > > > cases because the user typically wants the option to enumerate the > > > > device as mass storage as well or be able to see content on a PC if the > > > > SD card is removed. However, there is no reason that this could not be > > > > done with ext2 or other filesystems on non-removable media. > > > > > > Like NTFS? How are you going to handle that when you run out of size > > > due to the limitations of FAT? Hint, that's not going to work with this > > > type of solution... > > Isn't this also a userspace problem? When I run out of space on my Linux machine, > > the message "no space left on device" pops up. Why is this solution any > > more prone to size limitations compared with any other? > > No, my point is that for larger disk sizes, you can't use FAT, you have > to use NTFS to be interoperable with other operating systems. Your > solution will not handle that jump to larger storage sizes as you are > relying on FAT. This is so far not an issue for removable media. Do I really need to handle NTFS interoperability now? If so, do you agree with Christoph's feedback concerning the implementation? Could I add hooks to other file systems and leave them unpopulated? > > > > This would require some work to figure out how to implement a > > > > pre-allocation interface with it. Alternatively a more general > > > > pre-allocation interface could be added (falloc()) to implemented file > > > > systems to make this process easier. > > > > > > Userspace handles this quite easily already, see above. > > I don't see how userspace handles this from the comments above. I do > > understand that there is a userspace MTP driver, but I don't see a > > method for pre-allocation of files from the information above. Am I > > missing something? > > Yes, the pre-allocation is done in userspace, and then the data is > copied to the filesystem then. The kernel doesn't have to have any > filesystem specific hacks in it to try to handle this at all. Where do you see pre-allocation done in the Buteo MTP stack? Looking at the implementation, it appears to be allocated during write wherein a data buffer and pointer is passed in. > Take a look at the above link for what you might want to do instead. > Because of this, I'm guessing that a lot of this code can be removed > from the driver, right? If there were a user space method to pre-allocate the file, it would definitely trim down the ioctls in the gadget driver. Instead of pre-allocating the file, we would just need to send down the physical block numbers for the transfer destination. I am still not seeing where this user space method exists though. Thanks, David --------------------------------------------------------------- This message and any attachments may contain Cypress (or its subsidiaries) confidential information. If it has been received in error, please advise the sender and immediately delete this message. --------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Greg KH on 13 Aug 2010 18:20
On Fri, Aug 13, 2010 at 01:32:15PM -0700, David Cross wrote: > > > > What exactly are the performance issues with doing this from userspace, > > vs. the FAT hack? > Usually it takes a lot longer. West Bridge can do MTP transfers at the > performance of the storage device. Sending the file data through the > processor is typically much slower. What is "slower" here? Please, real numbers. > > > > We have a userspace MTP driver for Linux, using gadgetfs, right? So > > > > none of this is applicable from what I can tell. > > > Yes, the g_mtp development has started, but it is not integrated yet > > > last I checked. Most of the applications for this driver have used > > > gadgetfs as well in order to handle the protocol itself. So, I think it > > > is applicable. > > > > No, there's another MTP stack already released that works just fine on > > Linux. You can find it at: > > http://wiki.meego.com/Buteo > Thanks, I have seen this as well. This is not a driver though, it is an > MTP protocol stack. With a gadgetfs driver underneath, right? Or am I missing a piece here? > This is similar to the applications I have worked > with. The driver is not attempting to replace either the protocol stack > or the use of gadgetfs. All that it is providing is a gadget peripheral > controller driver (that can be used with gadgetfs) along with the > ability to perform pre-allocation and allow for direct transfer. It's that "pre-allocation" that is the issue. > I re-checked this stack once again to make sure that it had not > fundamentally changed and it seems not to have. What it uses is a > storageserver abstraction to the file system. At the low level this is > still operating on files at the level of open(), read(), write(), > close(). There is no alloc() in the list that I can see. So, I agree > that there is a working stack. As you can tell, the driver is not > attempting to re-create or replace this working stack. To "preallocate" a file, just open it and then mmap it and write away, right? Why can't userspace do that? > > > > > The West Bridge driver goes for option two for performance reasons. In > > > > > doing this, it needs to get information from the file system on where to > > > > > store the file. > > > > > > > > Look at how Linux already handles MTP for how to handle this properly, I > > > > don't think there is any need to do any of this from within the kernel. > > > I somewhat familiar with how Linux handles MTP. The current model is > > > CPU-centric and all data goes through the main processor from what I > > > have seen. This is a working solution, but not always a very fast one. I > > > agree though that this would not need to be done within the kernel if we > > > had a userspace method for file allocation and commitment. > > > > Again, what's wrong with using the processor here? What else does it > > have to do at this point in time? > Judging by the current batch of Android phones: run a video > conference, update a users twitter page, take high resolution > photographs, get live stock updates via desktop widget, receive a > phone call, play back Youtube, stream Pandora, manage media content, > post a new profile picture on facebook, get corporate email, etc. All while trying to transfer a file to the device over the USB connection? There's no reason you can't slow down the transfer if the user is doing something else, right? > I am sure we can both come up with many more examples. I still fail to see the use-case, and as you haven't backed it up with any real numbers, that's an issue. > > > > > >What happens if this isn't a FAT partition on the >device? > > > > > Good question. So far, it has been stored on a FAT partition in most use > > > > > cases because the user typically wants the option to enumerate the > > > > > device as mass storage as well or be able to see content on a PC if the > > > > > SD card is removed. However, there is no reason that this could not be > > > > > done with ext2 or other filesystems on non-removable media. > > > > > > > > Like NTFS? How are you going to handle that when you run out of size > > > > due to the limitations of FAT? Hint, that's not going to work with this > > > > type of solution... > > > Isn't this also a userspace problem? When I run out of space on my Linux machine, > > > the message "no space left on device" pops up. Why is this solution any > > > more prone to size limitations compared with any other? > > > > No, my point is that for larger disk sizes, you can't use FAT, you have > > to use NTFS to be interoperable with other operating systems. Your > > solution will not handle that jump to larger storage sizes as you are > > relying on FAT. > This is so far not an issue for removable media. Do I really need to > handle NTFS interoperability now? You can't create something that will not work for all filesystems. > If so, do you agree with Christoph's feedback concerning the > implementation? Could I add hooks to other file systems and leave them > unpopulated? ntfs is done by using a FUSE filesystem in userspace on a "raw" block device. You can't put that type of support in the kernel here :) > > Yes, the pre-allocation is done in userspace, and then the data is > > copied to the filesystem then. The kernel doesn't have to have any > > filesystem specific hacks in it to try to handle this at all. > Where do you see pre-allocation done in the Buteo MTP stack? Looking at > the implementation, it appears to be allocated during write wherein a > data buffer and pointer is passed in. And that's all that is needed, right? > > Take a look at the above link for what you might want to do instead. > > Because of this, I'm guessing that a lot of this code can be removed > > from the driver, right? > If there were a user space method to pre-allocate the file, it would > definitely trim down the ioctls in the gadget driver. open a file, seek to the end, then mmap the whole thing. That's how userspace has been doing this for a long time, right? I'm sure there are other ways of doing it as well. > Instead of pre-allocating the file, we would just need to send down > the physical block numbers for the transfer destination. I am still > not seeing where this user space method exists though. Ick, no, you would neve send down physical block numbers. Look at how filesystems work from userspace, they achieve _very_ fast speeds due to mmap and friends. Heck, some people try to get the OS out of the way entirely by just using direct I/O, or taking to the raw block device. Not by trying to allocate raw filesystem blocks from userspace, that way lies madness. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |