Prev: [PATCH] net/Makefile: conditionally descend to wireless and ieee802154
Next: [PATCH -mm 1/2] scsi: remove dma_is_consistent usage in 53c700
From: Boaz Harrosh on 30 Jun 2010 06:30 On 06/30/2010 11:42 AM, Christoph Hellwig wrote: > On Wed, Jun 30, 2010 at 11:32:43AM +0300, Boaz Harrosh wrote: >> May I ask a silly question? Why the dynamic allocation? >> >> Why not have a const-static single global page at the block-layer somewhere >> that will be used for all discard-type operations and be done with it once and >> for all. A single page can be used for any size bio , any number of concurrent >> discards, any ZERO needed operation. It can also be used by other operations >> like padding and others. In fact isn't there one for the libsata padding? > > for UNMAP we need to write into the payload. And for ATA TRIM we need > to write into the WRITE SAME payload. OK, Thanks, I see. Is it one of these operations, (like we have in OSD) where the CDB information spills into the payload? like the scatter-gather and extent lists and such. Do we actually use a WRITE_SAME which is not zero? for what use? > That's another layering violation > for those looking for them, btw.. > Agreed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Boaz Harrosh on 30 Jun 2010 07:00 On 06/30/2010 01:41 PM, Christoph Hellwig wrote: > On Wed, Jun 30, 2010 at 01:25:01PM +0300, Boaz Harrosh wrote: >> OK, Thanks, I see. Is it one of these operations, (like we have in OSD) where >> the CDB information spills into the payload? like the scatter-gather and extent >> lists and such. > > For UNMAP the payload is a list of block number / length pairs, while > the CDB itself doesn't contain any information like that. It's a rather > awkward command. > How big can that be? could we, maybe, use the sense_buffer, properly allocated already? >> Do we actually use a WRITE_SAME which is not zero? for what use? > > The kernel doesn't issue any WRITE SAME without the unmap bit set. So if the unmap bit is set then the page can just be zero, right? I still think a static zero-page is a worth while optimization. And block-drivers can take care with special needs with a private mem_pool or something. For the discard-type user and generic block layer the page is just an implementation specific residue, No? But don't mind me, I'm just babbling. Not that I'll do anything about it. Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: FUJITA Tomonori on 30 Jun 2010 08:00 On Mon, 28 Jun 2010 17:25:36 +0200 Christoph Hellwig <hch(a)lst.de> wrote: > On Mon, Jun 28, 2010 at 05:14:28PM +0900, FUJITA Tomonori wrote: > > > While I see the problems with leaking ressources in that case I still > > > can't quite explain the hang I see. > > > > Any way to reproduce the hang without ssd drives? > > Actually the SSDs don't fully hang, they just causes lots of I/O errors > and hit the error handler hard. The hard hang is when running under > qemu. Apply the patch below, then create an if=scsi drive that resides > on an XFS filesystem, and you'll have scsi TP support in the guest: Ok, I figured out what's wrong. As I suspected, it's due to the partial completion. qemu scsi driver tells that the WRITE_SAME command was successful but somehow the command has resid. So we retry it again and again (and leak some memory). I don't know yet why qemu scsi driver is broken. Maybe there is a bug in it or converting discard to FS sends broken commands to the driver. I'll try to figure out it tomorrow. I've put a patch to complete discard command in the all-or-nothing manner: git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git discard At least, the guest kernel doesn't hang for me. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Mike Snitzer on 30 Jun 2010 08:20 On Wed, Jun 30 2010 at 6:57am -0400, Boaz Harrosh <bharrosh(a)panasas.com> wrote: > On 06/30/2010 01:41 PM, Christoph Hellwig wrote: > > On Wed, Jun 30, 2010 at 01:25:01PM +0300, Boaz Harrosh wrote: > >> OK, Thanks, I see. Is it one of these operations, (like we have in OSD) where > >> the CDB information spills into the payload? like the scatter-gather and extent > >> lists and such. > > > > For UNMAP the payload is a list of block number / length pairs, while > > the CDB itself doesn't contain any information like that. It's a rather > > awkward command. > > > > How big can that be? could we, maybe, use the sense_buffer, properly allocated > already? > > >> Do we actually use a WRITE_SAME which is not zero? for what use? > > > > The kernel doesn't issue any WRITE SAME without the unmap bit set. > > So if the unmap bit is set then the page can just be zero, right? > > I still think a static zero-page is a worth while optimization. And > block-drivers can take care with special needs with a private mem_pool > or something. For the discard-type user and generic block layer the > page is just an implementation specific residue, No? Why should the block layer have any role in managing this page? Block layer doesn't care about it, SCSI does. Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: FUJITA Tomonori on 1 Jul 2010 00:30
On Wed, 30 Jun 2010 20:55:09 +0900 FUJITA Tomonori <fujita.tomonori(a)lab.ntt.co.jp> wrote: > On Mon, 28 Jun 2010 17:25:36 +0200 > Christoph Hellwig <hch(a)lst.de> wrote: > > > On Mon, Jun 28, 2010 at 05:14:28PM +0900, FUJITA Tomonori wrote: > > > > While I see the problems with leaking ressources in that case I still > > > > can't quite explain the hang I see. > > > > > > Any way to reproduce the hang without ssd drives? > > > > Actually the SSDs don't fully hang, they just causes lots of I/O errors > > and hit the error handler hard. The hard hang is when running under > > qemu. Apply the patch below, then create an if=scsi drive that resides > > on an XFS filesystem, and you'll have scsi TP support in the guest: > > Ok, I figured out what's wrong. > > As I suspected, it's due to the partial completion. > > qemu scsi driver tells that the WRITE_SAME command was successful but > somehow the command has resid. So we retry it again and again (and > leak some memory). > > I don't know yet why qemu scsi driver is broken. Maybe there is a bug > in it or converting discard to FS sends broken commands to the driver. looks like your qemu WRITE_SAME patch isn't completed :) You implement WRITE_SAME as if it doesn't do any data transfer. So qemu scsi driver gets resid. The reason why WRITE_SAME works now is that scsi-ml doesn't care about resid with PC commands but it cares with FS commands. I confirmed that qemu scsi driver gets the identical command with both PC and FS commands and qemu calls xfsctl. > I've put a patch to complete discard command in the all-or-nothing > manner: > > git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git discard Seems that I finished discard FS conversion. I'll update it on the top of James' uprep patchset soon. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |