block: fix leaks associated with discard request payload [Kernel]

Prev: [PATCH] vhost: break out of polling loop on error
Next: block: defer the use of inline biovecs for discard requests

From: Martin K. Petersen on 28 Jun 2010 13:20

>>>>> "James" == James Bottomley <James.Bottomley(a)suse.de> writes:

James> I really hate these growing contortions for discard. They're a
James> clear signal that we haven't implemented it right.

James> So let's first work out how it should be done. I really like
James> Tomo's idea of doing discard through the normal REQ_TYPE_FS
James> route, which means we can control the setup in prep and the tear
James> down in done, all confined to the ULD.

Yeah, this is what I was trying to do a couple of months ago. Trying to
make discard and write same filesystem class requests so we can split,
merge, etc. like READs and WRITEs. I still think this is how we should
do it but it's a lot of work.

There are several challenges involved. I was doing the "payload"
allocation at request allocation time by permitting a buffer trailing
struct request (size defined by ULD depending on req type). However, we
have a few places in the stack where we memcpy requests and assume them
to be the same size. That needs to be fixed. That's also the roadblock
I ran into wrt. 32-byte CDB allocation so for that I ended up allocating
the command in sd.

Also, another major headache of mine is WRITE SAME/UNMAP to DSM TRIM
conversion. Because of the limitations of the TRIM command format a
single WRITE SAME can turn into effectively hundreds of TRIM commands to
be issued. I tried to limit this by using UNMAP translation instead.
But we can still get into cases where we need to either loop or allocate
a bunch of TRIMs in the translation layer. That leaves two options:
Either pass really conservative limits up the stack and loop up there.
Or deal with the allocation/translation stuff at the bottom of the pile.
None of my attempts in these departments turned out to be very nice.
I'm still dreaming of the day where libata moves out from under SCSI so
we don't have to translate square pegs into round holes...

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Christoph Hellwig on 30 Jun 2010 04:50

On Wed, Jun 30, 2010 at 11:32:43AM +0300, Boaz Harrosh wrote:
> May I ask a silly question? Why the dynamic allocation?
>
> Why not have a const-static single global page at the block-layer somewhere
> that will be used for all discard-type operations and be done with it once and
> for all. A single page can be used for any size bio , any number of concurrent
> discards, any ZERO needed operation. It can also be used by other operations
> like padding and others. In fact isn't there one for the libsata padding?

for UNMAP we need to write into the payload. And for ATA TRIM we need
to write into the WRITE SAME payload. That's another layering violation
for those looking for them, btw..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Christoph Hellwig on 30 Jun 2010 06:50

On Wed, Jun 30, 2010 at 01:25:01PM +0300, Boaz Harrosh wrote:
> OK, Thanks, I see. Is it one of these operations, (like we have in OSD) where
> the CDB information spills into the payload? like the scatter-gather and extent
> lists and such.

For UNMAP the payload is a list of block number / length pairs, while
the CDB itself doesn't contain any information like that. It's a rather
awkward command.

> Do we actually use a WRITE_SAME which is not zero? for what use?

The kernel doesn't issue any WRITE SAME without the unmap bit set.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev |
Pages: 1 2
Prev: [PATCH] vhost: break out of polling loop on error
Next: block: defer the use of inline biovecs for discard requests