mm: iommu: An API to unify IOMMU, CPU and device memory management [Kernel]

Prev: Provide special async page fault handler when async PF capability is detected
Next: [PATCH] struct io panic on raid1 - Re: block: unify flags for struct bio and struct request will kernel panic

From: Russell King - ARM Linux on 7 Jul 2010 19:10

On Wed, Jul 07, 2010 at 03:44:27PM -0700, Zach Pfeffer wrote:
> The DMA API handles the allocation and use of DMA channels. It can
> configure physical transfer settings, manage scatter-gather lists,
> etc.

You're confused about what the DMA API is. You're talking about
the DMA engine subsystem (drivers/dma) not the DMA API (see
Documentation/DMA-API.txt, include/linux/dma-mapping.h, and
arch/arm/include/asm/dma-mapping.h)

> The VCM allows all device buffers to be passed between all devices in
> the system without passing those buffers through each domain's
> API. This means that instead of writing code to interoperate between
> DMA engines, IOMMU mapped spaces, CPUs and physically addressed
> devices the user can simply target a device with a buffer using the
> same API regardless of how that device maps or otherwise accesses the
> buffer.

With the DMA API, if we have a SG list which refers to the physical
pages (as a struct page, offset, length tuple), the DMA API takes
care of dealing with CPU caches and IOMMUs to make the data in the
buffer visible to the target device. It provides you with a set of
cookies referring to the SG lists, which may be coalesced if the
IOMMU can do so.

If you have a kernel virtual address, the DMA API has single buffer
mapping/unmapping functions to do the same thing, and provide you
with a cookie to pass to the device to refer to that buffer.

These cookies are whatever the device needs to be able to access
the buffer - for instance, if system SDRAM is located at 0xc0000000
virtual, 0x80000000 physical and 0x40000000 as far as the DMA device
is concerned, then the cookie for a buffer at 0xc0000000 virtual will
be 0x40000000 and not 0x80000000.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Russell King - ARM Linux on 14 Jul 2010 18:10

On Wed, Jul 14, 2010 at 01:11:49PM -0700, Zach Pfeffer wrote:
> If the DMA-API contained functions to allocate virtual space separate
> from physical space and reworked how chained buffers functioned it
> would probably work - but then things start to look like the VCM API
> which does graph based map management.

Every additional virtual mapping of a physical buffer results in
additional cache aliases on aliasing caches, and more workload for
developers to sort out the cache aliasing issues.

What does VCM to do mitigate that?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Russell King - ARM Linux on 15 Jul 2010 05:00

On Wed, Jul 14, 2010 at 06:29:58PM -0700, Zach Pfeffer wrote:
> The VCM ensures that all mappings that map a given physical buffer:
> IOMMU mappings, CPU mappings and one-to-one device mappings all map
> that buffer using the same (or compatible) attributes. At this point
> the only attribute that users can pass is CACHED. In the absence of
> CACHED all accesses go straight through to the physical memory.

So what you're saying is that if I have a buffer in kernel space
which I already have its virtual address, I can pass this to VCM and
tell it !CACHED, and it'll setup another mapping which is not cached
for me?

You are aware that multiple V:P mappings for the same physical page
with different attributes are being outlawed with ARMv6 and ARMv7
due to speculative prefetching. The cache can be searched even for
a mapping specified as 'normal, uncached' and you can get cache hits
because the data has been speculatively loaded through a separate
cached mapping of the same physical page.

FYI, during the next merge window, I will be pushing a patch which makes
ioremap() of system RAM fail, which should be the last core code creator
of mappings with different memory types. This behaviour has been outlawed
(as unpredictable) in the architecture specification and does cause
problems on some CPUs.

We've also the issue of multiple mappings with differing cache attributes
which needs addressing too...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Russell King - ARM Linux on 16 Jul 2010 04:10

On Thu, Jul 15, 2010 at 08:48:36PM -0400, Tim HRM wrote:
> Interesting, since I seem to remember the MSM devices mostly conduct
> IO through regions of normal RAM, largely accomplished through
> ioremap() calls.
>
> Without more public domain documentation of the MSM chips and AMSS
> interfaces I wouldn't know how to avoid this, but I can imagine it
> creates a bit of urgency for Qualcomm developers as they attempt to
> upstream support for this most interesting SoC.

As the patch has been out for RFC since early April on the linux-arm-kernel
mailing list (Subject: [RFC] Prohibit ioremap() on kernel managed RAM),
and no comments have come back from Qualcomm folk.

The restriction on creation of multiple V:P mappings with differing
attributes is also fairly hard to miss in the ARM architecture
specification when reading the sections about caches.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Russell King - ARM Linux on 19 Jul 2010 04:30

On Wed, Jul 14, 2010 at 06:41:48PM -0700, Zach Pfeffer wrote:
> On Thu, Jul 15, 2010 at 08:07:28AM +0900, FUJITA Tomonori wrote:
> > Why we we need a new abstraction layer to solve the problem that the
> > current API can handle?
>
> The current API can't really handle it because the DMA API doesn't
> separate buffer allocation from buffer mapping.

That's not entirely correct. The DMA API provides two things:

1. An API for allocating DMA coherent buffers
2. An API for mapping streaming buffers

Some implementations of (2) end up using (1) to work around broken
hardware - but that's a separate problem (and causes its own set of
problems.)

> For instance: I need 10, 1 MB physical buffers and a 64 KB physical
> buffer. With the DMA API I need to allocate 10*1MB/PAGE_SIZE + 64
> KB/PAGE_SIZE scatterlist elements, fix them all up to follow the
> chaining specification and then go through all of them again to fix up
> their virtual mappings for the mapper that's mapping the physical
> buffer.

You're making it sound like extremely hard work.

struct scatterlist *sg;
int i, nents = 11;

sg = kmalloc(sizeof(*sg) * nents, GFP_KERNEL);
if (!sg)
return -ENOMEM;

sg_init_table(sg, nents);
for (i = 0; i < nents; i++) {
if (i != nents - 1)
len = 1048576;
else
len = 64*1024;
buf = alloc_buffer(len);
sg_set_buf(&sg[i], buf, len);
}

There's no need to split the scatterlist elements up into individual
pages - the block layer doesn't do that when it passes scatterlists
down to block device drivers.

I'm not saying that it's reasonable to pass (or even allocate) a 1MB
buffer via the DMA API.

> If I want to share the buffer with another device I have to
> make a copy of the entire thing then fix up the virtual mappings for
> the other device I'm sharing with.

This is something the DMA API doesn't do - probably because there hasn't
been a requirement for it.

One of the issues for drivers is that by separating the mapped scatterlist
from the input buffer scatterlist, it creates something else for them to
allocate, which causes an additional failure point - and as all users sit
well with the current API, there's little reason to change especially
given the number of drivers which would need to be updated.

What you can do is:

struct map {
dma_addr_t addr;
size_t len;
};

int map_sg(struct device *dev, struct scatterlist *list,
unsigned int nents, struct map *map, enum dma_data_direction dir)
{
struct scatterlist *sg;
unsigned int i, j = 0;

for_each_sg(list, sg, nents, i) {
map[j]->addr = dma_map_page(dev, sg_page(sg), sg->offset,
sg->length, dir);
map[j]->len = length;
if (dma_mapping_error(map[j]->addr))
break;
j++;
}

return j;
}

void unmap(struct device *dev, struct map *map, unsigned int nents,
enum dma_data_direction dir)
{
while (nents) {
dma_unmap_page(dev, map->addr, map->len, dir);
map++;
nents--;
}
}

Note: this may not be portable to all architectures. It may also break
if there's something like the dmabounce or swiotlb code remapping buffers
which don't fit the DMA mask for the device - that's a different problem.

You can then map the same scatterlist into multiple different 'map'
arrays for several devices simultaneously. What you can't do is access
the buffers from the CPU while they're mapped to any device.

I'm not saying that you should do the above - I'm just proving that it's
not as hard as you seem to be making out.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3
Prev: Provide special async page fault handler when async PF capability is detected
Next: [PATCH] struct io panic on raid1 - Re: block: unify flags for struct bio and struct request will kernel panic