From: BubbaGump on
On Sun, 22 Oct 2006 14:09:24 -0400, Mark Roddy <markr(a)hollistech.com>
wrote:

>On Fri, 20 Oct 2006 20:21:24 -0400, BubbaGump <> wrote:
>
>>I noticed in my DDK header files (3790.1830) that KeFlushIoBuffers()
>>is defined to do nothing, absolutely nothing. I know x86 cache's are
>>already coherent with respect to DMA, but isn't there still the
>>possibility of out-of-order loads and stores? Is this a bug? Are
>>driver's supposed to do a KeMemoryBarrier() explicitly?
>>
>
>No it isn't a bug. This is DMA, not processor load store operations.
>You need a memory barrier only for shared memory regions that are not
>otherwise protected by a lock or an interlocked operation. not for DMA
>buffers.

Let's not answer the question of whether this is a bug yet. There's
more information to exchange.

DMA implies processor load store operations because the data involved
in a DMA either comes from or goes to the CPU. The buffer involved in
DMA is effectively a memory region shared between the CPU and a
device. The device is like another CPU. The only operation analogous
to releasing a lock that the device can see is the write to some "Go"
bit, and the barrier performed by the memory-mapped I/O write macros
is done after the write when the barrier needed for the shared DMA
region is needed before the write.


>> the compiler's memory accesses might not
>>be ordered so at least a compiler memory barrier is needed for DMA out
>>to a device.
>>
>
>Why? Is the DMA initiated without any locking operations? Is your
>concern here compiler optimizations requiring memory barriers (all of
>which are implicitly solved through a lock or interlock operation) or
>CPU read write re-ordering?

Yes, the DMA is initiated without any locking operations that would be
visible to the device, so they are not present to solve problems with
compiler optimizations.

I am no longer concerned with CPU re-ordering, since I'm only looking
at the x86 version of KeFlushIoBuffers and someone pointed out that
x86 writes are always ordered at least with respect to each other.


>>What about after a DMA? In order to account for speculative loads
>>during DMA in from a device, does the call to FlushAdapterBuffers()
>>have a memory barrier or should the driver also do a KeMemoryBarrier()
>>explicitly here?
>
>I'm confused. Why is the compiler doing speculative loads from the
>buffer that was the DMA target before the DMA completed? I don't
>believe that this is a real world example of compiler optimization
>requiring a memory barrier, nor is there a hardware coherency problem
>as cache coherency rules will guarantee the contents of the buffer
>once the DMA is complete. At a minimum the thread that is reading the
>contents of the DMA target buffer must wait on some lock for the DMA
>to complete and that lock is a barrier.

I think I agree with this.

From: BubbaGump on
On Sun, 22 Oct 2006 14:16:43 -0400, Mark Roddy <markr(a)hollistech.com>
wrote:

>On Sat, 21 Oct 2006 18:15:17 -0400, BubbaGump <> wrote:
>
>>I'm thinking of a transfer of a buffer out to a device using common
>>buffer DMA:
>>
>> 1) driver writes to the common buffer
>> 2) driver calls KeFlushIoBuffers
>> (device has logical address of buffer from previous operation)
>> 3) driver writes "Go" bit of a device register
>> 4) device reads from the common buffer
>>
>
>
>"On x86-based, x64-based and Itanium-based hardware, reordering might
>take place when a write operation for one location precedes a read
>operation for a different location. Processor reordering might move
>the read operation ahead of the write operation on the same CPU, thus
>effectively reversing their order in code. These architectures do not
>reorder read operations followed by read operations or write
>operations followed by write operations"
>http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/MP_issues.doc#_Toc119927283
>
>Go re-read the write post buffer descriptions in the IA32 specs.
>Write-write reordering would break all kinds of stuff. Your example is
>not valid. If the processor in step (3) initiated the DMA via a read
>operation (unlikely but possible) and did so without using the HAL
>READ_REGISTER functions or otherwise introducing a barrier, you might
>have a valid case.

There is another part of this thread where I agreed with someone that
x86 CPU reordering would not be an issue here but compiler reordering
still might. This possible compiler reordering of writes still makes
the example worthwhile.

(What you mention about reads is interesting. I hadn't thought about
that. I don't yet comprehend any new issues, but I'll keep it in mind
for later.)