From: Jonathan Morrison on 21 Oct 2006 23:13 I'll double-check this - but IIRC the 8.X compilers implement volatile with acquire-release semantics - which would make the code solid, as the volatile write in WRITE_XXX would force a flush of all previous operations before it could publish its results. However (again IIRC), the previous compiler didn't do this - so it does seem that there could be some problem in that case. Let me ask around in compiler land and see if I can get a definitive answer. Thanks. -- This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm <BubbaGump> wrote in message news:0fnlj25kc87dcsot6sjphvjak1rr3lvekr(a)4ax.com... > Based on the non-cacheable ordering JM pointed out, I think the last > statement I made below is false. The write of the logical address > would probably be to a device register, and the register access macros > like WRITE_REGISTER_ULONG might only use the volatile keyword or a > compiler barrier (under the assumption that the memory-mapped I/O they > will touch will be uncachable and not require a CPU barrier). The > volatile keyword doesn't serve as a barrier since nonvolatile accesses > can still be reordered around it, and the compiler barrier might need > to be both a compiler and CPU barrier since the buffer to be DMA'd > might be cached. > > My point is I don't think the register macros like > WRITE_REGISTER_ULONG will compensate in all cases for the barrier that > appears to be missing from KeFlushIoBuffers. > > > > > On Sat, 21 Oct 2006 18:15:17 -0400, BubbaGump <> wrote: > >>I'm thinking of a transfer of a buffer out to a device using common >>buffer DMA: >> >> 1) driver writes to the common buffer >> 2) driver calls KeFlushIoBuffers >> (device has logical address of buffer from previous operation) >> 3) driver writes "Go" bit of a device register >> 4) device reads from the common buffer >> >>I realize that if at least the logical address must be passed again >>before each operation, then its passing will already require a memory >>barrier between (2) and (3), which would substitute for the one >>apparently missing from KeFlushIoBuffers. >
From: already5chosen on 22 Oct 2006 05:31 BubbaGump wrote: > I noticed in my DDK header files (3790.1830) that KeFlushIoBuffers() > is defined to do nothing, absolutely nothing. I know x86 cache's are > already coherent with respect to DMA, but isn't there still the > possibility of out-of-order loads and stores? Is this a bug? Are > driver's supposed to do a KeMemoryBarrier() explicitly? > > Even if the x86 still does program ordered stores, and Microsoft will > update KeFlushIoBuffers() at some point in the future when that > ordering is no longer true, the compiler's memory accesses might not > be ordered so at least a compiler memory barrier is needed for DMA out > to a device. > > What about after a DMA? In order to account for speculative loads > during DMA in from a device, does the call to FlushAdapterBuffers() > have a memory barrier or should the driver also do a KeMemoryBarrier() > explicitly here? IA32 (and AMD64 for that matter) instruction set architecture guarantees so called processor consistency (PC) for WB and UC memory regions. PC means that stores by particular processor are observed in program order by all bus agents present in the system. Intel and AMD assure that processor consistency would be maintained over all future x86 compatible CPUs. So as far as the (IA32/AMD64) driver doesn't use WC memory regions it needs no memory barrier in KeFlushIoBuffers(). If your driver use WC memory region you have to call KeMemoryBarrier() explicitly in your code. On the receiving end of DMA operation the situation is different - x86-compatible CPUs can issue load instructions out of order. So I'd guess that FlushAdapterBuffers() routine issues memory fence (or load fence on the processors that support it).
From: BubbaGump on 22 Oct 2006 09:33 On 22 Oct 2006 02:31:02 -0700, already5chosen(a)yahoo.com wrote: >IA32 (and AMD64 for that matter) instruction set architecture >guarantees so called processor consistency (PC) for WB and UC memory >regions. PC means that stores by particular processor are observed in >program order by all bus agents present in the system. >Intel and AMD assure that processor consistency would be maintained >over all future x86 compatible CPUs. I see the first part in the IA-32 spec about the present, but where is the second part stated about the future? I don't necessarily think it's a problem, but I see the opposite about the future: "It is recommended that software written to run on Pentium 4, Intel Xeon, and P6 family processors assume the processor-ordering model or a weaker memory-ordering model. The Pentium 4, Intel Xeon, and P6 family processors do not implement a strong memory-ordering model, except when using the UC memory type. Despite the fact that Pentium 4, Intel Xeon, and P6 family processors support processor ordering, Intel does not guarantee that future processors will support this model." >So as far as the (IA32/AMD64) driver doesn't use WC memory regions it >needs no memory barrier in KeFlushIoBuffers(). I agree no CPU barrier would be needed, but what about a compiler barrier?
From: Mark Roddy on 22 Oct 2006 14:09 On Fri, 20 Oct 2006 20:21:24 -0400, BubbaGump <> wrote: >I noticed in my DDK header files (3790.1830) that KeFlushIoBuffers() >is defined to do nothing, absolutely nothing. I know x86 cache's are >already coherent with respect to DMA, but isn't there still the >possibility of out-of-order loads and stores? Is this a bug? Are >driver's supposed to do a KeMemoryBarrier() explicitly? > No it isn't a bug. This is DMA, not processor load store operations. You need a memory barrier only for shared memory regions that are not otherwise protected by a lock or an interlocked operation. not for DMA buffers. >Even if the x86 still does program ordered stores, and Microsoft will >update KeFlushIoBuffers() at some point in the future when that >ordering is no longer true, KeFlushIoBuffers is intended to accommodate platforms where some operation is required to guarantee memory coherency after a DMA operation. > the compiler's memory accesses might not >be ordered so at least a compiler memory barrier is needed for DMA out >to a device. > Why? Is the DMA initiated without any locking operations? Is your concern here compiler optimizations requiring memory barriers (all of which are implicitly solved through a lock or interlock operation) or CPU read write re-ordering? >What about after a DMA? In order to account for speculative loads >during DMA in from a device, does the call to FlushAdapterBuffers() >have a memory barrier or should the driver also do a KeMemoryBarrier() >explicitly here? I'm confused. Why is the compiler doing speculative loads from the buffer that was the DMA target before the DMA completed? I don't believe that this is a real world example of compiler optimization requiring a memory barrier, nor is there a hardware coherency problem as cache coherency rules will guarantee the contents of the buffer once the DMA is complete. At a minimum the thread that is reading the contents of the DMA target buffer must wait on some lock for the DMA to complete and that lock is a barrier. All of the examples I have seen regarding memory barriers are concerned with unlocked shared memory access that local compiler optimizations or cpu read write reordering can render incoherent. All of these problems disappear when the shared region is protected by any of the standard locks. ===================== Mark Roddy DDK MVP Windows Vista/2003/XP/2000 Consulting Device and Filesystem Drivers Hollis Technology Solutions 603-321-1032 www.hollistech.com
From: Mark Roddy on 22 Oct 2006 14:16 On Sat, 21 Oct 2006 18:15:17 -0400, BubbaGump <> wrote: >I'm thinking of a transfer of a buffer out to a device using common >buffer DMA: > > 1) driver writes to the common buffer > 2) driver calls KeFlushIoBuffers > (device has logical address of buffer from previous operation) > 3) driver writes "Go" bit of a device register > 4) device reads from the common buffer > "On x86-based, x64-based and Itanium-based hardware, reordering might take place when a write operation for one location precedes a read operation for a different location. Processor reordering might move the read operation ahead of the write operation on the same CPU, thus effectively reversing their order in code. These architectures do not reorder read operations followed by read operations or write operations followed by write operations" http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/MP_issues.doc#_Toc119927283 Go re-read the write post buffer descriptions in the IA32 specs. Write-write reordering would break all kinds of stuff. Your example is not valid. If the processor in step (3) initiated the DMA via a read operation (unlikely but possible) and did so without using the HAL READ_REGISTER functions or otherwise introducing a barrier, you might have a valid case. >I realize that if at least the logical address must be passed again >before each operation, then its passing will already require a memory >barrier between (2) and (3), which would substitute for the one >apparently missing from KeFlushIoBuffers. > >I know it's an odd case, but I don't think it breaks any rules except >for what KeFlushIoBuffers might not do. > > > > >On Sat, 21 Oct 2006 12:38:46 -0700, "Jonathan Morrison" ><jonathanm(a)mindspring.com> wrote: > >>Can you show an example of a case that you think needs the barrier please. I >>am trying to come up with the case in my head and having a hard time coming >>up with one. Thanks. ===================== Mark Roddy DDK MVP Windows Vista/2003/XP/2000 Consulting Device and Filesystem Drivers Hollis Technology Solutions 603-321-1032 www.hollistech.com
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: Unclear fakemodem kmdf sample behavior Next: Mirror Driver-Use mapviewoffile in DrvEscape |