From: Jonathan Morrison on
Can you show an example of a case that you think needs the barrier please. I
am trying to come up with the case in my head and having a hard time coming
up with one. Thanks.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm
<BubbaGump> wrote in message
news:ed3jj29rhs26ag6u0vt1cbmok0s4r7oh6o(a)4ax.com...
> The barrier done on I/O writes (like WRITE_REGISTER_ULONG) is
> performed _after_ the write, which means all operations before and
> including the write to the DMA "Go" bit will be serialized together
> with no predictable ordering relative to each other. This doesn't
> guarantee the write will be the very last operation, which means a
> write to part of the DMA'd buffer may happen after the "Go" bit is
> set.
>
> The barrier for the DMA'd buffer needs to be after the DMA'd buffer is
> written but before the write to the "Go" bit. This is where
> KeFlushIoBuffers() normally goes (and it would be nice if it did the
> barrier instead of nothing).
>
> (I'm aware that practically speaking there will probably be a few I/O
> writes between the last write to the DMA'd buffer and the write to the
> "Go" bit so there will be at least one barrier, but strictly speaking
> KeFlushIoBuffers() should be providing the barrier)
>
>
>
>
> On Fri, 20 Oct 2006 19:53:53 -0700, "Jonathan Morrison"
> <jonathanm(a)mindspring.com> wrote:
>
>>IIRC - it is only implemented on IA64. x86 and x64 (I think) use
>>serializing
>>instructions for IO reads and writes so there is an implicit barrier.
>>
>>This posting is provided "AS IS" with no warranties, and confers no
>>rights.
>>Use of any included script samples are subject to the terms specified at
>>http://www.microsoft.com/info/cpyright.htm
>><BubbaGump> wrote in message
>>news:ir0jj2t1j3k3qhgtoavsj1qdof7100obn1(a)4ax.com...
>>> On 20 Oct 2006 18:30:35 -0700, soviet_bloke(a)hotmail.com wrote:
>>>
>>>>> I noticed in my DDK header files (3790.1830) that KeFlushIoBuffers()
>>>>> is defined to do nothing, absolutely nothing.
>>>>
>>>>What do you mean by "defined"?????? After all, KeFlushIoBuffers() is a
>>>>function, rather than macro. What do you expect to see, apart from
>>>>function's prototype, in a header file?????
>>>
>>> I see a macro, defined to nothing:
>>>
>>>
>>> #if defined(_X86_)
>>>
>>> ...
>>>
>>> #define KeFlushIoBuffers(Mdl, ReadOperation, DmaOperation)
>>>
>>> ...
>>>
>>
>


From: BubbaGump on
I'm thinking of a transfer of a buffer out to a device using common
buffer DMA:

1) driver writes to the common buffer
2) driver calls KeFlushIoBuffers
(device has logical address of buffer from previous operation)
3) driver writes "Go" bit of a device register
4) device reads from the common buffer

I realize that if at least the logical address must be passed again
before each operation, then its passing will already require a memory
barrier between (2) and (3), which would substitute for the one
apparently missing from KeFlushIoBuffers.

I know it's an odd case, but I don't think it breaks any rules except
for what KeFlushIoBuffers might not do.




On Sat, 21 Oct 2006 12:38:46 -0700, "Jonathan Morrison"
<jonathanm(a)mindspring.com> wrote:

>Can you show an example of a case that you think needs the barrier please. I
>am trying to come up with the case in my head and having a hard time coming
>up with one. Thanks.

From: Jonathan Morrison on
Is the common buffer allocated as cacheable or non-cacheable? If it is
non-cacheable then no reordering could happen - but if it is cached - hmmmm.
Now I have to think about this a little bit. :)

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm
<BubbaGump> wrote in message
news:qi4lj29urvpbo1v3ca3h36e5epnjg9291k(a)4ax.com...
> I'm thinking of a transfer of a buffer out to a device using common
> buffer DMA:
>
> 1) driver writes to the common buffer
> 2) driver calls KeFlushIoBuffers
> (device has logical address of buffer from previous operation)
> 3) driver writes "Go" bit of a device register
> 4) device reads from the common buffer
>
> I realize that if at least the logical address must be passed again
> before each operation, then its passing will already require a memory
> barrier between (2) and (3), which would substitute for the one
> apparently missing from KeFlushIoBuffers.
>
> I know it's an odd case, but I don't think it breaks any rules except
> for what KeFlushIoBuffers might not do.
>
>
>
>
> On Sat, 21 Oct 2006 12:38:46 -0700, "Jonathan Morrison"
> <jonathanm(a)mindspring.com> wrote:
>
>>Can you show an example of a case that you think needs the barrier please.
>>I
>>am trying to come up with the case in my head and having a hard time
>>coming
>>up with one. Thanks.
>


From: BubbaGump on
If the common buffer is non-cacheable then I agree no CPU reordering
could happen on an x86, but compiler reordering could still happen.

-- Cacheability is an interesting thing to mention. I hadn't thought
about it before. Now I realize why some versions of macros like
WRITE_REGISTER_ULONG only use the volatile keyword without doing a CPU
barrier. I still wonder why other versions of it instead or in
addition call undocumented functions like KeFlushWriteBuffer or
__faststorefence and why still others do an InterlockedOr if a CPU
barrier isn't needed. --




On Sat, 21 Oct 2006 17:16:45 -0700, "Jonathan Morrison"
<jonathanm(a)mindspring.com> wrote:

>Is the common buffer allocated as cacheable or non-cacheable? If it is
>non-cacheable then no reordering could happen - but if it is cached - hmmmm.
>Now I have to think about this a little bit. :)

From: BubbaGump on
Based on the non-cacheable ordering JM pointed out, I think the last
statement I made below is false. The write of the logical address
would probably be to a device register, and the register access macros
like WRITE_REGISTER_ULONG might only use the volatile keyword or a
compiler barrier (under the assumption that the memory-mapped I/O they
will touch will be uncachable and not require a CPU barrier). The
volatile keyword doesn't serve as a barrier since nonvolatile accesses
can still be reordered around it, and the compiler barrier might need
to be both a compiler and CPU barrier since the buffer to be DMA'd
might be cached.

My point is I don't think the register macros like
WRITE_REGISTER_ULONG will compensate in all cases for the barrier that
appears to be missing from KeFlushIoBuffers.




On Sat, 21 Oct 2006 18:15:17 -0400, BubbaGump <> wrote:

>I'm thinking of a transfer of a buffer out to a device using common
>buffer DMA:
>
> 1) driver writes to the common buffer
> 2) driver calls KeFlushIoBuffers
> (device has logical address of buffer from previous operation)
> 3) driver writes "Go" bit of a device register
> 4) device reads from the common buffer
>
>I realize that if at least the logical address must be passed again
>before each operation, then its passing will already require a memory
>barrier between (2) and (3), which would substitute for the one
>apparently missing from KeFlushIoBuffers.