From: Igor Tandetnik on
Johnston wrote:
> Targeting x64 makes no difference, still no memory barrier instructions
> output.

x64 provides the same strong consistency model as x86. That's why I said you need to compile for IA64 (aka Itanium): as far as I know, it's the only CPU supported by MSVC compiler that has a weak consistency model and actually needs memory barriers.
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
From: Leigh Johnston on

>
> http://www.linuxjournal.com/article/8211
>
> x86 CPU provides process consistency, where writes by one CPU are observed
> in order by all other CPUs. For this reason, it doesn't need explicit
> memory barrier instructions.
>
> LFENCE, SFENCE and MFENCE are SSE instructions, apparently needed because
> certain other SSE instructions are asynchronous. I must admit I'm not very
> familiar with SSE, but your example doesn't issue SSE instructions anyway,
> so this is moot.
>

The FENCE instructions are not "SSE" instructions they are required for the
following cases it seems
(http://www.intel.com/Assets/PDF/manual/253668.pdf):
Writes to memory are not reordered with other writes, with the following
exceptions:
- writes executed with the CLFLUSH instruction;
- streaming stores (writes) executed with the non-temporal move instructions
(MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); and
- string operations (see Section 8.2.4.1).

But yeah for my simple ADD example it looks like you are correct, no fence
required.

> LOCK is not an instruction by itself, but a prefix to other instructions
> that renders them atomic (e.g. instructions like ADD which need to read,
> modify and write a memory location). Note that "volatile" doesn't promise
> or guarantee atomicity: ++n1 is still not atomic even though n1 is
> declared volatile.

I am aware that LOCK is a prefix and I have read elsewhere that is *also*
acts as a memory barrier when used in conjunction with a compatible
instruction. I am also well aware that volatile does not promise atomicity,
I never said that it does.

/Leigh

From: Leigh Johnston on
>
> x86 CPU provides process consistency, where writes by one CPU are observed
> in order by all other CPUs. For this reason, it doesn't need explicit
> memory barrier instructions.
>

What about store forwarding? MFENCE may help I think (from
http://www.intel.com/Assets/PDF/manual/253668.pdf):

The memory-ordering model allows concurrent stores by two processors to be
seen
in different orders by those two processors; specifically, each processor
may perceive
its own store occurring before that of the other. This is illustrated by the
following
example:

Example 8-5. Intra-Processor Forwarding is Allowed
Processor 0 Processor 1
mov [ _x], 1 mov [ _y], 1
mov r1, [ _x] mov r3, [ _y]
mov r2, [ _y] mov r4, [ _x]
Initially x == y == 0
r2 == 0 and r4 == 0 is allowed

The memory-ordering model imposes no constraints on the order in which the
two
stores appear to execute by the two processors. This fact allows processor 0
to see
its store before seeing processor 1's, while processor 1 sees its store
before seeing
processor 0's. (Each processor is self consistent.) This allows r2 == 0 and
r4 == 0.
In practice, the reordering in this example can arise as a result of
store-buffer
forwarding. While a store is temporarily held in a processor's store buffer,
it can
satisfy the processor's own loads but is not visible to (and cannot satisfy)
loads by
other processors.

From: Igor Tandetnik on
Leigh Johnston wrote:
> The FENCE instructions are not "SSE" instructions they are required for the
> following cases it seems
> (http://www.intel.com/Assets/PDF/manual/253668.pdf):
> Writes to memory are not reordered with other writes, with the following
> exceptions:
> - writes executed with the CLFLUSH instruction;
> - streaming stores (writes) executed with the non-temporal move instructions
> (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); and
> - string operations (see Section 8.2.4.1).

All these are in fact from SSE[2] instruction set:

http://en.wikipedia.org/wiki/X86_instruction_listings

--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
From: Igor Tandetnik on
Leigh Johnston wrote:
>> x86 CPU provides process consistency, where writes by one CPU are observed
>> in order by all other CPUs. For this reason, it doesn't need explicit
>> memory barrier instructions.
>>
>
> What about store forwarding?

I must admit you are digging deeper than my understanding extends. Hopefully, someone more knowledgeable will chime in.
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925