From: Igor Tandetnik on 8 Jan 2010 08:38 Johnston wrote: > Targeting x64 makes no difference, still no memory barrier instructions > output. x64 provides the same strong consistency model as x86. That's why I said you need to compile for IA64 (aka Itanium): as far as I know, it's the only CPU supported by MSVC compiler that has a weak consistency model and actually needs memory barriers. -- With best wishes, Igor Tandetnik With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
From: Leigh Johnston on 8 Jan 2010 09:03 > > http://www.linuxjournal.com/article/8211 > > x86 CPU provides process consistency, where writes by one CPU are observed > in order by all other CPUs. For this reason, it doesn't need explicit > memory barrier instructions. > > LFENCE, SFENCE and MFENCE are SSE instructions, apparently needed because > certain other SSE instructions are asynchronous. I must admit I'm not very > familiar with SSE, but your example doesn't issue SSE instructions anyway, > so this is moot. > The FENCE instructions are not "SSE" instructions they are required for the following cases it seems (http://www.intel.com/Assets/PDF/manual/253668.pdf): Writes to memory are not reordered with other writes, with the following exceptions: - writes executed with the CLFLUSH instruction; - streaming stores (writes) executed with the non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); and - string operations (see Section 8.2.4.1). But yeah for my simple ADD example it looks like you are correct, no fence required. > LOCK is not an instruction by itself, but a prefix to other instructions > that renders them atomic (e.g. instructions like ADD which need to read, > modify and write a memory location). Note that "volatile" doesn't promise > or guarantee atomicity: ++n1 is still not atomic even though n1 is > declared volatile. I am aware that LOCK is a prefix and I have read elsewhere that is *also* acts as a memory barrier when used in conjunction with a compatible instruction. I am also well aware that volatile does not promise atomicity, I never said that it does. /Leigh
From: Leigh Johnston on 8 Jan 2010 09:20 > > x86 CPU provides process consistency, where writes by one CPU are observed > in order by all other CPUs. For this reason, it doesn't need explicit > memory barrier instructions. > What about store forwarding? MFENCE may help I think (from http://www.intel.com/Assets/PDF/manual/253668.pdf): The memory-ordering model allows concurrent stores by two processors to be seen in different orders by those two processors; specifically, each processor may perceive its own store occurring before that of the other. This is illustrated by the following example: Example 8-5. Intra-Processor Forwarding is Allowed Processor 0 Processor 1 mov [ _x], 1 mov [ _y], 1 mov r1, [ _x] mov r3, [ _y] mov r2, [ _y] mov r4, [ _x] Initially x == y == 0 r2 == 0 and r4 == 0 is allowed The memory-ordering model imposes no constraints on the order in which the two stores appear to execute by the two processors. This fact allows processor 0 to see its store before seeing processor 1's, while processor 1 sees its store before seeing processor 0's. (Each processor is self consistent.) This allows r2 == 0 and r4 == 0. In practice, the reordering in this example can arise as a result of store-buffer forwarding. While a store is temporarily held in a processor's store buffer, it can satisfy the processor's own loads but is not visible to (and cannot satisfy) loads by other processors.
From: Igor Tandetnik on 8 Jan 2010 09:19 Leigh Johnston wrote: > The FENCE instructions are not "SSE" instructions they are required for the > following cases it seems > (http://www.intel.com/Assets/PDF/manual/253668.pdf): > Writes to memory are not reordered with other writes, with the following > exceptions: > - writes executed with the CLFLUSH instruction; > - streaming stores (writes) executed with the non-temporal move instructions > (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); and > - string operations (see Section 8.2.4.1). All these are in fact from SSE[2] instruction set: http://en.wikipedia.org/wiki/X86_instruction_listings -- With best wishes, Igor Tandetnik With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
From: Igor Tandetnik on 8 Jan 2010 09:34 Leigh Johnston wrote: >> x86 CPU provides process consistency, where writes by one CPU are observed >> in order by all other CPUs. For this reason, it doesn't need explicit >> memory barrier instructions. >> > > What about store forwarding? I must admit you are digging deeper than my understanding extends. Hopefully, someone more knowledgeable will chime in. -- With best wishes, Igor Tandetnik With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: deleting pointers in a list. Next: Using std::less_equl predicate with std::min() |