Prev: [GIT PULL 0/1] perf/core improvements
Next: [PATCH 1/1] perf tools: Add DWARF register lookup for SH
From: Zachary Amsden on 14 Jul 2010 20:30 On 07/14/2010 12:12 PM, Jeremy Fitzhardinge wrote: > Change d3ca901f94b3299 introduces a new mechanism for sequencing > accesses to the control registers using a variable to act as a > dependency. (However, the patch description only says it is unifying > parts of system.h and makes no mention of this new code.) > > This sequencing implementation is flawed in two ways: > - Firstly, it gets the input/outputs for __force_order wrong on > the asm statements. __force_order is a proxy for the control > registers themselves, so a write_crX should write __force_order, > and conversely for read_crX. The original code got this backwards, > making write_crX read from __force_order, which in principle would > allow gcc to eliminate a "redundant" write_crX. > > - Secondly, writes to control registers can have drastic > side-effects on all memory operations (write to cr3 changes the > current pagetable and redefines the meaning of all addresses, > for example), so they must clobber "memory" in order to be > ordered with respect to memory operations. > > We seem to have been saved by the implicit ordering that "asm volatile" > gives us. > > Signed-off-by: Jeremy Fitzhardinge<jeremy.fitzhardinge(a)citrix.com> > Cc: Glauber de Oliveira Costa<gcosta(a)redhat.com> > Cc: Ingo Molnar<mingo(a)elte.hu> > Cc: "H. Peter Anvin"<hpa(a)zytor.com> > Cc: Thomas Gleixner<tglx(a)linutronix.de> > Cc: Avi Kivity<avi(a)redhat.com> > Cc: Zachary Amsden<zamsden(a)redhat.com> > > diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h > index e7f4d33..b782af2 100644 > --- a/arch/x86/include/asm/system.h > +++ b/arch/x86/include/asm/system.h > @@ -212,53 +212,68 @@ static inline void native_clts(void) > > /* > * Volatile isn't enough to prevent the compiler from reordering the > - * read/write functions for the control registers and messing everything up. > - * A memory clobber would solve the problem, but would prevent reordering of > - * all loads stores around it, which can hurt performance. Solution is to > - * use a variable and mimic reads and writes to it to enforce serialization > + * read/write functions for the control registers and messing > + * everything up. A memory clobber would solve the problem, but would > + * prevent reordering of all loads stores around it, which can hurt > + * performance (however, control register writes can have drastic > + * effects on memory accesses - like switching pagetables and thereby > + * redefining what an address means - so they still need to clobber > + * memory). > + * > + * Solution is to use a variable and mimic reads and writes to it to > + * enforce serialization. __force_order acts as a proxy for the > + * control registers, so a read_crX reads __force_order, and write_crX > + * writes it (actually both reads and writes it to indicate that > + * write-over-write can't be "optimised" away). > + * > + * This assumes there's no global optimisation between object files, > + * so using a static per-file "__force_order" is OK. (In theory we > + * don't need __force_order to be instantiated at all, since it is > + * never actually read or written to. But gcc might decide to > + * generate a reference to it anyway, so we need to keep it around.) > */ > static unsigned long __force_order; > > static inline unsigned long native_read_cr0(void) > { > unsigned long val; > - asm volatile("mov %%cr0,%0\n\t" : "=r" (val), "=m" (__force_order)); > + asm volatile("mov %%cr0,%0\n\t" : "=r" (val) : "m" (__force_order)); > return val; > } > > static inline void native_write_cr0(unsigned long val) > { > - asm volatile("mov %0,%%cr0": : "r" (val), "m" (__force_order)); > + asm volatile("mov %1,%%cr0": "+m" (__force_order) : "r" (val) : "memory"); > } > > static inline unsigned long native_read_cr2(void) > { > unsigned long val; > - asm volatile("mov %%cr2,%0\n\t" : "=r" (val), "=m" (__force_order)); > + asm volatile("mov %%cr2,%0\n\t" : "=r" (val) : "m" (__force_order)); > return val; > } > > static inline void native_write_cr2(unsigned long val) > { > - asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order)); > + asm volatile("mov %1,%%cr2": "+m" (__force_order) : "r" (val) : "memory"); > } > You don't need the memory clobber there. Technically, this should never be used, however. > > static inline unsigned long native_read_cr3(void) > { > unsigned long val; > - asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m" (__force_order)); > + asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : "m" (__force_order)); > return val; > } > > static inline void native_write_cr3(unsigned long val) > { > - asm volatile("mov %0,%%cr3": : "r" (val), "m" (__force_order)); > + asm volatile("mov %1,%%cr3": "+m" (__force_order) : "r" (val) : "memory"); > } > > static inline unsigned long native_read_cr4(void) > { > unsigned long val; > - asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m" (__force_order)); > + asm volatile("mov %%cr4,%0\n\t" : "=r" (val) : "m" (__force_order)); > return val; > } > > @@ -271,7 +286,7 @@ static inline unsigned long native_read_cr4_safe(void) > asm volatile("1: mov %%cr4, %0\n" > "2:\n" > _ASM_EXTABLE(1b, 2b) > - : "=r" (val), "=m" (__force_order) : "0" (0)); > + : "=r" (val) : "m" (__force_order), "0" (0)); > #else > val = native_read_cr4(); > #endif > @@ -280,7 +295,7 @@ static inline unsigned long native_read_cr4_safe(void) > > static inline void native_write_cr4(unsigned long val) > { > - asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order)); > + asm volatile("mov %1,%%cr4": "+m" (__force_order) : "r" (val) : "memory"); > } > > #ifdef CONFIG_X86_64 > > > Looks good. I really hope __force_order gets pruned however. Does it actually? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jeremy Fitzhardinge on 14 Jul 2010 21:00 On 07/14/2010 05:28 PM, Zachary Amsden wrote: > >> static inline void native_write_cr2(unsigned long val) >> { >> - asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order)); >> + asm volatile("mov %1,%%cr2": "+m" (__force_order) : "r" (val) : >> "memory"); >> } >> > > > You don't need the memory clobber there. Technically, this should > never be used, however. Yes. I just did it for consistency. Likewise, I didn't pore over the manuals to work out whether writes to any crX could really have memory side-effects. >> >> static inline unsigned long native_read_cr3(void) >> { >> unsigned long val; >> - asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m" >> (__force_order)); >> + asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : "m" >> (__force_order)); >> return val; >> } >> >> static inline void native_write_cr3(unsigned long val) >> { >> - asm volatile("mov %0,%%cr3": : "r" (val), "m" (__force_order)); >> + asm volatile("mov %1,%%cr3": "+m" (__force_order) : "r" (val) : >> "memory"); >> } >> >> static inline unsigned long native_read_cr4(void) >> { >> unsigned long val; >> - asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m" >> (__force_order)); >> + asm volatile("mov %%cr4,%0\n\t" : "=r" (val) : "m" >> (__force_order)); >> return val; >> } >> >> @@ -271,7 +286,7 @@ static inline unsigned long >> native_read_cr4_safe(void) >> asm volatile("1: mov %%cr4, %0\n" >> "2:\n" >> _ASM_EXTABLE(1b, 2b) >> - : "=r" (val), "=m" (__force_order) : "0" (0)); >> + : "=r" (val) : "m" (__force_order), "0" (0)); >> #else >> val = native_read_cr4(); >> #endif >> @@ -280,7 +295,7 @@ static inline unsigned long >> native_read_cr4_safe(void) >> >> static inline void native_write_cr4(unsigned long val) >> { >> - asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order)); >> + asm volatile("mov %1,%%cr4": "+m" (__force_order) : "r" (val) : >> "memory"); >> } >> >> #ifdef CONFIG_X86_64 >> >> >> > > Looks good. I really hope __force_order gets pruned however. Does it > actually? There's a couple of instances in my vmlinux. I didn't try to track them back to specific .o files. gcc tends to generate references by putting its address into a register and passing that into the asms. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Zachary Amsden on 14 Jul 2010 21:10 On 07/14/2010 02:55 PM, Jeremy Fitzhardinge wrote: > On 07/14/2010 05:28 PM, Zachary Amsden wrote: > >> >>> static inline void native_write_cr2(unsigned long val) >>> { >>> - asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order)); >>> + asm volatile("mov %1,%%cr2": "+m" (__force_order) : "r" (val) : >>> "memory"); >>> } >>> >>> >> >> You don't need the memory clobber there. Technically, this should >> never be used, however. >> > Yes. I just did it for consistency. Likewise, I didn't pore over the > manuals to work out whether writes to any crX could really have memory > side-effects. > 0,3,4 all can. >>> static inline unsigned long native_read_cr3(void) >>> { >>> unsigned long val; >>> - asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m" >>> (__force_order)); >>> + asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : "m" >>> (__force_order)); >>> return val; >>> } >>> >>> static inline void native_write_cr3(unsigned long val) >>> { >>> - asm volatile("mov %0,%%cr3": : "r" (val), "m" (__force_order)); >>> + asm volatile("mov %1,%%cr3": "+m" (__force_order) : "r" (val) : >>> "memory"); >>> } >>> >>> static inline unsigned long native_read_cr4(void) >>> { >>> unsigned long val; >>> - asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m" >>> (__force_order)); >>> + asm volatile("mov %%cr4,%0\n\t" : "=r" (val) : "m" >>> (__force_order)); >>> return val; >>> } >>> >>> @@ -271,7 +286,7 @@ static inline unsigned long >>> native_read_cr4_safe(void) >>> asm volatile("1: mov %%cr4, %0\n" >>> "2:\n" >>> _ASM_EXTABLE(1b, 2b) >>> - : "=r" (val), "=m" (__force_order) : "0" (0)); >>> + : "=r" (val) : "m" (__force_order), "0" (0)); >>> #else >>> val = native_read_cr4(); >>> #endif >>> @@ -280,7 +295,7 @@ static inline unsigned long >>> native_read_cr4_safe(void) >>> >>> static inline void native_write_cr4(unsigned long val) >>> { >>> - asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order)); >>> + asm volatile("mov %1,%%cr4": "+m" (__force_order) : "r" (val) : >>> "memory"); >>> } >>> >>> #ifdef CONFIG_X86_64 >>> >>> >>> >>> >> Looks good. I really hope __force_order gets pruned however. Does it >> actually? >> > There's a couple of instances in my vmlinux. I didn't try to track them > back to specific .o files. gcc tends to generate references by putting > its address into a register and passing that into the asms. > Can you make it extern so at least there's only one in the final bss? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on 14 Jul 2010 21:40 Yes, it will definitely NOT be pruned. I'm going to file a gcc documentation request to see if any of this is actually needed, though. There may also be a need for gcc to handle *inbound* general memory constraints. "Jeremy Fitzhardinge" <jeremy(a)goop.org> wrote: >On 07/14/2010 05:28 PM, Zachary Amsden wrote: >> >>> static inline void native_write_cr2(unsigned long val) >>> { >>> - asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order)); >>> + asm volatile("mov %1,%%cr2": "+m" (__force_order) : "r" (val) : >>> "memory"); >>> } >>> >> >> >> You don't need the memory clobber there. Technically, this should >> never be used, however. > >Yes. I just did it for consistency. Likewise, I didn't pore over the >manuals to work out whether writes to any crX could really have memory >side-effects. > >>> >>> static inline unsigned long native_read_cr3(void) >>> { >>> unsigned long val; >>> - asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m" >>> (__force_order)); >>> + asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : "m" >>> (__force_order)); >>> return val; >>> } >>> >>> static inline void native_write_cr3(unsigned long val) >>> { >>> - asm volatile("mov %0,%%cr3": : "r" (val), "m" (__force_order)); >>> + asm volatile("mov %1,%%cr3": "+m" (__force_order) : "r" (val) : >>> "memory"); >>> } >>> >>> static inline unsigned long native_read_cr4(void) >>> { >>> unsigned long val; >>> - asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m" >>> (__force_order)); >>> + asm volatile("mov %%cr4,%0\n\t" : "=r" (val) : "m" >>> (__force_order)); >>> return val; >>> } >>> >>> @@ -271,7 +286,7 @@ static inline unsigned long >>> native_read_cr4_safe(void) >>> asm volatile("1: mov %%cr4, %0\n" >>> "2:\n" >>> _ASM_EXTABLE(1b, 2b) >>> - : "=r" (val), "=m" (__force_order) : "0" (0)); >>> + : "=r" (val) : "m" (__force_order), "0" (0)); >>> #else >>> val = native_read_cr4(); >>> #endif >>> @@ -280,7 +295,7 @@ static inline unsigned long >>> native_read_cr4_safe(void) >>> >>> static inline void native_write_cr4(unsigned long val) >>> { >>> - asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order)); >>> + asm volatile("mov %1,%%cr4": "+m" (__force_order) : "r" (val) : >>> "memory"); >>> } >>> >>> #ifdef CONFIG_X86_64 >>> >>> >>> >> >> Looks good. I really hope __force_order gets pruned however. Does it >> actually? > >There's a couple of instances in my vmlinux. I didn't try to track them >back to specific .o files. gcc tends to generate references by putting >its address into a register and passing that into the asms. > > J > -- Sent from my mobile phone. Please pardon any lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on 15 Jul 2010 03:10 On 07/15/2010 03:28 AM, Zachary Amsden wrote: >> static inline void native_write_cr2(unsigned long val) >> { >> - asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order)); >> + asm volatile("mov %1,%%cr2": "+m" (__force_order) : "r" (val) : >> "memory"); >> } > > > You don't need the memory clobber there. Technically, this should > never be used, however. kvm writes cr2 in order to present the correct value to the guest. It doesn't use native_write_cr2(), however. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 Prev: [GIT PULL 0/1] perf/core improvements Next: [PATCH 1/1] perf tools: Add DWARF register lookup for SH |