[PATCH RFC 03/12] x86/ticketlock: Use C for __ticket_spin

Prev: [PATCH RFC 10/12] x86/pvticketlock: keep count of blocked cpus
Next: [PATCH RFC 04/12] x86/ticketlock: make large and small ticket versions of spin_lock the same

From: Jeremy Fitzhardinge on 16 Jul 2010 21:10

If we don't need to use a locked inc for unlock, then implement it in C.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge(a)citrix.com>
---
arch/x86/include/asm/spinlock.h | 33 ++++++++++++++++++---------------
1 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 6711d36..082990a 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -33,9 +33,23 @@
* On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
* (PPro errata 66, 92)
*/
-# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+ if (sizeof(lock->tickets.head) == sizeof(u8))
+ asm (LOCK_PREFIX "incb %0"
+ : "+m" (lock->tickets.head) : : "memory");
+ else
+ asm (LOCK_PREFIX "incw %0"
+ : "+m" (lock->tickets.head) : : "memory");
+
+}
#else
-# define UNLOCK_LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+ barrier();
+ lock->tickets.head++;
+ barrier();
+}
#endif

/*
@@ -93,14 +107,6 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)

return tmp;
}
-
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-{
- asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
- : "+m" (lock->slock)
- :
- : "memory", "cc");
-}
#else
static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
{
@@ -144,15 +150,12 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)

return tmp;
}
+#endif

static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
{
- asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
- : "+m" (lock->slock)
- :
- : "memory", "cc");
+ __ticket_unlock_release(lock);
}
-#endif

static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
{
--
1.7.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeremy Fitzhardinge on 20 Jul 2010 12:20

On 07/20/2010 08:38 AM, Konrad Rzeszutek Wilk wrote:
>> --- a/arch/x86/include/asm/spinlock.h
>> +++ b/arch/x86/include/asm/spinlock.h
>> @@ -33,9 +33,23 @@
>> * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
>> * (PPro errata 66, 92)
>> */
>> -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
>> +{
>> + if (sizeof(lock->tickets.head) == sizeof(u8))
>> + asm (LOCK_PREFIX "incb %0"
>> + : "+m" (lock->tickets.head) : : "memory");
>> + else
>> + asm (LOCK_PREFIX "incw %0"
>> + : "+m" (lock->tickets.head) : : "memory");
>>
> Should those be 'asm volatile' to make them barriers as well? Or do we
> not have to worry about that on a Pentium Pro SMP?
>

"volatile" would be a compiler barrier, but it has no direct effect on,
or relevence to, the CPU. It just cares about the LOCK_PREFIX. The
"memory" clobber is probably unnecessary as well, since the constraints
already tell the compiler the most important information. We can add
barriers separately as needed.

>> +
>> +}
>> #else
>> -# define UNLOCK_LOCK_PREFIX
>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
>> +{
>> + barrier();
>> + lock->tickets.head++;
>> + barrier();
>> +}
>>
> Got a question:
> This extra barrier() (which I see gets removed in git tree) was
> done b/c the function is inlined and hence the second barrier() inhibits
> gcc from re-ordering __ticket_spin_unlock instructions? Which is a big
> pre-requisite in patch 7 where this function expands to:
>

Yes, I removed the barriers here so that the compiler can combine the
unlocking "inc" with getting "next" for unlock_kick. There's no
constraint on what instructions the compiler can use to do the unlock so
long as it ends up writing a byte value to the right location.

> static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
> {
> __ticket_t next = lock->tickets.head + 1; // This code
> is executed before the lock->tickets.head++ b/c of the 1st barrier?
> Or would it be done irregardless b/c gcc sees the data dependency here?
>
> __ticket_unlock_release(lock); <- expands to
> "barrier();lock->tickets.head++;barrier()"
>
> + __ticket_unlock_kick(lock, next); <- so now the second barrier()
> affects this code, so it won't re-order the lock->tickets.head++ to be called
> after this function?
>
>
> This barrier ("asm volatile("" : : : "memory")); from what I've been reading
> says : "Don't re-order the instructions within this scope and starting
> right below me." ? Or is it is just within the full scope of the
> function/code logic irregardless of the 'inline' defined in one of them?
>

The barrier effects the entire instruction stream, regardless of the
source from which it was generated. So inlining will bring the
function's instructions into the caller's stream, and the compiler will
freely reorder them as it sees fit - and the barrier adds a restraint to
that.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: H. Peter Anvin on 6 Aug 2010 14:00

On 07/20/2010 09:17 AM, Jeremy Fitzhardinge wrote:
>
> "volatile" would be a compiler barrier, but it has no direct effect on,
> or relevence to, the CPU. It just cares about the LOCK_PREFIX. The
> "memory" clobber is probably unnecessary as well, since the constraints
> already tell the compiler the most important information. We can add
> barriers separately as needed.
>

You absolutely need volatile, since otherwise you're permitting the
compiler to split, re-execute or even drop the code. Anything else
might work, by accident, but it's not clean.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeremy Fitzhardinge on 6 Aug 2010 16:10

On 08/06/2010 10:47 AM, H. Peter Anvin wrote:
> On 07/20/2010 09:17 AM, Jeremy Fitzhardinge wrote:
>> "volatile" would be a compiler barrier, but it has no direct effect on,
>> or relevence to, the CPU. It just cares about the LOCK_PREFIX. The
>> "memory" clobber is probably unnecessary as well, since the constraints
>> already tell the compiler the most important information. We can add
>> barriers separately as needed.
>>
> You absolutely need volatile, since otherwise you're permitting the
> compiler to split, re-execute or even drop the code. Anything else
> might work, by accident, but it's not clean.

I don't think so in this case. The instructions in question are
basically lock->waiters++/--; the only reason they need to be asm is
that they're locked. But I'm not relying on them for any kind of
compiler or cpu ordering or barrier. Where ordering is important, I
have explicit barrier()s to enforce it.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: [PATCH RFC 10/12] x86/pvticketlock: keep count of blocked cpus
Next: [PATCH RFC 04/12] x86/ticketlock: make large and small ticket versions of spin_lock the same

[PATCH RFC 03/12] x86/ticketlock: Use C for __ticket_spin_unlock