From: john cooper on 9 Apr 2010 02:10 Darren Hart wrote: > john cooper wrote: >> But here we're otherwise completely >> open to indiscriminate scheduling preemption even though >> we may be holding a userland lock. > > That's true with any userland lock. Agreed. However from your earlier mail it seemed addressing this scenario was within the scope of consideration. There are two ways to deal with this condition, either reactive in the sense we do so after the lock holder has been preempted and subsequently find we're spinning in sibling thread context attempting to acquire the lock. Or proactively where we provide a time bounded deferral of lock holder preemption with the assumption the lock hold path overhead has negligible effect upon deferring a potentially coincident scheduling operation. It is fairly straightforward to demonstrate the impact to performance with a focused micro benchmark, less so for a more "typical" application with the effect being diluted among other app activity. The two approaches are complimentary with differing system wide tradeoffs. Both require some insight into the scheduling disposition of the lock holder, the preemption deferral mechanism more so. If a scheme to expose scheduler state transitions to (or cooperate with) userland locking primitives is being considered, it seems opportune to consider support as well for this scenario. -john -- john.cooper(a)redhat.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Darren Hart on 15 Apr 2010 02:20 dvhltc(a)us.ibm.com wrote: > Now that an advantage can be shown using FUTEX_LOCK_ADAPTIVE over FUTEX_LOCK, > the next steps as I see them are: > > o Try and show improvement of FUTEX_LOCK_ADAPTIVE over FUTEX_WAIT based > implementations (pthread_mutex specifically). I've spent a bit of time on this, and made huge improvements through some simple optimizations of the testcase lock/unlock routines. I'll be away for a few days and wanted to let people know where things stand with FUTEX_LOCK_ADAPTIVE. I ran all the tests with the following options: -i 1000000 -p 1000 -d 20 where: -i iterations -p period (in instructions) -d duty cycle (in percent) MECHANISM KITERS/SEC ---------------------------------- pthread_mutex_adaptive 1562 FUTEX_LOCK_ADAPTIVE 1190 pthread_mutex 1010 FUTEX_LOCK 532 I took some perf data while running each of the above tests as well. Any thoughts on getting more from perf are appreciated, this is my first pass at it. I recorded with "perf record -fg" and snippets of "perf report" follow: FUTEX_LOCK (not adaptive) spends a lot of time spinning on the futex hashbucket lock. # Overhead Command Shared Object Symbol # ........ .......... .................. ...... # 40.76% futex_lock [kernel.kallsyms] [k] _raw_spin_lock | --- _raw_spin_lock | |--62.16%-- do_futex | sys_futex | system_call_fastpath | syscall | |--31.05%-- futex_wake | do_futex | sys_futex | system_call_fastpath | syscall ... 14.98% futex_lock futex_lock [.] locktest FUTEX_LOCK_ADAPTIVE spends much of its time in the test loop itself, followed by the actual adaptive loop in the kernel. It appears much of our savings over FUTEX_LOCK comes from not contending on the hashbucket lock. # Overhead Command Shared Object Symbol # ........ .......... .................. ...... # 36.07% futex_lock futex_lock [.] locktest | --- locktest | --100.00%-- 0x400e7000000000 9.12% futex_lock perf [.] 0x00000000000eee ... 8.26% futex_lock [kernel.kallsyms] [k] futex_spin_on_owner Pthread Mutex Adaptive spends most of it's time in the glibc heuristic spinning, as expected, followed by the test loop itself. An impressively minimal 3.35% is spent on the hashbucket lock. # Overhead Command Shared Object Symbol # ........ ............... ........................ ...... # 47.88% pthread_mutex_2 libpthread-2.5.so [.] __pthread_mutex_lock_internal | --- __pthread_mutex_lock_internal 22.78% pthread_mutex_2 pthread_mutex_2 [.] locktest ... 15.16% pthread_mutex_2 perf [.] ... ... 3.35% pthread_mutex_2 [kernel.kallsyms] [k] _raw_spin_lock Pthread Mutex (not adaptive) spends much of it's time on the hashbucket lock as expected, followed by the test loop. 33.89% pthread_mutex_2 [kernel.kallsyms] [k] _raw_spin_lock | --- _raw_spin_lock | |--56.90%-- futex_wake | do_futex | sys_futex | system_call_fastpath | __lll_unlock_wake | |--28.95%-- futex_wait_setup | futex_wait | do_futex | sys_futex | system_call_fastpath | __lll_lock_wait ... 16.60% pthread_mutex_2 pthread_mutex_2 [.] locktest These results mostly confirm the expected: the adaptive versions spend more time in their spin loops and less time contending for hashbucket locks while the non-adaptive versions take the hashbucket lock more often, and therefore shore more contention there. I believe I should be able to get the plain FUTEX_LOCK implementation to be much closer in performance to the plain pthread mutex version. I expect much of the work done to benefit FUTEX_LOCK will also benefit FUTEX_LOCK_ADAPTIVE. If that's true, and I can make a significant improvement to FUTEX_LOCK, it wouldn't take much to get FUTEX_LOCK_ADAPTIVE to beat the heuristics spinlock in glibc. It could also be that this synthetic benchmark is an ideal situation for glibc's heuristics, and a more realistic load with varying lock hold times wouldn't favor the adaptive pthread mutex over FUTEX_LOCK_ADAPTIVE by such a large margin. More next week. Thanks, -- Darren Hart IBM Linux Technology Center Real-Time Linux Team -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Pages: 1 2 3 4 5 6 7 Prev: [BUG] x86info fails on 2.6.34 ? Next: [GIT PULL] 9p file system bug fixes for 2.6.34-rc3 |