From: Alan Cox on 6 Apr 2010 13:00 > That works for the uncontended case. For the contended case, the waiter > and the owner have to go into the kernel and back out to transfer > ownership. If you insist on doing it that way yes, but knowing the lock owner is likely to be away for a while also lets you do things like punt work either by picking another work package in the meantime, or by queueing the work you can't do on a list the pre-empted lock owner will review before dropping the lock. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on 6 Apr 2010 14:30 On Tue, 6 Apr 2010, Alan Cox wrote: > > > IMO the best solution is to spin in userspace while the lock holder is > > > running, fall into the kernel when it is scheduled out. > > > > That's just not realistic as user space has no idea whether the lock > > holder is running or not and when it's scheduled out without a syscall :) > > Which is the real problem that wants addressing and can be addressed very > cheaply. That would bring us up to par with 1970s RTOS environments ;) Well, 1970's RTOSes had other features as well like preemption disable mechanisms and other interesting stuff. I hope you're not going to propose that next to bring us up to par with those :) Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on 6 Apr 2010 15:40 On Tue, 6 Apr 2010, Alan Cox wrote: > > Do you feel some of these situations would also benefit from some kernel > > assistance to stop spinning when the owner schedules out? Or are you > > saying that there are situations where pure userspace spinlocks will > > always be the best option? > > There are cases its the best option - you are assuming for example that > the owner can get scheduled out. Eg nailing one thread per CPU in some > specialist high performance situations means they can't. Fair enough, but that's not the problem Darren is targeting. > > If the latter, I'd think that they would also be situations where > > sched_yield() is not used as part of the spin loop. If so, then these > > are not our target situations for FUTEX_LOCK_ADAPTIVE, which hopes to > > provide a better informed mechanism for making spin or sleep decisions. > > If sleeping isn't part of the locking construct implementation, then > > FUTEX_LOCK_ADAPTIVE doesn't have much to offer. > > I am unsure about the approach. As Avi says knowing that the lock owner is > scheduled out allows for far better behaviour. It doesn't need complex > per lock stuff or per lock notifier entries on pre-empt either. > > A given task is either pre-empted or not and in the normal case of things > you need this within a process so you've got shared pages anyway. So you > only need one instance of the 'is thread X pre-empted' bit somewhere in a > non swappable page. I fear we might end up with a pinned page per thread to get this working properly and it restricts the mechanism to process private locks. > That gives you something along the lines of > > runaddr = find_run_flag(lock); > do { > while(*runaddr == RUNNING) { > if (trylock(lock)) > return WHOOPEE; > cpu relax > } > yield (_on(thread)); That would require a new yield_to_target() syscall, which either blocks the caller when the target thread is not runnable or returns an error code which signals to go into the slow path. > } while(*runaddr != DEAD); > > > which unlike blindly spinning can avoid the worst of any hit on the CPU > power and would be a bit more guided ? I doubt that the syscall overhead per se is large enough to justify all of the above horror. We need to figure out a more efficient way to do the spinning in the kernel where we have all the necessary information already. Darren's implementation is suboptimal AFAICT. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Darren Hart on 6 Apr 2010 17:30 Darren Hart wrote: > Avi Kivity wrote: > >>> > At 10% >>>> duty cycle you have 25 waiters behind the lock on average. I don't >>>> think this is realistic, and it means that spinning is invoked only >>>> rarely. >>> >>> Perhaps some instrumentation is in order, it seems to get invoked >>> enough to achieve some 20% increase in lock/unlock iterations. >>> Perhaps another metric would be of more value - such as average wait >>> time? >> >> Why measure an unrealistic workload? > > No argument there, thus my proposal for an alternate configuration below. > >>>> I'd be interested in seeing runs where the average number of waiters >>>> is 0.2, 0.5, 1, and 2, corresponding to moderate-to-bad contention. >>>> 25 average waiters on compute bound code means the application needs >>>> to be rewritten, no amount of mutex tweaking will help it. >>> >>> Perhaps something NR_CPUS threads would be of more interest? >> >> That seems artificial. > > How so? Several real world applications use one thread per CPU to > dispatch work to, wait for events, etc. > >> >>> At 10% that's about .8 and at 25% the 2 of your upper limit. I could >>> add a few more duty-cycle points and make 25% the max. I'll kick that >>> off and post the results... probably tomorrow, 10M iterations takes a >>> while, but makes the results relatively stable. >> >> Thanks. But why not vary the number of threads as well? > > Absolutely, I don't disagree that all the variables should vary in order > to get a complete picture. I'm starting with 8 - it takes several hours > to collect the data. While this might be of less interest after today's discussion, I promised to share the results of a run with 8 threads with a wider selection of lower duty-cycles. The results are very poor for adaptive and worse for aas (multiple spinners) compared to normal FUTEX_LOCK. As Thomas and Peter have pointed out, the implementation is sub-optimal. Before abandoning this approach I will see if I can find the bottlenecks and simplify the kernel side of things. My impression is that I am doing a lot more work in the kernel, especially in the adaptive loop, than is really necessary. Both the 8 and 256 Thread plots can be viewed here: http://www.kernel.org/pub/linux/kernel/people/dvhart/adaptive_futex/v4/ -- Darren Hart IBM Linux Technology Center Real-Time Linux Team -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Thomas Gleixner on 6 Apr 2010 19:20
On Tue, 6 Apr 2010, Ulrich Drepper wrote: > On Tue, Apr 6, 2010 at 12:31, Thomas Gleixner <tglx(a)linutronix.de> wrote: > > We need to figure out a more efficient way to > > do the spinning in the kernel where we have all the necessary > > information already. > > Really? The owner information isn't in general available in the > kernel. Futex operation doesn't require the value used to be the PID > (or negative of the PID). That is a dramatic limitation of the > usefulness of futexes. I know that you can do any weird stuff with the futex value, but I don't see the "dramatic" limitation. Care to elaborate ? > At userlevel there is access to other fields of the data structure > which can contain the owner information. > > I would like to see the method using a per-thread pinned page and an > update of a memory location on scheduling. For benchmarking at least. The per thread pinned page would be unconditional, right ? I agree that benchmarking would be interesting, but OTOH I fear that we open up a huge can of worms with exposing scheduler details and the related necessary syscalls like sys_yield_to: User space thread management/scheduling comes to my mind and I hope we agree that we do not want to revisit that. > I agree that a sys_yield_to() syscall would be at the very least > useful as well. But it's useful for other things already. Useful for what ? What are the exact semantics of such a syscall ? How does that fit into the various scheduling constraints ? Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |