From: Ingo Molnar on 4 Mar 2010 15:30 * Linus Torvalds <torvalds(a)linux-foundation.org> wrote: > > On Thu, 4 Mar 2010, Ingo Molnar wrote: > > > > - SA_NOFPU: on x86 to skip the FPU/SSE save/restore, for such fast in/out special > > purpose signal handlers? (can whip up a quick patch for you if you want) > > I'd love to do this, but it's wrong. > > It's too damn easy to use the FPU by mistake in user land, without ever > being aware of it. memset()/memcpy are obvious potential users SSE, but they > might be called in non-obvious ways implicitly by the compiler (ie structure > copy and setup). > > And modern glibc ends up using SSE4 even for things like strstr and strlen, > so it really is creeping into all kinds of trivial helper functions that > might not be obvious. So SA_NOFPU is a lovely idea, but it's also an idea > that sucks rotten eggs in practice, with quite possibly the same _binary_ > working or not working depending on what kind of CPU and what shared library > it happens to be using. > > Too damn fragile, in other words. > > (Now, if it's accompanied by the kernel actually _testing_ that there is no > FPU activity, by setting the TS flag and checking at fault time and causing > a SIGFPE, then that would be better. At least you'd get a nice clear signal > rather than random FPU state corruption. But you're still in the situation > that now the binary might work on some machines and setups, and not on > others. Perhaps NOFPU could do lazy context saving: clear the TS flag and only save the FPU state if it's actually used by the signal handler? This turns it into a 'hint', not into an FPU state corruption issue. Clearing/enabling FPU instructions is still faster than a full-blown FPU context save/restore. Careful and lightweight signal handlers (like a GC scheme would likely be) would thus be faster. In the worst-case it incures an extra trap and a (measurable/profilable) slowdown. In any case this would be a secondary optimization - the biggest difference i'd expect from the 'dont wake up the world' logic: > > - SA_RUNNING: a way to signal only running threads - as a way for user-space > > based concurrency control mechanisms to deschedule running threads (or, like > > in your case, to implement barrier / garbage collection schemes). > > Hmm. This sounds less fundamentally broken, but at the same time also _way_ > more invasive in the signal handling layer. It's already one of our more > "exciting" layers out there. Yeah, definitely. But i still tend to think it should be actively tried, at which point we can still say 'yuck this cannot work, lets go for the sys_membarrier() solution'. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 6 Mar 2010 14:50 On Thu, 4 Mar 2010, Ingo Molnar wrote: > > Perhaps NOFPU could do lazy context saving: clear the TS flag and only save > the FPU state if it's actually used by the signal handler? If we can get that working reliably, we probably shouldn't use NOFPU at all, and we should just do it unconditionally. That big (and almost always pointless) FPU state save is a _big_ performance issue on signal handling, and if we can do it lazily, we should. However, I'm not at all convinced we can do this reliably. How do we detect the "signal frame is dead" case with things like siglongjmp() etc? And if we can't detect that "frame no longer exists", we can't really do the lazy context saving. Now, there's _also_ the issue of the signal handler function possibly actually looking at the FPU state on the stack, and for that, a SA_NOFPU would be a good way to say "you can't do that". So it's possible that even if we could reliably detect the frame liveness we'd really have to use that new flag anyway. But if we do need a SA_NOFPU flag, then that means that basically no app will use it, and it will be some special case for some really unusual library. So I really don't think this whole thing is worth it unless you could do it automatically. (The "user accesses the frame" case _could_ possibly be handled by pointing the FP frame to a special faulting location, and never nesting the FP optimization. Nested signal handlers are unusual enough that they aren't worth optimizing for anyway. So I'm sure that there are possible solutions for "automatically just do the right thing" in theory, but I suspect they get rather complex) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on 9 Mar 2010 02:10 On Sat, Mar 06, 2010 at 11:43:26AM -0800, Linus Torvalds wrote: > > > On Thu, 4 Mar 2010, Ingo Molnar wrote: > > > > Perhaps NOFPU could do lazy context saving: clear the TS flag and only save > > the FPU state if it's actually used by the signal handler? > > If we can get that working reliably, we probably shouldn't use NOFPU at > all, and we should just do it unconditionally. That big (and almost always > pointless) FPU state save is a _big_ performance issue on signal handling, > and if we can do it lazily, we should. > > However, I'm not at all convinced we can do this reliably. How do we > detect the "signal frame is dead" case with things like siglongjmp() etc? > > And if we can't detect that "frame no longer exists", we can't really do > the lazy context saving. > > Now, there's _also_ the issue of the signal handler function possibly > actually looking at the FPU state on the stack, and for that, a SA_NOFPU > would be a good way to say "you can't do that". So it's possible that even > if we could reliably detect the frame liveness we'd really have to use > that new flag anyway. > > But if we do need a SA_NOFPU flag, then that means that basically no app > will use it, and it will be some special case for some really unusual > library. So I really don't think this whole thing is worth it unless you > could do it automatically. The library is librcu, which I suspect will become quite important for parallel programming in future (maybe I hope for too much). But maybe it's better to not merge _any_ librcu special case until we see results from programs using it. More general speedups or features (that also help librcu) is a different story. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on 16 Mar 2010 03:40 * Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> wrote: > * Mathieu Desnoyers (mathieu.desnoyers(a)efficios.com) wrote: > > * Linus Torvalds (torvalds(a)linux-foundation.org) wrote: > > > > - SA_RUNNING: a way to signal only running threads - as a way for user-space > > > > based concurrency control mechanisms to deschedule running threads (or, like > > > > in your case, to implement barrier / garbage collection schemes). > > > > > > Hmm. This sounds less fundamentally broken, but at the same time also > > > _way_ more invasive in the signal handling layer. It's already one of our > > > more "exciting" layers out there. > > > > > > > Hrm, thinking about it a bit further, the only way I see we could provide a > > usable SA_RUNNING flag would be to add hooks to the scheduler. These hooks would > > somehow have to call user-space code (!) when scheduling in/out a thread. Yes, > > this sounds utterly broken (since these hooks would have to be preemptable). > > > > The idea is this: if we look, for instance, at the kernel preemptable RCU > > implementations, they consist of two parts: one is iteration on all CPUs to > > consider all active CPUs, and the other is a modification of the scheduler to > > note all preempted tasks that were in a preemptable RCU C.S.. > > > > Just for the memory barrier we consider for sys_membarrier(), I had to ensure > > that the scheduler issues memory barriers to order accesses to user-space memory > > and mm_cpumask modifications. In reality, what we are doing is to ensure that > > the operation required on the running thread is done by the scheduler too when > > scheduling in/out the task. > > > > As soon as we have signal handlers which perform more than a simple memory > > barrier (e.g. something that has side-effects outside of the processor), I > > doubt it would ever make sense to only run the handler on running threads > > unless we have hooks in the scheduler too. > > Unless this question is answered, Ingo's SA_RUNNING signal proposal, as > appealing as it may look at a first glance, falls into the "fundamentally > broken" category. [...] How is it different from your syscall? I.e. which lines of code make the difference? We could certainly apply the (trivial) barrier change to context_switch(). Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on 16 Mar 2010 04:00 On Tue, Mar 16, 2010 at 08:36:35AM +0100, Ingo Molnar wrote: > > * Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> wrote: > > > * Mathieu Desnoyers (mathieu.desnoyers(a)efficios.com) wrote: > > > * Linus Torvalds (torvalds(a)linux-foundation.org) wrote: > > > > > - SA_RUNNING: a way to signal only running threads - as a way for user-space > > > > > based concurrency control mechanisms to deschedule running threads (or, like > > > > > in your case, to implement barrier / garbage collection schemes). > > > > > > > > Hmm. This sounds less fundamentally broken, but at the same time also > > > > _way_ more invasive in the signal handling layer. It's already one of our > > > > more "exciting" layers out there. > > > > > > > > > > Hrm, thinking about it a bit further, the only way I see we could provide a > > > usable SA_RUNNING flag would be to add hooks to the scheduler. These hooks would > > > somehow have to call user-space code (!) when scheduling in/out a thread. Yes, > > > this sounds utterly broken (since these hooks would have to be preemptable). > > > > > > The idea is this: if we look, for instance, at the kernel preemptable RCU > > > implementations, they consist of two parts: one is iteration on all CPUs to > > > consider all active CPUs, and the other is a modification of the scheduler to > > > note all preempted tasks that were in a preemptable RCU C.S.. > > > > > > Just for the memory barrier we consider for sys_membarrier(), I had to ensure > > > that the scheduler issues memory barriers to order accesses to user-space memory > > > and mm_cpumask modifications. In reality, what we are doing is to ensure that > > > the operation required on the running thread is done by the scheduler too when > > > scheduling in/out the task. > > > > > > As soon as we have signal handlers which perform more than a simple memory > > > barrier (e.g. something that has side-effects outside of the processor), I > > > doubt it would ever make sense to only run the handler on running threads > > > unless we have hooks in the scheduler too. > > > > Unless this question is answered, Ingo's SA_RUNNING signal proposal, as > > appealing as it may look at a first glance, falls into the "fundamentally > > broken" category. [...] > > How is it different from your syscall? I.e. which lines of code make the > difference? We could certainly apply the (trivial) barrier change to > context_switch(). I think it is just easy for userspace to misuse or think it does something that it doesn't (because of races). If a context switch includes a barrier, then it is easy to know that either the task of interest will execute the barrier, or it will have context switched. What more complex operation could be done in the signal handler that isn't broken by races? Programs that use realtime scheduling policies, and maybe some statistical or heuristic operations... Any cool use that would make anybody other than librcu bother using it? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: drivers: isdn: get rid of custom strtoul() Next: KVM: x86: Kick VCPU outside PIC lock again |