Prev: [tip:sched/core] sched: move_task_off_dead_cpu(): Take rq->lock around select_fallback_rq()
Next: [tip:perf/core] perf, x86: Add Nehalem programming quirk to Westmere
From: Linus Torvalds on 8 Apr 2010 11:00 On Thu, 8 Apr 2010, Will Deacon wrote: > > I simply used smp_mb() as a way to solve this ARM-specific problem. I think > Russell objects to this largely because this problem affects a particular > scenario of busy-wait loops and changing the definition of cpu_relax() adds > barriers to code that doesn't necessarily require them. How expensive is a smp_mb() on arm? And by "expensive" I don't mean so much performance of the instruction itself (after all, we _are_ just busy-looping), but more about things like power and perhaps secondary effects (does it cause memory traffic, for example?). Also, I have to say that _usually_ the problem with non-timely cache updates in not on the reading side, but on the writing side - ie the other CPU may be buffering writes indefinitely and the writes will go out only as a response to bus cycles or the write buffers filling up. In which case the reader can't really do much about it. But your comment for the "smp_mb()" patch seems to imply that it's literally a matter of cache access priorities: "On the ARM11MPCore processor [where loads are prioritised over stores], spinning in such a loop will prevent the write buffer from draining." and in that case I would say that the correct thing _definitely_ is to make sure that the loop simply is never so tight that. Maybe you can do that without an smp_mb(), by just making whatever "cpu_relax()" does slow enough (something that stalls the pipeline or whatever?) But if smp_mb() is cheap, then that sounds like the right solution. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |