From: Linus Torvalds on


On Thu, 8 Apr 2010, Will Deacon wrote:
>
> I simply used smp_mb() as a way to solve this ARM-specific problem. I think
> Russell objects to this largely because this problem affects a particular
> scenario of busy-wait loops and changing the definition of cpu_relax() adds
> barriers to code that doesn't necessarily require them.

How expensive is a smp_mb() on arm?

And by "expensive" I don't mean so much performance of the instruction
itself (after all, we _are_ just busy-looping), but more about things like
power and perhaps secondary effects (does it cause memory traffic, for
example?).

Also, I have to say that _usually_ the problem with non-timely cache
updates in not on the reading side, but on the writing side - ie the other
CPU may be buffering writes indefinitely and the writes will go out only
as a response to bus cycles or the write buffers filling up. In which case
the reader can't really do much about it.

But your comment for the "smp_mb()" patch seems to imply that it's
literally a matter of cache access priorities:

"On the ARM11MPCore processor [where loads are prioritised over stores],
spinning in such a loop will prevent the write buffer from draining."

and in that case I would say that the correct thing _definitely_ is to
make sure that the loop simply is never so tight that. Maybe you can do
that without an smp_mb(), by just making whatever "cpu_relax()" does slow
enough (something that stalls the pipeline or whatever?)

But if smp_mb() is cheap, then that sounds like the right solution.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/