From: H. Peter Anvin on 17 Feb 2010 21:10 On 02/17/2010 05:53 PM, Linus Torvalds wrote: >> >> FWIW, I don't know of any microarchitecture where adc is slower than >> add, *as long as* the setup time for the CF flag is already used up. > > Oh, I think there are lots. > > Look at just about any x86 latency/throughput table, and you'll see: > > - adc latencies are typically much higher than a single cycle > > But you are right that this is likel not an issue on any out-of-order > chip, since the 'stc' will schedule perfectly. > STC actually tends to schedule poorly, since it has a partial register stall. In-order or out-of-order doesn't really matter, though; what matters is that the scoreboarding used for the flags has to settle, or you will take a huge hit. > - but adc _throughput_ is also typically much higher, which indicates > that even if you do flag renaming, the 'adc' quite likely only > schedules in a single ALU unit. > > For example, on a Pentium, adc/sbb can only go in the U pipe, and I think > the same is true of 'stc'. Now, nobody likely cares about Pentiums any > more, but the point is, 'adc' does often have constraints that a regular > 'add' does not, and there's an example of a 'stc+adc' pair would at the > very least have to be scheduled with an instruction in between. No doubt. I doubt it much matters in this context, but either way I think the patch is probably a bad idea... much for the same as my incl hack was - since the code isn't actually inline, saving a handful bytes is not the right tradeoff. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Zachary Amsden on 17 Feb 2010 23:30 > > On 02/17/2010 05:53 PM, Linus Torvalds wrote: > >> - but adc _throughput_ is also typically much higher, which indicates >> that even if you do flag renaming, the 'adc' quite likely only >> schedules in a single ALU unit. >> >> For example, on a Pentium, adc/sbb can only go in the U pipe, and I think >> the same is true of 'stc'. Now, nobody likely cares about Pentiums any >> more, but the point is, 'adc' does often have constraints that a regular >> 'add' does not, and there's an example of a 'stc+adc' pair would at the >> very least have to be scheduled with an instruction in between. >> > No doubt. I doubt it much matters in this context, but either way I > think the patch is probably a bad idea... much for the same as my incl > hack was - since the code isn't actually inline, saving a handful bytes > is not the right tradeoff. > > -hpa > > Incidentally, the cost of putting all the rwsem code inline, using the straightforward approach, for git-tip, using defconfig on x86_64 is 3565 bytes / 20971778 bytes total, or 0.0168%, using gcc 4.4.3. That's small enough to actually consider it. Even smaller if you leave trylock as a function... actually no, that didn't work, size increased. I'm guessing many call sites also end up calling the explicit form as a fallback. If you inline only read_lock functions and write release, nope, that didn't work either. If you inline only read_lock functions, that still isn't it. Many other permutations are possible, but I've wasted enough time. Although, with a more clever inline implementation, if some of the constraints to %rdx go away... Zach -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on 18 Feb 2010 03:20 Zachary Amsden <zamsden(a)redhat.com> writes: > > Incidentally, the cost of putting all the rwsem code inline, using the > straightforward approach, for git-tip, using defconfig on x86_64 is > 3565 bytes / 20971778 bytes total, or 0.0168%, using gcc 4.4.3. The nice advantage of putting lock code inline is that it gets accounted to the caller in all profilers. -Andi -- ak(a)linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Zachary Amsden on 18 Feb 2010 03:30 > > Zachary Amsden<zamsden(a)redhat.com> writes >> Incidentally, the cost of putting all the rwsem code inline, using the >> straightforward approach, for git-tip, using defconfig on x86_64 is >> 3565 bytes / 20971778 bytes total, or 0.0168%, using gcc 4.4.3. >> > The nice advantage of putting lock code inline is that it gets > accounted to the caller in all profilers. > > -Andi > > Unfortunately, only for the uncontended case. The hot case still ends up in a call to the lock text section. Zach -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on 18 Feb 2010 04:30 On Wed, Feb 17, 2010 at 10:24:58PM -1000, Zachary Amsden wrote: >> >> Zachary Amsden<zamsden(a)redhat.com> writes >>> Incidentally, the cost of putting all the rwsem code inline, using the >>> straightforward approach, for git-tip, using defconfig on x86_64 is >>> 3565 bytes / 20971778 bytes total, or 0.0168%, using gcc 4.4.3. >>> >> The nice advantage of putting lock code inline is that it gets >> accounted to the caller in all profilers. >> >> -Andi >> >> > > Unfortunately, only for the uncontended case. The hot case still ends up > in a call to the lock text section. I removed those some time ago because it breaks unwinding. Did that get undone? -Andi -- ak(a)linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: 2.6.33-rc8-git: nouveaufb hangs on boot on MacBookPro5,3 Next: x86-32: panic on !CX8 && XMM |