Prev: [PATCH 22/43] memblock: Remove memblock_type.size and add memblock.memory_size instead
Next: [PATCH 35/43] memblock: Make memblock_alloc_try_nid() fallback to MEMBLOCK_ALLOC_ANYWHERE
From: Denys Vlasenko on 6 Aug 2010 01:20 On Friday 06 August 2010 00:38, Michal Nazarewicz wrote: > The disadvantage is that the proposed function is 2.5-3.5 bigger. > Those are not big functions though -- we are talking here about > proposed function being below 512. It's a slippery slope. Here's where it ends: glibc has memcpy() function which is "only" 8k of code or so. I'm not joking. > +#if BITS_PER_LONG == 64 > + .... > +#else .... > +/* > + * Based on code by Douglas W. Jones found at > + * <http://www.cs.uiowa.edu/~jones/bcd/decimal.html#sixtyfour>. This > + * performs no 64-bit division and hence should be faster on 32-bit > + * machines then the version of the function above. > + */ > +static noinline_for_stack > +char *put_dec(char *buf, unsigned long long n) > +{ > + uint32_t d3, d2, d1, q; > + > + if (!n) { > + *buf++ = '0'; > + return buf; > + } > + > + d1 = (n >> 16) & 0xFFFF; > + d2 = (n >> 32) & 0xFFFF; > + d3 = (n >> 48) & 0xFFFF; Are you assuming that sizeof(long long) == 8, always? -- vda -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michal Nazarewicz on 6 Aug 2010 03:10 Denys Vlasenko <vda.linux(a)googlemail.com> writes: > On Friday 06 August 2010 00:38, Michal Nazarewicz wrote: >> The disadvantage is that the proposed function is 2.5-3.5 bigger. >> Those are not big functions though -- we are talking here about >> proposed function being below 512. > It's a slippery slope. Here's where it ends: glibc > has memcpy() function which is "only" 8k of code or so. > I'm not joking. I'm aware of that. I assume that someone more clever then me will decide whether to accept this patch or not. (Also we win a few bytes on put_dec_full() and put_dec_8bit()). :P >> +#if BITS_PER_LONG == 64 >> + > ... >> +#else > ... >> +/* >> + * Based on code by Douglas W. Jones found at >> + * <http://www.cs.uiowa.edu/~jones/bcd/decimal.html#sixtyfour>. This >> + * performs no 64-bit division and hence should be faster on 32-bit >> + * machines then the version of the function above. >> + */ >> +static noinline_for_stack >> +char *put_dec(char *buf, unsigned long long n) >> +{ >> + uint32_t d3, d2, d1, q; >> + >> + if (!n) { >> + *buf++ = '0'; >> + return buf; >> + } >> + >> + d1 = (n >> 16) & 0xFFFF; >> + d2 = (n >> 32) & 0xFFFF; >> + d3 = (n >> 48) & 0xFFFF; > > Are you assuming that sizeof(long long) == 8, always? Well... yes. C requires long long to be at least 64-bit and I don't see it being larger in any foreseeable feature. Wouldn't it be enough to put a static assert here? -- Best regards, _ _ .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal "mina86" Nazarewicz (o o) ooo +--<mina86-tlen.pl>--<jid:mina86-jabber.org>--ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Denys Vlasenko on 6 Aug 2010 03:40 On Friday 06 August 2010 09:08, Michal Nazarewicz wrote: > >> +static noinline_for_stack > >> +char *put_dec(char *buf, unsigned long long n) > >> +{ > >> + uint32_t d3, d2, d1, q; > >> + > >> + if (!n) { > >> + *buf++ = '0'; > >> + return buf; > >> + } > >> + > >> + d1 = (n >> 16) & 0xFFFF; > >> + d2 = (n >> 32) & 0xFFFF; > >> + d3 = (n >> 48) & 0xFFFF; > > > > Are you assuming that sizeof(long long) == 8, always? > > Well... yes. C requires long long to be at least 64-bit and I don't > see it being larger in any foreseeable feature. "640k will be enough for everybody"? > Wouldn't it be enough to put a static assert here? I'd prefer the code which works with arbitrarily wide long long. If needed, use if (sizeof(long long) == 8) 64-bit code else generic code -- vda -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michał Nazarewicz on 6 Aug 2010 05:00 On Fri, 06 Aug 2010 09:35:26 +0200, Denys Vlasenko <vda.linux(a)googlemail.com> wrote: > On Friday 06 August 2010 09:08, Michal Nazarewicz wrote: >> >> +static noinline_for_stack >> >> +char *put_dec(char *buf, unsigned long long n) >> >> +{ >> >> + uint32_t d3, d2, d1, q; >> >> + >> >> + if (!n) { >> >> + *buf++ = '0'; >> >> + return buf; >> >> + } >> >> + >> >> + d1 = (n >> 16) & 0xFFFF; >> >> + d2 = (n >> 32) & 0xFFFF; >> >> + d3 = (n >> 48) & 0xFFFF; >> > >> > Are you assuming that sizeof(long long) == 8, always? >> >> Well... yes. C requires long long to be at least 64-bit and I don't >> see it being larger in any foreseeable feature. > > "640k will be enough for everybody"? > >> Wouldn't it be enough to put a static assert here? > > I'd prefer the code which works with arbitrarily wide long long. > If needed, use > > if (sizeof(long long) == 8) > 64-bit code > else > generic code Thanks. That seems like a perfect solution. I rearrange the code and try to post updated version after the weekend. -- Best regards, _ _ | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o | Computer Science, Michał "mina86" Nazarewicz (o o) +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Denys Vlasenko on 10 Aug 2010 00:20
On Sunday 08 August 2010 21:29, Michal Nazarewicz wrote: > Compared to previous version: the code is used only if: > 1. if long long is 64-bit (ie. ULLONG_MAX == 2**64-1), and > 2. user did not select optimisation for size with Kconfig. I measured the size and it does not seem to make sense to exclude it on -Os. On x86: put_dec_full change: 0x93 -> 0x47 bytes put_dec change: 0x12c -> 0x137 bytes IOW, there is net code size reduction (compared to current kernel, it may be a slight growth compared to patch 1). So, please use the optimized code even for CONFIG_CC_OPTIMIZE_FOR_SIZE. > Here are the results (normalised to the fastest/smallest): > : ARM Atom > -- Speed ---------------------------------- > orig_put_dec : 9.333822 2.083110 Original > mod1_put_dec : 9.282045 1.904564 > mod2_put_dec : 9.260409 1.910302 > mod3_put_dec : 9.320053 1.905689 Proposed by previous patch > mod4_put_dec : 9.297146 1.933971 > mod5_put_dec : 13.034318 2.434942 > mod6_put_dec : 1.000000 1.000000 Proposed by this patch > mod7_put_dec : 1.009574 1.014147 > mod8_put_dec : 7.226004 1.953460 > -- Size ----------------------------------- > orig_put_dec : 1.000000 1.000000 Original > mod1_put_dec : 1.000000 1.000000 > mod2_put_dec : 1.361111 1.403226 > mod3_put_dec : 1.000000 1.000000 Proposed by previous patch > mod4_put_dec : 1.361111 1.403226 > mod5_put_dec : 1.000000 1.000000 > mod6_put_dec : 2.555556 3.508065 Proposed by this patch > mod7_put_dec : 2.833333 3.911290 > mod8_put_dec : 2.027778 2.258065 I believe these are old results? Size growth is just too big. > As it can be obsevred, proposed version of the put_dec function is > twice as fast as the original version on Atom and almost 10 times > faster on ARM. I imagine that it may be similar on other "embedded" > processors. > > This may be skewed by the fact that the benchmark is using GCC's > 64-bit division operator instead of kernel's do_div but it would > appear that by avoiding 64-bit division something can be gained. Re speed: on Phenom II in 32-bit mode, I see ~x3.3 speedup on conversions involving large integers (might be skewed by gcc's full-blown 64-bit division in "old" code - kernel's div is smarter). > PS. From Mr. Jones site: "Nonetheless, before relying on the material > here, it would be prudent to check the arithmetic!" hence I checked > all the calculations myself and everything seemed fine. I've also run > test applitacion several times so it tested a few 64-bit numbers..." I tested [0, 100 million] and [2^64-100 million, 2^64-1] ranges. No errors. > +#if BITS_PER_LONG != 32 || defined CONFIG_CC_OPTIMIZE_FOR_SIZE || \ > + ULLONG_MAX != 18446744073709551615ULL I think it's better to say "if BITS_PER_LONG > 32 and ULLONG_MAX > 2^64-1", since it expresses your intent better. Also, add comments explaining what case you optimize for: #if BITS_PER_LONG > 32 || ULLONG_MAX > 18446744073709551615ULL /* Generic code */ .... #else /* BITS_PER_LONG <= 32 && ULLONG_MAX <= 2^64-1 */ /* Optimized code for arches with 64-bit long longs */ .... > +static noinline_for_stack > +char *put_dec(char *buf, unsigned long long n) > +{ > + uint32_t d3, d2, d1, q; > + > + if (!n) { > + *buf++ = '0'; > + return buf; > + } You may as well use the above shortcut for n <= 9, not only for 0. > + buf = put_dec_full4(buf, q % 10000); > + q = q / 10000; > + > + d1 = q + 7671 * d3 + 9496 * d2 + 6 * d1; > + buf = put_dec_full4(buf, d1 % 10000); > + q = d1 / 10000; I experimented with moving division up, before put_dec_full4: q = d1 / 10000; buf = put_dec_full4(buf, d1 % 10000); but gcc appears to be smart emough to do this transformation itself. But you may still do it for older (dumber) gcc's. -- vda -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |