From: Peter Olcott on 20 May 2010 14:07 On 5/20/2010 12:52 PM, James Kanze wrote: > On May 19, 6:45 pm, Peter Olcott<NoS...(a)OCR4Screen.com> wrote: >> On 5/19/2010 12:23 PM, James Kanze wrote: > > [...] >>> And how do you act on an ActionCode? Switch statements and >>> indirect jumps are very, very slow on some machines (but not on >>> others). > >> I could not imagine why they would ever be very slow. I know >> that they are much slower than an ordinary jump because of the >> infrastructure overhead. This is only about one order of >> magnitude or less. > > On some machines (HP PA architecture, for example), any indirect > jump (at the assembler level) will purge the pipeline, resulting > in a considerable slowdown. And the classical implementation of > a dense switch uses a jump table, i.e. an indirect jump. (The > alternative involves a number of jumps, so may not be that fast > either.) > > This is not universal, of course---I've not noticed it on > a Sparc, for example. > > -- > James Kanze Ah now it makes much more sense. back to the cache locality of reference again.
From: Joseph M. Newcomer on 20 May 2010 20:31 Actually, cache locality is only ONE of the parameters. Instruction pipe depth, and as pointed out, pipe flushing; speculative execution (such as the x86s and many other architectures do very well), dynamic register renaming, L2 cache vs. L1 cache, operand prefetching, the depth of the operand lookahead pipe, etc. all come into play. In the case of the x86, these vary widely across families of chips; lower-power (e.g., laptop) chips generally have fewer of these features than server-oriented chipsets (e.g., high-end Xeon, and i9). All these features involve more transistors, and higher clock speeds, and both of these translate into higher power requirements, Little factors like TLB collisions and TLB flush rates can change performance by integer multipliers, not just single-digit percentages. The effects of network traffic and other kernel activities, which impact the pipelines, TLB, caches, etc. can be quite disruptive to pretty models of behavior, even if you manage to model precisely what is going on in the abstract chip set. I've seen my desktop report processing 1K interrupts/second, so 1K times per second my idealized model of cache management gets scrambled by code I have no control over. This is reality. This is why NOTHING matters except MEASURED performance. Not theoretical performance, not performance under some "ideal conditions" model, bit performance predicted by counting instructions or guessing at memory delays, but ACTUAL, MEASURED performance. This is why the only measure of performance is actual execution, and your numbers are valid ONLY on the machine and under the conditions you measure them with, and do not necessarily predict good performance on a different CPU model or different motherboard chipset. joe On Thu, 20 May 2010 13:07:20 -0500, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote: >On 5/20/2010 12:52 PM, James Kanze wrote: >> On May 19, 6:45 pm, Peter Olcott<NoS...(a)OCR4Screen.com> wrote: >>> On 5/19/2010 12:23 PM, James Kanze wrote: >> >> [...] >>>> And how do you act on an ActionCode? Switch statements and >>>> indirect jumps are very, very slow on some machines (but not on >>>> others). >> >>> I could not imagine why they would ever be very slow. I know >>> that they are much slower than an ordinary jump because of the >>> infrastructure overhead. This is only about one order of >>> magnitude or less. >> >> On some machines (HP PA architecture, for example), any indirect >> jump (at the assembler level) will purge the pipeline, resulting >> in a considerable slowdown. And the classical implementation of >> a dense switch uses a jump table, i.e. an indirect jump. (The >> alternative involves a number of jumps, so may not be that fast >> either.) >> >> This is not universal, of course---I've not noticed it on >> a Sparc, for example. >> >> -- >> James Kanze > >Ah now it makes much more sense. back to the cache locality of reference >again. Joseph M. Newcomer [MVP] email: newcomer(a)flounder.com Web: http://www.flounder.com MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Peter Olcott on 20 May 2010 21:20 On 5/20/2010 7:31 PM, Joseph M. Newcomer wrote: > Actually, cache locality is only ONE of the parameters. Instruction pipe depth, and as > pointed out, pipe flushing; speculative execution (such as the x86s and many other > architectures do very well), dynamic register renaming, L2 cache vs. L1 cache, operand > prefetching, the depth of the operand lookahead pipe, etc. all come into play. In the > case of the x86, these vary widely across families of chips; lower-power (e.g., laptop) > chips generally have fewer of these features than server-oriented chipsets (e.g., high-end > Xeon, and i9). All these features involve more transistors, and higher clock speeds, and > both of these translate into higher power requirements, Little factors like TLB > collisions and TLB flush rates can change performance by integer multipliers, not just > single-digit percentages. The effects of network traffic and other kernel activities, > which impact the pipelines, TLB, caches, etc. can be quite disruptive to pretty models of > behavior, even if you manage to model precisely what is going on in the abstract chip set. > I've seen my desktop report processing 1K interrupts/second, so 1K times per second my > idealized model of cache management gets scrambled by code I have no control over. This > is reality. This is why NOTHING matters except MEASURED performance. This is a gross over exaggeration. I once had a non techie boss that wrote a program that read his data from disk fifty times because there was fifty different kinds of data. It could be easily known in advance that there is a much better way to do this. A more accurate statement might be something like unmeasured performance estimates are most often very inaccurate. It is also probably true that faster methods can often be discerned from much slower (at least an order of magnitude) methods without measurement. Because I am so fanatical about optimization, and I have done some further investigation, I am still confident that my UTF-8 recognizer has the fastest possible design. I would agree with you that this statement really doesn't count until proven with working code. > Not theoretical > performance, not performance under some "ideal conditions" model, bit performance > predicted by counting instructions or guessing at memory delays, but ACTUAL, MEASURED > performance. > > This is why the only measure of performance is actual execution, and your numbers are > valid ONLY on the machine and under the conditions you measure them with, and do not > necessarily predict good performance on a different CPU model or different motherboard > chipset. > joe
From: Mihai N. on 21 May 2010 02:08 > Note that they do refer to ISO 10646 (see the footnote on page 19 of > the draft standard (30) which has already been cited in this thread). This is what I was alluding to when I wrote "changing lately." (although not technically Unicode, ISO 10646 is a good subset, kept in sync with Unicode pretty well) (by "subset" I don't mean fewer characters encoded, but "some parts missing", like all the character properties, and all the UTS-es) > The key document is referenced as > http://www.rfc-editor.org/rfc/bcp/bcp47.txt which is > actually RFC5646. This is a lengthy document but worth reading. And that is a bad thing, because RFC5646 is a way to tag languages, not locales. In most cases there is no difference, but with UTS-35 you can say: de-DE(a)collation=phonebook (German-Germany with phonebook sorting) or ar(a)calendar=islamic (Arabic with Islamic calendar) or even ja-JP(a)calendar=japanese;numbers=jpanfin (Japanese-Japan, using the Japanese imperial calendar and Japanese financial numerals) That's something you can't do with RFC5646 (in fact the RFC says "For systems and APIs, language tags form the basis for most implementations of locale identifiers." and they send you to UTS-35 as an example) -- Mihai Nita [Microsoft MVP, Visual C++] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email
From: Mihai N. on 21 May 2010 02:16
> "What is the difference between a computer scientist, a newbie, and a > software engineer?" > > Sounds like a setup for a joke, but it isn't. A little bit like: "In theory, there is no difference between theory and practice. But, in practice, there is." Also sounds like a joke, but it isn't :-) -- Mihai Nita [Microsoft MVP, Visual C++] http://www.mihai-nita.net ------------------------------------------ Replace _year_ with _ to get the real email |