From: nmm1 on 26 Apr 2010 05:17 In article <4b0ga7-iqg.ln1(a)laptop.reistad.name>, Morten Reistad <first(a)last.name> wrote: >In article <4BCB4C2A.8080601(a)patten-glew.net>, >Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote: > >This general approach about throwing resources at the cpu and at >the compiler so we can work around all kinds of stalls has rapidly >diminishing returns at this point, with our deep pipelines, pretty >large 2-4 levels of cache, and code that is written without regard >to deep parallellism. > >We can win the battle, but we will lose the war if we continue down >that path. We must the facts sink in, and that is that the two main >challenges for modern processing are the "memory wall" and the "watt >per mips" challenge. Agreed. And we must face up to the fact that a critical part of the problem is that most of the programming languages and paradigms are unsuitable for modern systems (as well as being dire for RAS). >The memory wall is a profound problem, but bigger and better caches >can alleviate it. At the current point, I mean lots and lots of >caches, and well interconnected ones too. I like preloading, but that needs a language and programming paradigm where reasonably reliable preloading is feasible. We know that it can be done, for some programs, and there are known techniques to extend it (though not to all programs, of course). >Return to the risc mindset, and back down a little regarding cpu >power, and rather give us lots of them, and lots and lots of cache. > >It is amazing how well that works. > >Then we will have to adapt software, which happens pretty fast >in the Open Source world nowadays, when there are real performance >gains to be had. Don't bet on it :-( Changing the generated code, yes; changing the language, usually; changing the language concepts and programming paradigms, no. Regards, Nick Maclaren.
From: Morten Reistad on 26 Apr 2010 11:49 In article <8u3s97-9bt2.ln1(a)ntp.tmsw.no>, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: >nmm1(a)cam.ac.uk wrote: >> Well, yes, but that's no different from any other choice. As I have >> posted before, I favour a heterogeneous design on-chip: >> >> Essentially uninteruptible, user-mode only, out-of-order CPUs >> for applications etc. >> Interuptible, system-mode capable, in-order CPUs for the kernel >> and its daemons. > >This forces the OS to effectively become a message-passing system, since >every single os call would otherwise require a pair of migrations >between the two types of cpus. With modern transaction systems, somewhat loosely defined, like most of the kernels, database and server code we will already have to act as a message multiplexer between subsystems. It then becomes critical to arbitrate and schedule the code on the right cpus, and get the access to the right bits of cache. Which is very close to actually doing it as message passing through a blazingly fast fifo in the first place. >I'm not saying this would be bad though, since actual data could still >be passed as pointers... It would possibly save a copy operation or two, but you still have to do the cache and scheduling operations upon reference. The time may have come for message passing systems. -- mrr
From: nmm1 on 26 Apr 2010 12:34
In article <o9pga7-sif.ln1(a)laptop.reistad.name>, Morten Reistad <first(a)last.name> wrote: >In article <8u3s97-9bt2.ln1(a)ntp.tmsw.no>, >Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: >> >>> Well, yes, but that's no different from any other choice. As I have >>> posted before, I favour a heterogeneous design on-chip: >>> >>> Essentially uninteruptible, user-mode only, out-of-order CPUs >>> for applications etc. >>> Interuptible, system-mode capable, in-order CPUs for the kernel >>> and its daemons. >> >>This forces the OS to effectively become a message-passing system, since >>every single os call would otherwise require a pair of migrations >>between the two types of cpus. > >With modern transaction systems, somewhat loosely defined, like most >of the kernels, database and server code we will already have to act >as a message multiplexer between subsystems. It then becomes critical >to arbitrate and schedule the code on the right cpus, and get the access >to the right bits of cache. > >Which is very close to actually doing it as message passing through >a blazingly fast fifo in the first place. Actually, I think that the latter would be faster! >>I'm not saying this would be bad though, since actual data could still >>be passed as pointers... > >It would possibly save a copy operation or two, but you still have >to do the cache and scheduling operations upon reference. Yes, but think what you gain with cache use. In my experience, being interrupted is very costly to the interrupted process, ESPECIALLY when the interrupt is completely unrelated! It usually poisons the whole of the L1 cache and often the TLB and L2 as well. I saw one process doing heavy I/O (over a network) cause a separate one to slow down by a factor of two (in CPU time alone). Separating I/O interrupts onto a separate CPU from the number-crunching improved the system performance no end. >The time may have come for message passing systems. Yes. The more distribution you have, the better they do. Regards, Nick Maclaren. |