Prev: Which is the most beautiful and memorable hardware structure in a ?CPU?
Next: Energy usage per application ?
From: Quadibloc on 23 Apr 2010 03:29 On Apr 21, 7:46 pm, Bengt Larsson <bengtl8....(a)telia.NOSPAMcom> wrote: > Indeed, HP decides how long Itanium is alive. What else could they > use? x86? Well, now that the x86 architecture has available for it the same mainframe-like RAS features that Itanium had all along, that, at least, is a possibility. John Savard
From: Bengt Larsson on 21 Apr 2010 21:46 Robert Myers <rbmyersusa(a)gmail.com> wrote: >On Apr 20, 9:19�pm, timcaff...(a)aol.com (Tim McCaffrey) wrote: > >> >> But, it was pretty obvious that Itanium was dead 4 or 5 years ago. �Why is >> Intel still wasting money? >> > >Because Itanium isn't dead. HP appears to be doing just fine with it. Indeed, HP decides how long Itanium is alive. What else could they use? x86? HP are doing the same thing as when they did HP-PA, except they have outsourced microarchitecture design to Intel.
From: Morten Reistad on 26 Apr 2010 04:43 In article <4BCB4C2A.8080601(a)patten-glew.net>, Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote: >On 4/18/2010 1:36 AM, nmm1(a)cam.ac.uk wrote: >> As I have >> posted before, I favour a heterogeneous design on-chip: >> >> Essentially uninteruptible, user-mode only, out-of-order CPUs >> for applications etc. >> Interuptible, system-mode capable, in-order CPUs for the kernel >> and its daemons. > >This is almost opposite what I would expect. > >Out-of-order tends to benefit OS code more than many user codes. In-order coherent threading benefits manly fairly >stupid codes that run in user space, like multimedia. > >I would guess that you are motivated by something like the following: > >System code tends to have unpredictable branches, which hurt many OOO machines. > >System code you may want to be able to respond to interrupts easily. I am guessing that you believe that OOO has worse >interrupt latency. That is a misconception: OOO tends to have better interrupt latency, since they usually redirect to >the interrupt handler at retirement. However, they lose more work. ...... interesting perspectives deleted .... This general approach about throwing resources at the cpu and at the compiler so we can work around all kinds of stalls has rapidly diminishing returns at this point, with our deep pipelines, pretty large 2-4 levels of cache, and code that is written without regard to deep parallellism. We can win the battle, but we will lose the war if we continue down that path. We must the facts sink in, and that is that the two main challenges for modern processing are the "memory wall" and the "watt per mips" challenge. The memory wall is a profound problem, but bigger and better caches can alleviate it. At the current point, I mean lots and lots of caches, and well interconnected ones too. Return to the risc mindset, and back down a little regarding cpu power, and rather give us lots of them, and lots and lots of cache. It is amazing how well that works. Then we will have to adapt software, which happens pretty fast in the Open Source world nowadays, when there are real performance gains to be had. For the licensing problems, specificially windows, perhaps a hypervisor can address that, and keep the core systems like databases, transaction servers etc. running either under some second OS or directly under the hypervisor, and let windows be a window onto the user code. And I am sure licensing will be adapted if such designs threaten the revenue stream. For the recalcitrant, single thread code I would suggest taking the autotranslation path. Recode-on-the-fly. The Alpha team and Transmeta has proven that this is viable. Or, we may keep a 2-core standard chip for the monolithic code, and add a dozen smaller cores and a big cache for the stuff that is already parallellized. This seems like the path the gpu-coders are taking. Just integrate the GPUs with the rest of the system, and add a hypervisor. -- mrr
From: nmm1 on 26 Apr 2010 05:17 In article <4b0ga7-iqg.ln1(a)laptop.reistad.name>, Morten Reistad <first(a)last.name> wrote: >In article <4BCB4C2A.8080601(a)patten-glew.net>, >Andy \"Krazy\" Glew <ag-news(a)patten-glew.net> wrote: > >This general approach about throwing resources at the cpu and at >the compiler so we can work around all kinds of stalls has rapidly >diminishing returns at this point, with our deep pipelines, pretty >large 2-4 levels of cache, and code that is written without regard >to deep parallellism. > >We can win the battle, but we will lose the war if we continue down >that path. We must the facts sink in, and that is that the two main >challenges for modern processing are the "memory wall" and the "watt >per mips" challenge. Agreed. And we must face up to the fact that a critical part of the problem is that most of the programming languages and paradigms are unsuitable for modern systems (as well as being dire for RAS). >The memory wall is a profound problem, but bigger and better caches >can alleviate it. At the current point, I mean lots and lots of >caches, and well interconnected ones too. I like preloading, but that needs a language and programming paradigm where reasonably reliable preloading is feasible. We know that it can be done, for some programs, and there are known techniques to extend it (though not to all programs, of course). >Return to the risc mindset, and back down a little regarding cpu >power, and rather give us lots of them, and lots and lots of cache. > >It is amazing how well that works. > >Then we will have to adapt software, which happens pretty fast >in the Open Source world nowadays, when there are real performance >gains to be had. Don't bet on it :-( Changing the generated code, yes; changing the language, usually; changing the language concepts and programming paradigms, no. Regards, Nick Maclaren.
From: Morten Reistad on 26 Apr 2010 11:49
In article <8u3s97-9bt2.ln1(a)ntp.tmsw.no>, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: >nmm1(a)cam.ac.uk wrote: >> Well, yes, but that's no different from any other choice. As I have >> posted before, I favour a heterogeneous design on-chip: >> >> Essentially uninteruptible, user-mode only, out-of-order CPUs >> for applications etc. >> Interuptible, system-mode capable, in-order CPUs for the kernel >> and its daemons. > >This forces the OS to effectively become a message-passing system, since >every single os call would otherwise require a pair of migrations >between the two types of cpus. With modern transaction systems, somewhat loosely defined, like most of the kernels, database and server code we will already have to act as a message multiplexer between subsystems. It then becomes critical to arbitrate and schedule the code on the right cpus, and get the access to the right bits of cache. Which is very close to actually doing it as message passing through a blazingly fast fifo in the first place. >I'm not saying this would be bad though, since actual data could still >be passed as pointers... It would possibly save a copy operation or two, but you still have to do the cache and scheduling operations upon reference. The time may have come for message passing systems. -- mrr |