Prev: VMWare tools killed my Mac OS X ?
Next: Software vs hardware floating-point [was Re: What happened ...]
From: Brett Davis on 20 Sep 2009 16:57 In article <qejbb594tjah6s64vff144lickg1m5erat(a)4ax.com>, Emil Naepflein <netnewsegn(a)kabelmail.de> wrote: > On Sun, 20 Sep 2009 06:25:09 GMT, Brett Davis <ggtgp(a)yahoo.com> wrote: > > >Of course when adding PREFETCH slows down your code, that benefit is > >academic. > > I don't agree here. About 10 years ago I did a lot of performance > optimizations for TCP checksum and bcopy on R10K cpus. I got performance > improvements for the this functions of up to 90 %, just by adding PREF > instructions. In total this reduced cpu consumption per transfered TCP > byte by about 30 %. Now I have to point out that the MIPS and PowerPC CPUs I work on are modern embedded designs, and that PREFETCH on these chips is useless. The MIPS R10K was a nosebleed high end RISC chip, and likely implemented PREFETCH in the memory/cache controller, as opposed to using up one of the two read ports, and crippling your memory accesses. As to why PREFETCH is useless on Intel chips, that is outside my experience base. One would think Intel could go to the expense of implementing PREFETCH correctly. It could be used as a benchmark win against AMD. > Of course, this also depends on your hardware, and whether you operate > on data in cache or in memory, and how the memory is organized (UMA, > NUMA, ...).
From: "Andy "Krazy" Glew" on 30 Sep 2009 01:43 Mayan Moudgill wrote: > > I've been reading comp.arch off and on for more than 20 years now. In > the past few years the SNR has deteriorated considerably, and I was > wondering why. Maybe people who used to post at comp.arch are on other > formums? Maybe its that I've gotten a little harder to impress? Then I > thought about the quality of most papers at ISCA and Micro, the fact > that both EDF and MPF have gone away, and I think the rot is not > confined to just comp.arch. Mayan, you would post this just as I am changing jobs, leaving Intel for the second and last time. Not only have I been busy, but it probably would not have been a smart thing for me to post while in transition. But, the fact that I have left Intel says something: it says that I, at least, don't see much opportunity to do interesting computer architecture at Intel. Similarly, the fact that Mitch Alsup also posts to this list, and is not at any CPU company that I am aware of, also says something. > So, whats going on? I'm sure part of it is that the latest generation of > architects is talking at other sites. If so, they haven't told me. (Sob!) Dave Kanter may pitch realworldtech.com, and there's a lot of good stuff there. But as for me, I got my first real computer architecture job mainly because Bob Colwell liked my posts on comp.arch. And I'll end it here. In fact, making sure that I was allowed to post to comp.arch was a major condition for me accepting my new job. > However, equally important is that there are far fewer of them. The > number of companies designing processors has gone down and there are > fewer startups doing processors. So, less architects. Certainly, fewer companies. Probably fewer teams, even though Intel now has more teams than ever before doing CPUs: Oregon big-core, Israel big-core, Atom and Lrb. Not to forget the Intel integrated graphics teams. At Intel, there are probably more people called "computer architects" now than ever before. But, the scope of the job has narrowed. There are a dozen people, probably more, doing the job that I did as a single person on P6. > Within those processors there is less architecture (or micro > architecture) being done; instead, the imperative that clock cycle has > to be driven down leaves less levels of logic per cycle, which in turn > means that the "architecture" has to be simpler. So, less to talk about. Less architecture, I agree. Not necessarily less levels of logic per cycle. The "right hand turn: turned away from such high speed design as Willamette and Prescott. Mitch can talk to this. > There is less low-hanging fruit around; most of the simpler and > obviously beneficial ideas are known, and most other ideas are more > complex and harder to explain/utilize. I believe that there are good new ideas, in both single processor microarchitecture, and in multiprocessor. But we are in a period of retrenchment - one of the downward zigs of the "sawtooth wave" that I described in my Stanford EE380 talk so many years ago. There are several reasons for this, including (1) what Fred Pollack called "The Valley of Death" for applications. Many of the applications that I can imagine wanting improved - I want a computer that can think, talk, anticipate my needs - are still a few years out, maybe decades, in terms of CPU power, but also data access, organization, and just plain programming. (2) Low Power and Small Form Factor: combine with this that I don't really want those applications on a desktop or laptop PC. I want those applications on a cell phone - and the cell phone is the largest, highest power, device I want. I want those applications on an ear bud whispering into my ear. I want those applications on glasses drawing into my eyes, or on smart contact lenses. In part we are waiting for these new form factors to be created. In part, the new generation of low power devices - Atom, ARM - are recapitulating CPU evolution, in much the same way microprocessors recapitulated the evolution of mainframes and minicomputers. It's not clear if we ever really surpassed them - while I think that the most recent members of the Intel P6 family surpassed the most advanced IBM mainframe processors people rumor about Poughkeepsie, I'm not sure. Anyway, ARM and Atom reset to simple in-order processors, and are climbing the complexity ladder again. ARM Cortex A9 has, at least, reached the OOO level. When will they surpass the desktop and lapptop microprocessors? Mike Haertel says that the big value of Atom was allowing Intel to take a step back, and then get onto the Moore's Law curve again, for a few years. (3) Simple Applications and Parallelism: Also, since about 1999 many of the most important applications have been simple: video, multimedia. Relatively brute force algorithms. MPEG, rectangular blocks. Not model based. Easy to run SIMD vectors on. Easy to parallelize. Not very smart. Graphics algorithms have been much the same, in the earlier generation of GPUs. We are just now getting to the point where flexibly programmable shader engines are in GPUs. Couple this to the fact that there are throughput applications, and we have been in a space where there was more value, and certainly less project and career risk, in increasing the number of cores, making relatively minor modifications to the existing cores, than in improving the cores. And this will go on, until the low hanging fruit in multicore is taken up - by 4 or 8 processors per chip. Beyond that... well, most server guys don't want many more processors per chip; they want more powerful processors. Somewhere, I suspect soon, multicore will run out of steam. Although, as I have said before, there are applications that can use many, many, CPUs. Graphics, if nothing else; and there are others. So it may be that we switch our collective mindshare from multicore to manycore. > A larger number of decisions are being driven by the details of the > process, libraries and circuit families. This stuff is less accessible > to a non-practitioner, and probably propietary to boot. > > A lot of the architecture that is being done is application-specific. > Consequently, its probably more apt to be discussed in > comp.<application> than comp.arch. A lot of the trade-offs will make > sense only in that context. > > Basically, I think the field has gotten more complicated and less > accessible to the casual reader (or even the gifted well read amateur). > The knowledge required of a computer architect have increased to the > point that its probably impossible to acquire even a *basic* grounding > in computer architecture outside of actually working in the field > developing a processor or _possibly_ studying with one of a few PhD > programs. The field has gotten to the point where it _may_ require > architects to specialize in different application areas; a lot of the > skills transfer, but it still requires retraining to move from, say, > general-purpose processors to GPU design. > > I look around and see a handful of guys posting who've actually been > doing computer architecture. But its a shrinking pool.... > > Ah, well - I guess I can always go hang out at alt.folklore.computers. I may have to do that as well. Work on my book-and-wiki-site.
From: "Andy "Krazy" Glew" on 30 Sep 2009 01:47 Mayan Moudgill wrote: > > I've been reading comp.arch off and on for more than 20 years now. In > the past few years the SNR has deteriorated considerably, and I was > wondering why. Maybe people who used to post at comp.arch are on other > formums? Maybe its that I've gotten a little harder to impress? Then I > thought about the quality of most papers at ISCA and Micro, the fact > that both EDF and MPF have gone away, and I think the rot is not > confined to just comp.arch. Mayan, you would post this just as I am changing jobs, leaving Intel for the second and last time. Not only have I been busy, but it probably would not have been a smart thing for me to post while in transition. But, the fact that I have left Intel says something: it says that I, at least, don't see much opportunity to do interesting computer architecture at Intel. Similarly, the fact that Mitch Alsup also posts to this list, and is not at any CPU company that I am aware of, also says something. > So, whats going on? I'm sure part of it is that the latest generation of > architects is talking at other sites. If so, they haven't told me. (Sob!) Dave Kanter may pitch realworldtech.com, and there's a lot of good stuff there. But as for me, I got my first real computer architecture job mainly because Bob Colwell liked my posts on comp.arch. And I'll end it here. In fact, making sure that I was allowed to post to comp.arch was a major condition for me accepting my new job. > However, equally important is that there are far fewer of them. The > number of companies designing processors has gone down and there are > fewer startups doing processors. So, less architects. Certainly, fewer companies. Probably fewer teams, even though Intel now has more teams than ever before doing CPUs: Oregon big-core, Israel big-core, Atom and Lrb. Not to forget the Intel integrated graphics teams. At Intel, there are probably more people called "computer architects" now than ever before. But, the scope of the job has narrowed. There are a dozen people, probably more, doing the job that I did as a single person on P6. > Within those processors there is less architecture (or micro > architecture) being done; instead, the imperative that clock cycle has > to be driven down leaves less levels of logic per cycle, which in turn > means that the "architecture" has to be simpler. So, less to talk about. Less architecture, I agree. Not necessarily less levels of logic per cycle. The "right hand turn: turned away from such high speed design as Willamette and Prescott. Mitch can talk to this. > There is less low-hanging fruit around; most of the simpler and > obviously beneficial ideas are known, and most other ideas are more > complex and harder to explain/utilize. I believe that there are good new ideas, in both single processor microarchitecture, and in multiprocessor. But we are in a period of retrenchment - one of the downward zigs of the "sawtooth wave" that I described in my Stanford EE380 talk so many years ago. There are several reasons for this, including (1) what Fred Pollack called "The Valley of Death" for applications. Many of the applications that I can imagine wanting improved - I want a computer that can think, talk, anticipate my needs - are still a few years out, maybe decades, in terms of CPU power, but also data access, organization, and just plain programming. (2) Low Power and Small Form Factor: combine with this that I don't really want those applications on a desktop or laptop PC. I want those applications on a cell phone - and the cell phone is the largest, highest power, device I want. I want those applications on an ear bud whispering into my ear. I want those applications on glasses drawing into my eyes, or on smart contact lenses. In part we are waiting for these new form factors to be created. In part, the new generation of low power devices - Atom, ARM - are recapitulating CPU evolution, in much the same way microprocessors recapitulated the evolution of mainframes and minicomputers. It's not clear if we ever really surpassed them - while I think that the most recent members of the Intel P6 family surpassed the most advanced IBM mainframe processors people rumor about Poughkeepsie, I'm not sure. Anyway, ARM and Atom reset to simple in-order processors, and are climbing the complexity ladder again. ARM Cortex A9 has, at least, reached the OOO level. When will they surpass the desktop and lapptop microprocessors? Mike Haertel says that the big value of Atom was allowing Intel to take a step back, and then get onto the Moore's Law curve again, for a few years. (3) Simple Applications and Parallelism: Also, since about 1999 many of the most important applications have been simple: video, multimedia. Relatively brute force algorithms. MPEG, rectangular blocks. Not model based. Easy to run SIMD vectors on. Easy to parallelize. Not very smart. Graphics algorithms have been much the same, in the earlier generation of GPUs. We are just now getting to the point where flexibly programmable shader engines are in GPUs. Couple this to the fact that there are throughput applications, and we have been in a space where there was more value, and certainly less project and career risk, in increasing the number of cores, making relatively minor modifications to the existing cores, than in improving the cores. And this will go on, until the low hanging fruit in multicore is taken up - by 4 or 8 processors per chip. Beyond that... well, most server guys don't want many more processors per chip; they want more powerful processors. Somewhere, I suspect soon, multicore will run out of steam. Although, as I have said before, there are applications that can use many, many, CPUs. Graphics, if nothing else; and there are others. So it may be that we switch our collective mindshare from multicore to manycore. > A larger number of decisions are being driven by the details of the > process, libraries and circuit families. This stuff is less accessible > to a non-practitioner, and probably propietary to boot. > > A lot of the architecture that is being done is application-specific. > Consequently, its probably more apt to be discussed in > comp.<application> than comp.arch. A lot of the trade-offs will make > sense only in that context. > > Basically, I think the field has gotten more complicated and less > accessible to the casual reader (or even the gifted well read amateur). > The knowledge required of a computer architect have increased to the > point that its probably impossible to acquire even a *basic* grounding > in computer architecture outside of actually working in the field > developing a processor or _possibly_ studying with one of a few PhD > programs. The field has gotten to the point where it _may_ require > architects to specialize in different application areas; a lot of the > skills transfer, but it still requires retraining to move from, say, > general-purpose processors to GPU design. > > I look around and see a handful of guys posting who've actually been > doing computer architecture. But its a shrinking pool.... > > Ah, well - I guess I can always go hang out at alt.folklore.computers. I may have to do that as well. Work on my book-and-wiki-site.
From: "Andy "Krazy" Glew" on 30 Sep 2009 02:06 Tim McCaffrey wrote: > In article > <da524b6d-bc4d-4ad7-9786-3672f7e9e52c(a)j19g2000yqk.googlegroups.com>, > MitchAlsup(a)aol.com says... >> On Sep 10, 10:04=A0pm, Mayan Moudgill <ma...(a)bestweb.net> wrote: >>> Well, synchronization can be pretty easy to implement - depends on what >>> you are trying to accomplish with it (barriers, exclusion, queues, >>> etc.). >> If it is so easy to implement then why are (almost) all >> synchronization models at lest BigO( n**2 ) in time? per unit of >> observation. That is, it takes a minimum of n**2 memory accesses for 1 >> processor to recognize that it is the processor that can attempt to >> make forward progress amongst n contending processors/threads. Although my MS thesis was one of the first to make this observation of O(n^2) work, it also points out that there are O(1) algos, chiefly among the queue based locks. I liked Graunke Thakkar, but MCS gets the acclaim.
From: "Andy "Krazy" Glew" on 30 Sep 2009 02:18
Robert Myers wrote: > Chrome creates a separate process for each tab, and I have *usually* > been able to regain control by killing a single process. Hallelujah! Processes are the UNIX way. I may have to start using Chrome. |