Prev: VMWare tools killed my Mac OS X ?
Next: Software vs hardware floating-point [was Re: What happened ...]
From: Terje Mathisen on 10 Sep 2009 06:39 MitchAlsup wrote: > Nobody has solved the synchronization problem that will embolden cores > and threads to really deliver on their promise > The closest solution to synchronization has been lobotomized by its > implementation Hmmm..., do I hear the bitter sound of an architect twarthed? :-) > Thus the way forward is threads and cores with incresingly small gains > as the count increases Yes indeed. > To a very large extent: > There is no need for new instructions sets--extensions will occur > naturally > There is no need for new system organizations--software has enough > trouble with the ones it already has > There is no need for new storage organizations--DRAM and disks have a > natural growth path > So, evolution is bound to take place, and has been for a while. > > You see, the problem is not in the instruction sets, processor > organization, cores and threads, nor system organization: The problem > is in the power wall, memory wall, and synchronization wall. Until > solutions to these are found, little can be done except squeeze > another drop of blood from the stone. Personally I am very particular to XADD, i.e. you can use this as a building block to return a unique result to each of a bunch of competing cores. One idea would be to use the 0->1 transition as the Go! signal, and all the others would go into a scaled (exponential?) backoff, depending upon the return value they got. I.e. if you have code that first tries to read the variable, then uses LOCK XADD only after seeing a zero, the backoff path would require seeing zero X times, with X a function of the previous XADD result. This would at least guarantee forward progress, but you still have the problem with N cores all trying to gain ownership of the same cache line. :-( > > But what application is so compelling that it will actually change > which stone is getting squeezed? Graphics is the only one that comes to mind, and here you can mostly program your way around the problem, at least up to ~1K fp lanes. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: hanukas on 10 Sep 2009 06:50 On Sep 7, 11:54 pm, Robert Myers <rbmyers...(a)gmail.com> wrote: > On Sep 7, 3:38 pm, n...(a)cam.ac.uk wrote: > > > Only a complete loon would > > expect current software to do anything useful on large numbers of > > processors, let alone with a new architecture! > > You've such a way with words. Nick. Browsers, which are probably the > OS of the future, are already multi-threaded or soon to be. No longer > does the browser freeze because of some java script in an open tab. > Browsers that don't seize that advantage will fall by the wayside. > The same will happen all over software, and at increasing levels of > fineness of division of labor. > > Robert. So true. The fool that I am, had dual-core PentiumII system back in 1997 and yes, wrote multi-threaded software back then. Sure it's more difficult to debug and trace problems, but the actual design and implementation can be very simple and painless process. This has to be emphasized: In My Opinion, there haven't really been any great pressure to go multi-threaded for most things if the largest benefit has been just more problems to development process. The reason is simple: most target hardware (for desktop x86) has been single-core until the few recent years. There just haven't been that much incentive (or requirement) to do that sort of development except for some very specific software with embarassingly parallelizable problems (video, rendering and that sort of thing). With a single-core system, the biggest reason to go multi-threaded has been possibility of atomizing operations that take a long time to finish. If you have user interface, you don't want it to freeze for 100-300 ms periodically when the program just wants to pre-cache some resources in the background. Sure, you can split the task (=atomize) into small chunks manually, but why, when the OS is well capable of doing that for you (=launch a worker thread or have work queue and one worker thread, whatever). That's the kind of things that you want multi-threading for, even on single-core machine. Now that the path of least resistance for increased computation power is adding units to do computation, Intel, AMD, SUN, IBM and the others have roadmaps full of junk using this paradigm. At this point someone should ask this question from themselves: what's wrong, when some tasks take uncomfortably long time to complete while only fraction of the computational power is utilized? If the mr. software developer is smart, he finds ways to make his software more responsive. Step 1: find smarter ways to do things: don't do stuff you don't need. Do stuff in order that the user gets his feedback as soon as possible (reduce latency). The latency alone won't help if there isn't enough bandwidth to back it up, so.. let's see what multi-core can do for us: Increasing bandwidth is easier: it's easier to for example read 4 jpeg's in their own threads than parallize reading one jpeg. While the throughput is same for both cases, the parallel reading approach is much simpler to program but has in worst case 4x the latency. To fix this a cache with a prefetcher will do nicely. A cache is needed anyway for this kind of application, and prefetcher isn't a bad idea either: trade some memory for perceived latency compensation. Problem solved. IMO; utilizing more cores isn't difficult at all - there just haven't been much incentive to do so in the mainstream software development. It doesn't help that much that there really isn't much need for that kind of "solutions" for most software. But, whenever the developer sees the hourclass icon when testing their own software.. the question must be asked: how to get rid of this waiting? Going multi-threaded shouldn't be the anwer. First do things smart. Reaching out for many cores as first rection to poor performance is for the weak. =) Maybe there is a brute-force for-loop iterating through all combinations when doing a search from data set of hundreds of thousands of objects? Oops! Simple algorithm change will cut processing time from minutes to milliseconds. Throw 16 cores at the problem is waste. Just things everything thinks probably.
From: nmm1 on 10 Sep 2009 07:13 In article <0c2a8ab3-630b-48b7-9e61-26e819e133f7(a)o10g2000yqa.googlegroups.com>, MitchAlsup <MitchAlsup(a)aol.com> wrote: >Reality check::(V 1.0) > >A handful (maybe two) of people are doing architecture as IBM defined >the term with System/360 > . . . Yes. >All of the ILP that can be extracted with reasonable power has been >extracted from the instruction sets and programming langauges in vouge >The memory wall and the power wall have made it impossible to scale as >we have before >Threads scale to a handful per core Yes. >Cores scale to the limits of pin bandwidth (and power envelope) Likely >to be close to a handful of handfuls Actually, not really, because of your next point. >Nobody has solved the synchronization problem that will embolden cores >and threads to really deliver on their promise >The closest solution to synchronization has been lobotomized by its >implementation >Thus the way forward is threads and cores with incresingly small gains >as the count increases Yes. >To a very large extent: >There is no need for new instructions sets--extensions will occur >naturally >There is no need for new system organizations--software has enough >trouble with the ones it already has >There is no need for new storage organizations--DRAM and disks have a >natural growth path >So, evolution is bound to take place, and has been for a while. Here I disagree. The current designs are blocking progress in the most promising directions, which leads to an increasing dependence on the communicating sequential process model and (God help us) globally coherent shared memory. Evolution in the natural world is notorious for heading into dead ends, and major improvement tends to come from extinction and eventual replacement. >You see, the problem is not in the instruction sets, processor >organization, cores and threads, nor system organization: The problem >is in the power wall, memory wall, and synchronization wall. Until >solutions to these are found, little can be done except squeeze >another drop of blood from the stone. The point here is that beating your head against a brick wall is not productive; if you can't climb over it (and we can't), the solution is to go round it. And the current architectures (software AND hardware) are blocking that. >But what application is so compelling that it will actually change >which stone is getting squeezed? THAT'S the right question, all right. No, I don't have an answer. Regards, Nick Maclaren.
From: MitchAlsup on 10 Sep 2009 15:32 On Sep 10, 5:39 am, Terje Mathisen <Terje.Mathi...(a)tmsw.no> wrote: > MitchAlsup wrote: > > Nobody has solved the synchronization problem that will embolden cores > > and threads to really deliver on their promise > > The closest solution to synchronization has been lobotomized by its > > implementation > > Hmmm..., do I hear the bitter sound of an architect twarthed? :-) More like the sound of the architect ignored--with no actual bitterness. But then again, I could have been talking about transactional memory..... Mitch
From: Gavin Scott on 10 Sep 2009 16:19
Mayan Moudgill <mayan(a)bestweb.net> wrote: > Thats completely different than working in a field like theoretical > physics. When I look at the standard model and the mathematics involved > (non-abelian gauge theory with *3* symmetry groups)....argghhh!!! So I've been meaning for a while to make a post about this architecture book I've been reading on and off for a couple months. It describes a system architecture and its implementations which aren't quite your typical comp.arch fare, but it's an area that has many parallels to hardware and software systems in computing. The architecture described is rather old, but new applications are now becoming perhaps the leading area of technology growth in this century. And unlike theoretical physics, it's *surprisingly* accessible to any halfway intelligent reader. I highly recommend this work to everyone here in comp.arch as it gives fascinating insights into very different solutions to the same type of information handling problems that comp.arch normally discusses. I can pretty much guarantee you at least one mind-blowing revelation per chapter based on what I've read so far. http://www.amazon.com/dp/0815341059 G. |