Prev: a potential lisp convert, and interpreting the shootout
Next: ANN: ABLE 0.1, A Basic Lisp Editor
From: George Neuner on 12 Jan 2007 17:17 On 12 Jan 2007 01:35:29 -0800, "Tim Bradshaw" <tfb+google(a)tfeb.org> wrote: >George Neuner wrote: > >> Well, on Cells the private memories are not cache but staging memories >> ... the main processor has to move data into and out of them on behalf >> of the coprocessors. > >It doesn't matter very much who moves the data, it's still cache :-). >The issue that counts, really, is what the programming model is at the >user level. No one should need to care whether things are done >automagically by the hardware as most L1/L2 caches are today, or by >hardware with substantial SW support as, say, MMUs, or almost entirely >by SW with some small amount of HW support, as, say disk paging. >(Actually, the second thing that counts is whether the HW can >efficiently support the programming model you choose.) I have considerable experience with manual staging (on DSPs) and I can tell you that it is a royal PITA to schedule several functional units and keep them going full blast using software alone. Cell is less onerous only because of the granularity of the code the coprocessors can execute - whole functions or miniprograms rather than the baby steps DSP units can take. >> AFAIK, no one has tried to offer a hardware solution to staging >> computations in a distributed memory system since the KSR1 (circa >> 1990, which failed due to the company's creative bookkeeping rather >> than the machine's technology). Everyone now relies on software >> approaches like MPI and PVM. > >Well, I think they have actually, in all but name: that's essentially >what NUMA machines are. Such machines are quite common, of course >(well, for bigger systems anyway): all Sun's recent larger machines (4 >& 5-digit sunfire boxes) are basically NUMA, and it may be that smaller >ones are too. Non Uniform Memory Access simply means different memories have different access times - that describes just about every machine made today. The NUMA model distinguishes between "near" and "far" memories in terms of access time, but does not distinguish by how the memories are connected - a system with fast cache and slower main memory fits the model just as well as one with a butterfly network between CPU and memory. >Of course, as I said above, this comes down to programming model and >how much HW support you need for it. I think the experience of the >last 10-20 years is that a shared memory model (perhaps "shared address >space"?), preferably with cache-coherency, is a substantially easier >thing to program for than a distributed memory model. Whether that will >persist, who knows (I suspect it will, for a surprisingly long time). >Of course the physical memory that underlies this model will become >increasingly distributed, as it already has to a great extent. It's all about the programming model and I think you are on the right track. Shared address space is the right approach, IMO, but further I believe it should be implemented in hardware. That is why I mentioned KSR1 - the only massive multiprocessor I know of that tried to help the programmer. KSR1 was a distributed memory multiprocessor (256..1088 CPUs) with a multilevel caching tree network which provided the programmer with the illusion of a shared memory. The KSR1 ran a version of OSF/1, so software written for any shared memory Unix multiprocessor was relatively easy to port - an important consideration because most people looking to buy a supercomputer were outgrowing a shared memory machine. There was, of course, a penalty paid for the illusion of shared memory. Estimates were that the cache consistency model slowed the machine by 15-25% vs comparable MPI designs, but IMO that was more than made up for by the ease of programming. The second generation KSR2 improved shared memory speeds considerably, but few people ever saw one - the company went belly up before it was formally introduced. George -- for email reply remove "/" from address
From: Tim Bradshaw on 12 Jan 2007 17:20 mark.hoemmen(a)gmail.com wrote: > Rob Warnock wrote: > > "dusty deck" codes > > 1960's codes > > parallel codes > > Recent codes OK, I think this makes the point that "codes" is a common usage in the HPC community. I will expect delivery of Chris Barts' head on a platter tomorrow morning. You can do what you want with the rest of him.
From: Tim Bradshaw on 12 Jan 2007 17:42 George Neuner wrote: > I have considerable experience with manual staging (on DSPs) and I can > tell you that it is a royal PITA to schedule several functional units > and keep them going full blast using software alone. I bet it is! > Non Uniform Memory Access simply means different memories have > different access times - that describes just about every machine made > today. The NUMA model distinguishes between "near" and "far" memories > in terms of access time, but does not distinguish by how the memories > are connected - a system with fast cache and slower main memory fits > the model just as well as one with a butterfly network between CPU and > memory. I agree with this in theory (and of course, all (well, nearly, there have been recent cacheless designs which aimed to hide latency by heavily multithreaded processors) machines are NUMA in that sense. But I think the conventional use for the term is for multiprocessors where all memory is "more local" (in time terms) to some processors than it is to others, and that was the sense in which I was using it. You can think of these kinds of machines as systems where there is only cache memory. It seems to me inevitable that all large machines will become NUMA, if they are not all already. And the nonuniformity will increase over time. My argument is that physically, these machines actually are distributed memory systems, but their programming model is that of a shared memory system. And this illusion is maintained by a combination of hardware (route requests to non-local memory over the interconnect, deal with cache-coherency etc) and system-level software (arrange life so that memory is local to the threads which are using it where that is possible etc). Of course these machines typically are not MPP systems, and are also typically not HPC-oriented. Though I think SGI made NUMA systems with really quite large numbers of processors, and a Sun E25K can have 144 cores (72 2-core processors), though I think it would be quite unusual to run a configuration like that as a single domain. --tim
From: Chris Barts on 13 Jan 2007 00:55 On Fri, 12 Jan 2007 14:20:11 -0800, Tim Bradshaw wrote: > > OK, I think this makes the point that "codes" is a common usage in the > HPC community. I will expect delivery of Chris Barts' head on a > platter tomorrow morning. You can do what you want with the rest of > him. Doesn't matter. It merely means a lot of people are wrong, and a lot of people need frying. -- My address happens to be com (dot) gmail (at) usenet (plus) chbarts, wardsback and translated. It's in my header if you need a spoiler. ----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==---- http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups ----= East and West-Coast Server Farms - Total Privacy via Encryption =----
From: Chris Barts on 13 Jan 2007 00:56
On Thu, 11 Jan 2007 03:37:59 -0800, Tim Bradshaw wrote: > Chris Barts wrote: > >> >> How many people have forgotten that 'code' is a mass noun and, as such, >> does not take plurals? Do you also say 'these muds' and 'these dusts'? > > How many people have forgotten that *language changes over time* and is > not something handed down from the elder days, never to be changed? "Like, wow, dude! Language is whatever I say it is! Crumb buttercake up the windowpane with the black shoehorn butterhorse!" Grow up. -- My address happens to be com (dot) gmail (at) usenet (plus) chbarts, wardsback and translated. It's in my header if you need a spoiler. ----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==---- http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups ----= East and West-Coast Server Farms - Total Privacy via Encryption =---- |