Prev: a potential lisp convert, and interpreting the shootout
Next: ANN: ABLE 0.1, A Basic Lisp Editor
From: Tim Bradshaw on 11 Jan 2007 16:03 mark.hoemmen(a)gmail.com wrote: > Intel's proposed 80-core architecture will have DRAM attached to each > core -- sort of how Cell has "local stores" attached to each SPE. > That's how they plan to solve the BW problem -- amortize it over all > the cores. Don't we call that `cache' normally? (yes, I know, they'll be *big* caches, but only big by today's standards, in the same sense that today's machines have as much cache as yesterday's had main memory.)
From: Pascal Bourguignon on 11 Jan 2007 18:14 "Tim Bradshaw" <tfb+google(a)tfeb.org> writes: > mark.hoemmen(a)gmail.com wrote: > >> Intel's proposed 80-core architecture will have DRAM attached to each >> core -- sort of how Cell has "local stores" attached to each SPE. >> That's how they plan to solve the BW problem -- amortize it over all >> the cores. > > Don't we call that `cache' normally? (yes, I know, they'll be *big* > caches, but only big by today's standards, in the same sense that > today's machines have as much cache as yesterday's had main memory.) Well, the fact that L1 and L2 caches are totally transparent to the programmer and the HD cache somewhat less is no reason to distinguish them. You've probably already seen this pyramid with the registers in the top corner, above layers of memories, L1, L2 and now L3, the RAM, the HD, the tapes, etc. We could also add layers for the Internet and the physical world. RAM is used as cache for the HD. HD is used as cache for the big storage repositories on tapes or CD, or for the Internet. The Internet is used as a cache for the real world. Our computers don't need robotic extensions to access information in the real world, because the real world is cached into the Internet. (Well, it may be useful to have these robotic extensions to allow the computer access the real world itself, instead of having armies of human filling wikipedia and other pages indexed by google). It's only a matter of OS to hide all these details. Use mmap instead of open/read/write/close. Add an imap(2) and call imap(address,"http://en.wikipedia.org/wiki/Raven"); instead of sending your robotic extensions go watch birds. Of course, it helps to have a big addressing space. Earth is 510,065,600 km�(*), that's 510,065,600e12 mm� or 69 bits to identify each mm� of Earth surface. So we'll have to wait for 128bit processors to be able to mmap every bit of Earth surface into the virtual memory space of our computers. In the meantime, we can just implement our own 128-bit virtual address space, and a mere emap(2) syscall is all what is needed to address the (physical) desktop of your coworkers on another continent, thru remote presence robots. (*) I'm lazy to compute it tonight, so I just copied the number cached in Wikipedia; beware! ;-) -- __Pascal Bourguignon__ http://www.informatimago.com/ HEALTH WARNING: Care should be taken when lifting this product, since its mass, and thus its weight, is dependent on its velocity relative to the user.
From: Madhu on 11 Jan 2007 22:48 * Maciek Pasternacki <87r6u1plm6.fsf(a)lizard.king> : | On Sweetmorn, Chaos 11, 3173 YOLD, Juan R. wrote: | |>> | If you want to analyse chess positions you can never have too |>> | much speed and it has nothing to do with rendering. I'm sure |>> | it's the same situation with go and many other games. |>> |>> But having more than one core will not be a benefit if your |>> algorithms are graph based and have to search a tree. IIRC most |>> graph algorithms (dfs bfs) are inherently unparallelizable. |> |> And did not a parallel search tree could distribute subtree search |> between cores at each branching point? [...] | single thread would work like: | (loop | (if *node-queue* | (let ((node (dequeue *node-queue*))) | (do-something-with node) | (dolist (subnode (children node)) | (enqueue subnode *node-queue*))) | (return)) | | Search would start with enqueuing root node, and would end by any | thread setting *node-queue* to NIL. This would be parallelizable | over any number of cores (supposing one doesn't care about exact DFS | search order -- but if one cared about order, one wouldn't have | considered parallelizing). Your stopping criterion will have to be different. Also, if your input is not a tree, this algorithm will expand the same node multiple times. This [inefficiency] can be done in parallel, of course :) Which is why order tends to be important in DFS, and why it is unsuitable for decomposition. Of course, as others have noted, once the leaves are reached there are usually gains to be made. The point I wanted to make was akin to that in chemistry, where the overall rate of a reaction is limited by the rate of the slowest step. (The slowest step here being walking the graph) -- Madhu
From: Rob Warnock on 11 Jan 2007 23:06 Tim Bradshaw <tfb+google(a)tfeb.org> wrote: +--------------- | Chris Barts wrote: | > How many people have forgotten that 'code' is a mass noun and, as such, | > does not take plurals? Do you also say 'these muds' and 'these dusts'? | | How many people have forgotten that *language changes over time* and is | not something handed down from the elder days, never to be changed? | The sense of `codes' I gave is very common in the HPC community where | "a code" typically refers to something approximating to a particular | implementation of an algorithm. The plural use, which is more common, | means something like "implementations of algorithms". +--------------- Yup. Far too much of the HPC market consists of simply rerunning 1960's "dusty deck" codes with different inputs and larger array dimensions. -Rob ----- Rob Warnock <rpw3(a)rpw3.org> 627 26th Avenue <URL:http://rpw3.org/> San Mateo, CA 94403 (650)572-2607
From: George Neuner on 11 Jan 2007 23:08
On 11 Jan 2007 13:03:57 -0800, "Tim Bradshaw" <tfb+google(a)tfeb.org> wrote: >mark.hoemmen(a)gmail.com wrote: > >> Intel's proposed 80-core architecture will have DRAM attached to each >> core -- sort of how Cell has "local stores" attached to each SPE. >> That's how they plan to solve the BW problem -- amortize it over all >> the cores. > >Don't we call that `cache' normally? (yes, I know, they'll be *big* >caches, but only big by today's standards, in the same sense that >today's machines have as much cache as yesterday's had main memory.) Well, on Cells the private memories are not cache but staging memories .... the main processor has to move data into and out of them on behalf of the coprocessors. It's very similar to the multi-level memory system used on the old Cray's where the CPU had to fetch and organize data to feed the array processors and store the results back to the shared main memory. AFAIK, no one has tried to offer a hardware solution to staging computations in a distributed memory system since the KSR1 (circa 1990, which failed due to the company's creative bookkeeping rather than the machine's technology). Everyone now relies on software approaches like MPI and PVM. George -- for email reply remove "/" from address |