From: Nick Maclaren on 28 Sep 2006 05:40 In article <%CESg.422$fP5.194(a)news.cpqcorp.net>, Rick Jones <rick.jones2(a)hp.com> writes: |> Casper H.S. Dik <Casper.Dik(a)sun.com> wrote: |> > It all depends on the bandwidth. (Which means it ain't a pretty |> > picture for Intel as long as they keep the FSB) |> |> Is it really just a question of bandwidth? I would have thought that |> application (I'm assuming the system vendors deal with the OSes) |> behaviour would be equally important. Yes and no. The bandwidth is definitely the leading bottleneck. |> How different is having an FSB for a single socket with N cores on the |> chip than having a "link" for a single-socket with N cores on the |> chip? Not at all. |> I would think that as the cores per chip increase, the issues that the |> folks selling large SMP's deal with will become known to the |> single-socket crowd. Yup. They are already hitting them. But, to answer the question: For most workstations, the answer is probably 4 (at least in the near future - see later), because few workloads have more than a few genuinely active threads. Very fancy graphics is another matter. For servers and embarassingly parallel HPC, the answer is until you have saturated the bandwidth (modified by the demands of the applications). Multiple cores is merely a cheaper form of multiple sockets. For genuinely parallel, high communication, applications, the answer is how parallelisable is your application? And the answer to THAT (outside HPC) is generally "2 way, when I am lucky". The last is not a law of nature, but isn't going to change any time soon, as it is caused by the programming paradigms that people use. Regards, Nick Maclaren.
From: Joe Seigh on 28 Sep 2006 07:47 Jon Forrest wrote: > Today I read that we're going to get quad-core processors > in 2007, and 80-core processors in 5 years. This has > got me to wondering where the point of diminishing returns > is for processor cores. .... > Where do you think the point of diminishing returns might > be? > Rethinking this, the question should be what would you do with an unlimited number of processors? For one thing, the operating system would change. Interrupt handlers for asynchronous interrupts would go away. You'd have dedicated, possibly special purpose, processors to handle devices. They're already talking about this with "coprocessors". The scheduler would go away. No need for it when every thread has it's own dedicated hardware thread. This would affect realtime programming. No need to play games with thread priorities and any of the timeouts that could be caused by not being scheduled quickly enough, i.e. no dispatch latency. Polling and IPC mechanisms would have to be worked on a bit. E.g. make things like MONITOR/MWAIT efficient. Possibly some new instructions. The hw architects would have to be a little more proactive here. The latest proposals from Intel seem to be a little lacking here. What's with architectual extensions? It seems to be a "ready to fight the last war" kind of thing. Who cares if you can run a 20 year old application real fast. Distributed algorithms would become more important. How do you coordinate threads and how do you do it efficiently from a hw point of view. Etc... (more stuff when I think of it) -- Joe Seigh When you get lemons, you make lemonade. When you get hardware, you make software.
From: Bill Todd on 28 Sep 2006 08:58 Terje Mathisen wrote: > Casper H.S. Dik wrote: >> Jon Forrest <forrest(a)ce.berkeley.edu> writes: >> >>> Today I read that we're going to get quad-core processors >>> in 2007, and 80-core processors in 5 years. This has >>> got me to wondering where the point of diminishing returns >>> is for processor cores. >> >> Sun has been shipping 8 core CPUs since, I think, late last year. >> >> It all depends on the bandwidth. (Which means it ain't a pretty >> picture for Intel as long as they keep the FSB) > > That 80-core Intel demo chip has a vertically mounted SRAM chip as well, > providing 20 MB (afair) directly to each code. > > For any problem where those 20 MB * 80 = 1.6 GB of SRAM can hold > everything in a nicely distributed manner, you're going to see _very_ > impressive performance indeed, particularly since they also have a > (presumably very fast) mesh network connecting the individual cores. Well, since IIRC the processing cores are running at a princely 1.91 MHz (allegedly not a typo) I'm not sure how truly impressive that demo's performance would be: perhaps better to wait for the real thing in around 5 years' time. As for the SRAM, I rather suspect that the 20 MB is the *total* figure shared among the 80 cores: if Intel could really get 1.6 GB of SRAM on anything like a single chip, we'd be seeing a lot more cache in Itanics. - bill
From: Terje Mathisen on 28 Sep 2006 11:25 Joe Seigh wrote: > Jon Forrest wrote: >> Today I read that we're going to get quad-core processors >> in 2007, and 80-core processors in 5 years. This has >> got me to wondering where the point of diminishing returns >> is for processor cores. > ... >> Where do you think the point of diminishing returns might >> be? >> > > Rethinking this, the question should be what would you do > with an unlimited number of processors? > > For one thing, the operating system would change. Interrupt > handlers for asynchronous interrupts would go away. You'd have > dedicated, possibly special purpose, processors to handle devices. > They're already talking about this with "coprocessors". You still need some what to handle async inter-core communication! I.e. I believe that you really don't have any choice here, except to make most of your cores interruptible. This leads back to the old thread about having multiple cores which are compatible but not symmetrical: I.e. some of them are optimized for long timeslots doing stream/HPC/serious number crunching, using a microachitecture like the P4 which really doesn't like to be interrupted. Other cores could be much more Pentium-like: Possibly superscalar, but in-order, with very low branch miss penalty, and optimized for twisty/branchy/hard to predict code. As long as these cpus are compatible, an OS which knows that some processes prefer to run on a given kind of cpu could do quite well, and the programming task becomes _much_ easier than for a disjoint set as used in the PPC/Cell combination. > The scheduler would go away. No need for it when every thread has > it's own dedicated hardware thread. This would affect realtime > programming. No need to play games with thread priorities and > any of the timeouts that could be caused by not being scheduled > quickly enough, i.e. no dispatch latency. I believe you'd still need it, but not for anything that's timecritical. I.e. after sufficient time with tens of cores/hundreds of threads available, programming patterns to use/abuse them all will turn up, and you'll run out of resources anyway. :-( Terje -- - <Terje.Mathisen(a)hda.hydro.com> "almost all programming can be viewed as an exercise in caching"
From: Terje Mathisen on 28 Sep 2006 11:29
Bill Todd wrote: > Terje Mathisen wrote: >> That 80-core Intel demo chip has a vertically mounted SRAM chip as >> well, providing 20 MB (afair) directly to each code. >> >> For any problem where those 20 MB * 80 = 1.6 GB of SRAM can hold >> everything in a nicely distributed manner, you're going to see _very_ >> impressive performance indeed, particularly since they also have a >> (presumably very fast) mesh network connecting the individual cores. > > Well, since IIRC the processing cores are running at a princely 1.91 MHz > (allegedly not a typo) I'm not sure how truly impressive that demo's > performance would be: perhaps better to wait for the real thing in > around 5 years' time. I don't think we have other option than to wait, no matter what the current speed is. However, if they really run at 2 MHz, then the claimed TB/s total bandwidth seems totally bogus, even if they also include 4 sets of cpu-cpu mesh links. > > As for the SRAM, I rather suspect that the 20 MB is the *total* figure > shared among the 80 cores: if Intel could really get 1.6 GB of SRAM on > anything like a single chip, we'd be seeing a lot more cache in Itanics. Oops, you're almost certainly right. :-( Oh, well. It was fun as long as the fantasy lasted. :-) Terje -- - <Terje.Mathisen(a)hda.hydro.com> "almost all programming can be viewed as an exercise in caching" |