From: Torben �gidius Mogensen on 17 May 2010 04:59 "nedbrek" <nedbrek(a)yahoo.com> writes: > The ironic thing, (which we demonstrated, and which made us hugely > unpopular) is that massive many-core burns just as much (or more) power than > a smart OOO on anything but grossly parallel applications. That is assuming you are actually powering those that you don't actively use. A many-core design should be able to power down individual cores and power them up very quickly when needed. Torben
From: nmm1 on 17 May 2010 05:29 In article <7ztyq6nbt8.fsf(a)ask.diku.dk>, Torben �gidius Mogensen <torbenm(a)diku.dk> wrote: >"nedbrek" <nedbrek(a)yahoo.com> writes: > >> The ironic thing, (which we demonstrated, and which made us hugely >> unpopular) is that massive many-core burns just as much (or more) power than >> a smart OOO on anything but grossly parallel applications. > >That is assuming you are actually powering those that you don't actively >use. A many-core design should be able to power down individual cores >and power them up very quickly when needed. Indeed. And, unlike the other forms of power-saving gimmickry, that is simple to implement and debug, and does not interfere with tuning. Regards, Nick Maclaren.
From: nedbrek on 17 May 2010 07:46 Hello all, "Torben �gidius Mogensen" <torbenm(a)diku.dk> wrote in message news:7ztyq6nbt8.fsf(a)ask.diku.dk... > "nedbrek" <nedbrek(a)yahoo.com> writes: >> The ironic thing, (which we demonstrated, and which made us hugely >> unpopular) is that massive many-core burns just as much (or more) power >> than >> a smart OOO on anything but grossly parallel applications. > > That is assuming you are actually powering those that you don't actively > use. A many-core design should be able to power down individual cores > and power them up very quickly when needed. No, I'm talking about on multi-threaded workloads. MT advocates always talk about core power. Going multi-core can be a win for core power. However, supporting a huge number cores will require more overhead than supporter fewer (more power hungry, even less power efficient) cores. The problem then becomes one of evaluating the overhead of a more complicated core, versus the overhead of the support logic for your many cores. For example: 1) Even MT friendly jobs rarely scale linearly with number of cores. You might get as high as 80 or 90% (2 cores give 1.9x). This might be enough to justify more complicated cores given the next points. 2) More cores use more bandwidth. Bandwidth can be expensive (additional memory controllers and RAM chips all burn lots of power). You can think of OOO as a technique to get more performance per memory access. 3) Off chip bandwidth costs pins. Pins are expensive in themselves (and limited). They also burn lots of power. 4) More cores need a more complicated system. You have more directory structure, more switches, etc. 5) Often the static power can dominate (spinning hard drives, LCD power - in addition to chip static power). If latency goes up, you must pay the static power cost longer (optimal strategy is almost always "go fast, then sleep" rather than "slow and steady"). You can't just say, "MT workloads demand lots of tiny cores". For a given power budget, every workload will need to be analyzed to find the right trade-off. Ned
From: nmm1 on 17 May 2010 08:01 In article <hsr6qe$i8f$1(a)news.eternal-september.org>, nedbrek <nedbrek(a)yahoo.com> wrote: >"Torben �gidius Mogensen" <torbenm(a)diku.dk> wrote in message >news:7ztyq6nbt8.fsf(a)ask.diku.dk... >> >>> The ironic thing, (which we demonstrated, and which made us hugely >>> unpopular) is that massive many-core burns just as much (or more) power >>> than a smart OOO on anything but grossly parallel applications. >> >> That is assuming you are actually powering those that you don't actively >> use. A many-core design should be able to power down individual cores >> and power them up very quickly when needed. > >No, I'm talking about on multi-threaded workloads. > >MT advocates always talk about core power. Going multi-core can be a win >for core power. > >However, supporting a huge number cores will require more overhead than >supporter fewer (more power hungry, even less power efficient) cores. With reservations, agreed. >The problem then becomes one of evaluating the overhead of a more >complicated core, versus the overhead of the support logic for your many >cores. That is true. >For example: >1) Even MT friendly jobs rarely scale linearly with number of cores. You >might get as high as 80 or 90% (2 cores give 1.9x). This might be enough to >justify more complicated cores given the next points. A lot of such jobs scale fairly well - say, sqrt(N) or better. But, even if a lot don't do even that well, that's misleading. The performance VERY rarely scales well with the complexity of cores, and exceeding log(N) is rare. Oh, yes, there are occasional jobs where you can remove a bottleneck, but that's nothing to do with scalability (i.e. it's a one-off). >2) More cores use more bandwidth. Bandwidth can be expensive (additional >memory controllers and RAM chips all burn lots of power). You can think of >OOO as a technique to get more performance per memory access. Sorry, but that is NOT true. X performance on Y cores needs precisely the same bandwidth as XY performance on a single core, all other factors being the same. You are correct that some attempts at multithreading serial code increase the bandwidth requirement, but that's an artifact of the current approaches, and is dubiously a general rule. >3) Off chip bandwidth costs pins. Pins are expensive in themselves (and >limited). They also burn lots of power. True. But that's irrelevant to whether the chip has lots of slow cores or one fast one. >4) More cores need a more complicated system. You have more directory >structure, more switches, etc. That's not true. There are many designs which don't, and some have been very successful. >5) Often the static power can dominate (spinning hard drives, LCD power - in >addition to chip static power). If latency goes up, you must pay the static >power cost longer (optimal strategy is almost always "go fast, then sleep" >rather than "slow and steady"). True. But that's irrelevant to whether the chip has lots of slow cores or one fast one. >You can't just say, "MT workloads demand lots of tiny cores". For a given >power budget, every workload will need to be analyzed to find the right >trade-off. True. But the days of a special computer for every job had gone before I got into the game - and that's a LONG time back! Regards, Nick Maclaren.
From: Andrew Reilly on 17 May 2010 09:19
On Mon, 17 May 2010 06:46:18 -0500, nedbrek wrote: > optimal strategy is almost always "go fast, then sleep" rather than > "slow and steady" Is that what did in Transmeta? Quite a bit of their literature seemed to be about being able to tune the runtime and clock rate towards "slow and steady": spread the smallest number of clocks across the real-time budget by reducing the rate and voltage accordingly. Perhaps that works for media playback, but is not representative of the loads that power/battery users really care about? [I had a Fujitsu/Crusoe laptop for a while: it was nice, and it would play DVDs with little power, but it wasn't "fast" for Windows UI feel.] Cheers, -- Andrew |