Big OOO, SpMT, and possible designs [Computer Architecture]

Prev: FAKE CONFERENCE Call for papers : HPCS-10, USA, July 2010
Next: No lock for bt instruction ?

From: MitchAlsup on 17 May 2010 12:07

> On Mon, 17 May 2010 06:46:18 -0500, nedbrek wrote:
> > optimal strategy is almost always "go fast, then sleep" rather than
> > "slow and steady"

Depends: If you have a task that repeats at a known rate (Video or
Audio frames, for example) it is often optimal to reduce the voltage
and frequency so that the schedule is just barely made.

Mitch

From: Robert Myers on 17 May 2010 13:31

On May 17, 12:07 pm, MitchAlsup <MitchAl...(a)aol.com> wrote:
> > On Mon, 17 May 2010 06:46:18 -0500, nedbrek wrote:
> > > optimal strategy is almost always "go fast, then sleep" rather than
> > > "slow and steady"
>
> Depends: If you have a task that repeats at a known rate (Video or
> Audio frames, for example) it is often optimal to reduce the voltage
> and frequency so that the schedule is just barely made.
>

I have a hard time visualizing the usage pattern that justifies "get
it done fast, then sleep" as a winner.

How many people, for instance, are recalculating spreadsheets while
unplugged?

Does it help when taking notes at a meeting?

Where does this strategy help?

Robert.

From: Kai Harrekilde-Petersen on 17 May 2010 14:12

MitchAlsup <MitchAlsup(a)aol.com> writes:

>> On Mon, 17 May 2010 06:46:18 -0500, nedbrek wrote:
>> > optimal strategy is almost always "go fast, then sleep" rather than
>> > "slow and steady"
>
> Depends: If you have a task that repeats at a known rate (Video or
> Audio frames, for example) it is often optimal to reduce the voltage
> and frequency so that the schedule is just barely made.

Agree on the "slow and steady" strategy from the hearing aid corner.

Running at an excessive frequency requires you to increase the
operating voltage, and due to the P=f*C*V^2 relationship, this is a
(very) bad idea.

Kai
--
Kai Harrekilde-Petersen <khp(at)harrekilde(dot)dk>

From: Anton Ertl on 17 May 2010 15:39

"nedbrek" <nedbrek(a)yahoo.com> writes:
>2) More cores use more bandwidth.

And a faster core also uses more bandwidth.

However, the question is: If the cores work on a common job, do they
need more bandwidth than a faster core that gives the same
performance. Maybe, but not necessarily.

>Bandwidth can be expensive (additional
>memory controllers and RAM chips all burn lots of power).

Not in my experience. I have made power-measurements using loads that
run completely in-core, and loads that also eat up all memory
bandwidth
<http://www.complang.tuwien.ac.at/anton/computer-power-consumption.html>.
Even on a system with power-hungry FB-DIMMs, the increase in power
usage from exercising the RAM is about as much as the increase from
using one more core (of 8). This is also obvious from the fact that
RAM chips usually don't have a heat sink (FB-DIMMs typically have a
heat-spreader, though), and on boards with separate north bridges, the
North bridge (which includes the memory controller) is typically
paasively cooled.

>You can think of
>OOO as a technique to get more performance per memory access.

More than what? In this context it sounds like you are comparing with
multi-threading, so let me react to that:

I don't think so. Ok, one can let a multi-threaded program be less
cache-friendly than a single-threaded program, but one can make it
similarly cache-friendly. And I think once the number of cores is so
high that many applications become bandwidth-limited (which assumes we
have solved the problem of making use of many threads), programmers
will develop techniques for utilizing the given bandwidth better.

>5) Often the static power can dominate (spinning hard drives, LCD power - in
>addition to chip static power). If latency goes up, you must pay the static
>power cost longer (optimal strategy is almost always "go fast, then sleep"
>rather than "slow and steady").

The hard drive spins and the LCD consumes power even if the CPU
sleeps. Due to voltage scaling, running at a lower clock is usually a
win for power if the CPU then sleeps. True, if the user is waiting
for the result and then shuts the machine down, going faster is often
a win, but that's an unusual situation.

As an example, our dual-Xeon 5450 System consumes 353W with 8 cores at
2GHz, and 377W with 5 cores active at 3GHz (even though the latter
variant gives fewer core-cycles/s).

It's even more extreme for our Dual-Opteron 270 system: 134W at load 4
at 1000MHz, 165W-190W at load 2 at 2000 MHz.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

From: Andy 'Krazy' Glew on 17 May 2010 20:45

On 5/17/2010 10:31 AM, Robert Myers wrote:
> On May 17, 12:07 pm, MitchAlsup<MitchAl...(a)aol.com> wrote:
>>> On Mon, 17 May 2010 06:46:18 -0500, nedbrek wrote:
>>>> optimal strategy is almost always "go fast, then sleep" rather than
>>>> "slow and steady"
>>
>> Depends: If you have a task that repeats at a known rate (Video or
>> Audio frames, for example) it is often optimal to reduce the voltage
>> and frequency so that the schedule is just barely made.
>>
>
> I have a hard time visualizing the usage pattern that justifies "get
> it done fast, then sleep" as a winner.
>
> How many people, for instance, are recalculating spreadsheets while
> unplugged?
>
> Does it help when taking notes at a meeting?
>
> Where does this strategy help?

I am less adamant about this than Ed is, but:

* Ed is historically accurate: the "get it done fast then sleep" policy was the dominant policy in power management from
circa 1994 until 2005 at least - it was the policy that was held up as the alternative to all low-power core designs,
e.g. when I was at AMD.

* Although DVFS (Dynamic Frequency and Voltage Scaling) is important nowadays, it is debatable which is more important.

* Perhaps things are clarified when one talks about "sleeping" not necessarily being "hibernate - save stuff to disk,
and then stop", or "suspend - save stuff to memory, and then stop", but the various "save stuff to the N-th level cache,
and then stop", or even "go into a low power mode where register state is retained, but where leakage is reduced, e.g.
by mucking with voltage biases to put the CPU into a state that cannot run effectively, but whic can come out of this
sleep state fairly quickly".

I.e. sleep states that reduce leakage, where the processor isn't actually running, but where it can come out of sleep
"quickly" for some definition of quickly.

And, oh yes: where this sleep state is done possibly without the OS knowing. At the very least, without the OS having
to do a great deal of work.

By the way: some of Transmeta's most important inventions in power management seem to be related to this sort of low
leakage sleep state. (No, I won't list numbers.)

* Robert asks what workloads this matters for; and the answer is, yes, it helps taking notes at a meeting. You can go
into such a microsleep between every keystroke and mouse movement.

All such "hurry up and sleep" power management is ruled by equations of the form

%Tactive = fraction of time active
%Tsleep = fraction of time asleep
Ntrans = number of transitions per unit time between active and sleep state
Pactive = power while active
Psleep = power while in sleep - mainly leakage, for whatever devices still have
power and are retaining state.
Each new C-state and the like has a different set of devices active,
and different leakage.
Ptrans = power to transition between sleep and active

Let's say you want to compare two scenarios
a) Running at Pactive_hi for a small fraction of time, %T_active_hi
and sleeping at a low power Psleep for the rest of the time
b) always running at Pactive_lo

I.e. you are comparing

Pactive_lo
to
Pactive_hi*%Tactive_hi + (1-Pactive_hi)*Psleep + Ntrans * Ptrans

You just solve the inequality, and find the situations in which it makes sense to power down.

Hint: often %Tactive_hi can be < 1% of the time.
And Pactive_lo is often > 1/6th of Pactive_hi.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: FAKE CONFERENCE Call for papers : HPCS-10, USA, July 2010
Next: No lock for bt instruction ?