From: MitchAlsup on 17 May 2010 12:07 > On Mon, 17 May 2010 06:46:18 -0500, nedbrek wrote: > > optimal strategy is almost always "go fast, then sleep" rather than > > "slow and steady" Depends: If you have a task that repeats at a known rate (Video or Audio frames, for example) it is often optimal to reduce the voltage and frequency so that the schedule is just barely made. Mitch
From: Robert Myers on 17 May 2010 13:31 On May 17, 12:07 pm, MitchAlsup <MitchAl...(a)aol.com> wrote: > > On Mon, 17 May 2010 06:46:18 -0500, nedbrek wrote: > > > optimal strategy is almost always "go fast, then sleep" rather than > > > "slow and steady" > > Depends: If you have a task that repeats at a known rate (Video or > Audio frames, for example) it is often optimal to reduce the voltage > and frequency so that the schedule is just barely made. > I have a hard time visualizing the usage pattern that justifies "get it done fast, then sleep" as a winner. How many people, for instance, are recalculating spreadsheets while unplugged? Does it help when taking notes at a meeting? Where does this strategy help? Robert.
From: Kai Harrekilde-Petersen on 17 May 2010 14:12 MitchAlsup <MitchAlsup(a)aol.com> writes: >> On Mon, 17 May 2010 06:46:18 -0500, nedbrek wrote: >> > optimal strategy is almost always "go fast, then sleep" rather than >> > "slow and steady" > > Depends: If you have a task that repeats at a known rate (Video or > Audio frames, for example) it is often optimal to reduce the voltage > and frequency so that the schedule is just barely made. Agree on the "slow and steady" strategy from the hearing aid corner. Running at an excessive frequency requires you to increase the operating voltage, and due to the P=f*C*V^2 relationship, this is a (very) bad idea. Kai -- Kai Harrekilde-Petersen <khp(at)harrekilde(dot)dk>
From: Anton Ertl on 17 May 2010 15:39 "nedbrek" <nedbrek(a)yahoo.com> writes: >2) More cores use more bandwidth. And a faster core also uses more bandwidth. However, the question is: If the cores work on a common job, do they need more bandwidth than a faster core that gives the same performance. Maybe, but not necessarily. >Bandwidth can be expensive (additional >memory controllers and RAM chips all burn lots of power). Not in my experience. I have made power-measurements using loads that run completely in-core, and loads that also eat up all memory bandwidth <http://www.complang.tuwien.ac.at/anton/computer-power-consumption.html>. Even on a system with power-hungry FB-DIMMs, the increase in power usage from exercising the RAM is about as much as the increase from using one more core (of 8). This is also obvious from the fact that RAM chips usually don't have a heat sink (FB-DIMMs typically have a heat-spreader, though), and on boards with separate north bridges, the North bridge (which includes the memory controller) is typically paasively cooled. >You can think of >OOO as a technique to get more performance per memory access. More than what? In this context it sounds like you are comparing with multi-threading, so let me react to that: I don't think so. Ok, one can let a multi-threaded program be less cache-friendly than a single-threaded program, but one can make it similarly cache-friendly. And I think once the number of cores is so high that many applications become bandwidth-limited (which assumes we have solved the problem of making use of many threads), programmers will develop techniques for utilizing the given bandwidth better. >5) Often the static power can dominate (spinning hard drives, LCD power - in >addition to chip static power). If latency goes up, you must pay the static >power cost longer (optimal strategy is almost always "go fast, then sleep" >rather than "slow and steady"). The hard drive spins and the LCD consumes power even if the CPU sleeps. Due to voltage scaling, running at a lower clock is usually a win for power if the CPU then sleeps. True, if the user is waiting for the result and then shuts the machine down, going faster is often a win, but that's an unusual situation. As an example, our dual-Xeon 5450 System consumes 353W with 8 cores at 2GHz, and 377W with 5 cores active at 3GHz (even though the latter variant gives fewer core-cycles/s). It's even more extreme for our Dual-Opteron 270 system: 134W at load 4 at 1000MHz, 165W-190W at load 2 at 2000 MHz. - anton -- M. Anton Ertl Some things have to be seen to be believed anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html
From: Andy 'Krazy' Glew on 17 May 2010 20:45
On 5/17/2010 10:31 AM, Robert Myers wrote: > On May 17, 12:07 pm, MitchAlsup<MitchAl...(a)aol.com> wrote: >>> On Mon, 17 May 2010 06:46:18 -0500, nedbrek wrote: >>>> optimal strategy is almost always "go fast, then sleep" rather than >>>> "slow and steady" >> >> Depends: If you have a task that repeats at a known rate (Video or >> Audio frames, for example) it is often optimal to reduce the voltage >> and frequency so that the schedule is just barely made. >> > > I have a hard time visualizing the usage pattern that justifies "get > it done fast, then sleep" as a winner. > > How many people, for instance, are recalculating spreadsheets while > unplugged? > > Does it help when taking notes at a meeting? > > Where does this strategy help? I am less adamant about this than Ed is, but: * Ed is historically accurate: the "get it done fast then sleep" policy was the dominant policy in power management from circa 1994 until 2005 at least - it was the policy that was held up as the alternative to all low-power core designs, e.g. when I was at AMD. * Although DVFS (Dynamic Frequency and Voltage Scaling) is important nowadays, it is debatable which is more important. * Perhaps things are clarified when one talks about "sleeping" not necessarily being "hibernate - save stuff to disk, and then stop", or "suspend - save stuff to memory, and then stop", but the various "save stuff to the N-th level cache, and then stop", or even "go into a low power mode where register state is retained, but where leakage is reduced, e.g. by mucking with voltage biases to put the CPU into a state that cannot run effectively, but whic can come out of this sleep state fairly quickly". I.e. sleep states that reduce leakage, where the processor isn't actually running, but where it can come out of sleep "quickly" for some definition of quickly. And, oh yes: where this sleep state is done possibly without the OS knowing. At the very least, without the OS having to do a great deal of work. By the way: some of Transmeta's most important inventions in power management seem to be related to this sort of low leakage sleep state. (No, I won't list numbers.) * Robert asks what workloads this matters for; and the answer is, yes, it helps taking notes at a meeting. You can go into such a microsleep between every keystroke and mouse movement. All such "hurry up and sleep" power management is ruled by equations of the form %Tactive = fraction of time active %Tsleep = fraction of time asleep Ntrans = number of transitions per unit time between active and sleep state Pactive = power while active Psleep = power while in sleep - mainly leakage, for whatever devices still have power and are retaining state. Each new C-state and the like has a different set of devices active, and different leakage. Ptrans = power to transition between sleep and active Let's say you want to compare two scenarios a) Running at Pactive_hi for a small fraction of time, %T_active_hi and sleeping at a low power Psleep for the rest of the time b) always running at Pactive_lo I.e. you are comparing Pactive_lo to Pactive_hi*%Tactive_hi + (1-Pactive_hi)*Psleep + Ntrans * Ptrans You just solve the inequality, and find the situations in which it makes sense to power down. Hint: often %Tactive_hi can be < 1% of the time. And Pactive_lo is often > 1/6th of Pactive_hi. |