From: Terje Mathisen "terje.mathisen at on 27 Jan 2010 06:44 nmm1(a)cam.ac.uk wrote: > In article<0b40dbdb-53c0-4c5c-a19b-e68316f3d9c4(a)p17g2000vbl.googlegroups.com>, > Larry<lstewart2(a)gmail.com> wrote: >> By the way, once your applications get to large scale (over 1000 >> cores), problems of synchronization and load balancing start to >> dominate, and in that regime, I suspect variable speed clocks make the >> situation worse. Better to turn off cores to save power than to let >> them run at variable speed. > > Oh, gosh, YES! The more I think about tuning parallel codes in a > variable clock context, the more I think that I don't want to go > there. And that's independent of whether I have an application or > an implementor hat on. But this is already happening! Current leading-edge power-optimization schemes have to consider exactly these scenarios, i.e. run one core at slightly higher speed vs two cores at somewhat lower, merge and gang schedule all interrupt handling onto a single cpu, so that it can spend as much tima s possible in very low-power modes, while all the others get to sleep all the time. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: nmm1 on 27 Jan 2010 07:22 In article <1il537-kci1.ln1(a)ntp.tmsw.no>, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: >nmm1(a)cam.ac.uk wrote: >> In article<0b40dbdb-53c0-4c5c-a19b-e68316f3d9c4(a)p17g2000vbl.googlegroups.com>, >> Larry<lstewart2(a)gmail.com> wrote: >>> By the way, once your applications get to large scale (over 1000 >>> cores), problems of synchronization and load balancing start to >>> dominate, and in that regime, I suspect variable speed clocks make the >>> situation worse. Better to turn off cores to save power than to let >>> them run at variable speed. >> >> Oh, gosh, YES! The more I think about tuning parallel codes in a >> variable clock context, the more I think that I don't want to go >> there. And that's independent of whether I have an application or >> an implementor hat on. > >But this is already happening! > >Current leading-edge power-optimization schemes have to consider exactly >these scenarios, i.e. run one core at slightly higher speed vs two cores >at somewhat lower, merge and gang schedule all interrupt handling onto a >single cpu, so that it can spend as much tima s possible in very >low-power modes, while all the others get to sleep all the time. I am aware of that, but the perpetrators of such designs haven't thought things through. That sort of fiendish complexity is incompatible with most existing collective designs and implementations, and is very probably more or less incompatible with debugging (and, worse, tuning) any even remotely efficient ones. Not because it's theoretically impossible, but because it is too complicated for mere mortals to achieve. What they are targeting is the existing case of separate, independent, rarely communicating processes. Fine. But there is no way that design can be made to work when using parallelism to speed up most existing (serial) applications. I.e. when you have exhausted the natural parallelism, you have nowhere to go. While such usage is the sole province of the HPC people, that doesn't matter, but it's a catastrophic idea to move 'general purpose' systems to being even more informatible with HPC and without any plan for introducing parallelism INTO existing (serial) applications. Regards, Nick Maclaren.
From: Andrew Reilly on 27 Jan 2010 07:34 On Wed, 27 Jan 2010 12:22:00 +0000, nmm1 wrote: > That sort of fiendish complexity is incompatible with most existing > collective designs and implementations, and is very probably more or > less incompatible with debugging (and, worse, tuning) any even remotely > efficient ones. Not because it's theoretically impossible, but because > it is too complicated for mere mortals to achieve. Aside from the obvious CPU scaling issue just discussed, it seems to me that another major driver for this kind of thinking is the desire for "clockless" OSes (or OS modes) to improve efficiency of idle VM instances. The whole notion of virtualizing processor instances like that blows notions of clock synchronization of the window, or at least makes the notion a lot less tractable. > What they are targeting is the existing case of separate, independent, > rarely communicating processes. Fine. But there is no way that design > can be made to work when using parallelism to speed up most existing > (serial) applications. I.e. when you have exhausted the natural > parallelism, you have nowhere to go. Is there anywhere much to go when the natural parallelism has been exhausted? > While such usage is the sole province of the HPC people, that doesn't > matter, but it's a catastrophic idea to move 'general purpose' systems > to being even more informatible with HPC and without any plan for > introducing parallelism INTO existing (serial) applications. I received a semi-spam from Sun this morning, and before binning it my retinas registered something about doing HPC "in the cloud". It seems that some HPC folk aren't terribly concerned about tight synchronization. Or perhaps I'm missing the point? Cheers, -- Andrew
From: nmm1 on 27 Jan 2010 08:02 In article <7saq5vFefhU1(a)mid.individual.net>, Andrew Reilly <areilly---(a)bigpond.net.au> wrote: > >> That sort of fiendish complexity is incompatible with most existing >> collective designs and implementations, and is very probably more or >> less incompatible with debugging (and, worse, tuning) any even remotely >> efficient ones. Not because it's theoretically impossible, but because >> it is too complicated for mere mortals to achieve. > >Aside from the obvious CPU scaling issue just discussed, it seems to me >that another major driver for this kind of thinking is the desire for >"clockless" OSes (or OS modes) to improve efficiency of idle VM >instances. The whole notion of virtualizing processor instances like >that blows notions of clock synchronization of the window, or at least >makes the notion a lot less tractable. Actually, no, it doesn't. It only does when you are trying to force parallel applications into the separate independent process model. Consider a system where a process was made up of LIGHTWEIGHT threads (i.e. what they were always supposed to be), the system schedules the process and the application schedules the threads within it. >> What they are targeting is the existing case of separate, independent, >> rarely communicating processes. Fine. But there is no way that design >> can be made to work when using parallelism to speed up most existing >> (serial) applications. I.e. when you have exhausted the natural >> parallelism, you have nowhere to go. > >Is there anywhere much to go when the natural parallelism has been >exhausted? In a great many cases, yes. It's harder, but often feasible. >> While such usage is the sole province of the HPC people, that doesn't >> matter, but it's a catastrophic idea to move 'general purpose' systems >> to being even more informatible with HPC and without any plan for >> introducing parallelism INTO existing (serial) applications. > >I received a semi-spam from Sun this morning, and before binning it my >retinas registered something about doing HPC "in the cloud". It seems >that some HPC folk aren't terribly concerned about tight >synchronization. Or perhaps I'm missing the point? Yes, and so are they. "Embarrassingly parallel" or "farmable" applications are not really HPC, used not to be classified as that, and it has been a stupid idea to use specialist parallel computers for them for three decades. The users may use a lot of resources, but that's not the point. That sort of use (think Monte-Carlo, parameter space search etc.) works perfectly well "in the cloud", on a roomful of el cheapo workstations, or whatever. It's been a solved problem since time immemorial, which is why it used not to be classified as HPC. Real HPC is about problems that are inherently infeasible without a lot of computing power, and almost always involved quite a lot of communication. It includes the case of taking a basically serial design, and improving its algorithms to run in parallel. Regards, Nick Maclaren.
From: Michel Hack on 29 Jan 2010 18:23
On Jan 27, 7:34 am, Andrew Reilly <areilly...(a)bigpond.net.au> wrote: > Aside from the obvious CPU scaling issue just discussed, it seems to me > that another major driver for this kind of thinking is the desire for > "clockless" OSes (or OS modes) to improve efficiency of idle VM > instances. The whole notion of virtualizing processor instances like > that blows notions of clock synchronization of the window, or at least > makes the notion a lot less tractable. Having been aware of virtualized processors over forty years ago, S/ 370 has distinguished CPU time from elapsed time since day 1, and provided a TOD clock and clock comparator to permit deadline scheduling, making the whole notion of tick-based timekeeping look silly. I would have expected other processors to pick up on this at least fifteen years ago... As a result, reading time, or scheduling work based on CPU time, are not affected by virtualization, and if a virtual machine has nothing to do, it sits in (virtual) wait state until an external event happens, or the clock comparator trips due to the next scheduled event. This consumes zero cycles; 100% of the cycles are available to run other guests. It allows z/VM to support thousands of sleeping virtual machines at very little cost (a few structures in the hypervisor's memory). Michel. |