From: nmm1 on 20 Jan 2010 08:44 In article <n7dj27-n7n.ln1(a)ntp.tmsw.no>, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: > >> On a machine with a LOT of cores, you could update it directly. >> On one without, you would want a special loop which would take the >> hardware clock and the constants maintained by the NTP-like code, >> and update the clock field in memory once every microsecond. That >> would behave exactly like a separate core. And, because updating >> the memory field is a kernel operation, the implementation could be >> changed transparently. > >It could not: > >Anything that updates a real memory location every us is a performance bug! Not in this context! We are clearly at cross purposes. The actual memory location need not be a DIMM, but could be a logical DIMM actually stored in a CPU's SRAM (as you describe below). My point is to use the standard memory distribution system, and not necessarily real, physical memory. Yes, I agree that doing that is STILL a problem on current systems, but my other point is that they are going to HAVE to tackle the same issue for ordinary memory to make the currently favoured shared memory programming designs work on a large number of cores. >If you instead use a memory-mapped timer chip register, then you've >still got the cost of a real bus transaction instead of a couple of >core-local instructions. Eh? But how are you going to keep a thousand cores synchronised? You can't do THAT with a couple of core-local instructions! Regards, Nick Maclaren.
From: Anne & Lynn Wheeler on 20 Jan 2010 09:09 Terje Mathisen <"terje.mathisen at tmsw.no"> writes: > Anything that updates a real memory location every us is a performance bug! > > If you instead use a memory-mapped timer chip register, then you've > still got the cost of a real bus transaction instead of a couple of > core-local instructions. one of the justification for the 370 timer facilities. 360s had location "80" timer in low-store. lower-end 360 modules updated in millisecond range ... higher end 360s updated low order bit every 13+ microseconds. for compatibility, 370s did provide support for location 80 timer but at the millisecond range. univ. where i was undergraduate had 360/67 (that had "high-speed" location 80 timer). I had been doing a bunch of enhancements to (virtual machine) cp67 ... one of which was adding tty/ascii terminal support to cp67. part of this was I attempted to do something with the 2702 terminal controller that it couldn't quite do (but should). somewhat as a result, the univ. started a clone controller project ... using an interdata/3, reverse engineer the 360 channel interface, build channel interface board for the interdata/3, program the interdata/3 to emulate 2702 controller with some additional function (later four of us got written up for being responsible for mainframe clone controller business). some early controller tests resulted in bringing down the 360/67 (hardware "red-light"). the issue was the memory bus was shared between processor, the location 80 timer, and i/o channels (and these were non-cache machines). the location 80 timer had some leeway if the bus was in use when timer tic'ed ... but if the timer tic'ed again ... and there was previous timer memory update still pending ... the machine would stop/red-light. had to go back and redo the controller channel board to make sure that it periodically told the channel to release the memory bus (in middle of transfers) so that any pending timer tic update could occur. misc. past posts mentioning clone controller effort http://www.garlic.com/~lynn/subtopic.html#360pcm -- 40+yrs virtualization experience (since Jan68), online at home since Mar1970
From: Terje Mathisen "terje.mathisen at on 20 Jan 2010 09:22 nmm1(a)cam.ac.uk wrote: > In article<n7dj27-n7n.ln1(a)ntp.tmsw.no>, > Terje Mathisen<"terje.mathisen at tmsw.no"> wrote: >> If you instead use a memory-mapped timer chip register, then you've >> still got the cost of a real bus transaction instead of a couple of >> core-local instructions. > > Eh? But how are you going to keep a thousand cores synchronised? > You can't do THAT with a couple of core-local instructions! You and I have both written NTP-type code, so as I wrote in another message: Separate motherboards should use NTP to stay in sync, with or without hw assists like ethernet timing hw and/or a global PPS source. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
From: nmm1 on 20 Jan 2010 09:33 In article <b6gj27-5bn.ln1(a)ntp.tmsw.no>, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: >nmm1(a)cam.ac.uk wrote: >> In article<n7dj27-n7n.ln1(a)ntp.tmsw.no>, >> Terje Mathisen<"terje.mathisen at tmsw.no"> wrote: >>> If you instead use a memory-mapped timer chip register, then you've >>> still got the cost of a real bus transaction instead of a couple of >>> core-local instructions. >> >> Eh? But how are you going to keep a thousand cores synchronised? >> You can't do THAT with a couple of core-local instructions! > >You and I have both written NTP-type code, so as I wrote in another >message: Separate motherboards should use NTP to stay in sync, with or >without hw assists like ethernet timing hw and/or a global PPS source. Yes, but I thinking of a motherboard with a thousand cores on it. While it could use NTP-like protocols between cores, and for each core to maintain its own clock, that's a fairly crazy approach. All right, realistically, it would be 64 groups of 16 cores, or whatever, but the point stands. Having to use TWO separate protocols on a single board isn't nice. Regards, Nick Maclaren.
From: Bernd Paysan on 20 Jan 2010 09:42
nmm1(a)cam.ac.uk wrote: > The other is maintaining global uniqueness and monotonicity while > increasing the precision to nanoseconds and the number of cores > to thousands. All are needed, but it is probably infeasible to > deliver all of them, simultaneously :-( It's not so bad as you think. As long as your uncertainty of time is smaller than the communication delay between the nodes, you are fine, i.e. your values are unique - you only have to make sure that the adjustments propagate through the shortest path. For monotonicity, just make sure your corrections for NTP don't step back. The NTP implementations I know adjust clocks by slowing them down or speeding them up. -- Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/ |