From: nmm1 on 20 Jan 2010 10:10 In article <1531844.zBA62FjkXi(a)elfi.zetex.de>, Bernd Paysan <bernd.paysan(a)gmx.de> wrote: > >> The other is maintaining global uniqueness and monotonicity while >> increasing the precision to nanoseconds and the number of cores >> to thousands. All are needed, but it is probably infeasible to >> deliver all of them, simultaneously :-( > >It's not so bad as you think. As long as your uncertainty of time is >smaller than the communication delay between the nodes, you are fine, i.e. >your values are unique - you only have to make sure that the adjustments >propagate through the shortest path. Er, no. How do you stop two threads delivering the same timestamp if they execute a 'call' at the same time without having a single time server? Ensuring global uniqueness is the problem. > For monotonicity, just make sure your >propagate through the shortest path. For monotonicity, just make sure your >corrections for NTP don't step back. The NTP implementations I know adjust >clocks by slowing them down or speeding them up. Don't bet on it :-( They do when all goes well, but many of them will behave very weirdly indeed (including jumping both forward and back) when they get confused. xntpd certainly does, and that's perhaps the most common implementation. Or at least did - for the purposes of strict accuracy. Regards, Nick Maclaren.
From: Anne & Lynn Wheeler on 20 Jan 2010 10:21 nmm1(a)cam.ac.uk writes: > Er, no. How do you stop two threads delivering the same timestamp > if they execute a 'call' at the same time without having a single > time server? Ensuring global uniqueness is the problem. one the requirements was to correctly order dbms transaction log records after a failure (for recovery). a standard dbms speed-up is to allow transaction to be considered committed after the corresponding log record has been written to disk ... but the altered record in buffer memory may not be pushed out to dbms location (lazy writes to DBMS disk location). recovery (after failure) requires using the log to sequentially "rerun" the transactions ... eventually getting the dbms image on disk to consistent state. a cluster dbms implementation use to force record to disk before allowing it to migrate into DBMS buffer on a different processor. to speed things up, it would be possible to allow modified record to be transmitted (over high-speed link) between dbms buffers (in different processors in cluster). the problem then is that there could be multiple committed transaction changes ... recorded in different dbms logs .... but not reflected in the DBMS record. as part of supporting direct buffer-to-buffer copies (w/o having to force out to disk) ... a mechanism was needed (for recovery) to merge transaction logs from different systems so that they have the original global temporal ordering. The requirement isn't actually to have exact time value for each transaction ... but to have multiple logs to be merged so that entries occured in the original sequence. unique accurate time works ... but so would nearly any unique monotonically increasing number (say like a transaction version number ... which could be supported as part of the operation of dbms cluster distributed lock manager ... which also piggy-backs buffer-to-buffer record copies as part of lock traffic). -- 40+yrs virtualization experience (since Jan68), online at home since Mar1970
From: EricP on 20 Jan 2010 13:38 Andy "Krazy" Glew wrote: > EricP wrote: >> Andy "Krazy" Glew wrote: >>> Note that x86 eventually got around to adding READ_EIP instruction. >> >> Where is that? I find no reference to such an instruction. > > Perhaps I should have said x86-64. And perhaps I have slipped a bit, > wishful thinking and all that, but does not LEA with a RIP-relative > addressing mode do what you want? Ok, yes. I thought you were referring to legacy mode. In x64 mode there is also SysCall and SysRet. They seem to be more in line with my requirements as SysRet re-enables interrupts as it returns to user mode. SysCall does not load the kernel stack pointer though. It just disables interrupts and load the kernel RIP and that code must load the kernel stack pointer. > I must admit that I have slightly mixed feelings about PIC. Sure, it's > a good idea to be able to relocate code. But that is PIC code > addressing. I am not so sure that it is a good idea to encourage data > to be at a fixed offset from the code. Perhaps for constants. Yeah, I think I was just time tripping. It's not worth adding support for it to an operating system and image file format as as hardware support is so unlikely. You are better off to focus on standard image relocation techniques and automatically reusing code pages where possible. > Also, as somebody who has had to deal with security issues: PIC is a > gift to malware. After all, one of the basic characteristics of binary > code injections via buffer overflows is that they are at an unknown > address. PIC makes it easier to write viruses. Although at the same > time it makes it easier to randomize the address space, and thereby make > it harder to write viruses. Fortunately, x86-64 has other good features > that, when correctly employed, can hinder malware. And, fortunately, > x86-64 breaks the need for legacy compatibility, affording the > opportunity to I think you are blaming PIC for bad software. There is an exploit, called Return-oriented-programming, where by making calls to just the right place in a library you can accomplish anything. Should we also eliminate RET instructions? Most of the C run time library is like running with scissors anyway. Is there any language other than C/C++ that suffers from buffer overflows? As long as the language allows and the rtl encourages buffer overflows then bad stuff is going to happen. I don't use the silliest of the C rtl routines, the ones without buffer length args, and I have never had this problem (afaik :-). > I suspect that it is better overall to use one or more base registers > for data addresses. Rather than relying on RIP, the instruction > pointer, as a free base register. But then that requires at least one > dedicated register, and even with REX x86-64 doesn't really have enough > registers. > > I sometimes think that we should have RIP-relative branching and control > flow, and RIP-relative loading of constants. But that we should > discourage writing to RIP relative data locations. E.g. by disallowing > it in the store addressing modes. So long as you can do a RIP relative > LEA, you can always get RIP relative stores if you want. > RIP-relative addressing makes DLL code a lot easier as it eliminates all that GOT table stuff. The OS just maps all the static data areas right after the code pages. Eric
From: robertwessel2 on 20 Jan 2010 13:56 On Jan 20, 4:52 am, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: > robertwess...(a)yahoo.com wrote: > > Other attributes of the TOD clock are that the values are global, > > unique, and monotonically increasing as viewed by *all* CPUs in the > > system. That allows timing to happen across CPUs, things to be given > > globally unique timestamps, etc. The TOD clock also provides the > > basis for timer interrupts on the CPU. > > > It's very handy. > > It is also the "Right Stuff", i.e. as I wrote earlier the correct way to > handle this particular problem. > > The only real remaining problem is related to NTP, i.e. when you want to > sync this system-global TOD clock to UTC/TAI. > > Afair IBM does have a (very expensive!) hw solution for this, instead of > the trivial sw needed for a RDTSC-based clock which I outlined earlier. IBM's problem is that they need to keep TOD clocks synchronized cluster-wide. With a single machine, there have been ways to sync to an external clock, in some cases involving third party software (and hardware). Not necessarily ideal, but possible. In many cases the problem was more political than technical. Anyway, for a cluster this used to be accomplished by an external device known as a Sysplex Timer, and each machine in the cluster would connect to the Timer (and there would usually be at least two for redundancy). Sysplex Timer's had an option for accessing an external time source, and would drift the hardware clocks as necessary to stay in sync with that. These days it's somewhat cheaper and more straight-forward, since the clock synchronization hardware is now built into current machines, and a dedicated external Sysplex Timer is not required (I don't remember if the current z10s still support a Sysplex Timer or not - at least some generations of hardware did so that you could have a cluster with both older and newer boxes). With the new synchronization support, there are architected functions for performing fine adjustments to the clock stepping rate, and it's no longer dedicated hardware doing the adjustment, but rather OS code.
From: robertwessel2 on 20 Jan 2010 14:15
On Jan 20, 5:32 am, n...(a)cam.ac.uk wrote: > In article <ds3j27-tsm....(a)ntp.tmsw.no>, > Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: > > >robertwess...(a)yahoo.com wrote: > >> Other attributes of the TOD clock are that the values are global, > >> unique, and monotonically increasing as viewed by *all* CPUs in the > >> system. That allows timing to happen across CPUs, things to be given > >> globally unique timestamps, etc. The TOD clock also provides the > >> basis for timer interrupts on the CPU. > > >> It's very handy. > > >It is also the "Right Stuff", i.e. as I wrote earlier the correct way to > >handle this particular problem. > > Yes. But see below. > > >The only real remaining problem is related to NTP, i.e. when you want to > >sync this system-global TOD clock to UTC/TAI. > > No, not at all. There are two problems. That's one. > > The other is maintaining global uniqueness and monotonicity while > increasing the precision to nanoseconds and the number of cores > to thousands. All are needed, but it is probably infeasible to > deliver all of them, simultaneously :-( You only need to keep the clocks well enough synchronized that threads running on separate cores can't tell that the order of time values stored is actually slightly out of sync across the machine or cluster. Basically this is approximately the physical propagation delay between nodes, and synchronizing to less than that is relatively straight-forward. Then making sure the values are unique just requires an extension at the low end of the time value, and a fixed value per-core to be stored there. So effectively core number 13 always stores time values of the form "nnnnnnnn.nnnnnnnnn013" and two actually simultaneous stores have an artificial difference inserted at the low end. And so long as the prior condition (about time/event visibility) is met, you're covered here too. You can artificially reduce the timer frequency requirement (and ease your synchronization problems) by imposing a minimum time between clock reads. And frankly a real timer rate an order of magnitude slower than the instruction rate probably doesnt eliminate any real utility. A few old S/370s have done that (a 1MHz timer on a 3 MIPS machine, would result in two consecutive STCKs taking at least a microsecond, for example). And modern boxes do it for the old 64 bit timer format, which is now effectively out of resolution (so if you use the fully synchronized version of store clock 64, theres an artificial maximum rate imposed on those instructions). |