From: Anne & Lynn Wheeler on 21 Oct 2009 16:42 this morning there was a presentation about OpenSolaris with top bullet item that it recently has gone "ticless" ... related to high amount of overhead when running in virtual machine even when idle (potentially with large number of concurrent numbers all "tic'ing"). in the mid-80s ... I noticed the code in unix and commented that I had replaced almost the identical code that was in cp67 in 1968 (some conjecture that cp67 might possibly traced back to ctss ... and unix might also traced design back to ctss ... potentially via multics). i've periodically mentioned that this was significant contribution to being able to leave the system up 7x24 ... allowing things like offshift access, access from home, etc. the issue was that the mainframes "rented" and had useage meters ... and paid monthly useage based on the number of hours run in the useage meters. in the early days ... simple sporadic offshift useage wasn't enuf to justify the additional rental logged by the useage meters. the useage meters ran when cpu &/or i/o was active and tended to log/increment a couple hundred milliseconds ... even if only had a few hundred instructions "tic'ing" a few times per second (effectively resulting in the meter running all the time). moving to event based operation and eliminating the "tic'ing", helped enabling the useage meter actually stopping doing idle periods. the other factors (helping enable transition to leaving systems up 7x24) were 1) "prepare" command for terminal i/o ... allowed (terminal) channel i/o program to go appear idle (otherwise would have also resulted in useage meter running) but able to immediately do something when there were incoming characters and 2) automatic reboot/restart after failure (contributed to lights out operation, leaving the system up 2nd & 3rd shift w/o human operator .... eliminating those costs also). on 370s, the useage meter would take 400 milliseconds of idle before coasting to stop. we had some snide remarks about the favorite son operating system that had a "tic" process that was exactly 400 milliseconds (if the system was active at all, even otherwise completely idle, it was guaranteed that the useage meter would never stop). -- 40+yrs virtualization experience (since Jan68), online at home since Mar1970
From: ChrisQ on 21 Oct 2009 16:44 Bernd Paysan wrote: > > Sorry, that's not true. Especially in the "small embedded intelligent > device" area we are talking about. The scale of integration changed: You > will produce a SoC for these applications. I.e. the parts are build from > incredible small devices: GDS polygons, to be precise (people use more > convenient building blocks, though). Most people who make SoCs embed some > standard core like an ARM (e.g. Cortex M0) or an 8051 (shudder - takes the > same area as a Cortex M0, but is horrible!), but that's because they chose > so, not because it's not feasible to develop your own architecture. I think you must work in a slightly more rarified atmosphere :-). The only place i've seen custom with embedded core is in mobile telephony, though it's probably far more prevalent now, as it moves more into the mainstream and where the high volume can justify it. Most embedded devices still use off the shelf micros afaics, including a lot of consumer electronics. I still use later si labs 8051 variants (shock horror) for simple tasks and logic replacement. The compilers produce reliable, if tortuous code. Historically, used a lot of 68k, but am moving over to arm for the more complex stuff because everyone makes it. It is a bit idiosyncratic and stuff like the interrupt handling takes some getting used to. Portable libraries are important here, so limiting architectures saves time and money. Most of the client work is low to medium volume where there is a bit more scope for creativity and novel solutions. Current project is based around Renesas 80C87 series, which is a fairly neat 16 bit machine. I didn't choose it, but it's a very conventional architecture and easy to integrate and program. > > Architectures that usually don't surface to the user - e.g. when I embed a > b16 in a device of ours, it's not user-programmable. It's not visible what > kind of microprocessor there is or if there is any at all. > Just as it should be, but simple user interfaces often represent a large proportion of the overall software effort, just to make them that way... Regards, Chris
From: Robert Myers on 21 Oct 2009 16:56 On Oct 21, 6:08 am, Bill Todd <billt...(a)metrocast.net> wrote: > > Once again that's irrelevant to the question under discussion here: > whether Terje's statement that Merced "_would_ have been, by far, the > fastest cpu on the planet" (i.e., in some general sense rather than for > a small cherry-picked volume of manually-optimized code) stands up under > any real scrutiny. I think that Intel seriously expected that the entire universe of software would be rewritten to suit its ISA. As crazy as that sounds, it's the only way I can make sense of Intel's idea that Itanium would replace x86 as a desktop chip. To add spice to the mix of speculation, I suspect that Microsoft would have been salivating at the prospect, as it would have been a one-time opportunity for Microsoft, albeit with a huge expenditure of resources, to seal the doom of open source. None of it happened, of course, but I think your objection about hand- picked subsets of software would not have impressed Intel management. Robert.
From: Mayan Moudgill on 21 Oct 2009 17:50 Andy "Krazy" Glew wrote: > Brett Davis wrote: > >> Cool info though, TRIPS is the first modern data flow architecture I >> have looked at. Probably the last as well. ;( > > > No, no! > > All of the modern OOO machines are dynamic dataflow machines in their > hearts. Albeit micro-dataflow: they take a sequential stream of > instructions, convert it into dataflow by register renaming and what > amounts to memory dependency prediction and verification (even if, in > the oldest machine, the prediction was "always depends on earlier stores > whose address is unknown"; now, of course, better predictors are > available). > > I look forward to slowly, incrementally, increasing the scope of the > dataflow in OOO machines. > * Probably the next step is to make the window bigger, by multilevel > techniques. > * After that, get multiple sequencers from the same single threaded > program feeding in. > * After that, or at the same time, reduce the stupid recomputation > of the dataflow graph that we are constantly redoing. > > My vision is of static dataflow nodes being instantiated several times > as dynamic dataflow. > > I suppose that you could call trips static dataflow, compiler managed. > But why? I am sure you are familiar with Monsoon & Id, and all the work that went into serializing the dataflow graph :). As for recomputing the dataflow graph; several papers/theses called for explitictly annotating instructions with the dependence distance(s). I always wondered: - do you annotate for flow (register-write-read-dependences) only? - or do you annotate for any (write-write and read-write as well)? - how do you deal with multi-path (particularily dependence joins)? - how do you deal with memory (must vs. may)? - and how do you indicate this information so that the extra bits in the I$ don't overwhelm any savings in the dynamic logic? The first two points are important, because the information you need differs between implementations with and without renaming. About making windows bigger: my last work on this subject is a bit dated, but, at that time, for most workloads, you pretty soon hit a point of exponentially smaller returns. Path mispredicts & cache misses were a couple of the gating factors, but so were niggling little details such as store-queue sizes, retire resources & rename buffer sizes. There is also the nasty issue of cache pollution on mispredicted paths.
From: Andrew Reilly on 21 Oct 2009 18:32
On Wed, 21 Oct 2009 13:56:13 -0700, Robert Myers wrote: > As crazy as that sounds, it's the only way I can make sense of Intel's > idea that Itanium would replace x86 as a desktop chip. I don't think that it's as crazy as it sounds (today). At the time Microsoft had Windows NT running on MIPS and Alpha as well as x86: how much effort would it be to run all of the other stuff through the compiler too? We're a little further along, now, and I think that the MS/NT experience and Apple's processor-shifting [six different instruction sets in recent history: 68k, PPC, PPC64, ia32, x86_64 and at least one from ARM, perhaps three] (and a side order of Linux/BSD/Unix cross-platform support) shows that the bigest headaches for portability (across processors but within a single OS) are word-size and endianness, rather than specific processor instruction sets. We're still having issues with pointer size changes as we move from ia32 to x86_64, but at least the latter has *good* support for running the former at the same time (as long as the OS designers provide some support and ship the necessary 32-bit libraries). Of course, most of the pointer-size issues stem directly from the use of C et al, where they show up all sorts of bad assumptions about integer equivalence and structure size and packing. Most other languages don't even have ways to express those sorts of concerns, and so aren't as affected. > To add spice to the mix of speculation, I suspect that Microsoft would > have been salivating at the prospect, as it would have been a one-time > opportunity for Microsoft, albeit with a huge expenditure of > resources, to seal the doom of open source. How so? Open source runs fine on the Itanium, in general. (I think that most of the large SGI Itanium boxes only run Linux, right?) Cheers, -- Andrew |