From: nmm1 on 27 Apr 2010 07:10 In article <6830060f-e8a6-4ebb-a0ed-9bc42f14e319(a)5g2000yqj.googlegroups.com>, Michael S <already5chosen(a)yahoo.com> wrote: >> >> And that's where you get the simplification. =A0No fiendishly complicated >> FLIH, horrible and inadequate run-time system support, and so on. > >I think, you are wrong. >This behavior for [async] interrupts (i.e. all instructions before >return address are fully completed; all instructions at and above >return address are not started) is architected on all current ISAs >that could be considered general-purpose. That was not true when I investigated this area, and my experiments confirmed that it wasn't the case in practice, either. I have just looked at the Intel x86 architecture manual, to see if things have changed, and they haven't. See, for example, the NOTE at the end of 6.5 and section 6.6 of Intel� 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1. You may have missed the subtlety that the guarantee that an interrupt handler is called between two instructions needs a LOT of logic that my scheme does not. You may also have missed the subtle gotcha that synchronising the view of memory on a parallel system is not a direct consequence of taking an interrupt between two instructions. >I, at least, am not aware about better ways of for OS-level >fragmentation-free memory allocation. Esp, when upfront the app is not >sure about the real size of continuous buffer that it allocates and >there is a big difference between max size and typical size. >It (demand paging) also works very well for application stack(s). Well, I am, and so are lots of other people. You are confusing (relocatable) virtual memory with demand paging. There is absolutely no difficulty in compiled code trapping stack overflow and extending it, without relying on any form of interrupt, for example. I have implemented that for several languages, including C. And buffer extensibility is similar. Regards, Nick Maclaren.
From: Morten Reistad on 27 Apr 2010 09:26 In article <hr6gmd$oir$1(a)smaug.linux.pwf.cam.ac.uk>, <nmm1(a)cam.ac.uk> wrote: >In article <6830060f-e8a6-4ebb-a0ed-9bc42f14e319(a)5g2000yqj.googlegroups.com>, >Michael S <already5chosen(a)yahoo.com> wrote: >>> >You may have missed the subtlety that the guarantee that an interrupt >handler is called between two instructions needs a LOT of logic that >my scheme does not. You may also have missed the subtle gotcha that >synchronising the view of memory on a parallel system is not a direct >consequence of taking an interrupt between two instructions. You all seem to be trapped in the single processor world, despite valient efforts to break out. We have already concluded that in a wide (tens or more) processor layout a message passing architecture with a fifo using something like hyperchannel and a fast mux may be the signalling method of choice. Further, we have identified three "walls"; the "memory", "power" and "synchronicity" walls. Nick is perfectly correct in going for a "less is more" cpu design, and get more cpus online, and more cache. A fast message-passing fifo, conceptually similar to hyperchannel(s) and a routing mux, can do what we previously did with interrupts. QNX did the design for this in 1982, and this proved very viable. Hardware then has to send and receive messages from this bus. This is not very different from a SATA, etherchannel, or inter-cache protocol on modern cpus. We then need to have one or more cpus reading from this channel, performing kernel functions. And we need some "channel-to-sata" and "channel-to-pci" etc bridges. But dispensing with interrupts does not necessarily mean ditching demand paging. It just means the hardware must be sufficiently intelligent to freeze the process, send a message and wait for the reply. Depending on the reply; continue, or fail. Nothing particularly fancy there, we already do this for a handful of layers of cache; except the channel is internal to the cpu. As long as that cpu is waiting on a message can hibernate, and the message system is fast and low latency I am willing to bet it can beat an interrupt based system. >>I, at least, am not aware about better ways of for OS-level >>fragmentation-free memory allocation. Esp, when upfront the app is not >>sure about the real size of continuous buffer that it allocates and >>there is a big difference between max size and typical size. >>It (demand paging) also works very well for application stack(s). > >Well, I am, and so are lots of other people. You are confusing >(relocatable) virtual memory with demand paging. There is absolutely >no difficulty in compiled code trapping stack overflow and extending >it, without relying on any form of interrupt, for example. I have >implemented that for several languages, including C. And buffer >extensibility is similar. Indeed. The important part there is to keep the instruction stream rolling. We all need to unthink the CPU as the core. We have reached a point of very diminishing returns regarding cpu performance, and further cpu fanciness will cost more than it is worth in terms of power, and will be substantially hindered by the memory and synchronicity walls. We are probably well past the optimum design point for cpu design by now, and need to back down quite a bit. Rather, think about how we can handle the cache-memory-i/o interconnects well, save energy, and address synchronicity. The latter does not need full, global synchronous operation except in a few, very rare cases. A lock/semaphore manager will do nicely in most cases; where defining a sequence and passing tokens is more important than absolute time. QNX got that right, and that is an important part of the neatness of that OS. So, if we need to build a hypervisor for windows, fine. And if windows cannot license-wise run on more than 2 cpus, utilise the rest of the cpus for i/o, cache, paging, running video, generate graphics etc. We probably just need to make a proof of concept before Microsoft obliges with licenses. This is pretty close to what we do with GPUs anyway. Speaking of GPUs; what if we gave them an MMU, and access to a cache/memory interconnect? Even if all non-cache references has to go to a command channel, if that is sufficiently fast we can do "paging" between gpu memory and dram. Yes, it is wandering off the subject a bit but as a "gedankenexperiment"; if the GPUs just have a small, fast memory, but we can handle "page faults" through an mmu, and bring pages in an out of cache at hyperchannel speeds we can use those gpus pretty much like an ordinary cpu. -- mrr
From: Tim McCaffrey on 27 Apr 2010 10:54 In article <hr6gmd$oir$1(a)smaug.linux.pwf.cam.ac.uk>, nmm1(a)cam.ac.uk says... > >In article <6830060f-e8a6-4ebb-a0ed-9bc42f14e319(a)5g2000yqj.googlegroups.com>, >Michael S <already5chosen(a)yahoo.com> wrote: >>> >>> And that's where you get the simplification. =A0No fiendishly complicated >>> FLIH, horrible and inadequate run-time system support, and so on. >> >>I think, you are wrong. >>This behavior for [async] interrupts (i.e. all instructions before >>return address are fully completed; all instructions at and above >>return address are not started) is architected on all current ISAs >>that could be considered general-purpose. > >That was not true when I investigated this area, and my experiments >confirmed that it wasn't the case in practice, either. I have just >looked at the Intel x86 architecture manual, to see if things have >changed, and they haven't. See, for example, the NOTE at the end >of 6.5 and section 6.6 of Intel� 64 and IA-32 Architectures Software >Developer's Manual Volume 3A: System Programming Guide, Part 1. > >You may have missed the subtlety that the guarantee that an interrupt >handler is called between two instructions needs a LOT of logic that >my scheme does not. You may also have missed the subtle gotcha that >synchronising the view of memory on a parallel system is not a direct >consequence of taking an interrupt between two instructions. > >>I, at least, am not aware about better ways of for OS-level >>fragmentation-free memory allocation. Esp, when upfront the app is not >>sure about the real size of continuous buffer that it allocates and >>there is a big difference between max size and typical size. >>It (demand paging) also works very well for application stack(s). > >Well, I am, and so are lots of other people. You are confusing >(relocatable) virtual memory with demand paging. There is absolutely >no difficulty in compiled code trapping stack overflow and extending >it, without relying on any form of interrupt, for example. I have >implemented that for several languages, including C. And buffer >extensibility is similar. > So, how is this different from the CDC 6600? The OS was in the PPs (although MSU moved it (mostly) back into the CPU). The I/O was handled by the PPs (the CPU couldn't do I/O), there were interrupts, and instruction faults, but they weren't precise (I think they were thought of more as guidelines... argh :) ). And there was no page faults (no paging). If you think about it, you should be able to implement an entire (including memory) CDC 7600 on a single chip these days. You can use DDR3 as a paging device. It might even run at 4 Ghz... - Tim
From: Robert Myers on 27 Apr 2010 11:49 Rick Jones wrote: > Robert Myers <rbmyersusa(a)gmail.com> wrote: >> Genetic programming is only one possible model. > >> The current programming model is to tell the computer in detail what >> to do. > >> The proposed paradigm is to shift from explicitly telling the >> computer what to do to telling the computer what you want and >> letting it figure out the details of how to go about it, with >> appropriate environmental feedback, which could include human >> intervention. > > Sounds like child rearing. I could handle a computer behaving like my > nine year-old, at least most of the time. I'm not sure I want my > computer behaving like my five year-old :) > It's a fair analogy, although computers have yet to reach the learning capacity of infants. "I am a HAL 9000 computer. I became operational at the H.A.L. plant in Urbana, Illinois on the 12th of January 1992. My instructor was Mr. Langley, and he taught me to sing a song." It was a naively ambitious view of computers, but I think it was more right than the Deist watchmaker-programmer God view we now have. Robert.
From: Quadibloc on 27 Apr 2010 13:29
On Apr 27, 1:27 am, n...(a)cam.ac.uk wrote: > Well, actually, I blame the second-raters who turned some seminal > results into dogma. > > None of Turing, Von Neumann or the best mathematicians and computer > people would ever say that the model is the last word, still less > that hardware architecture and programming languages must be forced > into it. The reason that, so far, parallel architectures are used to execute programs which basically were written for a von Neumann machine, but chopped into bits that can run in parallel, is not so much the fault of a blind dogmatism as it is of the absence of a clear alternative. While there are other models out there, such as genetic programming and neural nets, (hey, let's not forget associative memory - Al Kossow just put the STARAN manual up on bitsavers!) at the moment they're only seen as applicable to a small number of isolated problem domains. A computer tended to be conceived of as a device which automatically does what people would have done by hand, perhaps with a desk calculator or a slide rule or log tables, whether for a scientific purpose or for accounting. How we addressed these problem domains gradually evolved through the use of unit record equipment to the use of digital computers. (During that evolution, though, another non-von Neumann model was encountered - the mechanical differential analyzer of Vannevar Bush, and its successors the analog computer and the digital differential analyzer such as MADDIDA.) So I go further: not only don't I "blame" Turing and von Neumann... I also don't "blame" everyone else who came later for failing to come up with some amazing new revolutionary insight that would transform how we think about computing. Because unlike the von Neumann model, this new insight would have to involve a way to fundamentally transform algorithms from their old paper-and-pencil form. Now, it _is_ true that there was a von Neumann bottleneck back when a mainframe computer was considered impressive when it had 32K words of memory (i.e. a 7090 with a filled address space) and that had become worse by the time of the IBM 360/195 (up to 4 megabytes of regular main memory, although you could fill the 16 megabyte address space if you used bulk core). When it comes to today's PCs, with possibly 2 gigabytes of DRAM, and perhaps 2 megabytes of L2 cache on chip... the von Neumann bottleneck has grown to grotesque proportions. An "intelligent RAM" architecture that included a very low-power processor for every 128 kilobytes of DRAM would provide considerable additional parallel processing power without having to as drastically change how we write programs as changing to a neural-net model, for example, would require. But the required change would likely still be so drastic as to lead to this power usually lying unused. It would be different, of course, if one PC were time-shared between hundreds of thin clients - thin clients that were accessing it in order to obtain the processing power of a 7090 mainframe. The trouble is, of course, that this doesn't make economic sense - that level of processing power is cheaper than the wires needed to connect to it at a distance. So, instead, IRAM ends up being a slow, but flexible, associative memory... John Savard |