From: John Mashey on 14 Sep 2005 06:09 Seongbae Park wrote: > John Mashey <old_systems_guy(a)yahoo.com> wrote: > ... > > You hardware guys are all alike [in hating sign-extension on loads] > >:-). > Well, if the sign-extend version takes more cycles than zero-extend > - I suppose your second choice meant such a case - > it creates the same funny optimization hassle > and such an optimization accompanies occasional bug reports that cry wolf > over the zero-extend load that correctly replaced sign-extend load > ("It's a signed char in my code. > Why is the compiler using a zero-extend load ? > The compiler must be buggy!"). Yes, but the complaints are much worse when people disassemble code and see a bunch of EXTs that are clearly unnecessary, i.e., visible instructions almost always get more attention/flak/whinging than slow instructions, unfortunately. I spent some time tuning a 68K compiler years ago at Convergent, and this kind of thing came up, and it wasn't trivial to fix at the time, and get it right, at least in pcc.
From: John Mashey on 14 Sep 2005 06:22 Seongbae Park wrote: > I think LISP/Smalltalk/ADA market is just too small to justify > adding any significant change in the general purpose ISA, > unless this yet-to-be-invented mechanism is easy and cheap > to implement or it is for some other purpose which happened to > help them (like a fast user-level trap). Well, that's why we never did it. We certainly couldn't justify expensive features for that market, but we hoped to find modest useful ones that might be general enough to have other uses as well. Maybe if we could have afforded another 6 months to do the original MIPS-I ISA, we might have thought of something reasonable, but after that, it was probably too late. Nothing very complex would have fit in the R2000 in any case, although I would have given up a few TLB entries had we gotten a good solution here.
From: Scott A Crosby on 14 Sep 2005 15:21 On 13 Sep 2005 08:33:17 -0700, "John Mashey" <old_systems_guy(a)yahoo.com> writes: > I wished for something general enough to: > a) Fix alignment errors, i.e., one would like to be able to run a > binary with/without alignment checking. [Recall that MIPS could handle > alginment errors, but needed a recompile to use LWL/LWR, etc]. > b) Be able to trap unimplemented instructions, i.e., like > floating-point operations on original MIPS R2000, before the FPU was > available, or for machines that didn't have one, rather than doing > coprocessor-unusable traps. Also, one might do not-yet-implemented > instructions, like sqrt (which was not there in MIPS-I, but added > later). One might consider doing integer mul/div this way, where some > designs had them, and some didn't. > c) Likewise, support for parts of IEEE FP that one didn't want to do in > hardware. > A) Managing binary compatibility across a family whose implemented > features vary. Note that a good mechanism would let you run binaries > with new instructions on old systems, given the right emulation code. About a month ago, during a discussion on mul/div on SPARC, someone here suggested what I thought was a cute technique for doing this. What happens is when the CPU tries to run an illegal instruction and traps, the kernel backpatches the executable to jump to an appropriate emulation routine. The compiler is required to always follow such a not-universally-implemented instruction with enough no-ops so there's always room for the back-patch. However, if the binary is targetted only for hardware with the instruction, the compiler isn't required to generate the no-ops. The ABI is such that all binaries are linked with an appropriate emulation library for the kernel to backpatch jumps to point to. The no-op space overhead might be reduced if the ISA included a special save&jump instructions designed for this purpose. On old hardware there's no loss in performance, and the kernel only gets involved with one trap once for each instruction, not once for each execution of an unsupported instruction. And on new hardware the cost is a few extra no-ops. Software targetting new hardware only doesn't even pay the no-op overhead. Scott
From: Eliot Miranda on 14 Sep 2005 17:59 John Mashey wrote: > David Hopwood wrote: > >>andrewspencers(a)yahoo.com wrote: >> >>>Terje Mathisen wrote: > > >>A slightly different situation is where you have code that in practice >>always handles integers that fit in a single word, but that can't be >>statically guaranteed to do so, and the language specification says that >>bignum arithmetic must be supported -- the obvious example being Smalltalk. >>There were some attempts to support this in hardware (e.g. "Smalltalk on >>a RISC"; also something on SPARC that I can't remember the details of), >>but it turned out to be easier and faster for implementations of Smalltalk >>and similar languages to use other tricks that don't require hardware support. > > > Yes. > 1) There was Berkeley SOAR as noted, and SPARC included ADD/SUB Tagged, > which used the high-30 bits as integers, and the low 2 bits as tags; if > either low 2-bit field were non-zero, it trapped. > > 2) ~1988, while working on MIPS-II, I/we spent a lot of time talking > with Smalltalk & LISP friends, potential customers, etc, asking: > "Are there any modest extensions that would help you a lot, and would > be reasonable to implement? > > Short answer: NO. > > Longer answer: > a) They said either give them a complete, tailored solution [which they > didn't expect], or just make the CPU run fast, but don't bother with > minor enhancements. Some said they knew about the SPARC feature, but > didn't use it. This would include Peter Deutsch and the design of HPS his 2nd dynamic translation (JIT) VM. The tag pattern for immediate integers was already chosen to be 11, and changing it just for SPARC when the performance boost would be below 10% in all but micro-bencmarks just isn't worth it. However, were the SPARC designers to have allowed the trap mask to be a variable part of per-thread state, or even better, to be specified in the instruction itself (eliminating problems combining different language implementations in one program) then we would have made use of it (certainly code exists to use it). The most convenient design would be not a trap but a branch or skip. Something like "add and skip on overflow or if either operand's tag pattern doesn't match X". Now with 64-bit implementations one would also want to specify the width of the tag field (one bit would suit HPS; its 32-bit and 64-bit implementations use a single bit to tag immediate integers. > b) Some said: they were all doing fairly portable versions, had learned > a lot of good tricks, and minor improvements that required major > structural changes just weren't worth it. hence the need for any instructions to provide flexibility and not dictate particular bit patterns... [snip] > Anyway, it's pretty clear that relevant mechanisms were being discussed > ~20 years ago, but nobody seems to have figured out features that > actually make implementation sense. I'd be delighted to see a > well-informed proposal that had sensible hardware/software > implementations and really helped LISP/Smalltalk/ADA and hopefully > other languages... We could use a tagged add/sub and skip on overflow or tag mismatch, and a tagged compare and skip on tag mismatch, where the tag field can be flexibly specified to suit both 32-bit implementations (typical tags least significant two bits) and 64-bit implementations (typical tags least significant three or four bits). If 6 bits were dedicated to the tag specification, two would be the size of the tag field 00 -> least significant bit 01 -> least significant two bits 10 -> least significant three bits 11 -> least significant four bits The remaining four bits would specify the required tag pattern, bits excess to the tag size being ignored. The two operands would be interpreted as 2's complement signed integers in the remaining non-tag bits. The add/sub instructions would skip or annul the following instruction if either operand's tag pattern didn't match the tag specification or if the result overflowed. The compare instructions would skip or annul the following instruction if either operand's tag pattern didn't match the tag specification. The value of the result register of the tagged add/sub would have the same tag pattern as the operands. Result value is undefined if overflow or tag mismatch (i.e. I don't think one would typically be interested in the result). Code sequences for polymorphic add/sub or compare would then look like fetch operand one fetch operand two tagged add/sub branch Ldone code for non-tagged case (method lookup) ... Ldone: Code for compare sequences would depend on whether one needed to take a conditional branch or produce a result. So one could use an instruction that would skip the next two instructions on tag mismatch. If tagged compare skips the next instruction then tagged compare for a conditional branch might look like fetch operand one fetch operand two tagged compare branch Lcond code for non-tagged case (method lookup) ... compare result of non-tagged compare against TRUE value branch if equal Ltrue compare result of non-tagged compare against FALSE value branch if equal Lfalse call notBooleanError Lcond: branch on equal Ltrue Lfalse: If it skips the following two instructions then fetch operand one fetch operand two tagged compare branch if equal Ltrue branch Lfalse code for non-tagged case (method lookup) ... compare result of non-tagged compare against TRUE value branch if equal Ltrue compare result of non-tagged compare against FALSE value branch if equal Lfalse call notBooleanError which isn't much of a saving... One could also make use of a tagged add/sub immediate as there's a high dynamic frequency of var + 1 in most (Smalltalk) programs. The immediate value would omit the tag pattern and be shifted by the tag size to increase useful range. -- _______________,,,^..^,,,____________________________ Eliot Miranda Smalltalk - Scene not herd
From: John Mashey on 15 Sep 2005 01:00
Eliot Miranda wrote: > John Mashey wrote: > > Longer answer: > > a) They said either give them a complete, tailored solution [which they > > didn't expect], or just make the CPU run fast, but don't bother with > > minor enhancements. Some said they knew about the SPARC feature, but > > didn't use it. > > This would include Peter Deutsch and the design of HPS his 2nd dynamic > translation (JIT) VM. Lots of good details deleted... 1) The suggestions would probably fit HP PA better than MIPs, as it has extensive "annul-next-instruction" features. 2) My comment above was indeed a paraphrase of Peter's comments, although somewhat similar thoughts came from others as well. |