From: Robert Myers on 24 Oct 2009 12:45 On Oct 24, 10:02 am, an...(a)mips.complang.tuwien.ac.at (Anton Ertl) wrote: > Robert Myers <rbmyers...(a)gmail.com> writes: > > [Speed of PA-RISC emulation on Itanium] > > >If I remember the numbers Anton provided, 50% per clock for untuned > >code and a less than optimal compiler seems about right > > I don't know what you think you remember, but I have not presented > PA-RISC results, simply because we have no PA-RISC box (for Gforth) > and nobody has submitted PA-RISC results (for the latex benchmark). > > For those who wonder what this is all about, the message that he means > is <2009Oct22.164...(a)mips.complang.tuwien.ac.at>, and the results > referred to are > > http://www.complang.tuwien.ac.at/anton/euroforth/ef09/papers/ertl-sli...http://www.complang.tuwien.ac.at/franz/latex-bench >3. I couldn't get the link to work when I wrote the post. On your scale, where ia32 is 1.0 performance per cycle, Itanium was between 0.35 and 0.40, barely better than ARM XScale. I took the ia32 to indicate a compiler working with a processor that it was well-tuned to schedule for and the Itanium results as indicative of how code that wasn't analyzed or scheduled with much insight into ia64 would do. The PA- RISC code would have been compiled in an environment that was completely naive of itanium, and I'm not surprised that it can't be translated into code that does well on itanium (any more than can ia32). If the architecture depends heavily on the compiler and the code was compiled and scheduled by a compiler that's naive of the architecture, it's hardly surprising that it can't be translated into code that performs well. That they got ia32 translation to work even acceptably seems something of a miracle to me. Robert.
From: Anton Ertl on 24 Oct 2009 14:05 jgd(a)cix.compulink.co.uk writes: >In article <2009Oct24.154356(a)mips.complang.tuwien.ac.at>, >anton(a)mips.complang.tuwien.ac.at (Anton Ertl) wrote: > >> Judging from experience with Linux-Alpha, this probably means that the >> kernel supports executing IA-32 executables, but needs a helper file >> for that (on Linux-Alpha it was the emulator), and that file is >> missing. > >What do you get when you run ldd on the IA-32 executable? [ia64:~/gforth:25338] ldd ./gforth not a dynamic executable [ia64:~/gforth:25339] file ./gforth ../gforth: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped >I'm >wondering if it needs a different loader, since having one of those >missing is one way of producing the error message you quote. It's possible that it needs a different loader. AFAIK ldd then needs that loader, too. Looking at the strace for ldd, I see: stat("/lib/ld-linux.so.2", 0x60000fffffe4b480) = -1 ENOENT (No such file or directory) With that, I found that package I needed to install on this Debian system (ia32-libs), and now I can run IA32 programs on this IA64 machine. I just ran some simple Gforth benchmarks on it: sieve bubble matrix fib 0.764 1.000 0.560 1.188 IA64 code (gcc-4.1) on 900MHz Itanium II 1.840 2.284 1.080 2.796 IA32 code (gcc-2.95) on 900MHz Itanium II 0.261 0.299 0.156 0.375 IA32 code (gcc-2.95) on 2.26GHz Pentium 4 (These gcc versions give good performance for Gforth). Note that this Pentium 4 (released in May 2002 according to Wikipedia) is contemporary with this Itanium II (released in 2002-07-08 according to Wikipedia). - anton -- M. Anton Ertl Some things have to be seen to be believed anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html
From: Anton Ertl on 24 Oct 2009 14:39 Robert Myers <rbmyersusa(a)gmail.com> writes: >On Oct 24, 10:02=A0am, an...(a)mips.complang.tuwien.ac.at (Anton Ertl) >wrote: >> http://www.complang.tuwien.ac.at/anton/euroforth/ef09/papers/ertl-sli...h= >ttp://www.complang.tuwien.ac.at/franz/latex-bench >>3. >I couldn't get the link to work when I wrote the post. That's no wonder because apparently your Newsreader mutilates it. Here is is again: http://www.complang.tuwien.ac.at/anton/euroforth/ef09/papers/ertl-slides.pdf >On your scale, >where ia32 is 1.0 performance per cycle, Different IA32 implementations have different performance per cycle in the range of 0.55-1.0. >Itanium was between 0.35 and >0.40, barely better than ARM XScale. ~0.39, In the same ballpark as the other non-IA32/AMD64 CPUs (~0.34-0.53). >I took the ia32 to indicate a >compiler working with a processor that it was well-tuned to schedule >for and the Itanium results as indicative of how code that wasn't >analyzed or scheduled with much insight into ia64 would do. So the PPC, Alpha and ARM results are also due to lack of insight into the scheduling requirements of the CPU in your opinion? My theory (which you can find in the text of that slide) is that the better perfromance of the IA32 and AMD64 implementations on this benchmark is because they perform indirect-branch prediction and most of the others do not (hmm, the 21264B also has a kind of indirect-branch predictor, but the performance is still not so great at ~0.43; I have no theory for that). Unless the PA-RISC implementation you are thinking of has an indirect-branch predictor, I have no reason to expect it to perform better than ~0.5. - anton -- M. Anton Ertl Some things have to be seen to be believed anton(a)mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html
From: Robert Myers on 24 Oct 2009 15:25 On Oct 24, 2:39 pm, an...(a)mips.complang.tuwien.ac.at (Anton Ertl) wrote: > Robert Myers wrote > >On your scale, > >where ia32 is 1.0 performance per cycle, > > Different IA32 implementations have different performance per cycle in > the range of 0.55-1.0. > > >Itanium was between 0.35 and > >0.40, barely better than ARM XScale. > > ~0.39, In the same ballpark as the other non-IA32/AMD64 CPUs (~0.34-0.53).. > > >I took the ia32 to indicate a > >compiler working with a processor that it was well-tuned to schedule > >for and the Itanium results as indicative of how code that wasn't > >analyzed or scheduled with much insight into ia64 would do. > > So the PPC, Alpha and ARM results are also due to lack of insight into > the scheduling requirements of the CPU in your opinion? > > My theory (which you can find in the text of that slide) is that the > better perfromance of the IA32 and AMD64 implementations on this > benchmark is because they perform indirect-branch prediction and most > of the others do not (hmm, the 21264B also has a kind of > indirect-branch predictor, but the performance is still not so great > at ~0.43; I have no theory for that). > > Unless the PA-RISC implementation you are thinking of has an > indirect-branch predictor, I have no reason to expect it to perform > better than ~0.5. > I don't have enough insight into the other architectures to comment. I first looked at the chart and said, yup, just like I said, it's a compiler built and tuned around x86. I don't have any insight into what being architecture-naive on the other architectures might be, but, for Itanium, you have to start with deep insight into the code in order to get a payback on all the fancy bells and whistles. Itanium should be getting more instructions per clock, not significantly fewer (that *was* the idea, wasn't it?). Even with respect to the other architectures, it's only in the pack. Once you're past the source code and information you can preserve from it in intermediate representations, you have an expensive space heater. I just happened to have your charts fresh in mind when I made the comment, and neither your results nor the fact that binary translation doesn't work well is a surprise. My apologies if you feel that I overinterpreted your numbers and didn't give sufficient credit to your own analysis. Robert.
From: Bernd Paysan on 24 Oct 2009 18:31
Del Cecchi wrote: > You could use SOI, no bulk. :-) There still is a bulk, there is just no substrate, so the bulk is left floating. The diodes I mentioned are sill there, supplying the bulk when forward biased (this is the well-known effect of SOI to have variable gate thresholds through charging and discharging the bulk below the diodes threshold, unless you add in a real bulk contact like on stock silicon wafers). > I don't get the point of the AC. Light bulbs and space heaters are AC > powered and still disipate power. What did I miss? I can't tell you. Andy apparently doesn't care much about the physics behind integrated circuits, his knowledge stops at the gate level. This is completely ok for digital design, but I wonder why he makes that sort of suggestions ;-). One interesting property of quantum mechanics is that for irreversible logic, there's a minimum amount of energy that is necessary to make it happen. Reversible logic does not have this drawback. Therefore, people investigate into reversible logic, even though the actual components to get that benefit are not in sigh (not even carbon nanotube switches have these properties, even though they are much closer to the physical limits for irreversible logic). Many people also forget that quantum mechanics does not properly take changes in the system into account, and that means that your reversible logic only works with the predicted low power when the inputs are not changing any more - and this is just the uninteresting case (the coherent one - changes in the system lead to decoherence, and thereby to classical physics). -- Bernd Paysan "If you want it done right, you have to do it yourself" http://www.jwdt.com/~paysan/ |