Prev: gcc4.4: f951: warning: command line option "-MM" is valid for C/C++/Java/ObjC/ObjC++ but not for Fortran
Next: Bad dependency relation for "gfortran"
From: Richard Maine on 13 Aug 2010 11:34 Ron Shepard <ron-shepard(a)NOSPAM.comcast.net> wrote: > http://sourceforge.net/projects/math-atlas/ .... > This code ...The hard part of hand-tuning assembly > is eliminated through brute force tuning of the various parameters Note that if you don't do the hard part, hand-coded assembly might quite likely be slower than the code from high-level languages. There is a long and well established history of people allegedly hand optimizing codes only to find that they made the codes actually run slower instead of faster. I've certainly done such things myself. This history goes back almost 50 years now, but it is probably more likely to happen now than 50 years ago. This is the case for many attempts at hand optimization - not just assembly. Such things as loop unrolling, for example, can slow things down in some scenarios because it can inhibit the compiler's ability to do its own optimization or parallelization. I suppose I should repeat the cliche about testing such things. It shouldn't need saying, but it does turn out to need saying... a lot. If you try to optimize code and don't test the results, you haven't done much of a job of optimization. There have been many cases where people didn't bother to do such testing, but claimed they had achieved wondrous optimizations, only to have someone else find that the code could be substantially sped up by removing the claimed optimizations. -- Richard Maine | Good judgment comes from experience; email: last name at domain . net | experience comes from bad judgment. domain: summertriangle | -- Mark Twain
From: glen herrmannsfeldt on 13 Aug 2010 12:03 Richard Maine <nospam(a)see.signature> wrote: (snip, someone wrote) >> This code ...The hard part of hand-tuning assembly >> is eliminated through brute force tuning of the various parameters > Note that if you don't do the hard part, hand-coded assembly might quite > likely be slower than the code from high-level languages. There is a > long and well established history of people allegedly hand optimizing > codes only to find that they made the codes actually run slower instead > of faster. I've certainly done such things myself. This history goes > back almost 50 years now, but it is probably more likely to happen now > than 50 years ago. There are stories back to the first Fortran compiler. After writing the compiler, the develepers looked at the generated code, and were surprised at some of the things it did. There there was OS/360 Fortran H, reported to generate code "as good as an experienced assembler programmer" For RISCier processors, it is even harder to hand optimize the code, IA64 being one of the harder ones. The interaction between instructions is so strong that only computers can do it fast enough. > This is the case for many attempts at hand optimization - not just > assembly. Such things as loop unrolling, for example, can slow things > down in some scenarios because it can inhibit the compiler's ability to > do its own optimization or parallelization. -- glen
From: Nick Maclaren on 13 Aug 2010 12:52 In article <i43qcf$mj0$1(a)speranza.aioe.org>, glen herrmannsfeldt <gah(a)ugcs.caltech.edu> wrote: >Richard Maine <nospam(a)see.signature> wrote: > >>> This code ...The hard part of hand-tuning assembly >>> is eliminated through brute force tuning of the various parameters > >> Note that if you don't do the hard part, hand-coded assembly might quite >> likely be slower than the code from high-level languages. There is a >> long and well established history of people allegedly hand optimizing >> codes only to find that they made the codes actually run slower instead >> of faster. I've certainly done such things myself. This history goes >> back almost 50 years now, but it is probably more likely to happen now >> than 50 years ago. > >There are stories back to the first Fortran compiler. After writing >the compiler, the develepers looked at the generated code, and were >surprised at some of the things it did. > >There there was OS/360 Fortran H, reported to generate code >"as good as an experienced assembler programmer" Experienced, yes - skilled, no. It was a ghastly compiler. Now, there WERE others that did just that - though I can't remember which ones they were (not on a System/370, anyway). >For RISCier processors, it is even harder to hand optimize the >code, IA64 being one of the harder ones. The interaction between >instructions is so strong that only computers can do it fast enough. And Terje :-) Regards, Nick Maclaren.
From: Vincenzo Mercuri on 13 Aug 2010 13:13 Ron Shepard ha scritto: > There is something in between using high-level language constructs > and hand-coding assembly. An example of this is the ATLAS BLAS > library > > http://sourceforge.net/projects/math-atlas/ > > This code uses a high-level language, C, but it is used in a very > low-level primitive way. Basically, it is writing assembly language > in C. There is relatively little compiler optimization that can, or > should be done on that code. The hard part of hand-tuning assembly > is eliminated through brute force tuning of the various parameters > (in ATLAS, that includes tuning for the number of registers, the > size of cache, loop unrolling, matrix subblocking, and things like > that). After a piece of code is written, it is run for hours at a > time on the target architecture in order to search for the optimal > set of tuning parameters, and then that final result is distributed > for use. > > Why was ATLAS done in C? I don't know definitely, but I think it is > simply because it relies heavily on use of the C preprocessor. If > you look at some of the routines, there are more lines of > preprocessor code than there are executable code. The low-level C > code that is there is simple and could have been done just as easily > (or maybe even easier) in fortran. In fact, considering the > aliasing problems with C (look at the code, it is explicitly > checked), fortran is probably the more natural language for things > like ATLAS. But the C preprocessor has always been an integral part > of the C language, and as all of us fortran programmers here know, > the fortran standards process failed to produce anything similarly > useful for fortran over the past 30+ years. So ATLAS (and many > other similar low-level utility programs) is written in C rather > than fortran. Thank you, precious article and link. I think that the use of the C language instead of assembly is due as much to the extensive use of its preprocessor as to the demands for portability. Yes, I didn't look at Atlas code yet, and maybe I am wrong, since there are many ways to write non portable code even in C, but this is something that cannot be underestimated. We cannot talk about assembly regardless of the target machine. Also, assembly code wouldn't be optimal enough for all the targets and no longer optimizable as well. A library in C (or Fortran) is to make the most of the host compiler's optimization capabilities. -- Vincenzo Mercuri
From: glen herrmannsfeldt on 13 Aug 2010 18:20
Nick Maclaren <nmm(a)gosset.csi.cam.ac.uk> wrote: (snip on compiled code vs. hand generated assembly code) >>For RISCier processors, it is even harder to hand optimize the >>code, IA64 being one of the harder ones. The interaction between >>instructions is so strong that only computers can do it fast enough. > And Terje :-) Hmmm. Many compilers that I know of now use dynamic programming to select an optimal set of instructions. Given the appropriate weights (instruction times) dynamic programming chooses the appropriate instructions. I have an actual IA64 machine, and the books describing the instruction set. I haven't even thought about trying to do any assembly programming for it. For those that don't follow such things, IA64 instructions are grouped into 128 bit bundles, with three 41 bit instructions and a five bit template field in each bundle. There are five different types of instructions, and 24 different combinations of those types that can go into a bundle. Much of the possible interaction between instructions that most pipelined processors have to figure out for you is done by the compiler for IA64. With most processors, you can assume that the instructions are executed in order, with the exception of branch delay slots on many RISC processors. As far as I know, you can't make such assumptions for IA64. -- glen |