Prev: Help with Tcal
Next: COBOL Error Handling (was: What MF says about ROUNDED(was:Cobol Myth Busters
From: Robert on 25 Sep 2007 20:49 On Tue, 25 Sep 2007 22:45:12 +0000 (UTC), docdwarf(a)panix.com () wrote: >In article <regif3d0b34nreavsckap09omqjhptnik8(a)4ax.com>, >Robert <no(a)e.mail> wrote: >>On Tue, 25 Sep 2007 09:25:04 +0000 (UTC), docdwarf(a)panix.com () wrote: >> >>>Now, Mr Wagner... is one to expect another dreary series of repetitions >>>about how mainframers who said that indices were faster than subscripts >>>were, in fact, right about something? >> >>I expected I-told-you-so from the mainframe camp. > >It may be interesting to see if you get one; my point - and pardon the >obscure manner of its making - was that you made a series of repetitions >which a demonstration has disproved and it may be interesting to see if an >equally lengthy series of repetitions follows... or if it just Goes Away >until you next get an idea about something... and begin another, similar >series of repetitions. We saw that subscript and index run at the same speed on three CPU families -- HP PA (SuperDome), DEC Alpha (Cray) and Richard's undisclosed machine, possibly Intel. I am confident we'd see the same on Intel, PowerPC (pseries, iseries, Mac) and SPARC, based on tests I ran a few years ago. Thus the generalizaton. I was surprised to see zSeries did not follow the pattern of the others. My previous idea, that memory alignment no longer matters, turned out to be wrong. It does matter on modern RISC machines. There's a good chance I'll get another idea.
From: William M. Klein on 25 Sep 2007 22:29 "Robert" <no(a)e.mail> wrote in message news:uq8jf3pd3rq48eqio0hdtqo172nv2c16is(a)4ax.com... > On Tue, 25 Sep 2007 22:45:12 +0000 (UTC), docdwarf(a)panix.com () wrote: <snip> > We saw that subscript and index run at the same speed on three CPU families -- > HP PA > (SuperDome), DEC Alpha (Cray) and Richard's undisclosed machine, possibly > Intel. I am > confident we'd see the same on Intel, PowerPC (pseries, iseries, Mac) and > SPARC, based on > tests I ran a few years ago. Thus the generalizaton. I was surprised to see > zSeries did > not follow the pattern of the others. > Robert, What you CONTINUE to seem to miss is that for MOST "performance" recommendations from compiler vendors, it is their INTERNAL knowledge of the code generated by their compiler and NOT the machine speed for specific instructions. You have repeatedly refered to "load" vs "multiply" machine instructions in the discussionn of subscripts vs indexes - and yet have NOT posted the "generated" code from a VARIETY of compilers to show that this is actually the "only" (or even major) differences between the generated code. I cannot tell you why there are as many differences as therree are in generated code, but I do know from work that I have done over the years that there is (was - and always will be) surprising differences for the "most simple" COBOL source code. Often this has to do with options/features that are NOT used in the specific code sequences - but are possible. Different compilers (and especially different optimizer) simply do not always create the "expected - if *I* were the compiler" object code. So far, it does appear (to me) that Micro Focus (various platforms) *and* the HP OpenVMS comiler create object code with "similar" performance (regardless of platform) while IBM (at least on one mainframe) creates very different code sequences. I don't know if Daniel has run a test for Unisys (given the Standard-conforming problems with the Speed2 program) or if anyone else has run with different compilers. Again, I would strongly GUESS that differences have to do with generated object code and NOT speed of specific machine instructions. But this is a guess without evidence to "prove" it. -- Bill Klein wmklein <at> ix.netcom.com
From: Pete Dashwood on 26 Sep 2007 00:08 "Robert" <no(a)e.mail> wrote in message news:uq8jf3pd3rq48eqio0hdtqo172nv2c16is(a)4ax.com... > On Tue, 25 Sep 2007 22:45:12 +0000 (UTC), docdwarf(a)panix.com () wrote: > >>In article <regif3d0b34nreavsckap09omqjhptnik8(a)4ax.com>, >>Robert <no(a)e.mail> wrote: >>>On Tue, 25 Sep 2007 09:25:04 +0000 (UTC), docdwarf(a)panix.com () wrote: >>> > >>>>Now, Mr Wagner... is one to expect another dreary series of repetitions >>>>about how mainframers who said that indices were faster than subscripts >>>>were, in fact, right about something? >>> >>>I expected I-told-you-so from the mainframe camp. >> >>It may be interesting to see if you get one; my point - and pardon the >>obscure manner of its making - was that you made a series of repetitions >>which a demonstration has disproved and it may be interesting to see if an >>equally lengthy series of repetitions follows... or if it just Goes Away >>until you next get an idea about something... and begin another, similar >>series of repetitions. > > We saw that subscript and index run at the same speed on three CPU > families -- HP PA > (SuperDome), DEC Alpha (Cray) and Richard's undisclosed machine, possibly > Intel. I am > confident we'd see the same on Intel, PowerPC (pseries, iseries, Mac) and > SPARC, based on > tests I ran a few years ago. Thus the generalizaton. I was surprised to > see zSeries did > not follow the pattern of the others. Well, Robert, I don't want to shake your confidence, and I deliberately refrained from posting these results (I felt you were getting enough flak...), but reconsidered when I saw your statement above :-) Here are the results of "Speed2" from a genuine Intel Celeron Core 2 Duo Vaio AR250G notebook with 2 GB of main memory, running under Windows XP with SP2 applied, using your code (with the following amendments: all asterisks and comments removed, exit perform cycle removed), compiled with no options other than the defaults (which includes "Optimize"), with the Fujitsu NetCOBOL version 6 compiler, compiled to .EXE: Null test 1 Index 3 Subscript 25 Subscript comp-5 3 Index 1 3 Subscript 1 22 Subscript 1 comp-5 3 As you can see, indexing is between 7 and 8 times more efficient than subscripting, unless you use optimized subscripts, in this environment. (I was surprised that the figures are 3 times faster than the z/OS mainframe figures posted by Charlie...:-) I've had this machine for around a year now, bought it at Fry's in L.A a few days after Core 2 became available in the marketplace, and have become blase about the speed of it. A few days ago I was running a test on a P4 notebook that had to create a couple of million rows on an ACCESS database. It ran for 20 minutes, then closed down due to a thermal cutoff. (If the CPU runs at or near 100% for an extended period, the machine closes down :-) It was made in Germany and I bought it in England. It is an annoying, although valuable feature of this machine, that it protects itself. Anyway, I transferred the job to the Vaio and tried again: It never even broke into a sweat; no fans came on and the job was done in 7 minutes...(It does NOT have a high speed disk, runs at 5400 RPM but is well buffered, and they claim it was the first disk drive for a notebook that had 200GB)) It is things like this that make me wonder why we even bother about performance and have heated discussions about things like indexes and subscripts, when the technology is advancing rapidly enough to simply take care of it. More importantly, I hope Robert you will accept that generalizations about performance simply don't stack up. Sometimes the most unexpected results are obtained. The only reliable way to check performance is empirically (I give you credit for doing that, and publishing results even when they didn't tell you what you wanted to hear) and, outside of test results, everything else should be accorded the same degree of credibility that we accord glossy marketing brochures ("MIGHT be true...but the person presenting it has a definite axe to grind :-)) > > My previous idea, that memory alignment no longer matters, turned out to > be wrong. It does > matter on modern RISC machines. > > There's a good chance I'll get another idea. Let's hope it won't be lonely... :-) Pete -- "I used to write COBOL...now I can do anything."
From: Robert on 26 Sep 2007 01:13 On Wed, 26 Sep 2007 02:29:59 GMT, "William M. Klein" <wmklein(a)nospam.netcom.com> wrote: >"Robert" <no(a)e.mail> wrote in message >news:uq8jf3pd3rq48eqio0hdtqo172nv2c16is(a)4ax.com... >> On Tue, 25 Sep 2007 22:45:12 +0000 (UTC), docdwarf(a)panix.com () wrote: ><snip> >> We saw that subscript and index run at the same speed on three CPU families -- >> HP PA >> (SuperDome), DEC Alpha (Cray) and Richard's undisclosed machine, possibly >> Intel. I am >> confident we'd see the same on Intel, PowerPC (pseries, iseries, Mac) and >> SPARC, based on >> tests I ran a few years ago. Thus the generalizaton. I was surprised to see >> zSeries did >> not follow the pattern of the others. >> > >Robert, > What you CONTINUE to seem to miss is that for MOST "performance" >recommendations from compiler vendors, it is their INTERNAL knowledge of the >code generated by their compiler and NOT the machine speed for specific >instructions. If you prefer, treat the compiler and CPU as a black box. The results show the relative speed of changing one variable, such as index versus subscript, and keeping everything else unchanged. >You have repeatedly refered to "load" vs "multiply" machine instructions in the >discussionn of subscripts vs indexes - and yet have NOT posted the "generated" >code from a VARIETY of compilers to show that this is actually the "only" (or >even major) differences between the generated code. Server Express will not show generated code for HP PA, using options ASM or ASMLIST. It will for other platforms such as SPARC. >So far, it does appear (to me) that Micro Focus (various platforms) *and* the HP >OpenVMS comiler create object code with "similar" performance (regardless of >platform) while IBM (at least on one mainframe) creates very different code >sequences. I don't know if Daniel has run a test for Unisys (given the >Standard-conforming problems with the Speed2 program) or if anyone else has run >with different compilers. Again, I would strongly GUESS that differences have >to do with generated object code and NOT speed of specific machine instructions. >But this is a guess without evidence to "prove" it. As you requested, I compiled speed2 with ENTCOBOL and NOMF at the end of the options already there. Getting it to compile required three changes: Change To comp-5 binary goback stop run exit perform cycle continue It did not object to single quotes nor free format, which don't affect performance. Removing exit perform cycle made each of the tests run almost twice as fast, but their relative speeds did not change. >I cannot tell you why there are as many differences as there are in generated >code, but I do know from work that I have done over the years that there is >(was - and always will be) surprising differences for the "most simple" COBOL >source code. Often this has to do with options/features that are NOT used in >the specific code sequences - but are possible. Different compilers (and >especially different optimizer) simply do not always create the "expected - if >*I* were the compiler" object code. I ran MANY speed tests in the '80s and early '90s, for Computer Language magazine, which is long gone. I was comparing the relative speed of LANGUAGES as well as compilers and machines. My technique was to take a single algorithm, The Sieve, and write it once in well-written assembly language that used all the 'tricks' the compiler could have used, write it a second time in the subject language using all of its features (not a line-for-line translation). Then I expressed the efficiency of the compilerand language as the ratio of its speed divided by the hypothetical speed of the machine. On one test, I was stunned when GCC and another hot C compiler BEAT my hand-crafted assembly language. Their generated code was wildly non-intuitive. I used to have results from many more languages than shown below -- ALGOL, LISP, FORTH, etc. Here are some notes I did keep. As you will see, GOOD compiler-generated code used to run 3-4 times slower than assembly language while p-code ran 50-250 times slower. By the early '90s, the ratio had dropped below 2. Note how the original 4.77 MHz PC running Cobol beat an entry-level mainframe running Cobol. It also beat the mainframe running an IO bound program which pit VSAM using 3310 against Realia's indexed file system using a slow 30 MB hard drive. When I saw that, I took non-mainframe computers seriously. I started writing high-volume production applications for the 'personal computer'. Sieve Benchmark Times, mostly circa. 1984 Machine Language Time Ratio to asm IBM 4331-2 S/370 Cobol 14 14 Cobol w/Capex optimizer 11 11 PL/I 4 4 Assembly 1 TRS80-2 Z80-4 Cobol RM 3390 242 Basic Int Integer 1210 86 Assembly 14 TI99/4A TMS9900 Basic Int Floating 3960 566 Assembly 7 TRS-2000 Intel 8086-8 Basic Int 890 445 Assembly 2 IBM PC Intel i8088-4.7 Cobol CIS 1700 425 Basic Int MS 1330 333 Cobol mbp 220 55 Pascal Turbo 2 28 7 Pascal Turbo 4 24 6 '90 C C-Systems 23 6 Fortran MS 3 20 5 '90 Basic QBasic 4 18 5 '92 Pascal Turbo 1 16 4 Basic BASCOM 1 16 4 C DeSmet 16 4 '90 C BC++ 3 12 3 '92 Cobol Realia 12 3 Fortran MS 4 11 3 '92 Assembly 4 PC Intel i486-33 (run in 1993) xBase Force .881 9.3 Cobol Realia .167 1.8 Fortran MS 4 .112 1.2 Assembly .095 1.0 C GCC .089 .93 C EMX, OS/2 32bit .082 .86 Assembly w/loop unrolled .058 .61 Here's the Sieve in Realia Cobol, which had inline perform in 1984. * Seive progrm using 1974 COBOL * Benchmark. Finds primes between 3 and 16384 * IDENTIFICATION DIVISION. PROGRAM-ID. PRIMES. ENVIRONMENT DIVISION. CONFIGURATION SECTION. SOURCE-COMPUTER. IBM-PC. OBJECT-COMPUTER. IBM-PC. DATA DIVISION. WORKING-STORAGE SECTION. 77 TOTAL-PRIME-COUNT PIC S9(4) COMP. 77 PRIME PIC S9(4) COMP. 77 PRIME-MULTIPLE PIC S9(4) COMP. 01 PRIME-FLAGS-GROUP. 05 PRIME-FLAG PIC X OCCURS 8191 TIMES INDEXED BY PRIME-INDEX. 01 FILLER. 05 TIME-AREA. 10 HH PIC 99. 10 MM PIC 99. 10 SS PIC 99. 10 HUN PIC 99. 05 MILLI-SECONDS PIC S9(8) COMP. 05 BGN-MILLI-SECONDS PIC S9(8) COMP. 05 DISPLAY-MILLI-SECONDS PIC Z(8). PROCEDURE DIVISION. PRIME-COUNT-ROUTINE. * * Compute the primes 10 times to increase timing accuracy. * Indicate all odd numbers are potential primes * Indicate no primes have been found. * PERFORM CALC-MILLI-SECONDS MOVE MILLI-SECONDS TO BGN-MILLI-SECONDS. PERFORM 10 TIMES MOVE ALL '1' TO PRIME-FLAGS-GROUP MOVE ZERO TO TOTAL-PRIME-COUNT SET PRIME-INDEX TO 1 PERFORM COUNT-PRIMES END-PERFORM DISPLAY 'COUNT:' TOTAL-PRIME-COUNT PERFORM CALC-MILLI-SECONDS. SUBTRACT BGN-MILLI-SECONDS FROM MILLI-SECONDS GIVING DISPLAY-MILLI-SECONDS. DISPLAY 'Elapsed time was' DISPLAY-MILLI-SECONDS ' milliseconds' STOP RUN. * * For each number which has not been flagged as a multiple * of an earlier prime, indicate all of its multiples in * the range being evaluated are not primes. Note that * PRIME-FLAG(1) represents the integer 3, PRIME-FLAG(n) * represents the integer 2n+1. * Note that PRIME-MULTIPLE technically can only contain * values through 9999, but will have values up to 24574. * This works since ADD does not truncate, * S9(4) COMP fields can be -32768 through 32767. * COUNT-PRIMES. SEARCH PRIME-FLAG VARYING PRIME-INDEX WHEN PRIME-FLAG (PRIME-INDEX) IS NOT EQUAL TO ZERO ADD 1 TO TOTAL-PRIME-COUNT SET PRIME-MULTIPLE TO PRIME-INDEX ADD PRIME-MULTIPLE PRIME-MULTIPLE 1 GIVING PRIME ADD PRIME TO PRIME-MULTIPLE SET PRIME-INDEX UP BY 1 PERFORM UNTIL PRIME-MULTIPLE IS GREATER THAN 8191 MOVE ZERO TO PRIME-FLAG (PRIME-MULTIPLE) ADD PRIME TO PRIME-MULTIPLE END-PERFORM GO TO COUNT-PRIMES. CALC-MILLI-SECONDS. ACCEPT TIME-AREA FROM TIME. MULTIPLY HH BY 60 GIVING MILLI-SECONDS. ADD MM TO MILLI-SECONDS. MULTIPLY 60 BY MILLI-SECONDS. ADD SS TO MILLI-SECONDS. MULTIPLY 100 BY MILLI-SECONDS. ADD HUN TO MILLI-SECONDS. MULTIPLY 10 BY MILLI-SECONDS.
From: Arnold Trembley on 26 Sep 2007 01:42
Pete Dashwood wrote: > (snip) > > Here are the results of "Speed2" from a genuine Intel Celeron Core 2 Duo > Vaio AR250G notebook with 2 GB of main memory, running under Windows XP with > SP2 applied, using your code (with the following amendments: all asterisks > and comments removed, exit perform cycle removed), compiled with no options > other than the defaults (which includes "Optimize"), with the Fujitsu > NetCOBOL version 6 compiler, compiled to .EXE: > > Null test 1 > Index 3 > Subscript 25 > Subscript comp-5 3 > Index 1 3 > Subscript 1 22 > Subscript 1 comp-5 3 > > As you can see, indexing is between 7 and 8 times more efficient than > subscripting, unless you use optimized subscripts, in this environment. Here are the results of "Speed2" using a 2.60 GHz Pentium 4 with 512 MB of main memory, running under Windows XP with SP2 applied, using Robert's code with EXIT PERFORM CYCLE commented out, compiled with a 1990 education version of Realia COBOL (equivalent to Realia 3): Null test 5 Index 2 Subscript 8 Subscript comp-5 8 Index 1 2 Subscript 1 7 Subscript 1 comp-5 7 Directory of C:\dosboxc\rccob 09/25/2007 11:09 PM 14,438 SPEED2.ASM 09/25/2007 11:05 PM 5,949 speed2.cob 09/25/2007 11:09 PM 25,134 SPEED2.EXE 09/26/2007 12:21 AM 259 speed2.tst 4 File(s) 45,780 bytes 0 Dir(s) 59,246,034,944 bytes free The generated assembler is available, if anyone is interested. Kind regards, -- http://arnold.trembley.home.att.net/ |