Prev: Pocket Lisp Machine
Next: Next Generation of Language
From: Henry Bigelow on 29 Sep 2006 13:36 hi paul, i took a look at SPEC--it seems definitive, indeed, but for comparing computer systems, not languages. is that right? would you be able to point me in a direction to compare languages? i have a hunch that there is something very wrong with the lisp benchmark which is about 100 times more memory/ or slower than a c program, but it'd be very difficult for me to rewrite it more efficiently. it's possible that for various reasons, people don't care enough about the shootout http://shootout.alioth.debian.org/ to fix the benchmarks. or, it's possible that the lisp ones are indeed written to be as memory/speed efficient as possible. what do you think is going on here? thanks, henry > This is the first time I've heard of this particular competition (and > set of benchmarks) -- it looks rather cool! I doubt it has the same > clout as SPEC though... :-) > > Hey, while you are at learning lisp, why not debug that 120x memory > program to see why this happens? profile and time are your friends... > ;-) you're overestimating my skill here! > > CL-USER>CL-USER> (time (main 5000000)) > 10000000 > Evaluation took: > 34.466 seconds of real time > 12.708794 seconds of user run time > 21.369335 seconds of system run time > [Run times include 0.044 seconds GC run time.] > 0 page faults and > 79,975,616 bytes consed. > NIL > CL-USER>
From: Javier on 29 Sep 2006 17:14 Henry Bigelow ha escrito: > hi paul, > i took a look at SPEC--it seems definitive, indeed, but for comparing > computer systems, not languages. is that right? would you be able to > point me in a direction to compare languages? i have a hunch that > there is something very wrong with the lisp benchmark which is about > 100 times more memory/ or slower than a c program, but it'd be very > difficult for me to rewrite it more efficiently. > > it's possible that for various reasons, people don't care enough about > the shootout > > http://shootout.alioth.debian.org/ > > to fix the benchmarks. or, it's possible that the lisp ones are indeed > written to be as memory/speed efficient as possible. > > what do you think is going on here? We have already said that: 1) Lisp loads the entire compiler/debugger/documentation among with the program. As you can imagine, it is impossible to compite to C in memory ussage in this situation (imagine that you would have to load GCC with your C program, it would be much bigger, don't think so?). But please, take also in count that, while in a program written in C it does not count the memory shared with the C library (it is linked dinamically), in Lisp you have the Lisp library loaded with you, so the C library is already consuming memory from your system, but the program which you use to see that memory doesn't take that in count. 2) It is not 100x slower. Please take a look at: http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=sbcl&lang2=gcc That demonstrates that, except for some bechmarks in which algorithms differ a lot, SBCL is almost as fast as C. 3) SBCL is not the only implementation. There are other implementations which even run faster, and which consumes less memory.
From: Henry Bigelow on 29 Sep 2006 20:51 Javier wrote: > > We have already said that: > > 1) Lisp loads the entire compiler/debugger/documentation among with the > program. i didn't know that. but, if so, why aren't all the other benchmarks 100x more memory? k-nucleotide is only 3.1x more memory. As you can imagine, it is impossible to compite to C in memory > ussage in this situation (imagine that you would have to load GCC with > your C program, it would be much bigger, don't think so?). But please, > take also in count that, while in a program written in C it does not > count the memory shared with the C library (it is linked dinamically), > in Lisp you have the Lisp library loaded with you, so the C library is > already consuming memory from your system, but the program which you > use to see that memory doesn't take that in count. > 2) It is not 100x slower. Please take a look at: > http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=sbcl&lang2=gcc > That demonstrates that, except for some bechmarks in which algorithms > differ a lot, SBCL is almost as fast as C. agreed. fannkuch is 27x slower, and chameneos requires 120x more memory. i'm not contesting anything besides 'fannkuch' for speed. i'm not really concerned about 'startup' because it's amortized. > 3) SBCL is not the only implementation. There are other implementations > which even run faster, and which consumes less memory. yes, but the use of SBCL here is obviously not why the fannkuch benchmark is 26 times slower, instead of 3 or 5 times slower. thanks in advance, henry
From: Javier on 29 Sep 2006 21:26 Henry Bigelow ha escrito: > Javier wrote: > > > > > We have already said that: > > > > 1) Lisp loads the entire compiler/debugger/documentation among with the > > program. > > i didn't know that. but, if so, why aren't all the other benchmarks > 100x more memory? k-nucleotide is only 3.1x more memory. Because that algorithm uses a lot of memory, even in C. If you don't believe me, download and install SBCL. Start it. Only starting it, requires about 6Mb. Now define a function: (defun jj () (format t "hello")) Now it requires about 11Mb, which means that the compiler has been loaded and compiled your function in real time. If you start to do things, like using the debugger, the documentation, extra libraries, etc, your application can grow in memory very fast. A similar thing happens with Java. In C, you are not compiling code in real time, and your program uses the libraries on your system, so this makes the ilussion that your C program uses very little memory. But it probably no. What is happening is that your C program uses libraries that are already loaded in memory (for example stdlib). You can see that this is always happening even in C. Take for example Mozilla, Open Office, the X server, GCC... all of them are using a lot of memory because they need libraries which are not already dinamically loaded. > > 2) It is not 100x slower. Please take a look at: > > http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=sbcl&lang2=gcc > > That demonstrates that, except for some bechmarks in which algorithms > > differ a lot, SBCL is almost as fast as C. > > agreed. fannkuch is 27x slower, and chameneos requires 120x more > memory. i'm not contesting anything besides 'fannkuch' for speed. i'm > not really concerned about 'startup' because it's amortized. A good measure of how well is SBCL doing is disassembling: (disassemble 'jj) ; 1160AD7A: BA27000008 MOV EDX, 134217767 ; no-arg-parsing entry point ; 7F: 8B3D14AD6011 MOV EDI, [#x1160AD14] ; "hello" ; 85: 8B0518AD6011 MOV EAX, [#x1160AD18] ; #<FDEFINITION object for FORMAT> ; 8B: B908000000 MOV ECX, 8 ; 90: FF75F8 PUSH DWORD PTR [EBP-8] ; 93: FF6005 JMP DWORD PTR [EAX+5] ; 96: 90 NOP ; 97: 90 NOP ; 98: 0F0B0A BREAK 10 ; error trap ; 9B: 02 BYTE #X02 ; 9C: 18 BYTE #X18 ; INVALID-ARG-COUNT-ERROR ; 9D: 4D BYTE #X4D ; ECX ; 9E: 0F0B0A BREAK 10 ; error trap ; A1: 02 BYTE #X02 ; A2: 18 BYTE #X18 ; INVALID-ARG-COUNT-ERROR ; A3: 4D BYTE #X4D ; ECX If you understand some asemssembler you can see that SBCL is actually very clever compiling things. :-) (For example, it is using a jump in line 93 instead of function call to avoid some overhead. Not all C compilers are able to do so.)
From: Wade Humeniuk on 30 Sep 2006 03:12
Speaking of which. By modifying the original somewhat (I do not have SBCL) I do not have the patience to code it like C, so.... (defvar *fannkuch-prints* nil) (defvar *fannkuch-max-flips* 0) (declaim (inline flip make-fannkuch init-fannkuch print-fannkuch copy-fannkuch permute-fannkuch count-flips)) (defun make-fannkuch (n) (make-array n :element-type '(unsigned-byte 8))) (defun init-fannkuch (fannkuch end) (loop for n from 1 for i from 0 below end do (setf (aref fannkuch i) n)) fannkuch) (defun copy-fannkuch (from to start end) (declare (type (simple-array (unsigned-byte 8) (*)) from to) (optimize (speed 3) (safety 0) (debug 0) #+lispworks(fixnum-safety 0))) (loop for i from start below end do (setf (aref to i) (aref from i)))) (defun permute-fannkuch (fannkuch n) (declare (type (simple-array (unsigned-byte 8) (*)) fannkuch) (optimize (speed 3) (safety 0) (debug 0) #+lispworks(fixnum-safety 0))) (let ((first (aref fannkuch 0))) (loop for i from 0 below (1- n) do (setf (aref fannkuch i) (aref fannkuch (1+ i)))) (setf (aref fannkuch (1- n)) first) fannkuch)) (defun print-fannkuch (permutation) (declare (type (simple-array (unsigned-byte 8) (*)) permutation) (optimize (speed 3) (safety 0) (debug 0) #+lispworks(fixnum-safety 0))) (when (< *fannkuch-prints* 30) (loop for i across permutation do (princ i)) (terpri) (incf *fannkuch-prints*))) (defun flip (permutation) (declare (type (simple-array (unsigned-byte 8) (*)) permutation) (optimize (speed 3) (safety 0) (debug 0) #+lispworks(fixnum-safety 0))) (let ((n (aref permutation 0))) (when (> n 1) (loop for start from 0 for end downfrom (1- n) while (< start end) do (rotatef (aref permutation start) (aref permutation end)))))) (defun count-flips (permutation) (declare (type (simple-array (unsigned-byte 8) (*)) permutation) (optimize (speed 3) (safety 0) (debug 0) #+lispworks(fixnum-safety 0))) (let ((flips 0)) (loop until (= (aref permutation 0) 1) do (flip permutation) (incf flips)) (setf *fannkuch-max-flips* (max flips *fannkuch-max-flips*)) flips)) (defun fanndance (fannkuch len &aux (first (first fannkuch)) (next (second fannkuch))) (declare (type (simple-array (unsigned-byte 8) (*)) first next) (type fixnum len) (optimize (speed 3) (safety 0) (debug 0) #+lispworks(fixnum-safety 0))) (copy-fannkuch first next 0 (length first)) (if (= len 1) (progn (count-flips next) (print-fannkuch first)) (progn (dotimes (i (1- len)) (fanndance (cdr fannkuch) (1- len)) (permute-fannkuch next len)) (fanndance (cdr fannkuch) (1- len))))) (defun fannkuch (n) (let ((*fannkuch-prints* 0) (*fannkuch-max-flips* 0) (fannkuch (loop repeat (1+ n) collect (make-fannkuch n)))) (init-fannkuch (car fannkuch) n) (format t "~A" (with-output-to-string (*standard-output*) (fanndance fannkuch n))) (format t "Pfannkuchen(~D) = ~D" n *fannkuch-max-flips*))) #+ignore(defun main () (fannkuch (parse-integer (second *posix-argv*)))) CL-USER 49 > (time (fannkuch 10)) Timing the evaluation of (FANNKUCH 10) 12345678910 21345678910 23145678910 32145678910 31245678910 13245678910 23415678910 32415678910 34215678910 43215678910 42315678910 24315678910 34125678910 43125678910 41325678910 14325678910 13425678910 31425678910 41235678910 14235678910 12435678910 21435678910 24135678910 42135678910 23451678910 32451678910 34251678910 43251678910 42351678910 24351678910 Pfannkuchen(10) = 38 user time = 2.874 system time = 0.000 Elapsed time = 0:00:03 Allocation = 3792 bytes standard / 4554 bytes conses 0 Page faults Calls to %EVAL 34 NIL CL-USER 50 > |