From: Ron Garret on 17 Jan 2010 11:19 In article <98666562-db92-4a47-849d-50bf2c3d1a36(a)b2g2000yqi.googlegroups.com>, piscesboy <oraclmaster(a)gmail.com> wrote: > On Jan 17, 9:32 am, Tamas K Papp <tkp...(a)gmail.com> wrote: > > On Sun, 17 Jan 2010 13:38:57 +0200, Captain Obvious wrote: > > > RU> as the key. It conses on every hash lookup, but if that's > > > RU> acceptable... > > > > > It is pretty easy to avoid consing, in single threaded case, at least. > > > > > (let ((the-cons (cons nil nil))) > > > (defun lookup-compound-key (ht k1 k2) > > > (setf (car the-cons) k1 > > > (cdr the-cons) k2) > > > (prog1 > > > (gethash the-cons ht) > > > (setf (car the-cons) nil > > > (cdr the-cons) nil)))) > > > > > Thread-safe solution would probably require thread-local variables. > > > > Dude, it is still mid-January, but you are very likely to win the > > "Optimization of the Year Award" in 2010: it is ugly, and results in a > > much longer runtime. > > > > (let ((the-cons (cons nil nil))) > > (defun ugly-lookup (ht k1 k2) > > (setf (car the-cons) k1 > > (cdr the-cons) k2) > > (prog1 > > (gethash the-cons ht) > > (setf (car the-cons) nil > > (cdr the-cons) nil)))) > > > > (defun lookup (ht k1 k2) > > (gethash (cons k1 k2) ht)) > > > > (defparameter *ht* (make-hash-table :test #'equal)) > > (setf (gethash (cons 5 10) *ht*) 5) > > > > (time > > (dotimes (i 10000000) > > (lookup *ht* 5 10))) > > > > Evaluation took: > > 0.923 seconds of real time > > 0.920058 seconds of total run time (0.916058 user, 0.004000 system) > > [ Run times consist of 0.020 seconds GC time, and 0.901 seconds non-GC > > time. ] > > 99.67% CPU > > 2,330,460,770 processor cycles > > 80,068,320 bytes consed > > > > (time > > (dotimes (i 10000000) > > (ugly-lookup *ht* 5 10))) > > > > Evaluation took: > > 3.464 seconds of real time > > 3.452216 seconds of total run time (3.452216 user, 0.000000 system) > > 99.65% CPU > > 8,753,522,082 processor cycles > > 354,272 bytes consed > > > > I really wish that people who post "optimized" CL code would at least > > benchmark it. > > > > Tamas > > Strange. The ugly cons lookup had 354,272 bytes consed, yet took 3.464 > seconds to execute. But the pretty lookup had 80,068,320 bytes consed > and took fewer processor cycles in 0.923 seconds. > > The ugly cons had fewer bytes consed than the pretty cons...? One of the many reasons it is good to write code in a functional style is that it allows some important memory-management optimizations. A good generational GC can be very efficient even for code that conses a lot as long as the code is purely functional. But as soon as you do a SETF, the assumptions that underlie those optimizations break and the GC has to fall back on less efficient techniques. That is why code that conses more but is written in a purely functional style can often be faster than code that conses less but mutates data. rg
From: Alberto Riva on 17 Jan 2010 15:55 Tamas K Papp wrote: > On Sun, 17 Jan 2010 13:38:57 +0200, Captain Obvious wrote: > >> RU> as the key. It conses on every hash lookup, but if that's >> RU> acceptable... >> >> It is pretty easy to avoid consing, in single threaded case, at least. >> >> (let ((the-cons (cons nil nil))) >> (defun lookup-compound-key (ht k1 k2) >> (setf (car the-cons) k1 >> (cdr the-cons) k2) >> (prog1 >> (gethash the-cons ht) >> (setf (car the-cons) nil >> (cdr the-cons) nil)))) >> >> Thread-safe solution would probably require thread-local variables. > > Dude, it is still mid-January, but you are very likely to win the > "Optimization of the Year Award" in 2010: it is ugly, and results in a > much longer runtime. But the OP was presenting this as a solution to avoid consing, he didn't say anything about runtime. > (let ((the-cons (cons nil nil))) > (defun ugly-lookup (ht k1 k2) > (setf (car the-cons) k1 > (cdr the-cons) k2) > (prog1 > (gethash the-cons ht) > (setf (car the-cons) nil > (cdr the-cons) nil)))) > > (defun lookup (ht k1 k2) > (gethash (cons k1 k2) ht)) > > (defparameter *ht* (make-hash-table :test #'equal)) > (setf (gethash (cons 5 10) *ht*) 5) > > (time > (dotimes (i 10000000) > (lookup *ht* 5 10))) > > Evaluation took: > 0.923 seconds of real time > 0.920058 seconds of total run time (0.916058 user, 0.004000 system) > [ Run times consist of 0.020 seconds GC time, and 0.901 seconds non-GC > time. ] > 99.67% CPU > 2,330,460,770 processor cycles > 80,068,320 bytes consed > > (time > (dotimes (i 10000000) > (ugly-lookup *ht* 5 10))) > > Evaluation took: > 3.464 seconds of real time > 3.452216 seconds of total run time (3.452216 user, 0.000000 system) > 99.65% CPU > 8,753,522,082 processor cycles > 354,272 bytes consed > > I really wish that people who post "optimized" CL code would at least > benchmark it. And it would equally be nice if people who post benchmarks knew how to perform them. If you don't compile your macros (eg DOTIMES), you're simply measuring the overhead of expanding them. On ACL 8.1: CL-USER> (defun test1 () (dotimes (i 10000000) (lookup *ht* 5 10))) TEST1 CL-USER> (defun test2 () (dotimes (i 10000000) (ugly-lookup *ht* 5 10))) TEST2 CL-USER> (compile 'test1) TEST1 NIL NIL CL-USER> (compile 'test2) TEST2 NIL NIL CL-USER> (time (test1)) ; cpu time (non-gc) 1,470 msec user, 0 msec system ; cpu time (gc) 100 msec user, 0 msec system ; cpu time (total) 1,570 msec user, 0 msec system ; real time 1,567 msec ; space allocation: ; 10,000,000 cons cells, 0 other bytes, 0 static bytes NIL CL-USER> (time (test2)) ; cpu time (non-gc) 1,660 msec user, 0 msec system ; cpu time (gc) 0 msec user, 0 msec system ; cpu time (total) 1,660 msec user, 0 msec system ; real time 1,672 msec ; space allocation: ; 0 cons cells, 0 other bytes, 0 static bytes NIL No consing, as promised, and the runtime is not "much longer", although this is a relatively crude way of testing it. Alberto
From: Captain Obvious on 17 Jan 2010 16:37 TKP> I really wish that people who post "optimized" CL code would at least TKP> benchmark it. I did not say it is faster, I said that it does not cons. :) And your benchmark pretty much proved it. Think this is merely an excersise rather than an optimization. If one really wants to optimize something, he should try it with his concrete application, implementation and data. And he should probably try different appoaches, as it is too complex to say what is faster theoretically. Now, if we want compare performance, I would change ugly-lookup a bit: (let ((the-cons (cons nil nil))) (defun ugly-lookup (ht k1 k2) (setf (car the-cons) k1 (cdr the-cons) k2) (gethash the-cons ht))) That is, if you do shitloads of lookups, you don't need to bother cleaning up the cons. Now let's compare: CL-USER> (time (dotimes (i 10000000) (ugly-lookup *ht* 5 10))) Evaluation took: 1.83 seconds of real time 1.824114 seconds of user run time 0.004 seconds of system run time 0 calls to %EVAL 0 page faults and 0 bytes consed. NIL CL-USER> (time (dotimes (i 10000000) (lookup *ht* 5 10))) Evaluation took: 1.974 seconds of real time 1.92012 seconds of user run time 0.056003 seconds of system run time [Run times include 0.02 seconds GC run time.] 0 calls to %EVAL 0 page faults and 80,001,544 bytes consed. Now my version is faster by whopping 7%!
From: Günther Thomsen on 17 Jan 2010 22:32 On Jan 17, 12:55 pm, Alberto Riva <ar...(a)nospam.ufl.edu> wrote: > Tamas K Papp wrote: [..] > > I really wish that people who post "optimized" CL code would at least > > benchmark it. > > And it would equally be nice if people who post benchmarks knew how to > perform them. If you don't compile your macros (eg DOTIMES), you're > simply measuring the overhead of expanding them. From the output Tamas posted, it looks like SBCL, which is a compile- only implementation. Further, wouldn't one expect that a (decent implementation of) DOTIMES macro expands exactly once?
From: Günther Thomsen on 17 Jan 2010 22:36
On Jan 17, 8:19 am, Ron Garret <rNOSPA...(a)flownet.com> wrote: > In article > <98666562-db92-4a47-849d-50bf2c3d1...(a)b2g2000yqi.googlegroups.com>, > > > > piscesboy <oraclmas...(a)gmail.com> wrote: > > On Jan 17, 9:32 am, Tamas K Papp <tkp...(a)gmail.com> wrote: > > > On Sun, 17 Jan 2010 13:38:57 +0200, Captain Obvious wrote: > > > > RU> as the key. It conses on every hash lookup, but if that's > > > > RU> acceptable... > > > > > It is pretty easy to avoid consing, in single threaded case, at least. > > > > > (let ((the-cons (cons nil nil))) > > > > (defun lookup-compound-key (ht k1 k2) > > > > (setf (car the-cons) k1 > > > > (cdr the-cons) k2) > > > > (prog1 > > > > (gethash the-cons ht) > > > > (setf (car the-cons) nil > > > > (cdr the-cons) nil)))) > > > > > Thread-safe solution would probably require thread-local variables. > > > > Dude, it is still mid-January, but you are very likely to win the > > > "Optimization of the Year Award" in 2010: it is ugly, and results in a > > > much longer runtime. > > > > (let ((the-cons (cons nil nil))) > > > (defun ugly-lookup (ht k1 k2) > > > (setf (car the-cons) k1 > > > (cdr the-cons) k2) > > > (prog1 > > > (gethash the-cons ht) > > > (setf (car the-cons) nil > > > (cdr the-cons) nil)))) > > > > (defun lookup (ht k1 k2) > > > (gethash (cons k1 k2) ht)) > > > > (defparameter *ht* (make-hash-table :test #'equal)) > > > (setf (gethash (cons 5 10) *ht*) 5) > > > > (time > > > (dotimes (i 10000000) > > > (lookup *ht* 5 10))) > > > > Evaluation took: > > > 0.923 seconds of real time > > > 0.920058 seconds of total run time (0.916058 user, 0.004000 system) > > > [ Run times consist of 0.020 seconds GC time, and 0.901 seconds non-GC > > > time. ] > > > 99.67% CPU > > > 2,330,460,770 processor cycles > > > 80,068,320 bytes consed > > > > (time > > > (dotimes (i 10000000) > > > (ugly-lookup *ht* 5 10))) > > > > Evaluation took: > > > 3.464 seconds of real time > > > 3.452216 seconds of total run time (3.452216 user, 0.000000 system) > > > 99.65% CPU > > > 8,753,522,082 processor cycles > > > 354,272 bytes consed > > > > I really wish that people who post "optimized" CL code would at least > > > benchmark it. > > > > Tamas > > > Strange. The ugly cons lookup had 354,272 bytes consed, yet took 3.464 > > seconds to execute. But the pretty lookup had 80,068,320 bytes consed > > and took fewer processor cycles in 0.923 seconds. > > > The ugly cons had fewer bytes consed than the pretty cons...? > > One of the many reasons it is good to write code in a functional style > is that it allows some important memory-management optimizations. A > good generational GC can be very efficient even for code that conses a > lot as long as the code is purely functional. But as soon as you do a > SETF, the assumptions that underlie those optimizations break and the GC > has to fall back on less efficient techniques. That is why code that > conses more but is written in a purely functional style can often be > faster than code that conses less but mutates data. > > rg True, but also GC can be much faster if there is plenty of available RAM. Under memory pressure GC might not fare quite as well. I'd think it's a safe bet, that Tamas' host had much more than 80,068,320B available. Actually it isn't obvious, whether there was any GC activity at all (clearly the best possible outcome ;-) . |