From: Ron Garret on 18 Jan 2010 00:15 In article <fded6479-a152-4f50-a7da-68706f9d6f42(a)m16g2000yqc.googlegroups.com>, Günther Thomsen <guenthert(a)gmail.com> wrote: > On Jan 17, 8:19 am, Ron Garret <rNOSPA...(a)flownet.com> wrote: > > In article > > <98666562-db92-4a47-849d-50bf2c3d1...(a)b2g2000yqi.googlegroups.com>, > > > > > > > > piscesboy <oraclmas...(a)gmail.com> wrote: > > > On Jan 17, 9:32 am, Tamas K Papp <tkp...(a)gmail.com> wrote: > > > > On Sun, 17 Jan 2010 13:38:57 +0200, Captain Obvious wrote: > > > > > RU> as the key. It conses on every hash lookup, but if that's > > > > > RU> acceptable... > > > > > > > It is pretty easy to avoid consing, in single threaded case, at least. > > > > > > > (let ((the-cons (cons nil nil))) > > > > > (defun lookup-compound-key (ht k1 k2) > > > > > (setf (car the-cons) k1 > > > > > (cdr the-cons) k2) > > > > > (prog1 > > > > > (gethash the-cons ht) > > > > > (setf (car the-cons) nil > > > > > (cdr the-cons) nil)))) > > > > > > > Thread-safe solution would probably require thread-local variables. > > > > > > Dude, it is still mid-January, but you are very likely to win the > > > > "Optimization of the Year Award" in 2010: it is ugly, and results in a > > > > much longer runtime. > > > > > > (let ((the-cons (cons nil nil))) > > > > (defun ugly-lookup (ht k1 k2) > > > > (setf (car the-cons) k1 > > > > (cdr the-cons) k2) > > > > (prog1 > > > > (gethash the-cons ht) > > > > (setf (car the-cons) nil > > > > (cdr the-cons) nil)))) > > > > > > (defun lookup (ht k1 k2) > > > > (gethash (cons k1 k2) ht)) > > > > > > (defparameter *ht* (make-hash-table :test #'equal)) > > > > (setf (gethash (cons 5 10) *ht*) 5) > > > > > > (time > > > > (dotimes (i 10000000) > > > > (lookup *ht* 5 10))) > > > > > > Evaluation took: > > > > 0.923 seconds of real time > > > > 0.920058 seconds of total run time (0.916058 user, 0.004000 system) > > > > [ Run times consist of 0.020 seconds GC time, and 0.901 seconds non-GC > > > > time. ] > > > > 99.67% CPU > > > > 2,330,460,770 processor cycles > > > > 80,068,320 bytes consed > > > > > > (time > > > > (dotimes (i 10000000) > > > > (ugly-lookup *ht* 5 10))) > > > > > > Evaluation took: > > > > 3.464 seconds of real time > > > > 3.452216 seconds of total run time (3.452216 user, 0.000000 system) > > > > 99.65% CPU > > > > 8,753,522,082 processor cycles > > > > 354,272 bytes consed > > > > > > I really wish that people who post "optimized" CL code would at least > > > > benchmark it. > > > > > > Tamas > > > > > Strange. The ugly cons lookup had 354,272 bytes consed, yet took 3.464 > > > seconds to execute. But the pretty lookup had 80,068,320 bytes consed > > > and took fewer processor cycles in 0.923 seconds. > > > > > The ugly cons had fewer bytes consed than the pretty cons...? > > > > One of the many reasons it is good to write code in a functional style > > is that it allows some important memory-management optimizations. A > > good generational GC can be very efficient even for code that conses a > > lot as long as the code is purely functional. But as soon as you do a > > SETF, the assumptions that underlie those optimizations break and the GC > > has to fall back on less efficient techniques. That is why code that > > conses more but is written in a purely functional style can often be > > faster than code that conses less but mutates data. > > > > rg > > True, but also GC can be much faster if there is plenty of available > RAM. Under memory pressure GC might not fare quite as well. I'd think > it's a safe bet, that Tamas' host had much more than 80,068,320B > available. Actually it isn't obvious, whether there was any GC > activity at all (clearly the best possible outcome ;-) . Certainly. But it's an even safer bet that his machine had more than 354,272 bytes available. rg
From: Tamas K Papp on 18 Jan 2010 03:14 On Sun, 17 Jan 2010 23:37:26 +0200, Captain Obvious wrote: > TKP> I really wish that people who post "optimized" CL code would at > least > TKP> benchmark it. > > I did not say it is faster, I said that it does not cons. :) And your > benchmark pretty much proved it. Think this is merely an excersise > rather than an optimization. > > If one really wants to optimize something, he should try it with his > concrete application, implementation and data. And he should probably > try different appoaches, as it is too complex to say what is faster > theoretically. > > Now, if we want compare performance, I would change ugly-lookup a bit: > > (let ((the-cons (cons nil nil))) > (defun ugly-lookup (ht k1 k2) > (setf (car the-cons) k1 > (cdr the-cons) k2) > (gethash the-cons ht))) > > That is, if you do shitloads of lookups, you don't need to bother > cleaning up the cons. > > Now let's compare: > > CL-USER> (time > (dotimes (i 10000000) > (ugly-lookup *ht* 5 10))) > Evaluation took: > 1.83 seconds of real time > 1.824114 seconds of user run time > 0.004 seconds of system run time > 0 calls to %EVAL > 0 page faults and > 0 bytes consed. > NIL > > CL-USER> (time > (dotimes (i 10000000) > (lookup *ht* 5 10))) > Evaluation took: > 1.974 seconds of real time > 1.92012 seconds of user run time > 0.056003 seconds of system run time > [Run times include 0.02 seconds GC run time.] 0 calls to %EVAL > 0 page faults and > 80,001,544 bytes consed. > > Now my version is faster by whopping 7%! Nope. If you do a shitload of lookups, you just inline the bloody thing. For the sake of fairness (haha, see below), let's inline both versions. I evaluated this in SBSL: (declaim (optimize speed) (inline lookup ugly-lookup)) (let ((the-cons (cons nil nil))) (defun ugly-lookup (ht k1 k2) (setf (car the-cons) k1 (cdr the-cons) k2) (gethash the-cons ht))) (defun lookup (ht k1 k2) (gethash (cons k1 k2) ht)) (defparameter *ht* (make-hash-table :test #'equal)) (setf (gethash (cons 5 10) *ht*) 5) (defun test1 () (time (dotimes (i 10000000) (lookup *ht* 5 10)))) (defun test2 () (time (dotimes (i 10000000) (ugly-lookup *ht* 5 10)))) (time (test1)) (time (test2)) I compiled everything of course. And the result is: Evaluation took: 0.024 seconds of real time 0.028001 seconds of total run time (0.024001 user, 0.004000 system) 116.67% CPU 61,855,802 processor cycles 29,064 bytes consed Evaluation took: 2.067 seconds of real time 2.068129 seconds of total run time (2.068129 user, 0.000000 system) 100.05% CPU 5,222,531,767 processor cycles 204,888 bytes consed So lookup is 100x faster (!) than ugly lookup (to be fair, inlining made the ugly version about 25% slower on my machine, but that is a small difference). And it conses very little. Why is this? SBCL tells you that ; in: LET ((THE-CONS (CONS NIL NIL))) ; (LET ((THE-CONS (CONS NIL NIL))) ; (DEFUN UGLY-LOOKUP (HT K1 K2) ; (SETF (CAR THE-CONS) K1 ; (CDR THE-CONS) K2) ; (GETHASH THE-CONS HT))) ; ; note: lexical environment too hairy, can't inline DEFUN UGLY-LOOKUP So there are a few lessons in here: 1) If you want speed, ask the compiler first. (declaim (optimize speed)) gave me roughly 50% speedup on for both versions. 2) If you want more speed, inline simple stuff. 3) Note that 1) and 2) didn't ask you for any hairy, unreadable code, just a few simple declarations. So keep your code simple. People who have developed your compiler optimized for common cases. Don't try to second-guess your compiler without benchmarking. I have been working on a numerical library during the weekend, giving it some final polish. I spend Saturday afternoon ripping out the "optimized" code I put in earlier, it turned out that my "clever" hacks were actually quite stupid: little or no speed gain, compromised maintainability. So I am very skeptical of hacks like you posted: the usually don't improve anything, rather the opposite. That said, I am also very sympathetic, I do stuff that is 10x more stupid sometimes. Cheers, Tamas
From: Tamas K Papp on 18 Jan 2010 03:17 On Sun, 17 Jan 2010 21:15:56 -0800, Ron Garret wrote: > In article > <fded6479-a152-4f50-a7da-68706f9d6f42(a)m16g2000yqc.googlegroups.com>, > Günther Thomsen <guenthert(a)gmail.com> wrote: > >> True, but also GC can be much faster if there is plenty of available >> RAM. Under memory pressure GC might not fare quite as well. I'd think >> it's a safe bet, that Tamas' host had much more than 80,068,320B >> available. Actually it isn't obvious, whether there was any GC activity >> at all (clearly the best possible outcome ;-) . > > Certainly. But it's an even safer bet that his machine had more than > 354,272 bytes available. As we all know, 640k should be enough :-) Tamas
From: Frode V. Fjeld on 18 Jan 2010 03:53 One approach I've tried is something like this: (defun double-hash (table key1 key2) (car (find-if (lambda (x) (and (eq key1 (second x)) (eq key2 (third x)))) (gethash (logior (sxhash key1) (sxhash key2)) table)))) I.e. to hash over some combination of each key's sxhash (taking care not to combine them so as to generate a bignum). Hm.. I guess it can even scale neatly to variable arguments: (defun multi-hash (table &rest keys) (declare (dynamic-extent keys)) (car (find keys (gethash (sxhash keys) table) :test #'equal :key #'cdr))) The corresponding setters are left as an exercise to the reader. (And the code above is untested, sketches really.) I have no idea really how well this actually performs, I'd expect this to vary quite a bit across lisps and the details of how this technique is used. -- Frode V. Fjeld
From: Robert Uhl on 18 Jan 2010 10:11
"Frode V. Fjeld" <frode(a)netfonds.no> writes: > > I.e. to hash over some combination of each key's sxhash (taking care not > to combine them so as to generate a bignum). Note that SXHASH doesn't care about anything more than the top level of a list: cl-user> (sxhash '(1 2 3 (4))) 224526978 cl-user> (sxhash '(1 2 3 (4 5))) 224526978 -- Robert A. Uhl Thesaurus: ancient reptile with an excellent vocabulary. |