Counting occurences [Lisp]

Prev: Black-Jewish History FAQ - Is The Secret Relationship between blacks and jews a “hate literature”?
Next: which library to generate xml with namespaces?

From: Tim Bradshaw on 18 Jun 2010 11:09

On 2010-06-17 21:46:23 +0100, Eric Wolf said:

> And I thought: "Hmm that doesn't look elegant" but I'm not used to
> recursion and functional programming and that stuff so I didn't
> came up with a better version.

This is kind of a counterclaim to the "use some complicated abstract
data structure" stuff (See Gabriel on that). Depends on tail-call
elimintation if you want to use it on long lists (and of course in real
life you would use alist functions not roll your own as here).

(defun occ (list &key (test #'eql))
(labels ((oc (lt al)
(if (null lt)
al
(oc (rest lt) (oca (first lt) al al))))
(oca (s alt al)
(cond ((null alt)
(cons (cons s 1) al))
((funcall test s (car (first alt)))
(incf (cdr (first alt)))
al)
(t (oca s (rest alt) al)))))
(oc list nil)))

From: RG on 18 Jun 2010 11:22

In article <4c1b7527$0$274$14726298(a)news.sunsite.dk>,
"Captain Obvious" <udodenko(a)users.sourceforge.net> wrote:

> R> IMHO a better way to approach this, particular for beginners, is to use
> R> abstract associative maps rather than association lists. A CL
> R> implementation of abstract associative maps can be found here:
>
> What's wrong with hash-tables which are in the standard?
>
> R> (defun count-items (list)
> R> (let ((c (make-instance 'counter)))
> R> (dolist (item list) (add c item))
> R> c))
>
> (defun count-items (list)
> (let ((ht (make-hash-table)))
> (dolist (item list)
> (incf (gethash item ht 0)))
> ht))
>
> The only difference I see is that GETHASH accessor has default value and you
> can use INCF directly on it, no need for a goddamn additional function for
> that.

Wow, you're the second person to react violently to the lack of a setf
expander for REF with a default argument. Why is that so important? It
seems like such a trivial issue to me. You can already do this:

(defmethod add ((c counter) item)
(incf (refd c item 0)))

if the explicit call really rubs you the wrong way, and making it work
for REF would be trivial.

But to answer your question directly:

> What's wrong with hash-tables

It's a premature optimization. What if hash tables are not the right
data structure? What if it turns out that a red-black tree would be
more efficient for the particular data you are processing? Or what if
you have too much data to fit in main memory and you want to use a
database as a backing store? If you've hard-coded in calls to GETHASH
you now have to change ALL your code. If you use abstract associative
maps you only have to change the underlying implementation. You can
even do this at run time:

? (setf c (make-instance 'counter))
#<COUNTER HASH-TABLE { }>
? (add c 'foo)
1
? (add c 'baz)
1
? (add c 'baz)
2
? c
#<COUNTER HASH-TABLE { BAZ 2 FOO 1 }>
? (describe (slot-value c 'implementation))
#<HASH-TABLE :TEST EQUAL size 2/60 #x3020019B5CCD>
Type: HASH-TABLE
Class: #<BUILT-IN-CLASS HASH-TABLE>
TYPE: (HASH-TABLE . #<CCL::CLASS-WRAPPER HASH-TABLE #x3000400296BD>)
NHASH.KEYTRANSF: #<Compiled-function CCL::%%EQUALHASH #x3000000AA3AF>
NHASH.COMPAREF: #<Compiled-function EQUAL #x30000007B23F>
NHASH.REHASH-BITS: NIL
NHASH.VECTOR: #<HASH-TABLE-VECTOR #x3020019B5DCD>
NHASH.LOCK: 524288
NHASH.OWNER: NIL
NHASH.GROW-THRESHOLD: 58
NHASH.REHASH-RATIO: 1.1764705
NHASH.REHASH-SIZE: 1.5
NHASH.PUTHASH-COUNT: 0
NHASH.EXCLUSION-LOCK: #<RECURSIVE-LOCK [ptr @ #xB8EB0C0] #x3020019B5D4D>
NHASH.FIND: #<Compiled-function CCL::GENERAL-HASH-FIND #x3000000AD27F>
NHASH.FIND-NEW: #<Compiled-function CCL::GENERAL-HASH-FIND-FOR-PUT
#x3000000ADF4F>
NHASH.READ-ONLY: NIL
? (change-implementation c 'plist)
#<COUNTER PLIST { FOO 1 BAZ 2 }>
? (describe (slot-value c 'implementation))
#<PLIST #x302001A3ED2D>
Class: #<STANDARD-CLASS PLIST>
Wrapper: #<CCL::CLASS-WRAPPER PLIST #x3020018AD7CD>
Instance slots
PLIST: (FOO 1 BAZ 2)
?

You can even make an abstract associative map that automatically chooses
an underlying implementation based on run-time profiling without
changing your client code.

rg

From: Tim Bradshaw on 18 Jun 2010 11:53

On 2010-06-18 16:22:29 +0100, RG said:

> You can even make an abstract associative map that automatically chooses
> an underlying implementation based on run-time profiling without
> changing your client code.

I think there's really two approaches here:

You can go for the "huge complicated reusable library which does
everything" approach. This is what Java does, and is what you're
proposing. This is great, except the library will probably not
actually do everything, and will turn out to have design deficiencies
which make it eithr annoying or impossible to use. If you are using a
minority language, then the library will not be very well tested
because too few people will be using it. It will fail at crucial
moments with results varying from you getting woken up at 3AM every day
to death. Your program will also become enormous, you will not be able
to understand why it performs so badly, and you will spend the rest of
your life trying to understand the library, smelling increasingly bad
and eventually dying on a street corner somewhere.

Or you can use the facilities provided by the language, and build
things which work for your application without stressing too much about
making them do everything. You will be laughed at by your peers, but
your program will work, you will understand it, and it will perform
well. You will become rich, and will be able to spend your time
foisting huge complicated reusable libraries on your former peers, who
are now your slaves, and laughing at them as they starve in the gutter
while trying to understand a million pages of documentation, which you
change every two weeks.

Very strong-willed people can achieve the latter even when using a
language such as Java which is predicated around the former.

From: Captain Obvious on 18 Jun 2010 12:11

??>> The only difference I see is that GETHASH accessor has default value
??>> and you can use INCF directly on it, no need for a goddamn additional
??>> function for that.

R> Wow, you're the second person to react violently to the lack of a setf
R> expander for REF with a default argument. Why is that so important? It
R> seems like such a trivial issue to me. You can already do this:

R> (defmethod add ((c counter) item)
R> (incf (refd c item 0)))

R> if the explicit call really rubs you the wrong way, and making it work
R> for REF would be trivial.

Yes, it is trivial issue, but when problem itself is so simple, the only
things which are left are trivial issues.

R> But to answer your question directly:

??>> What's wrong with hash-tables

R> It's a premature optimization.

Using something else other than hash-tables is a premature optimization!

Because they are in standard, they work reasonably well for a wide variety
of situations and have quite sane API.

Each line of code which is unnecesarry and does not do anything useful is
over-engineering and premature optimization.

In your example those are:

(require :dictionary)

(defclass counter (dictionary) ())

(defmethod add ((c counter) item)
(setf (ref c item) (1+ (ref c item 0))))

Four unnecessary lines vs four useful lines. So half of your code was
over-engineering/premature optimization.

R> What if hash tables are not the right data structure?

You can think about it when you have performance problems.

I'd recommend you to read something on premature optimization, you've got it
not just wrong, but in reverse!

R> What if it turns out that a red-black tree would be more efficient for
R> the particular data you are processing?

Then I'll refactor my code, but I'll not waste my time thinking about all
possible situations before I need that.

R> Or what if you have too much data to fit in main memory and you want
R> to use a database as a backing store? If you've hard-coded in calls to
R> GETHASH you now have to change ALL your code.

If it that bad, packages can do that.

R> If you use abstract associative maps you only have to change the
R> underlying implementation. You can even do this at run time:

It is cool, but it is over-engineering.
Over-engineering usually looks cool.

R> You can even make an abstract associative map that automatically chooses
R> an underlying implementation based on run-time profiling without
R> changing your client code.

This is over-engineering classic.

From: Captain Obvious on 18 Jun 2010 12:39

TB> You can go for the "huge complicated reusable library which does
TB> everything" approach. This is what Java does, and is what you're
TB> proposing. This is great, except the library will probably not
TB> actually do everything, and will turn out to have design deficiencies
TB> which make it either annoying or impossible to use.

Definitely.

When I was working as a C++ programmer I used STL and Boost libraries which
are meant to be very general, abstract and flexible, but unfortunately they
are often PITA to use and do not have useful functionality people need.

I think there might be a fundamental rule which imposes limit on usefulness
of "huge complicated reusable" libraries.
It might be related to Kolmogorov complexity and its invariance theorem.

There is a good discussion of reusability in Richard P. Gabriel's "Patterns
of Software" book.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Prev: Black-Jewish History FAQ - Is The Secret Relationship between blacks and jews a “hate literature”?
Next: which library to generate xml with namespaces?