From: Strav on
He. I'm somewhat informally reading lectures on algorithm and
stumbling upon disjoint sets, I decided to try a quick and most
probably dirty implementation in common lisp. (in a matter of getting
acquainted to CL) - with union by depth and path compression.

Any comment concerned with bettering this implementation is most
welcomed but more importantly, a few questions arose when I was
thinking about / writing the code, if any of you can shed some light
on them, it would be most appreciated.

(defstruct node (value nil) (parent nil) (root nil) (rank 0))

(defun makeSet (setName)
(make-node :value setName))

(defun link (item)
(cons item nil))

(defun addItem (node item)
(let ((root-node (findRootNode node)))
(incf (node-rank root-node))
(make-node :value item :parent (link root-node))))

(defun root-node-p (node)
(null (node-parent node)))

(defun findRootNode (node)
(if (root-node-p node)
node
(if (null (node-root node))
(let ((root-node (findRootNode (car (node-parent node)))))
(setf (node-root node) (link root-node))
root-node)
(findRootNode (car (node-root node))))))

(defun unionSet (set-A set-B)
(let ((root-A (findRootNode set-A))
(root-B (findRootNode set-B)))
(flet ((mergeSets (root-from root-to)
(setf (node-parent root-from) (link root-to))))
(if (> (node-rank root-A) (node-rank root-B))
(mergeSets root-B root-A)
(mergeSets root-A root-B)))))


Questions:

1.

a) Is it a convention that the set's representative has a pointer to
itself, if yes, can't my solution be acceptable too.

b) In common lisp, How do you define a data structure having a pointer
pointing to that same data structure without falling into a circular
reference?

2.
In the usual path compression scheme, why updating the pointer to the
representative only upon a call to find?
couldn't it be more efficient to do so when inserting a node as in my
code above? (or is it that we insert nodes way more often than we
search)

3.
Given that disjoint sets are implemented using trees where the parent
has no pointer to it's children, if one wishes to list all or some
of the elements member of a set, wouldn't it be better to store the
nodes in a vector. Now, if we store the set's representative at (elt v
0), the worst case for getting to a node will take O(n - 1) getting
to the representative will take constant time, what's then the use of
keeping a tree structure?

Thanks

(you guy be cool, this is my first post here ;)
From: vanekl on
Strav wrote:
> He. I'm somewhat informally reading lectures on algorithm and
> stumbling upon disjoint sets, I decided to try a quick and most
> probably dirty implementation in common lisp. (in a matter of getting
> acquainted to CL) - with union by depth and path compression.
>
> Any comment concerned with bettering this implementation is most
> welcomed but more importantly, a few questions arose when I was
> thinking about / writing the code, if any of you can shed some light
> on them, it would be most appreciated.
>
> (defstruct node (value nil) (parent nil) (root nil) (rank 0))

If you are implementing the classical disjoint-set algorithm implemented as
a tree you don't need both parent and root slots. A parent slot is
sufficient.


> (defun makeSet (setName)
> (make-node :value setName))

Please don't use camel-case when writing CL. CL traditionally separates
words with hyphens.


> (defun link (item)
> (cons item nil))

This 'link' function isn't needed.


> (defun addItem (node item)
> (let ((root-node (findRootNode node)))
> (incf (node-rank root-node))
> (make-node :value item :parent (link root-node))))

This last line could have been written,
(make-node :value item :parent root-node))))


> (defun root-node-p (node)
> (null (node-parent node)))
>
> (defun findRootNode (node)
> (if (root-node-p node)
> node
> (if (null (node-root node))
> (let ((root-node (findRootNode (car (node-parent node)))))

The 'car' isn't needed when 'link' isn't used.

> (setf (node-root node) (link root-node))

The 'link' function isn't needed.

> root-node)
> (findRootNode (car (node-root node))))))
>
> (defun unionSet (set-A set-B)
> (let ((root-A (findRootNode set-A))
> (root-B (findRootNode set-B)))
> (flet ((mergeSets (root-from root-to)
> (setf (node-parent root-from) (link root-to))))

'link' not needed here, either.


> (if (> (node-rank root-A) (node-rank root-B))
> (mergeSets root-B root-A)
> (mergeSets root-A root-B)))))
>
>
> Questions:
>
> 1.
>
> a) Is it a convention that the set's representative has a pointer to
> itself, if yes, can't my solution be acceptable too.

The way you implemented it (using nil) is more common than a
self-referential pointer, but either could be used.


> b) In common lisp, How do you define a data structure having a pointer
> pointing to that same data structure without falling into a circular
> reference?

I show an example of a self-referential pointer in code below. You test with
the "eql" function to make sure you haven't circled back.

> 2.
> In the usual path compression scheme, why updating the pointer to the
> representative only upon a call to find?

Because it's usually more efficient to delay processing until it becomes
absolutely necessary to perform the computation.

> couldn't it be more efficient to do so when inserting a node as in my
> code above?

Your algorithm is slightly less efficient because you perform find-root-node
sooner than when it is absolutely needed and thus you may be performing more
work than some algorithms require.


> (or is it that we insert nodes way more often than we
> search)

That depends on the algorithm that this data structure is used with.


> 3.
> Given that disjoint sets are implemented using trees where the parent
> has no pointer to it's children, if one wishes to list all or some
> of the elements member of a set, wouldn't it be better to store the
> nodes in a vector.

The classic vector doesn't grow. Unless you know the maximum size of the
vector you require, a data structure that can adjust is preferable. Some
vectors are adjustable but they are harder to algorithmically analyze, which
I suspect may be one reason why vectors are typically not used when
pedagogically discussing this data structure. Also, this disjoint data
structure has been proven to be extremely efficient when used with some
graph algorithms (where your additional requirements are not needed).

> Now, if we store the set's representative at (elt v
> 0), the worst case for getting to a node will take O(n - 1) getting

Typically this is written O(n).

> to the representative will take constant time, what's then the use of
> keeping a tree structure?

Easy to analyze; easy to grow; much easier to dynamically adjust lists at
run-time than vectors. Full-set traversal is not a requirement in many graph
algorithms.

>
> Thanks
>
> (you guy be cool, this is my first post here ;)

We be cool.
Here's an alternate implementation that allows for traversing the elements
in the set.
This uses a linked-list implementation instead of trees but is nearly as
fast.
One of the methods requires the "iterate" package.

(defstruct node
(value)
(next)
(root)
(rank 0))

(defun make-set (set-item)
(make-node :value set-item))

(defun add-item (node item)
(let* ((root (find-root-node node))
(new-node (make-node :value item
:root root
:next (node-next root))))
(incf (node-rank root))
(setf (node-next root) new-node)))

(defun last-node (node)
(if (null (node-next node))
node
(last-node (node-next node))))

(defun list-set (node)
(format t "~a " (node-value node))
(if (node-next node)
(list-set (node-next node))))

(defun root-node-p (node)
(null (node-root node)))

(defmethod find-root-node ((node node))
(if (root-node-p node)
node
(setf (node-root node) (find-root-node (node-root node)))))

(defmethod find-node ((sym symbol) (node node))
(if (equalp sym (node-value node))
node
(iter (for next-node initially (node-next node) then (node-next
next-node))
(when (or (null next-node)
(eql next-node node))
(leave))
(finding next-node such-that (equalp sym (node-value next-node))))))

(defun union-set (set-A set-B)
(let ((root-A (find-root-node set-A))
(root-B (find-root-node set-B)))

(flet ((merge-sets (root-from root-to)
(incf (node-rank root-to) (node-rank root-from))
(setf (node-root root-from) root-to
(node-next (last-node root-to)) root-from)))

(if (> (node-rank root-A)
(node-rank root-B))
(merge-sets root-B root-A)
(merge-sets root-A root-B)))))


;; This example comes out of Cormen, Leiserson, & Rivest, Intro. to
Algorithms, Chap. 22.
(let ((c (make-set 'c))
(f (make-set 'f)))
;; in cases where there is a loop, stop from printing in circles.
(setf *print-circle* t *print-length* 256 *print-level* 2)

;; you can make a node self-referential, but it adds no value to the
algo.
(setf (node-root c) c)
;; and test for it with 'eql'
(format t "self-referential root: ~s~%" (eql (node-root c) c))
(setf (node-root c) nil) ;; reset back to normal

(add-item c 'h)
(add-item c 'e)
(add-item c 'b)

(add-item f 'g)
(add-item f 'd)

(union-set c f)

(format t "~%LIST SET F~%")
(list-set f)
(format t "~%LIST SET F/ROOT~%")
(list-set (find-root-node f))

(format t "~%LIST SET C~%")
(list-set (find-root-node c))
(terpri)
(format t "~%find-node 'f: ~a~%" (find-node 'f f))
(format t "~%find-node 'e: ~a~%" (find-node 'e c)))

OUTPUT:
self-referential root: T

LIST SET F
F D G
LIST SET F/ROOT
C B E H F D G
LIST SET C
C B E H F D G

find-node 'f: #1=#S(NODE :VALUE F :NEXT #S(NODE :VALUE D :NEXT # :ROOT #1#
:RANK 0) :ROOT #S(NODE :VALUE C :NEXT # :ROOT NIL :RANK 5) :RANK 2)

find-node 'e: #S(NODE :VALUE E :NEXT #S(NODE :VALUE H :NEXT # :ROOT #1=#
:RANK 0) :ROOT #1# :RANK 0)

If you want to see another cool way to handle sets, check out,
http://research.swtch.com/2008/03/using-uninitialized-memory-for-fun-and.html
This method uses dual vectors instead of trees or lists.


From: Strav on
He. Thanks for the quality of your comments on either the code or my
questions.
I now know that structs store values by reference and that consing
isn't needed. I should have tested that.

As for my question, it's pretty much what I suspected.

Btw, this sparse array scheme you linked at the end is indeed a nice
trick.

From: Thomas A. Russ on
Strav <essorcreations(a)gmail.com> writes:
> b) In common lisp, How do you define a data structure having a pointer
> pointing to that same data structure without falling into a circular
> reference?

Normally you would just have it point to itself. Circular data
structures are not, in general, a problem in Common Lisp. If you end up
printing out circular items, though, it is a good idea to make sure you
have the printer variable *PRINT-CIRCLE* set to T, so that the reader
will print things properly.

Consider:

(setq *print-circle* t)
(defstruct circular self (id (gentemp)))

(make-circular) => #S(CIRCULAR :SELF NIL :ID T0)

(let ((c (make-circular)))
(setf (circular-self c) c)
c)
=> #1=#S(CIRCULAR :SELF #1# :ID T1)

#1=#S(circular :self #1#)
=> #1=#S(CIRCULAR :SELF #1# :ID T2)

#1=#S(circular :self #1# :id MY-ID)
=> #1=#S(circular :self #1# :id MY-ID)

--
Thomas A. Russ, USC/Information Sciences Institute