Persistent arrays [Lisp]

Prev: Stop the Heat from Sun Rays!
Next: GSLL, adjusting marrays

From: Slobodan Blazeski on 29 Mar 2010 14:08

On Mar 29, 7:54 am, "vanekl" <va...(a)acd.net> wrote:
> Slobodan Blazeski wrote:
> > Currently my system behaves like this:
> > 1. I create array FOO and do some work with it, then I save it
> > 2. The system stores it in some file FOO-2010-25-13h30min23sec.out
> > 3. Tomorrow I restore my FOO array (systems reads last known file )
> > then I modify it (add, delete and change elements) then save it
> > 4 The system stores it in other file say FOO-2010-26-17h12min23sec.out
> > 5 In some other session I need that array again and system reads the
> > last file FOO-2010-26-17h12min23sec.out and restores the array
>
> > Every time I save I create a new file in order to preserve the data if
> > something odd happens (system crash, power surge etc).
> > Every time I restore I read the last saved file. Now I'm looking for
> > a better storage mechanism since its very stupid saving the whole file
> > again and again even if I modified a single element.
> > I'm currently thinking should I:
>
> > A save the whole array once then use deltas:
> > 1. Save foo #(1 2 3 4)
> > foo.out
> > ;;; name : foo
> > ;; elements 1 2 3 4
> > 2. Add element to foo #(1 2 3 4 5) then save
> > foo-1.out
> > ;;; add 5 to foo
> > 3. Change first element to 999 then save foo #(999 2 3 4 5)
> > foo-2.out
> > ;;; change index 1 to 999
> > ...
> > So when I restore I will have to redo all the operations
>
> > B Break array to small pieces then only update those that were changed
> > similar to what Git is doing keeping just one copy of the file then
> > using the hash code to see did it changed.
>
> >> If you change the data occasionally, I would just
> >> use git or similar.
> > I want portable lisp solution with no external dependencies.
>
> As an exercise I wrote a chunked array that can save itself to file.
> Located athttp://paste.lisp.org/+22UP
> It's commented.

Many thanks for the code I have now few strategies in mind so its time
to find the one with most potential.

Slobodan

From: Slobodan Blazeski on 29 Mar 2010 14:17

On Mar 29, 2:22 pm, Jochen Schmidt <j...(a)crispylogics.com> wrote:
> On 2010-03-27 18:03:34 +0100, Slobodan Blazeski said:
>
> > I'm working with arrays that are one dimensional,adjustable, hold
> > only one type of element and could get quite long(*). I need a way to
> > save them in the file system after doing some work with them (like
> > inserting , deleting and/or changing elements). Quick test of cl-store
> > for storing an array of 10 000 000 fixnums takes 31.8 MB and the > 3
> > sec to run. So what would you suggest as a pure lisp storing
> > strategy?
>
> How about using a functional (persistent) vector? I've written a simple
> CL implementation of clojures persistent vectors for a published
> (german) article. In this case "persistent" actually means "full
> history is available"; but one certainly could employ a scheme were the
> new nodes are written to disc and a system to reconstruct the
> persistent vector from disc into memory.

Currently I'm in pursuit of that scheme of how the new nodes should be
written to disc and restored from it. Could you post a link to your
article I didn't have much luck finding while reading your blog posts.

Slobodan
> ciao,
> Jochen
>
> --
> Jochen Schmidt
> CRISPYLOGICS
> Uhlandstr. 9, 90408 Nuremberg
>
> Fon +49 (0)911 517 999 82
> Fax +49 (0)911 517 999 83
>
> mailto:(format nil "~(~36r@~36r.~36r~)" 870180 1680085828711918828 16438)http://www.crispylogics.com

From: Slobodan Blazeski on 29 Mar 2010 14:21

On Mar 28, 5:01 pm, Johan Ur Riise <jo...(a)riise-data.no> wrote:
>
> This is exactly what cl-prevalence do for you automatically, but
> your requirement of 10 million objects is too difficult for
> cl-prevalence, 100 000 is a rough practical limit.

I've expected that that's why I was looking for a special tool for my
needs. At the worst case I could always split the arrays into
something that cl-prevalence could handle.

Slobodan

From: Alex Mizrahi on 29 Mar 2010 14:32

??>> I mean you can't just write (setf (aref my-array 1 2) 3) and
??>> cl-prevalence automatically records it into transaction, can you?

JUR> In what persistency system can you do that?

I don't know any which allows you to do this exactly.
But that doesn't mean cl-prevalence automatically wins.
I think we should compare it to ad-hoc solution.

JUR> For cl-prevalence, you write exactly the setf as above, but you have
JUR> to instrument the function containing the write.

I'd rather have a persistence wrapper which provides API which is very close
to CL array API and does all persistence under the hood.

As Tim have noted, it is pretty easy to make such wrapper.
I've made one for the purpose of comparison.
Here's how it looks like:

(open-store #p"my.store")

(defvar *myarray* (make-persistent-array '*myarray* '(2 2)))

(adjust-array-dimensions *myarray* '(3 3))

(dotimes (i 3)
(dotimes (j 3)
(setf (paref *myarray* i j) (+ (* 10 i) j))))

(dotimes (i 3)
(dotimes (j 3)
(princ (paref *myarray* i j))
(princ " "))
(terpri))

(close-store)

Then you can load store and automatically bind global variables if you like
it:

(open-store #p"my.store" :auto-set-globals t)

And work with it:

(dotimes (i 3)
(dotimes (j 3)
(princ (paref *myarray* i j))
(princ " "))
(terpri))

Maybe not ideal (that's what I've hacked together in half of hour), but
pretty close.

Now here is a persistence implementation without cl-prevalence:

http://github.com/killerstorm/cl-db-comparison/blob/master/parray/simple.lisp

And with cl-prevalence:
http://github.com/killerstorm/cl-db-comparison/blob/master/parray/prevalence.lisp

And here is test code which is reproduced above:
http://github.com/killerstorm/cl-db-comparison/blob/master/parray/test.lisp

Without cl-prevalence it is 71 lines (59 sloc) 2.327 kb
With cl-prevalence it is 50 lines (37 sloc) 1.831 kb

We can see that cl-prevalence made code somewhat shorter, but still it is
comparable in size.
Of course, cl-prevalence implements additional features like snapshots and
rollbacks.
But custom, ad-hoc implementation gives more control and opens possibilities
of custom features.

Here's how prevalence transaction log looks like:

(:OBJECT 1 :CLASS CL-PREVALENCE::TRANSACTION :SLOTS ( (CL-PREVALENCE::ARGS .
(:SEQUENCE 2 :CLASS CL:LIST :SIZE 2 :ELEMENTS ( COMMON-LISP-USER::*MYARRAY*
(:SEQUENCE 3 :CLASS CL:LIST :SIZE 2 :ELEMENTS ( 2 2 ) ) ) )) (CL:FUNCTION .
COMMON-LISP-USER::TX-MAKE-PERSISTENT-ARRAY) ) )
(:OBJECT 1 :CLASS CL-PREVALENCE::TRANSACTION :SLOTS ( (CL-PREVALENCE::ARGS .
(:SEQUENCE 2 :CLASS CL:LIST :SIZE 2 :ELEMENTS ( COMMON-LISP-USER::*MYARRAY*
(:SEQUENCE 3 :CLASS CL:LIST :SIZE 2 :ELEMENTS ( 3 3 ) ) ) )) (CL:FUNCTION .
COMMON-LISP-USER::TX-ADJUST-ARRAY-DIMENSIONS) ) )
(:OBJECT 1 :CLASS CL-PREVALENCE::TRANSACTION :SLOTS ( (CL-PREVALENCE::ARGS .
(:SEQUENCE 2 :CLASS CL:LIST :SIZE 3 :ELEMENTS ( COMMON-LISP-USER::*MYARRAY*
0 (:SEQUENCE 3 :CLASS CL:LIST :SIZE 2 :ELEMENTS ( 0 0 ) ) ) )) (CL:FUNCTION
.. COMMON-LISP-USER::TX-SETF-PAREF) ) )
(:OBJECT 1 :CLASS CL-PREVALENCE::TRANSACTION :SLOTS ( (CL-PREVALENCE::ARGS .
(:SEQUENCE 2 :CLASS CL:LIST :SIZE 3 :ELEMENTS ( COMMON-LISP-USER::*MYARRAY*
1 (:SEQUENCE 3 :CLASS CL:LIST :SIZE 2 :ELEMENTS ( 0 1 ) ) ) )) (CL:FUNCTION
.. COMMON-LISP-USER::TX-SETF-PAREF) ) )

And here's how "simple"'s log likes like:

(:MAKE-PERSISTENT-ARRAY *MYARRAY* (2 2))
(:ADJUST-ARRAY-DIMENSIONS *MYARRAY* (3 3))
(:SETF-PAREF *MYARRAY* 0 (0 0))
(:SETF-PAREF *MYARRAY* 1 (0 1))

JUR> How does the system know which objects to persist? What changes
JUR> should be recorded? What is the scope of the transaction?

As a sidenote, actually Elephant supports something close to that
out-of-box -- you can use ele:btree instead of arrays.
Then it looks like this:

(setf (get-value 5 *my-array*) 5.0)

Or

(setf (get-value '(3 4) *my-array*) 5.0)

It is a key-value store, so it doesn't have dimensions at all, but for many
applications it will be as good as array.
But I'm afraid it's not practical because of a huge overhead.

I can answer you questions:

JUR> How does the system know which objects to persist?

Instances of ele:btree and

JUR> What changes should be recorded?

Ones which were introduced via (setf get-value). All of them.

JUR> What is the scope of the transaction?

Either one operation of (setf get-value), or you can do it manually:

(with-transaction ()
(setf (get-value 5 *my-array*) 5.0)
(setf (get-value '(3 4) *my-array*) 5.0))

By the way, it also supports multi-threading and isolation.

JUR> Yes, that is the instrumentation. It is exactly the same for all
JUR> transactions, by the way.

I'd hate to instrument ALL my code for persistence -- I'd prefer to have
persistence abtracted-out and encapsulated.

JUR> That is right, you arrange the operations on your data in functions,
JUR> where the function represents the unit of work regarding the
JUR> atomicity, consistency, durability traits of the system.

But you said before (you or Raffael, I don't remember who said what exactly)
that cl-prevalence is totally unobtrusive, so you can keep all your code and
add persistence in some "automatic" way.

And now it turns out that you need to instrument all you code, and,
moreover, rearrange operations on your data. Probably rewrite some portions
if they don't fit cl-prevalence model...

I just want you to mention this when you mention cl-prevalence. That it's
not automatic at all.

But other persistence solutions can be pretty automatic. Compare this to
Elephant -- it does not require code rearrangement and instrumentation, in a
lot of cases code which uses CLOS can be exactly the same with and without
persistence.

(And, by the way, it's not true that Elephant can't work with lists or
arrays or stuff like that -- it can work with them, it's just that they need
to be attached somewhere.)

JUR> (defun tx-array-change-element (system index1 index2 new-value)
JUR> (let ((array (cl-prevalence:get-root-object system :array)))
JUR> (setf (aref array index1 index2) new-value)))

That's pretty verbose, don't you think so? I'd absolutely hate to write this
for each function which modifies something.
That's why I think wrapper above is much better.

JUR> You have to take responsibility for your own feelings yourself.

I'm just asking you guys to be more honest in your propaganda -- don't
forget to mention that cl-prevalence requires considerable changes to code
to add persistence.
Each persistence solution is a trade-off. Lack of support for "ten million
objects" is not the only weak part of cl-prevalence -- it totally lack query
language, requires code instrumentation etc.

JUR> I write this for others also. Since I like the system and think it is
JUR> useful for others, I think people should not have to rely on
JUR> desinformation from someone who does not understand the system, but
JUR> who pretends that he has tried it. Thank you for the opportunity.

I don't want people to rely on desinformation too. You and especially
Raffael paint other persistence solutions as heavy-weight and cumbersome,
and cl-prevalence is all good and elegant.
Just make it fair, mention trade-offs.

By the way, I'm going to make persistence solution comparison with concrete
code examples, to show trade-offs of different persistence solutions. You
guys have provoked me to do it, especially Raffael -- I've found that it's
just impossible to argue with him without concrete code examples...

From: Johan Ur Riise on 29 Mar 2010 15:28

"Alex Mizrahi" <udodenko(a)users.sourceforge.net> writes:

> ??>> I mean you can't just write (setf (aref my-array 1 2) 3) and
> ??>> cl-prevalence automatically records it into transaction, can you?
>
> JUR> In what persistency system can you do that?
>
> I don't know any which allows you to do this exactly.

Yeah, right.

> But that doesn't mean cl-prevalence automatically wins.
> I think we should compare it to ad-hoc solution.
>
> JUR> For cl-prevalence, you write exactly the setf as above, but you have
> JUR> to instrument the function containing the write.
>
> I'd rather have a persistence wrapper which provides API which is very
> close to CL array API and does all persistence under the hood.
>
> As Tim have noted, it is pretty easy to make such wrapper.

> I've made one for the purpose of comparison.
> Here's how it looks like:
>
> (open-store #p"my.store")
>
> (defvar *myarray* (make-persistent-array '*myarray* '(2 2)))
This functions leads me to think that your persistence layer
uses a strategy different from cl-prevalence. Could be interesting,
but that also means that cl-prevalence is relevant, even if your
implementation is high quality.
>
> (adjust-array-dimensions *myarray* '(3 3))
>
>
> (dotimes (i 3)
> (dotimes (j 3)
> (setf (paref *myarray* i j) (+ (* 10 i) j))))
>
> (dotimes (i 3)
> (dotimes (j 3)
> (princ (paref *myarray* i j))
> (princ " "))
> (terpri))
>
> (close-store)
>
> Then you can load store and automatically bind global variables if you
> like it:
>
> (open-store #p"my.store" :auto-set-globals t)
>
> And work with it:
>
> (dotimes (i 3)
> (dotimes (j 3)
> (princ (paref *myarray* i j))
> (princ " "))
> (terpri))
>
> Maybe not ideal (that's what I've hacked together in half of hour),
> but pretty close.
>
> Now here is a persistence implementation without cl-prevalence:
>
> http://github.com/killerstorm/cl-db-comparison/blob/master/parray/simple.lisp
>
> And with cl-prevalence:
> http://github.com/killerstorm/cl-db-comparison/blob/master/parray/prevalence.lisp
>
> And here is test code which is reproduced above:
> http://github.com/killerstorm/cl-db-comparison/blob/master/parray/test.lisp
>
> Without cl-prevalence it is 71 lines (59 sloc) 2.327 kb
> With cl-prevalence it is 50 lines (37 sloc) 1.831 kb
>
> We can see that cl-prevalence made code somewhat shorter, but still it
> is comparable in size.
> Of course, cl-prevalence implements additional features like snapshots
> and rollbacks.
> But custom, ad-hoc implementation gives more control and opens
> possibilities of custom features.
>
> Here's how prevalence transaction log looks like:
>
> (:OBJECT 1 :CLASS CL-PREVALENCE::TRANSACTION :SLOTS (
> (CL-PREVALENCE::ARGS . (:SEQUENCE 2 :CLASS CL:LIST :SIZE 2 :ELEMENTS (
> COMMON-LISP-USER::*MYARRAY* (:SEQUENCE 3 :CLASS CL:LIST :SIZE 2
> :ELEMENTS ( 2 2 ) ) ) )) (CL:FUNCTION
> . COMMON-LISP-USER::TX-MAKE-PERSISTENT-ARRAY) ) )
> (:OBJECT 1 :CLASS CL-PREVALENCE::TRANSACTION :SLOTS (
> (CL-PREVALENCE::ARGS . (:SEQUENCE 2 :CLASS CL:LIST :SIZE 2 :ELEMENTS (
> COMMON-LISP-USER::*MYARRAY* (:SEQUENCE 3 :CLASS CL:LIST :SIZE 2
> :ELEMENTS ( 3 3 ) ) ) )) (CL:FUNCTION
> . COMMON-LISP-USER::TX-ADJUST-ARRAY-DIMENSIONS) ) )
> (:OBJECT 1 :CLASS CL-PREVALENCE::TRANSACTION :SLOTS (
> (CL-PREVALENCE::ARGS . (:SEQUENCE 2 :CLASS CL:LIST :SIZE 3 :ELEMENTS (
> COMMON-LISP-USER::*MYARRAY* 0 (:SEQUENCE 3 :CLASS CL:LIST :SIZE 2
> :ELEMENTS ( 0 0 ) ) ) )) (CL:FUNCTION
> . COMMON-LISP-USER::TX-SETF-PAREF) ) )
> (:OBJECT 1 :CLASS CL-PREVALENCE::TRANSACTION :SLOTS (
> (CL-PREVALENCE::ARGS . (:SEQUENCE 2 :CLASS CL:LIST :SIZE 3 :ELEMENTS (
> COMMON-LISP-USER::*MYARRAY* 1 (:SEQUENCE 3 :CLASS CL:LIST :SIZE 2
> :ELEMENTS ( 0 1 ) ) ) )) (CL:FUNCTION
> . COMMON-LISP-USER::TX-SETF-PAREF) ) )
>
>
> And here's how "simple"'s log likes like:
>
> (:MAKE-PERSISTENT-ARRAY *MYARRAY* (2 2))
> (:ADJUST-ARRAY-DIMENSIONS *MYARRAY* (3 3))
> (:SETF-PAREF *MYARRAY* 0 (0 0))
> (:SETF-PAREF *MYARRAY* 1 (0 1))
>
>
> JUR> How does the system know which objects to persist? What changes
> JUR> should be recorded? What is the scope of the transaction?
>
> As a sidenote, actually Elephant supports something close to that
> out-of-box -- you can use ele:btree instead of arrays.
> Then it looks like this:
>
> (setf (get-value 5 *my-array*) 5.0)
>
> Or
>
> (setf (get-value '(3 4) *my-array*) 5.0)
>
> It is a key-value store, so it doesn't have dimensions at all, but for
> many applications it will be as good as array.
> But I'm afraid it's not practical because of a huge overhead.
>
> I can answer you questions:
>
> JUR> How does the system know which objects to persist?
>
> Instances of ele:btree and
As I said earlier, not deriving from a special "persistence" base class
is exactly the interesting bit about cl-prevalence, you can persist
any value, also integer and cons.
>
> JUR> What changes should be recorded?
>
> Ones which were introduced via (setf get-value). All of them.
>
> JUR> What is the scope of the transaction?
>
> Either one operation of (setf get-value), or you can do it manually:
>
> (with-transaction ()
> (setf (get-value 5 *my-array*) 5.0)
> (setf (get-value '(3 4) *my-array*) 5.0))
>
> By the way, it also supports multi-threading and isolation.
So you need to to do a bit more than just setf. What a surprise.
>
> JUR> Yes, that is the instrumentation. It is exactly the same for all
> JUR> transactions, by the way.
>
> I'd hate to instrument ALL my code for persistence -- I'd prefer to
> have persistence abtracted-out and encapsulated.
>
> JUR> That is right, you arrange the operations on your data in functions,
> JUR> where the function represents the unit of work regarding the
> JUR> atomicity, consistency, durability traits of the system.
>
> But you said before (you or Raffael, I don't remember who said what
> exactly) that cl-prevalence is totally unobtrusive, so you can keep
> all your code and add persistence in some "automatic" way.

No, didn't say that.

>
> And now it turns out that you need to instrument all you code, and,
> moreover, rearrange operations on your data. Probably rewrite some
> portions if they don't fit cl-prevalence model...
>
> I just want you to mention this when you mention cl-prevalence. That
> it's not automatic at all.
>
> But other persistence solutions can be pretty automatic. Compare this
> to Elephant -- it does not require code rearrangement and
> instrumentation, in a lot of cases code which uses CLOS can be exactly
> the same with and without persistence.
Elephant is interesting, but has a strategy different from cl-prevalence.
>
> (And, by the way, it's not true that Elephant can't work with lists or
> arrays or stuff like that -- it can work with them, it's just that
> they need to be attached somewhere.)
>
> JUR> (defun tx-array-change-element (system index1 index2 new-value)
> JUR> (let ((array (cl-prevalence:get-root-object system :array)))
> JUR> (setf (aref array index1 index2) new-value)))
>
> That's pretty verbose, don't you think so?
Not really.
> I'd absolutely hate to
> write this for each function which modifies something.
> That's why I think wrapper above is much better.
>
> JUR> You have to take responsibility for your own feelings yourself.
>
> I'm just asking you guys to be more honest in your propaganda
This is usenet.
> -- don't
> forget to mention that cl-prevalence requires considerable changes to
> code to add persistence.
It doesn't.
> Each persistence solution is a trade-off. Lack of support for "ten
> million objects" is not the only weak part of cl-prevalence -- it
> totally lack query language, requires code instrumentation etc.
It uses CL as a query language.
>
> JUR> I write this for others also. Since I like the system and think it is
> JUR> useful for others, I think people should not have to rely on
> JUR> desinformation from someone who does not understand the system, but
> JUR> who pretends that he has tried it. Thank you for the opportunity.
>
> I don't want people to rely on desinformation too. You and especially
> Raffael paint other persistence solutions as heavy-weight and
> cumbersome,
I talked only about SQL databases, in comparison with cl-prevalence.
> and cl-prevalence is all good and elegant.
> Just make it fair, mention trade-offs.
That is _your_ plan, and it could be good.
>
> By the way, I'm going to make persistence solution comparison with
> concrete code examples, to show trade-offs of different persistence
> solutions. You guys have provoked me to do it, especially Raffael --
> I've found that it's just impossible to argue with him without
> concrete code examples...
I hope you will not misrepresent cl-prevalence too badly.

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: Stop the Heat from Sun Rays!
Next: GSLL, adjusting marrays