Persistent arrays [Lisp]

Prev: Stop the Heat from Sun Rays!
Next: GSLL, adjusting marrays

From: Alex Mizrahi on 28 Mar 2010 09:30

SB> Basically I don't want to touch data after it has
SB> been written.

Basically, if you don't want to touch data after it has been written, don't
touch it, ok?

You absolutely do not need a database to avoid touching data.
Except, maybe, a database with access control. So when you'll try to touch
your data, it will say you
"Oh, Slobodan, are you trying to touch your data again? Stop doing that, you
nasty boy!".

From: Slobodan Blazeski on 28 Mar 2010 10:49

On Mar 27, 8:48 pm, Tamas K Papp <tkp...(a)gmail.com> wrote:
> On Sat, 27 Mar 2010 11:03:05 -0700, Slobodan Blazeski wrote:
> > On Mar 27, 6:43 pm, Tamas K Papp <tkp...(a)gmail.com> wrote:
> >> On Sat, 27 Mar 2010 10:03:34 -0700, Slobodan Blazeski wrote:
> >> > I'm working with arrays that are one dimensional,adjustable, hold
> >> > only one type of element and could get quite long(*). I need a way to
> >> > save them in the file system after doing some work with them (like
> >> > inserting , deleting and/or changing elements). Quick test of
> >> > cl-store for storing an array of 10 000 000 fixnums takes 31.8 MB
> >> > and the > 3 sec to run. So what would you suggest as a pure lisp
> >> > storing strategy? (*) Sample array:
> >> > (make-array 10000000 :element-type 'fixnum :initial-element 19
> >> > :fill-
> >> > pointer t :adjustable t)
>
> >> Have you tried writing it our in binary form? Ie take the fixnums,
> >> slice them up as bytes, open a byte stream, etc. This is the fastest
> >> portable solution I can think of.
>
> >> On SBCL, you might want to try
>
> >>http://github.com/nikodemus/sb-vector-io
>
> >> Tamas
>
> > Thanks for the link but my problem is not the size of the stored files
> > (though its certainly nice to save space) but rather I want to achieve
> > some sort of the safety that databases provide. Something like cl-
> > prevalence and AllegroCache do for classes(*). Maybe transaction log
> > might help, or splicing the arrays in smaller manageable chunks, or
> > using deltas(*). Basically I don't want to touch data after it has been
> > written.
>
> > Slobodan
> > (*)http://www.ericsink.com/entries/time_space_tradeoffs.html
>
> I would guess that in this case, the hard disk is the actual
> bottleneck, so saving space would save you time. There is no
> time-space trade-off, quite the opposite.
>
> Also, I don't understand why you need a transaction log if you don't
> change the data.
Currently my system behaves like this:
1. I create array FOO and do some work with it, then I save it
2. The system stores it in some file FOO-2010-25-13h30min23sec.out
3. Tomorrow I restore my FOO array (systems reads last known file )
then I modify it (add, delete and change elements) then save it
4 The system stores it in other file say FOO-2010-26-17h12min23sec.out
5 In some other session I need that array again and system reads the
last file FOO-2010-26-17h12min23sec.out and restores the array

Every time I save I create a new file in order to preserve the data if
something odd happens (system crash, power surge etc).
Every time I restore I read the last saved file. Now I'm looking for
a better storage mechanism since its very stupid saving the whole file
again and again even if I modified a single element.
I'm currently thinking should I:

A save the whole array once then use deltas:
1. Save foo #(1 2 3 4)
foo.out
;;; name : foo
;; elements 1 2 3 4
2. Add element to foo #(1 2 3 4 5) then save
foo-1.out
;;; add 5 to foo
3. Change first element to 999 then save foo #(999 2 3 4 5)
foo-2.out
;;; change index 1 to 999
....
So when I restore I will have to redo all the operations

B Break array to small pieces then only update those that were changed
similar to what Git is doing keeping just one copy of the file then
using the hash code to see did it changed.

> If you change the data occasionally, I would just
> use git or similar.
I want portable lisp solution with no external dependencies.

From: Johan Ur Riise on 28 Mar 2010 11:01

Slobodan Blazeski <slobodan.blazeski(a)gmail.com> writes:

> On Mar 27, 8:48�pm, Tamas K Papp <tkp...(a)gmail.com> wrote:
>> On Sat, 27 Mar 2010 11:03:05 -0700, Slobodan Blazeski wrote:
>> > On Mar 27, 6:43�pm, Tamas K Papp <tkp...(a)gmail.com> wrote:
>> >> On Sat, 27 Mar 2010 10:03:34 -0700, Slobodan Blazeski wrote:
>> >> > I'm working with �arrays that are one dimensional,adjustable, hold
>> >> > only one type of element and could get quite long(*). I need a way to
>> >> > save them in the file system after doing some work with them (like
>> >> > inserting , deleting and/or changing elements). Quick test of
>> >> > cl-store for storing an array of 10 000 000 fixnums �takes 31.8 MB
>> >> > and the > 3 sec �to run. So what would you suggest as a pure lisp
>> >> > storing strategy? (*) Sample array:
>> >> > �(make-array 10000000 :element-type 'fixnum :initial-element 19
>> >> > �:fill-
>> >> > pointer t :adjustable t)
>>
>> >> Have you tried writing it our in binary form? �Ie take the fixnums,
>> >> slice them up as bytes, open a byte stream, etc. �This is the fastest
>> >> portable solution I can think of.
>>
>> >> On SBCL, you might want to try
>>
>> >>http://github.com/nikodemus/sb-vector-io
>>
>> >> Tamas
>>
>> > Thanks for the link but my problem is not the size of the stored files
>> > (though its certainly nice to save space) but rather I want to achieve
>> > some sort of the safety that databases provide. Something like cl-
>> > prevalence and AllegroCache do for classes(*). Maybe transaction log
>> > might help, or splicing the arrays in smaller manageable chunks, or
>> > using deltas(*). Basically I don't want to touch data after it has been
>> > written.
>>
>> > Slobodan
>> > (*)http://www.ericsink.com/entries/time_space_tradeoffs.html
>>
>> I would guess that in this case, the hard disk is the actual
>> bottleneck, so saving space would save you time. �There is no
>> time-space trade-off, quite the opposite.
>>
>> Also, I don't understand why you need a transaction log if you don't
>> change the data. �
> Currently my system behaves like this:
> 1. I create array FOO and do some work with it, then I save it
> 2. The system stores it in some file FOO-2010-25-13h30min23sec.out
> 3. Tomorrow I restore my FOO array (systems reads last known file )
> then I modify it (add, delete and change elements) then save it
> 4 The system stores it in other file say FOO-2010-26-17h12min23sec.out
> 5 In some other session I need that array again and system reads the
> last file FOO-2010-26-17h12min23sec.out and restores the array
>
> Every time I save I create a new file in order to preserve the data if
> something odd happens (system crash, power surge etc).
> Every time I restore I read the last saved file. Now I'm looking for
> a better storage mechanism since its very stupid saving the whole file
> again and again even if I modified a single element.
> I'm currently thinking should I:
>
> A save the whole array once then use deltas:
> 1. Save foo #(1 2 3 4)
> foo.out
> ;;; name : foo
> ;; elements 1 2 3 4
> 2. Add element to foo #(1 2 3 4 5) then save
> foo-1.out
> ;;; add 5 to foo
> 3. Change first element to 999 then save foo #(999 2 3 4 5)
> foo-2.out
> ;;; change index 1 to 999
> ...
> So when I restore I will have to redo all the operations

This is exactly what cl-prevalence do for you automatically, but
your requirement of 10 million objects is too difficult for
cl-prevalence, 100 000 is a rough practical limit.

From: vanekl on 29 Mar 2010 01:54

Slobodan Blazeski wrote:
> Currently my system behaves like this:
> 1. I create array FOO and do some work with it, then I save it
> 2. The system stores it in some file FOO-2010-25-13h30min23sec.out
> 3. Tomorrow I restore my FOO array (systems reads last known file )
> then I modify it (add, delete and change elements) then save it
> 4 The system stores it in other file say FOO-2010-26-17h12min23sec.out
> 5 In some other session I need that array again and system reads the
> last file FOO-2010-26-17h12min23sec.out and restores the array
>
> Every time I save I create a new file in order to preserve the data if
> something odd happens (system crash, power surge etc).
> Every time I restore I read the last saved file. Now I'm looking for
> a better storage mechanism since its very stupid saving the whole file
> again and again even if I modified a single element.
> I'm currently thinking should I:
>
> A save the whole array once then use deltas:
> 1. Save foo #(1 2 3 4)
> foo.out
> ;;; name : foo
> ;; elements 1 2 3 4
> 2. Add element to foo #(1 2 3 4 5) then save
> foo-1.out
> ;;; add 5 to foo
> 3. Change first element to 999 then save foo #(999 2 3 4 5)
> foo-2.out
> ;;; change index 1 to 999
> ...
> So when I restore I will have to redo all the operations
>
> B Break array to small pieces then only update those that were changed
> similar to what Git is doing keeping just one copy of the file then
> using the hash code to see did it changed.
>
>
>
>> If you change the data occasionally, I would just
>> use git or similar.
> I want portable lisp solution with no external dependencies.

As an exercise I wrote a chunked array that can save itself to file.
Located at http://paste.lisp.org/+22UP
It's commented.

From: Tim Bradshaw on 29 Mar 2010 06:18

On 2010-03-28 18:20:13 +0100, Alex Mizrahi said:

> You need to write it like (execute-transaction (tx-array-change-element
> *system* 1 2 3)), and then write tx-array-change-element function...
>
> It is probably more verbose than some ad-hoc thing one can write.

It doesn't strike me as overwhelmingly difficult to create a wrapper
class such that (setf (ref o x y z) new) does this for you. Easier to
whine about it though, of course.

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: Stop the Heat from Sun Rays!
Next: GSLL, adjusting marrays