From: Captain Obvious on 1 Feb 2010 10:20 TKP> I don't know much about these things, but I think that the best TKP> solution would be a database of some kind. I am wondering what would TKP> be the simplest and most hassle-free way to do this in CL (if that TKP> matters, I am using SBCL). Juho Snellman have described rather hackish way for working with large data sets here: http://jsnell.iki.fi/blog/archive/2006-10-15-netflix-prize.html As I understand, with this kind of solution, you can mmap your vectors to files once, then then OS will do the rest for you -- it will automatically load stuff from disk, write it to disk, clear memory from data which is not used right now etc. On a 32-bit machine you will be able to mmap only a portion of data at time, so you'll need a wrapper of some sort.
From: Mario S. Mommer on 1 Feb 2010 12:08 Hi Tamas, Tamas K Papp <tkpapp(a)gmail.com> writes: > I have a good 64 bit machine with tons of ram, but in a momentary > lapse of reason, I installed 32 bit ubuntu on it in the past. Maybe a > reinstall would be less hassle than a DB. I'm quite sure that this is so. You'll probably upgrade eventually anyway. > I notice that you are using SBCL. I posted the message below to the > SBCL list, but got no reply so far. I wonder if you could help me: [...] > - how big is ARRAY-TOTAL-SIZE-LIMIT on 64-bit SBCL? Will this allow > me to use larger arrays? Is there another limit (provided that I > take enough memory with a --dynamic-space-size)? ; SLIME 2009-05-19 CL-USER> ARRAY-TOTAL-SIZE-LIMIT 1152921504606846973 CL-USER> (log * 2) 60.0 CL-USER> No idea if there are other limits. I've not bumped into any. > - Does 64-bit result in significantly a higher memory consumption? I > understand that fixnums will now take twice the space, but does > anything else take up more memory? Conses are wider too. > - Does 64 vs 32 bit have any impact on speed (positively or > negatively)? Can single floats be unboxed in 64-bit?" No idea about single floats; I'd be very surprised if they would not be unboxable, as the SBCL developers really pay a lot of attention to performance (and my thanks go to them for that!). I have no accurate information on the speed issue either, but would again be very surprised if the 64 bit version would be slower. Now to the FASL thing. It is a hack, but it works. See below. The fasls are not portable, so one has to migrate them from one version to the next, or from one implementation to the next. But they load fast, so there are cases where this is a good solution. Mario (defpackage #:faslstore (:export #:bindump #:binload) (:nicknames #:fs) (:use :cl)) (in-package #:faslstore) (defparameter *hook* nil) (defun gentempname nil (format nil "~Afaslize.lisp" (get-universal-time))) (defun bindump (data fname) (let ((tmp (gentempname))) (setq *hook* data) (with-open-file (str tmp :direction :output :if-exists :supersede) (format str "(in-package #:faslstore)~%~ (let ((c #.*hook*))~%~ (defun returner nil~%~ (prog1 c (setf c nil))))")) (compile-file tmp :output-file fname :verbose nil :print nil) (delete-file tmp))) (defun returner nil nil) (defun binload (fname) (load fname) (returner))
From: Alberto Riva on 1 Feb 2010 14:10 Tamas K Papp wrote: > Hi, > > I am doing Markov-Chain Monte Carlo in CL. Specifically, I draw a > vector (of about 10^5 elements) from a distribution. I need about > 10^4 draws. This makes a huge table --- I am not sure I would like to > fit that in memory, even if I could (single float would take 4e9 > bytes, but 1e9 is not a fixnum any more, so plain vanilla Lisp arrays > would not work on my 32-bit platform). > > I don't know much about these things, but I think that the best > solution would be a database of some kind. I am wondering what would > be the simplest and most hassle-free way to do this in CL (if that > matters, I am using SBCL). > > If I think of this tables as a matrix, I will save data along one > dimension (eg rows, each draw), but I will retrieve data along the > other (eg columns, multiple draws for each variable). You could simply write all your numbers out to a file (using an appropriate encoding), and since the number of bytes per number and the number of columns are constant, you can calculate the offset in the file based on row and column number, and then use FILE-POSITION to jump to that location directly. Alberto
From: Waldek Hebisch on 1 Feb 2010 15:09 Tamas K Papp <tkpapp(a)gmail.com> wrote: > > I have a good 64 bit machine with tons of ram, but in a momentary > lapse of reason, I installed 32 bit ubuntu on it in the past. Maybe a > reinstall would be less hassle than a DB. > > I notice that you are using SBCL. I posted the message below to the > SBCL list, but got no reply so far. I wonder if you could help me: > > "Currently, I am using SBCL on 32-bit Ubuntu (x86). I ran into a > specific limitation (fixnum limits my array size), so am wondering > whether to switch to 64-bit SBCL. This would require a reinstall, > which is not a major issue but a minor PITA which would surely take a > few hours. Before I undertake this, I have a few questions: > > - how big is ARRAY-TOTAL-SIZE-LIMIT on 64-bit SBCL? 1.0.16 reports 1152921504606846975 > Will this allow > me to use larger arrays? (defvar a) (progn (setf a (make-array (list (expt 10 4) (expt 10 5)) :element-type 'single-float)) nil) works OK. > Is there another limit (provided that I > take enough memory with a --dynamic-space-size)? > Yes, for me the most significant limit is number of virtual mappings (see thread started by Martin Rubey). This is combined limit of sbcl and Linux. Basically in default configuration you should be always able to use 256 Mb. Typically you may use much more, but I have seen sbcl running out of virtual mappings already at 640 Mb. Basically if you need a lot of small mutable object of varying lifetimes then expect troubles. OTOH few huge arrays should be OK. (I solved my problem switching from milions of small vectors to thousends of bigger ones). > - Does 64-bit result in significantly a higher memory consumption? I > understand that fixnums will now take twice the space, but does > anything else take up more memory? Pointers and consequently "general" Lisp data like conses, closures, general arrays and structures take twice the space. Code should take similar size (you may even see smaller code due to higher number of available registers and less need for reloads). Specialized arrays have bigger header, but otherwise take the same space. In particual long strings should take similar space (for short ones header is more significant). General single floats should also take less space in 64 bit version: in 32 bit version you have pointer (32 bit) plus data with header, giving 96 bits. 64-bit version uses direct representation, taking 64 bits. > > - Does 64 vs 32 bit have any impact on speed (positively or > negatively)? Can single floats be unboxed in 64-bit?" For me 64 bits have large positive impact, the biggest factor beeing that I have a lot of integer that are bignums on 32 bit but fixnums on 64 bit. Also getting native code for 64 bit integer helps. AFAIK in 64-bit single floats do not require memory allocation. Old measurements in C indicated that 64 bit is about 10% faster than 32 bit. Lisp and Java is freqently memory bound, so bigger data may mean less speed. Also, newer machines have better SSE, so now performace critical parts use SSE, which works the same both in 64 bit and 32 bit version. Finally, Intel processors have a some improvements that are only active in 32 bit mode (actually maybe only one thing: instruction fusing) -- I do not how this affects performace. -- Waldek Hebisch hebisch(a)math.uni.wroc.pl
From: Thomas A. Russ on 1 Feb 2010 17:31 Tamas K Papp <tkpapp(a)gmail.com> writes: > Hi, > > I am doing Markov-Chain Monte Carlo in CL. Specifically, I draw a > vector (of about 10^5 elements) from a distribution. I need about > 10^4 draws. This makes a huge table --- I am not sure I would like to > fit that in memory, even if I could (single float would take 4e9 > bytes, but 1e9 is not a fixnum any more, so plain vanilla Lisp arrays > would not work on my 32-bit platform). .... > If I think of this tables as a matrix, I will save data along one > dimension (eg rows, each draw), but I will retrieve data along the > other (eg columns, multiple draws for each variable). The second step > will be done more often, so I want that to be fast. Does it matter > for speed which dimension I consider rows or columns? Yes, that will matter. If you use arrays in Common Lisp, they are stored in row-major format. But for your application, I would seriously consider changing a bit the order in which you do things. Assuming that the draw procedure is the same for each item you draw, and that you have a good random number source, then it shouldn't matter if you do all of the draws for one trial in a single pass or if you instead do all of the draws for a single column. Since the columns (variables? features?) are accessed more frequently, you would want them to be contiguous in memory. It would seem, therefore, that you would perhaps want to make the column memory the primary dimension and have the draws be secondary. That would suggest to me storing each column's value in a separate 10^4 length vector, and have a collection of these columns. You could do this in memory by using a 64-bit lisp system and just having a collection of vectors. Assuming that you really process the columns independently, you would only be working on one of the 10^4 vectors at a time. That should give you good locality of reference, and allow for both cache and paging efficiency. If need be, you could store the information externally using a (binary) file format or a database. But for the processing, you would want to have a contiguous vector allocated for the entire data set. -- Thomas A. Russ, USC/Information Sciences Institute
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: lisp introspection/reflection question Next: online shopping |