From: webmasterATflymagnetic.com on 24 Jun 2010 07:09 On Jun 24, 11:46 am, Zach Beane <x...(a)xach.com> wrote: > "webmasterATflymagnetic.com" <webmas...(a)flymagnetic.com> writes: > > Hi, > > > I use the following function to return a text file as a list: > > > (defun read-file (file) > > (with-open-file (stream file) > > (do ((retval) > > (line (read-line stream nil) > > (read-line stream nil))) > > ((null line) retval) > > (setf retval (append retval (list line)))))) > > > I have three files I've tested this on: > > > Name Size Lines > > file1 9,223 214 > > file2 2,116,283 31,342 > > file3 5,648,093 105,311 > > > [10]> (time (progn (read-file "file1") nil)) > > Real time: 0.010914 sec. > > Run time: 0.012001 sec. > > Space: 201712 Bytes > > GC: 1, GC time: 0.008001 sec. > > [11]> (time (progn (read-file "file2") nil)) > > Real time: 41.51742 sec. > > Run time: 41.062565 sec. > > Space: 3931805312 Bytes > > GC: 2268, GC time: 29.117828 sec. > > NIL > > [12]> (time (progn (read-file "file3") nil)) > > Real time: 610.77686 sec. > > Run time: 595.9412 sec. > > Space: 44368777352 Bytes > > GC: 23409, GC time: 443.23578 sec. > > NIL > > > So the largest file, with 105,000 lines (5.6M) took nearly 10 minutes. > > I accept that there is some work to do to get it into a list format, > > but clearly the way I have approached this must be wrong. This sort of > > performance is abysmal. > > > I would be interested in people's views on how I should be doing this, > > and comments as to why what I am seeing is so poor. > > That output looks like GNU CLISP. Did you compile the function before > timing it? > > Zach Thanks for the replies. OK I need to research arrays. I tried compiling this and running it again (yes it is CLisp on Linux), and it took even longer! [22]> (compile 'read-file) READ-FILE ; NIL ; NIL [23]> (time (progn (read-file "file1") nil)) Real time: 0.017634 sec. Run time: 0.020001 sec. Space: 201704 Bytes GC: 1, GC time: 0.012001 sec. NIL [24]> (time (progn (read-file "file2") nil)) Real time: 41.995705 sec. Run time: 41.734608 sec. Space: 3931805304 Bytes GC: 3151, GC time: 29.84985 sec. NIL [25]> (time (progn (read-file "file3") nil)) Real time: 617.9405 sec. Run time: 611.1302 sec. Space: 44368777344 Bytes GC: 23051, GC time: 452.03613 sec. NIL Cheers! Phil
From: webmasterATflymagnetic.com on 24 Jun 2010 07:18 On Jun 24, 12:05 pm, Rainer Joswig <jos...(a)lisp.de> wrote: > "webmasterATflymagnetic.com" <webmas...(a)flymagnetic.com> wrote: > one of the most common mistakes when working with singly linked lists Thanks! Nice to know I've fallen into a common trap. I've not looked at LOOP as I've always been able to work with DO, but I do intend to devote some time to learning the extended LOOP syntax at some point (trouble is that point is always in the future). OK, taken the advice about cons-ing and reversing the list: (defun read-file (file) (with-open-file (stream file) (do ((retval) (line (read-line stream nil) (read-line stream nil))) ((null line) (reverse retval)) (push line retval)))) [30]> (time (progn (read-file "file1") nil)) Real time: 0.001456 sec. Run time: 0.0 sec. Space: 21064 Bytes NIL [31]> (time (progn (read-file "file2") nil)) Real time: 0.168605 sec. Run time: 0.15201 sec. Space: 2897552 Bytes GC: 1, GC time: 0.016 sec. NIL [32]> (time (progn (read-file "file3") nil)) Real time: 0.558222 sec. Run time: 0.552034 sec. Space: 8414192 Bytes GC: 3, GC time: 0.104006 sec. NIL Well impressed! Thanks all for speedy and erudite responses. Phil
From: Pascal J. Bourguignon on 24 Jun 2010 07:49 "webmasterATflymagnetic.com" <webmaster(a)flymagnetic.com> writes: > Thanks for the replies. OK I need to research arrays. I tried > compiling this and running it again (yes it is CLisp on Linux), and it > took even longer! > > [22]> (compile 'read-file) > Cheers! Remember: (defun optimize-for-speed (item) (etypecase item ((or symbol function) (compile item)) (cons (compile nil item)) ; (lambda ...) ((or pathname string) (compile-file item)))) Once you've removed the inefficiencies of your algorithm and let the compiler optimize your code, when comparing with most other programming language there remains one difference in Lisp: it uses a CHARACTER data type and converts between bytes sequences and characters, which may take some more time than just processing octets. For most fast processing, if you don't really need characters, you may instead read bytes: (with-open-file (... :element-type '(unsigned-byte 8)) ...) Of course, you cannot use read-line, read, read-char anymore, you have to use read-byte or read-sequence. Notice however, that you wouldn't even lose much processing bytes instead of characters in CL, since most functions you would use on strings are actually function appliable on vectors or sequences anyways. mismatch, subseq, replace, etc... The only thing you may feel a need for is some way to entre literal sequences of bytes from an ASCII representation, like in C. For this, you can write a simple reader macro, such as #"ABCD" --> #(65 66 67 68) so that you can write things like: (if (equal (subseq buffer 0 4) #"HELO") (start-up)) -- __Pascal Bourguignon__ http://www.informatimago.com/
From: Norbert_Paul on 25 Jun 2010 02:34 szabolcs.szucs wrote: > Append is the key. Because of you collecting lines into a list which > is a simply linked one > (afaik) when appending to the end, you have to iterate over the list > every time. This makes the solution O(n^2).. A somewhat better > solution is to (cons line retval) and return (reverse retval) which > makes stuff linear (O(n)), although consider using arrays. True. Even better (faster) is returning (nreverse retval).
From: Tamas K Papp on 25 Jun 2010 04:13 On Fri, 25 Jun 2010 08:34:56 +0200, Norbert_Paul wrote: > szabolcs.szucs wrote: >> Append is the key. Because of you collecting lines into a list which is >> a simply linked one >> (afaik) when appending to the end, you have to iterate over the list >> every time. This makes the solution O(n^2).. A somewhat better solution >> is to (cons line retval) and return (reverse retval) which makes stuff >> linear (O(n)), although consider using arrays. > True. > Even better (faster) is returning (nreverse retval). Collecting via loop/iter is usually at least as fast as using nreverse (in my experience, on SBCL). Given that the former provide a more natural idiom, I rarely every use nreverse for this. Tamas
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: hunchentoot doesn't show a picture Next: Emacs Form Feed (^L) Display Suggestion and Tips |