From: webmasterATflymagnetic.com on
On Jun 24, 11:46 am, Zach Beane <x...(a)xach.com> wrote:
> "webmasterATflymagnetic.com" <webmas...(a)flymagnetic.com> writes:
> > Hi,
>
> > I use the following function to return a text file as a list:
>
> > (defun read-file (file)
> >   (with-open-file (stream file)
> >       (do ((retval)
> >            (line (read-line stream nil)
> >                  (read-line stream nil)))
> >           ((null line) retval)
> >           (setf retval (append retval (list line))))))
>
> > I have three files I've tested this on:
>
> > Name Size Lines
> > file1 9,223 214
> > file2 2,116,283 31,342
> > file3 5,648,093 105,311
>
> > [10]> (time (progn (read-file "file1") nil))
> > Real time: 0.010914 sec.
> > Run time: 0.012001 sec.
> > Space: 201712 Bytes
> > GC: 1, GC time: 0.008001 sec.
> > [11]> (time (progn (read-file "file2") nil))
> > Real time: 41.51742 sec.
> > Run time: 41.062565 sec.
> > Space: 3931805312 Bytes
> > GC: 2268, GC time: 29.117828 sec.
> > NIL
> > [12]> (time (progn (read-file "file3") nil))
> > Real time: 610.77686 sec.
> > Run time: 595.9412 sec.
> > Space: 44368777352 Bytes
> > GC: 23409, GC time: 443.23578 sec.
> > NIL
>
> > So the largest file, with 105,000 lines (5.6M) took nearly 10 minutes.
> > I accept that there is some work to do to get it into a list format,
> > but clearly the way I have approached this must be wrong. This sort of
> > performance is abysmal.
>
> > I would be interested in people's views on how I should be doing this,
> > and comments as to why what I am seeing is so poor.
>
> That output looks like GNU CLISP. Did you compile the function before
> timing it?
>
> Zach

Thanks for the replies. OK I need to research arrays. I tried
compiling this and running it again (yes it is CLisp on Linux), and it
took even longer!

[22]> (compile 'read-file)
READ-FILE ;
NIL ;
NIL
[23]> (time (progn (read-file "file1") nil))
Real time: 0.017634 sec.
Run time: 0.020001 sec.
Space: 201704 Bytes
GC: 1, GC time: 0.012001 sec.
NIL
[24]> (time (progn (read-file "file2") nil))
Real time: 41.995705 sec.
Run time: 41.734608 sec.
Space: 3931805304 Bytes
GC: 3151, GC time: 29.84985 sec.
NIL
[25]> (time (progn (read-file "file3") nil))
Real time: 617.9405 sec.
Run time: 611.1302 sec.
Space: 44368777344 Bytes
GC: 23051, GC time: 452.03613 sec.
NIL


Cheers!

Phil
From: webmasterATflymagnetic.com on
On Jun 24, 12:05 pm, Rainer Joswig <jos...(a)lisp.de> wrote:
> "webmasterATflymagnetic.com" <webmas...(a)flymagnetic.com> wrote:
> one of the most common mistakes when working with singly linked lists

Thanks! Nice to know I've fallen into a common trap. I've not looked
at LOOP as I've always been able to work with DO, but I do intend to
devote some time to learning the extended LOOP syntax at some point
(trouble is that point is always in the future).

OK, taken the advice about cons-ing and reversing the list:

(defun read-file (file)
(with-open-file (stream file)
(do ((retval)
(line (read-line stream nil)
(read-line stream nil)))
((null line) (reverse retval))
(push line retval))))

[30]> (time (progn (read-file "file1") nil))
Real time: 0.001456 sec.
Run time: 0.0 sec.
Space: 21064 Bytes
NIL
[31]> (time (progn (read-file "file2") nil))
Real time: 0.168605 sec.
Run time: 0.15201 sec.
Space: 2897552 Bytes
GC: 1, GC time: 0.016 sec.
NIL
[32]> (time (progn (read-file "file3") nil))
Real time: 0.558222 sec.
Run time: 0.552034 sec.
Space: 8414192 Bytes
GC: 3, GC time: 0.104006 sec.
NIL


Well impressed!

Thanks all for speedy and erudite responses.

Phil
From: Pascal J. Bourguignon on
"webmasterATflymagnetic.com" <webmaster(a)flymagnetic.com> writes:
> Thanks for the replies. OK I need to research arrays. I tried
> compiling this and running it again (yes it is CLisp on Linux), and it
> took even longer!
>
> [22]> (compile 'read-file)
> Cheers!

Remember:

(defun optimize-for-speed (item)
(etypecase item
((or symbol function) (compile item))
(cons (compile nil item)) ; (lambda ...)
((or pathname string) (compile-file item))))





Once you've removed the inefficiencies of your algorithm and let the
compiler optimize your code, when comparing with most other
programming language there remains one difference in Lisp: it uses a
CHARACTER data type and converts between bytes sequences and
characters, which may take some more time than just processing octets.

For most fast processing, if you don't really need characters, you may instead read bytes:

(with-open-file (... :element-type '(unsigned-byte 8)) ...)

Of course, you cannot use read-line, read, read-char anymore, you have
to use read-byte or read-sequence.

Notice however, that you wouldn't even lose much processing bytes
instead of characters in CL, since most functions you would use on
strings are actually function appliable on vectors or sequences
anyways. mismatch, subseq, replace, etc...

The only thing you may feel a need for is some way to entre literal
sequences of bytes from an ASCII representation, like in C. For this,
you can write a simple reader macro, such as #"ABCD" --> #(65 66 67 68)

so that you can write things like:

(if (equal (subseq buffer 0 4) #"HELO")
(start-up))



--
__Pascal Bourguignon__ http://www.informatimago.com/
From: Norbert_Paul on
szabolcs.szucs wrote:
> Append is the key. Because of you collecting lines into a list which
> is a simply linked one
> (afaik) when appending to the end, you have to iterate over the list
> every time. This makes the solution O(n^2).. A somewhat better
> solution is to (cons line retval) and return (reverse retval) which
> makes stuff linear (O(n)), although consider using arrays.
True.
Even better (faster) is returning (nreverse retval).
From: Tamas K Papp on
On Fri, 25 Jun 2010 08:34:56 +0200, Norbert_Paul wrote:

> szabolcs.szucs wrote:
>> Append is the key. Because of you collecting lines into a list which is
>> a simply linked one
>> (afaik) when appending to the end, you have to iterate over the list
>> every time. This makes the solution O(n^2).. A somewhat better solution
>> is to (cons line retval) and return (reverse retval) which makes stuff
>> linear (O(n)), although consider using arrays.
> True.
> Even better (faster) is returning (nreverse retval).

Collecting via loop/iter is usually at least as fast as using nreverse
(in my experience, on SBCL). Given that the former provide a more
natural idiom, I rarely every use nreverse for this.

Tamas