From: webmasterATflymagnetic.com on
Hi,

I use the following function to return a text file as a list:

(defun read-file (file)
(with-open-file (stream file)
(do ((retval)
(line (read-line stream nil)
(read-line stream nil)))
((null line) retval)
(setf retval (append retval (list line))))))

I have three files I've tested this on:

Name Size Lines
file1 9,223 214
file2 2,116,283 31,342
file3 5,648,093 105,311

[10]> (time (progn (read-file "file1") nil))
Real time: 0.010914 sec.
Run time: 0.012001 sec.
Space: 201712 Bytes
GC: 1, GC time: 0.008001 sec.
[11]> (time (progn (read-file "file2") nil))
Real time: 41.51742 sec.
Run time: 41.062565 sec.
Space: 3931805312 Bytes
GC: 2268, GC time: 29.117828 sec.
NIL
[12]> (time (progn (read-file "file3") nil))
Real time: 610.77686 sec.
Run time: 595.9412 sec.
Space: 44368777352 Bytes
GC: 23409, GC time: 443.23578 sec.
NIL

So the largest file, with 105,000 lines (5.6M) took nearly 10 minutes.
I accept that there is some work to do to get it into a list format,
but clearly the way I have approached this must be wrong. This sort of
performance is abysmal.

I would be interested in people's views on how I should be doing this,
and comments as to why what I am seeing is so poor.

Yes it works, and as I'm still relatively new to Lisp this is the way
I would see Lisp interacting with the outside world -- ie get the data
into a list and then it can be worked on using the normal Lisp
functions. I strongly suspect I must be missing something, but can't
see what it is. My code will need to work with files sometimes much
larger than 5MB, but clearly this approach will not work.

So any advice is very much appreciated.

Phil
From: szabolcs.szucs on
Append is the key. Because of you collecting lines into a list which
is a simply linked one
(afaik) when appending to the end, you have to iterate over the list
every time. This makes the solution O(n^2).. A somewhat better
solution is to (cons line retval) and return (reverse retval) which
makes stuff linear (O(n)), although consider using arrays.

Cheers,
Sz

On Jun 24, 12:29 pm, "webmasterATflymagnetic.com"
<webmas...(a)flymagnetic.com> wrote:
> Hi,
>
> I use the following function to return a text file as a list:
>
> (defun read-file (file)
>   (with-open-file (stream file)
>       (do ((retval)
>            (line (read-line stream nil)
>                  (read-line stream nil)))
>           ((null line) retval)
>           (setf retval (append retval (list line))))))
>
> I have three files I've tested this on:
>
> Name Size Lines
> file1 9,223 214
> file2 2,116,283 31,342
> file3 5,648,093 105,311
>
> [10]> (time (progn (read-file "file1") nil))
> Real time: 0.010914 sec.
> Run time: 0.012001 sec.
> Space: 201712 Bytes
> GC: 1, GC time: 0.008001 sec.
> [11]> (time (progn (read-file "file2") nil))
> Real time: 41.51742 sec.
> Run time: 41.062565 sec.
> Space: 3931805312 Bytes
> GC: 2268, GC time: 29.117828 sec.
> NIL
> [12]> (time (progn (read-file "file3") nil))
> Real time: 610.77686 sec.
> Run time: 595.9412 sec.
> Space: 44368777352 Bytes
> GC: 23409, GC time: 443.23578 sec.
> NIL
>
> So the largest file, with 105,000 lines (5.6M) took nearly 10 minutes.
> I accept that there is some work to do to get it into a list format,
> but clearly the way I have approached this must be wrong. This sort of
> performance is abysmal.
>
> I would be interested in people's views on how I should be doing this,
> and comments as to why what I am seeing is so poor.
>
> Yes it works, and as I'm still relatively new to Lisp this is the way
> I would see Lisp interacting with the outside world -- ie get the data
> into a list and then it can be worked on using the normal Lisp
> functions. I strongly suspect I must be missing something, but can't
> see what it is. My code will need to work with files sometimes much
> larger than 5MB, but clearly this approach will not work.
>
> So any advice is very much appreciated.
>
> Phil

From: Zach Beane on
"webmasterATflymagnetic.com" <webmaster(a)flymagnetic.com> writes:

> Hi,
>
> I use the following function to return a text file as a list:
>
> (defun read-file (file)
> (with-open-file (stream file)
> (do ((retval)
> (line (read-line stream nil)
> (read-line stream nil)))
> ((null line) retval)
> (setf retval (append retval (list line))))))
>
> I have three files I've tested this on:
>
> Name Size Lines
> file1 9,223 214
> file2 2,116,283 31,342
> file3 5,648,093 105,311
>
> [10]> (time (progn (read-file "file1") nil))
> Real time: 0.010914 sec.
> Run time: 0.012001 sec.
> Space: 201712 Bytes
> GC: 1, GC time: 0.008001 sec.
> [11]> (time (progn (read-file "file2") nil))
> Real time: 41.51742 sec.
> Run time: 41.062565 sec.
> Space: 3931805312 Bytes
> GC: 2268, GC time: 29.117828 sec.
> NIL
> [12]> (time (progn (read-file "file3") nil))
> Real time: 610.77686 sec.
> Run time: 595.9412 sec.
> Space: 44368777352 Bytes
> GC: 23409, GC time: 443.23578 sec.
> NIL
>
> So the largest file, with 105,000 lines (5.6M) took nearly 10 minutes.
> I accept that there is some work to do to get it into a list format,
> but clearly the way I have approached this must be wrong. This sort of
> performance is abysmal.
>
> I would be interested in people's views on how I should be doing this,
> and comments as to why what I am seeing is so poor.

That output looks like GNU CLISP. Did you compile the function before
timing it?

Zach
From: Teemu Likonen on
* 2010-06-24 03:29 (-0700), webmasterATflymagnetic com wrote:

> (defun read-file (file)
> (with-open-file (stream file)
> (do ((retval)
> (line (read-line stream nil)
> (read-line stream nil)))
> ((null line) retval)
> (setf retval (append retval (list line))))))

It's _much_ faster to add items to the front of list. So, instead of

(setf retval (append retval (list line)))

You should do

(push line retval)

and then reverse the list when exiting the loop:

(nreverse retval)

The function could look like this:

(defun read-file (file)
(with-open-file (stream file)
(do (retval
(line (read-line stream nil)
(read-line stream nil)))
((null line)
(nreverse retval))
(push line retval))))

Although I usually prefer the LOOP macro -

(defun read-file (file)
(with-open-file (stream file)
(loop for line = (read-line stream nil)
while line collect line)))

- which automatically uses fast methods for building lists.
From: Rainer Joswig on
"webmasterATflymagnetic.com" <webmaster(a)flymagnetic.com> wrote:
> Hi,
....

> (setf retval (append retval (list line))))))
>

one of the most common mistakes when working with singly linked lists
made of cons cells is to append to the end of the list - in loops and
recursive calls.

The usual approaches to work around this:

- use a data structure that does not have this problem

- cons to the front of the list and at after the loop reverse the list
once

Or you can use also LOOP instead of DO:

(loop for line = (read-line stream nil nil) while line collect line)