Prev: METHOD URL
Next: Cann't require UTF-8 files.
From: Alex Dowad on 30 Apr 2010 13:20 Hi, this is my first post on ruby-forum. Hope this is useful to someone! I have learned from experience to avoid reading files using File.read(filename)... it gives terrible performance on even moderately large files. Reading large files line by line is much faster, and uses much less memory. However, there are cases when you do want the entire file in a single string. I just discovered that you can do this MUCH faster with File.read(filename,File.size(filename))... check this out: > File.size(bigfile) # not really that big... just 10 MB => 10531519 > Benchmark.bm do |bm| * bm.report("straight read") { File.read(bigfile) } * bm.report("read w/ size") { File.read(bigfile,File.size(bigfile)) } * end user system total real straight read 28.875000 18.032000 46.907000 ( 47.812500) read w/ size 0.000000 0.031000 0.031000 ( 0.031250) ...for just a *moderate* 1500x boost in performance. I believe that these are the offending lines, in io.c: 1622: static VALUE 1623: read_all(rb_io_t *fptr, long siz, VALUE str) ... intervening lines omitted... 1668: siz += BUFSIZ; 1669: rb_str_resize(str, siz); It appears that the buffer is being grown linearly, giving O(n^2) performance. If this is the case, switching to an exponential growth strategy should give O(n) performance instead; a BIG improvement. Is there a good reason why the code is written this way? Could it really be an oversight? Seems hard to believe. Comments please! Alex Dowad -- Posted via http://www.ruby-forum.com/.
From: Roger Pack on 30 Apr 2010 14:43 > user system total real > straight read 28.875000 18.032000 46.907000 ( 47.812500) > read w/ size 0.000000 0.031000 0.031000 ( 0.031250) > > ...for just a *moderate* 1500x boost in performance. ... > It appears that the buffer is being grown linearly, giving O(n^2) > performance. Yeah this is true. I think it has been fixed in ruby trunk, though (try it out there). Also if you're on windows try binread. -rp -- Posted via http://www.ruby-forum.com/.
From: Roger Pack on 30 Apr 2010 16:22 Roger Pack wrote: >> user system total real >> straight read 28.875000 18.032000 46.907000 ( 47.812500) >> read w/ size 0.000000 0.031000 0.031000 ( 0.031250) >> >> ...for just a *moderate* 1500x boost in performance. > ... >> It appears that the buffer is being grown linearly, giving O(n^2) >> performance. I'm unable to reproduce this except on windows, so that's where you are, I assume? Yeah unfortunately with windows it does exactly what you described (except in trunk, where it has been fixed, except still has a bit of slowdown when you read files in ascii+translation mode (see last post of this thread: http://www.ruby-forum.com/topic/182875#new) It does seem that the idea results in some speedup, though: linux 1.9.2, 500MB file user system total real normal 0.130000 0.810000 0.940000 ( 0.938607) optimized 0.000000 0.740000 0.740000 ( 0.749340) windows 1.9.2, 500MB file user system total real normal 0.250000 0.671000 0.921000 ( 0.921829) optimized 0.000000 0.764000 0.764000 ( 0.774697) plus results in a huge increase in speed for ascii mode in windows. ruby 1.9.2dev (2010-05-01) [i386-mingw32] user system total real normal 11.342000 0.718000 12.060000 ( 12.735092) optimized 0.000000 0.437000 0.437000 ( 0.446179) (maybe there is still some N^2 action going on?) I'll file a feature request for it. -rp -- Posted via http://www.ruby-forum.com/.
From: Alex Dowad on 1 May 2010 01:48 Thanks for your reply, Roger! > I'm unable to reproduce this except on windows, so that's where you are, > I assume? Yes. Sorry, I'll make that clear next time I post. > ... > I'll file a feature request for it. Thanks! There's no point in growing a buffer dynamically, when you know from the beginning how many bytes you need to store in it. Rather than chaining directly to IO.read, File.read could easily pass the file size along if no "length" argument is passed in. Alex Dowad -- Posted via http://www.ruby-forum.com/.
From: Michel Demazure on 1 May 2010 04:20 Alex Dowad wrote: > Thanks for your reply, Roger! > >> I'm unable to reproduce this except on windows, so that's where you are, >> I assume? > > Yes. Sorry, I'll make that clear next time I post. > In my case, on Windows with 1.9.1 and an utf-8 file, the two commands are not equivalent. Replacing File.read(file) by File.read(file, File.size(file)) raises encoding errors. They behave differently wrt encodings. -- Posted via http://www.ruby-forum.com/.
|
Pages: 1 Prev: METHOD URL Next: Cann't require UTF-8 files. |