Prev: FAQ 2.3 I don't have a C compiler. How can I build my own Perl interpreter?
Next: FAQ 2.7 Is there an ISO or ANSI certified version of Perl?
From: PerlFAQ Server on 4 Jul 2010 00:00 This is an excerpt from the latest version perlfaq5.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest version of the complete perlfaq is at http://faq.perl.org . -------------------------------------------------------------------- 5.29: How can I read in an entire file all at once? Are you sure you want to read the entire file and store it in memory? If you mmap the file, you can virtually load the entire file into a string without actually storing it in memory: use File::Map qw(map_file); map_file my $string, $filename; Once mapped, you can treat $string as you would any other string. Since you don't actually load the data, mmap-ing is very fast and does not increase your memory footprint. If you really want to load the entire file, you can use the "File::Slurp" module to do it in one step. use File::Slurp; my $all_of_it = read_file($filename); # entire file in scalar my @all_lines = read_file($filename); # one line per element The customary Perl approach for processing all the lines in a file is to do so one line at a time: open my $input, '<', $file or die "can't open $file: $!"; while (<$input>) { chomp; # do something with $_ } close $input or die "can't close $file: $!"; This is tremendously more efficient than reading the entire file into memory as an array of lines and then processing it one element at a time, which is often--if not almost always--the wrong approach. Whenever you see someone do this: my @lines = <INPUT>; You should think long and hard about why you need everything loaded at once. It's just not a scalable solution. You might also find it more fun to use the standard Tie::File module, or the DB_File module's $DB_RECNO bindings, which allow you to tie an array to a file so that accessing an element the array actually accesses the corresponding line in the file. You can read the entire filehandle contents into a scalar. { local $/; open my $fh, '<', $file or die "can't open $file: $!"; $var = <$fh>; } That temporarily undefs your record separator, and will automatically close the file at block exit. If the file is already open, just use this: $var = do { local $/; <$fh> }; For ordinary files you can also use the read function. read( $fh, $var, -s $fh ); The third argument tests the byte size of the data on the INPUT filehandle and reads that many bytes into the buffer $var. -------------------------------------------------------------------- The perlfaq-workers, a group of volunteers, maintain the perlfaq. They are not necessarily experts in every domain where Perl might show up, so please include as much information as possible and relevant in any corrections. The perlfaq-workers also don't have access to every operating system or platform, so please include relevant details for corrections to examples that do not work on particular platforms. Working code is greatly appreciated. If you'd like to help maintain the perlfaq, see the details in perlfaq.pod.
From: Uri Guttman on 4 Jul 2010 01:15 brian, here are some edits and comments for this faq: >>>>> "PS" == PerlFAQ Server <brian(a)theperlreview.com> writes: PS> 5.29: How can I read in an entire file all at once? PS> Are you sure you want to read the entire file and store it in memory? If PS> you mmap the file, you can virtually load the entire file into a string PS> without actually storing it in memory: Reading in an entire file at one time can be useful and more efficient providing the file is small enough. With modern systems, even a 1MB file can be considered small and almost all common text files and many others are less than 1MB. Also some files need to be processed as whole entities (e.g. image formats) and are best loaded into a scalar. PS> use File::Map qw(map_file); PS> map_file my $string, $filename; PS> Once mapped, you can treat $string as you would any other PS> string. Since you don't actually load the data, mmap-ing is PS> very fast and does not increase your memory footprint. i disagree with that last point. mmap always needs virtual ram allocated for the entire file to be mapped. it only saves ram if you map part of the file into a smaller virtual window. the win of mmap is that it won't do the i/o until you touch a section. so if you want random access to sections of a file, mmap is a big win. if you are going to just process the whole file, there isn't any real win over File::Slurp PS> If you really want to load the entire file, you can use the PS> "File::Slurp" module to do it in one step. If you decide to load the entire file, you can use the "File::Slurp" module to do it in one simple and efficient step. PS> use File::Slurp; PS> my $all_of_it = read_file($filename); # entire file in scalar PS> my @all_lines = read_file($filename); # one line per element PS> The customary Perl approach for processing all the lines in a file is to PS> do so one line at a time: PS> open my $input, '<', $file or die "can't open $file: $!"; PS> while (<$input>) { PS> chomp; PS> # do something with $_ PS> } PS> close $input or die "can't close $file: $!"; PS> This is tremendously more efficient than reading the entire file into PS> memory as an array of lines and then processing it one element at a PS> time, which is often--if not almost always--the wrong approach. Whenever PS> you see someone do this: again, i disagree. you can easily benchmark slurping an array of lines and looping vs line by line reading. the win with slurping (with File::Slurp) is bypassing perl's i/o layer. the looping overhead is the same and the ram overhead isn't so much for most files as i have said above. also some parsing or regex stuff is MUCH faster with whole files in ram. a single s///g done over a whole file in a scalar is way faster than doing it over each line in a loop. parsing and munging whole files can be much easier too as you can do multiline matches and such. here is a super fast way to read and parse a simple config file (key: value lines): use File::Slurp ; my %config = read_file( $conf_file ) =~ /^(\w+):\s*(.+)$/mg ; doing that line by line takes more code and is much slower as you need to call the regex for each line. PS> my @lines = <INPUT>; PS> You can read the entire filehandle contents into a scalar. PS> { PS> local $/; PS> open my $fh, '<', $file or die "can't open $file: $!"; PS> $var = <$fh>; PS> } PS> That temporarily undefs your record separator, and will automatically PS> close the file at block exit. If the file is already open, just use PS> this: PS> $var = do { local $/; <$fh> }; you missed the coolest variant: my $text = do { local( @ARGV, $/) = $file ; <> }; no open needed! other than file::slurp not being in core (and it should be! :), there is no reason to show the $/ = undef trick. it is always slower and more obscure then calling read_file (which also does better error handling and has more options). PS> For ordinary files you can also use the read function. PS> read( $fh, $var, -s $fh ); might as well use sysread as it is faster and has the same api. read is almost never needed unless you are doing block reads on a file and mixing in line reads (they share the perl stdio). uri -- Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com -- ----- Perl Code Review , Architecture, Development, Training, Support ------ --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
From: Eric Pozharski on 5 Jul 2010 03:35 with <87mxu7j08z.fsf(a)quad.sysarch.com> Uri Guttman wrote: >>>>>> "PS" == PerlFAQ Server <brian(a)theperlreview.com> writes: *SKIP* > PS> { > PS> local $/; > PS> open my $fh, '<', $file or die "can't open $file: $!"; > PS> $var = <$fh>; > PS> } > > PS> That temporarily undefs your record separator, and will automatically > PS> close the file at block exit. If the file is already open, just use > PS> this: > > PS> $var = do { local $/; <$fh> }; > > you missed the coolest variant: > > my $text = do { local( @ARGV, $/) = $file ; <> }; > > no open needed! > > other than file::slurp not being in core (and it should be! :), there is > no reason to show the $/ = undef trick. it is always slower and more > obscure then calling read_file (which also does better error handling > and has more options). > > PS> For ordinary files you can also use the read function. > > PS> read( $fh, $var, -s $fh ); > > might as well use sysread as it is faster and has the same api. read is > almost never needed unless you are doing block reads on a file and > mixing in line reads (they share the perl stdio). Please reconsider your 'always slower': #!/usr/bin/perl use strict; use warnings; use Benchmark qw{ cmpthese timethese }; use File::Slurp; my $fname = '/etc/passwd'; read_file $fname; cmpthese timethese -5, { code00 => sub { my $aa = read_file $fname; }, code01 => sub { local $/; open my $fh, '<', $fname or die $!; my $aa = <$fh>; }, code02 => sub { local( @ARGV, $/ ) = $fname; my $aa = <>; }, code03 => sub { open my $fh, '<', $fname or die $!; defined read $fh, my $aa, -s $fh or die $!; }, code04 => sub { open my $fh, '<', $fname or die $!; defined sysread $fh, my $aa, -s $fh or die $!; }, }; __END__ Benchmark: running code00, code01, code02, code03, code04 for at least 5 CPU seconds... code00: 5 wallclock secs ( 3.34 usr + 2.01 sys = 5.35 CPU) @ 31214.95/s (n=167000) code01: 5 wallclock secs ( 2.82 usr + 2.45 sys = 5.27 CPU) @ 41757.50/s (n=220062) code02: 5 wallclock secs ( 2.58 usr + 2.68 sys = 5.26 CPU) @ 43446.01/s (n=228526) code03: 5 wallclock secs ( 2.60 usr + 2.69 sys = 5.29 CPU) @ 47371.08/s (n=250593) code04: 4 wallclock secs ( 2.36 usr + 3.02 sys = 5.38 CPU) @ 52458.92/s (n=282229) Rate code00 code01 code02 code03 code04 code00 31215/s -- -25% -28% -34% -40% code01 41757/s 34% -- -4% -12% -20% code02 43446/s 39% 4% -- -8% -17% code03 47371/s 52% 13% 9% -- -10% code04 52459/s 68% 26% 21% 11% -- And that's for s{/etc/passwd}{/boot/vmlinuz} Benchmark: running code00, code01, code02, code03, code04 for at least 5 CPU seconds... code00: 5 wallclock secs ( 1.45 usr + 3.96 sys = 5.41 CPU) @ 223.84/s (n=1211) code01: 5 wallclock secs ( 2.08 usr + 3.06 sys = 5.14 CPU) @ 365.18/s (n=1877) code02: 6 wallclock secs ( 2.16 usr + 3.00 sys = 5.16 CPU) @ 366.28/s (n=1890) code03: 6 wallclock secs ( 2.12 usr + 3.24 sys = 5.36 CPU) @ 372.20/s (n=1995) code04: 5 wallclock secs ( 0.12 usr + 5.16 sys = 5.28 CPU) @ 583.14/s (n=3079) Rate code00 code01 code02 code03 code04 code00 224/s -- -39% -39% -40% -62% code01 365/s 63% -- -0% -2% -37% code02 366/s 64% 0% -- -2% -37% code03 372/s 66% 2% 2% -- -36% code04 583/s 161% 60% 59% 57% -- Although, braian, please consider cleaning up that entry a bit. Those who can read would find their way; those who can't wouldn't read that anyway. -- Torvalds' goal for Linux is very simple: World Domination Stallman's goal for GNU is even simpler: Freedom
From: Uri Guttman on 5 Jul 2010 11:52 >>>>> "EP" == Eric Pozharski <whynot(a)pozharski.name> writes: EP> Please reconsider your 'always slower': try the pass by scalar reference method of read_file. and check out the much more comprehensive benchmark script that comes with the module. and that was also redone in an unreleased version you can find on git at perlhunter.com/git. for one thing it uses better names so you can see what the results mean. uri -- Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com -- ----- Perl Code Review , Architecture, Development, Training, Support ------ --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
From: brian d foy on 5 Jul 2010 14:31
In article <87mxu7j08z.fsf(a)quad.sysarch.com>, Uri Guttman <uri(a)StemSystems.com> wrote: > other than file::slurp not being in core (and it should be! :), there is > no reason to show the $/ = undef trick. That's a pretty big reason though. |