Prev: FAQ 2.3 I don't have a C compiler. How can I build my own Perl interpreter?
Next: FAQ 2.7 Is there an ISO or ANSI certified version of Perl?
From: Uri Guttman on 5 Jul 2010 15:28 >>>>> "bdf" == brian d foy <brian.d.foy(a)gmail.com> writes: bdf> In article <87mxu7j08z.fsf(a)quad.sysarch.com>, Uri Guttman bdf> <uri(a)StemSystems.com> wrote: >> other than file::slurp not being in core (and it should be! :), there is >> no reason to show the $/ = undef trick. bdf> That's a pretty big reason though. the sys/open followed by sysread and -s is faster and less obscure. you already show that. the undef $/ is just poor coding imo. at least comment on the various qualities of the methods. my comments on the mmap are on point - it doesn't save ram and only wins for random access. tie::file is ok for some things but for a simple read/modify/write it is just as simple and faster to slurp/mung/write. you can work on an array in both cases. one day i will release file::slurp with edit_file and edit_file_lines which will make that process even easier and faster. uri -- Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com -- ----- Perl Code Review , Architecture, Development, Training, Support ------ --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
From: Eric Pozharski on 6 Jul 2010 04:53 with <8739vy7wo4.fsf(a)quad.sysarch.com> Uri Guttman wrote: >>>>>> "EP" == Eric Pozharski <whynot(a)pozharski.name> writes: > > EP> Please reconsider your 'always slower': > > try the pass by scalar reference method of read_file. #!/usr/bin/perl use strict; use warnings; use Benchmark qw{ cmpthese timethese }; use File::Slurp; my $fn = '/etc/passwd'; cmpthese timethese -5, { code00 => sub { my $aa = read_file $fn; }, code01 => sub { my $aa = read_file $fn, scalar_ref => 1; }, }; __END__ Benchmark: running code00, code01 for at least 5 CPU seconds... code00: 6 wallclock secs ( 3.61 usr + 1.67 sys = 5.28 CPU) @ 33617.23/s (n=177499) code01: 6 wallclock secs ( 3.74 usr + 1.52 sys = 5.26 CPU) @ 33122.05/s (n=174222) Rate code01 code00 code01 33122/s -- -1% code00 33617/s 1% -- What? However... (s{/etc/passwd}{/boot/vmlinuz}) Benchmark: running code00, code01 for at least 5 CPU seconds... code00: 6 wallclock secs ( 1.57 usr + 3.86 sys = 5.43 CPU) @ 222.65/s (n=1209) code01: 6 wallclock secs ( 0.23 usr + 5.04 sys = 5.27 CPU) @ 319.92/s (n=1686) Rate code00 code01 code00 223/s -- -30% code01 320/s 44% -- That's pretty impressive. Or not? Look, if someone is going to play with B<read_file>'s options shouldn't he be going with B<sysread> instead? I hardly can imagine that someone would try to make B<read_file> to be as fast as possible instead of making slurping itself fast. > and check out the much more comprehensive benchmark script that comes > with the module. Yeah, cool stuff. Although I wasn't told beforehand to make terminal 250 columns wide. So it's still unreadable. > and that was also redone in an unreleased version you can find on git > at perlhunter.com/git. Concentrate. Talking about 'unreleased' is lame. -- Torvalds' goal for Linux is very simple: World Domination Stallman's goal for GNU is even simpler: Freedom
From: brian d foy on 23 Jul 2010 14:56 In article <87mxu7j08z.fsf(a)quad.sysarch.com>, Uri Guttman <uri(a)StemSystems.com> wrote: > i disagree with that last point. mmap always needs virtual ram allocated > for the entire file to be mapped. it only saves ram if you map part of > the file into a smaller virtual window. I haven't found that to be the case for program memory at least. If you copy parts of the file you have to copy, but > again, i disagree. you can easily benchmark slurping an array of lines > and looping vs line by line reading. Well, the tension there is the trade-off between space and memory. I could make that more clear I guess. I will look at some benchmarks, though, and see how that illuminates the situation.
From: Uri Guttman on 23 Jul 2010 15:27 >>>>> "bdf" == brian d foy <brian.d.foy(a)gmail.com> writes: bdf> In article <87mxu7j08z.fsf(a)quad.sysarch.com>, Uri Guttman bdf> <uri(a)StemSystems.com> wrote: >> i disagree with that last point. mmap always needs virtual ram allocated >> for the entire file to be mapped. it only saves ram if you map part of >> the file into a smaller virtual window. bdf> I haven't found that to be the case for program memory at least. If you bdf> copy parts of the file you have to copy, but mmap still needs space in the program. it may be allocated with malloc or even builtin these days (haven't used it directly in decades! :). now real ram could be saved but that is true for all virtual memory use. if you seek into the mmap space and only read/write parts, then the other sections won't be touched. so the issue comes down to random access vs processing a whole file. most uses of slurp are for processing a whole file so i would lean in that direction. someone sophisticated enough to use mmap directly for random access should know the resource usage issues. >> again, i disagree. you can easily benchmark slurping an array of lines >> and looping vs line by line reading. bdf> Well, the tension there is the trade-off between space and memory. I bdf> could make that more clear I guess. classic tradeoff. but again, these days almost all files you need to slurp are small relative to ram (real and virtual) sizes. a 1 MB file is nothing on a 1 GB system. but few text files are as big as 1MB. way back when, reading line by line was almost required due to ram constraints but ram size has way outgrown file size increases. i just want to change the prevailing view a bit. and as i have said some things are only doable when you have the full file in ram vs line by line. bdf> I will look at some benchmarks, though, and see how that illuminates bdf> the situation. a simple one is slurping a simple config file and doing a basic parse on it to make a hash. i have posted that code before. it would be easy to compare that to a line by line version of that. the slurp will blow it away as it does one s/// op and slurps the file. the line by line has to parse each line individually and also read in each line. more perl code and more perl guts code. uri -- Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com -- ----- Perl Code Review , Architecture, Development, Training, Support ------ --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
From: Uri Guttman on 23 Jul 2010 18:15
>>>>> "TW" == Tim Watts <tw(a)dionic.net> writes: TW> Uri Guttman <uri(a)StemSystems.com> TW> wibbled on Sunday 04 July 2010 06:15 >> i disagree with that last point. mmap always needs virtual ram allocated >> for the entire file to be mapped. it only saves ram if you map part of >> the file into a smaller virtual window. the win of mmap is that it won't >> do the i/o until you touch a section. so if you want random access to >> sections of a file, mmap is a big win. if you are going to just process >> the whole file, there isn't any real win over File::Slurp TW> I think it is worth some clarification - at least under linux: TW> mmap requires virtual address space, not RAM per se, for the TW> initial mmap. TW> Obviously as soon as you try to read any part of the file, those TW> blocks must be paged in to actual RAM pages. TW> However, if you then ignore those pages and have not modified TW> them, the LRU recovery sweeper can just drop those pages. but a slurped file in virtual ram behaves the same way. it may be swapped in when you read in the file and process it but as soon as that is done, and you free the scalar in perl, perl can reuse the space. the virtual ram can't be given back to the os but the real ram is reused. TW> Compare to if you slurp the file into some virtual RAM that's been malloc'd: TW> The RAM pages are all dirty (because you copied data into them) - TW> so if the system needs to reduce the working page set, it will TW> have to page those out to swap rather than just dropping them - it TW> no longer has the knowledge that they are in practise backed by TW> the original file. that is true. the readonly aspect of a mmap slurp is a win. but given the small sizes of most files slurped it isn't that large a win. today we have 4k or larger page sizes and many files are smaller than that. ram and vram are cheap as hell so fighting for each byte is a long lost art that needs to die. :) uri -- Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com -- ----- Perl Code Review , Architecture, Development, Training, Support ------ --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com --------- |