Prev: FAQ 6.22 What's wrong with using grep in a void context?
Next: determining whether a server supports secure authentication
From: PerlFAQ Server on 16 May 2010 00:00 This is an excerpt from the latest version perlfaq5.pod, which comes with the standard Perl distribution. These postings aim to reduce the number of repeated questions as well as allow the community to review and update the answers. The latest version of the complete perlfaq is at http://faq.perl.org . -------------------------------------------------------------------- 5.4: How do I delete the last N lines from a file? (contributed by brian d foy) The easiest conceptual solution is to count the lines in the file then start at the beginning and print the number of lines (minus the last N) to a new file. Most often, the real question is how you can delete the last N lines without making more than one pass over the file, or how to do it with a lot of copying. The easy concept is the hard reality when you might have millions of lines in your file. One trick is to use "File::ReadBackwards", which starts at the end of the file. That module provides an object that wraps the real filehandle to make it easy for you to move around the file. Once you get to the spot you need, you can get the actual filehandle and work with it as normal. In this case, you get the file position at the end of the last line you want to keep and truncate the file to that point: use File::ReadBackwards; my $filename = 'test.txt'; my $Lines_to_truncate = 2; my $bw = File::ReadBackwards->new( $filename ) or die "Could not read backwards in [$filename]: $!"; my $lines_from_end = 0; until( $bw->eof or $lines_from_end == $Lines_to_truncate ) { print "Got: ", $bw->readline; $lines_from_end++; } truncate( $filename, $bw->tell ); The "File::ReadBackwards" module also has the advantage of setting the input record separator to a regular expression. You can also use the "Tie::File" module which lets you access the lines through a tied array. You can use normal array operations to modify your file, including setting the last index and using "splice". -------------------------------------------------------------------- The perlfaq-workers, a group of volunteers, maintain the perlfaq. They are not necessarily experts in every domain where Perl might show up, so please include as much information as possible and relevant in any corrections. The perlfaq-workers also don't have access to every operating system or platform, so please include relevant details for corrections to examples that do not work on particular platforms. Working code is greatly appreciated. If you'd like to help maintain the perlfaq, see the details in perlfaq.pod.
From: sln on 16 May 2010 20:26 On Sun, 16 May 2010 04:00:02 GMT, PerlFAQ Server <brian(a)theperlreview.com> wrote: >This is an excerpt from the latest version perlfaq5.pod, which >comes with the standard Perl distribution. These postings aim to >reduce the number of repeated questions as well as allow the community >to review and update the answers. The latest version of the complete >perlfaq is at http://faq.perl.org . > >-------------------------------------------------------------------- > >5.4: How do I delete the last N lines from a file? > > (contributed by brian d foy) > > The easiest conceptual solution is to count the lines in the file then > start at the beginning and print the number of lines (minus the last N) > to a new file. > > Most often, the real question is how you can delete the last N lines > without making more than one pass over the file, or how to do it with a > lot of copying. The easy concept is the hard reality when you might have > millions of lines in your file. I believe, "or how to do it with a lot of copying." was meant to be "or how to do it without a lot of copying." And, I'm no so sure you're not conflating "making more than one pass over the file" with reading/writing the file more than one time. > > One trick is to use "File::ReadBackwards", which starts at the end of Is this really a trick? I can't remember if there is a truncate at file position primitive. If I take a guess one way, I would say this approach would work as fast as any: create a line stack, the size of N read each line, store line in stack, increment a counter when the counter equals N, drop the oldest line into a new file, newest line to stack. repeat until end of old file close new file delete old file rename new file to old viola, truncation -sln
From: Ralph Malph on 17 May 2010 12:43 On 5/16/2010 12:00 AM, PerlFAQ Server wrote: > This is an excerpt from the latest version perlfaq5.pod, which > comes with the standard Perl distribution. These postings aim to > reduce the number of repeated questions as well as allow the community > to review and update the answers. The latest version of the complete > perlfaq is at http://faq.perl.org . > > -------------------------------------------------------------------- > > 5.4: How do I delete the last N lines from a file? > > (contributed by brian d foy) > > The easiest conceptual solution is to count the lines in the file then > start at the beginning and print the number of lines (minus the last N) > to a new file. > > Most often, the real question is how you can delete the last N lines > without making more than one pass over the file, or how to do it with a > lot of copying. The easy concept is the hard reality when you might have > millions of lines in your file. > > One trick is to use "File::ReadBackwards", which starts at the end of > the file. That module provides an object that wraps the real filehandle > to make it easy for you to move around the file. Once you get to the > spot you need, you can get the actual filehandle and work with it as > normal. In this case, you get the file position at the end of the last > line you want to keep and truncate the file to that point: > > use File::ReadBackwards; > > my $filename = 'test.txt'; > my $Lines_to_truncate = 2; > > my $bw = File::ReadBackwards->new( $filename ) > or die "Could not read backwards in [$filename]: $!"; > > my $lines_from_end = 0; > until( $bw->eof or $lines_from_end == $Lines_to_truncate ) > { > print "Got: ", $bw->readline; > $lines_from_end++; > } > > truncate( $filename, $bw->tell ); > > The "File::ReadBackwards" module also has the advantage of setting the > input record separator to a regular expression. > > You can also use the "Tie::File" module which lets you access the lines > through a tied array. You can use normal array operations to modify your > file, including setting the last index and using "splice". Feeling bored I compared the code in the faq with some bash code that would achieve the same results. I also ran some generic perl that did basically the same thing as the shell script(code at bottom). The test file was named 'puke'. Contents are the integers 0 through 999999. 1 million rows total. The test is to excluded the last 10000 lines. perl 5.10.1 on cygwin. machine has 4gb ram. dual core Intel. Anyway, in this not really scientific test the faq method using Uri's File::ReadBackwards module is the winner. I suppose this is the expected result but I thought the shell code would be more competitive. $ time perl faq.pl > top_n-10000 real 0m0.219s user 0m0.093s sys 0m0.061s $ time cat puke | wc -l | xargs echo -10000 + | bc \ | xargs echo head puke -n | sh > top_n-10000 real 0m0.312s user 0m0.090s sys 0m0.121s $ time perl temp.pl > top_n-10000 real 0m0.858s user 0m0.701s sys 0m0.062s ----------------- temp.pl ----------------- use strict; use warnings; my $num_lines_exclude=10000; open(FH, '<', "puke") or die $!; my $line_count=0; while(<FH>){ $line_count++; } seek(FH, 0, 0); my $lines_to_read=$line_count-$num_lines_exclude; while($lines_to_read>0){ my $line=<FH>; print $line; $lines_to_read--; }
From: Willem on 17 May 2010 13:10 Ralph Malph wrote: ) Feeling bored I compared the code in the faq with ) some bash code that would achieve the same results. ) I also ran some generic perl that did basically the same ) thing as the shell script(code at bottom). ) The test file was named 'puke'. Contents are the integers 0 through ) 999999. 1 million rows total. The test is to excluded the last 10000 ) lines. perl 5.10.1 on cygwin. machine has 4gb ram. dual core Intel. ) Anyway, in this not really scientific test the faq method using ) Uri's File::ReadBackwards module is the winner. I suppose this is the ) expected result but I thought the shell code would be more ) competitive. Why ? AIUI, ReadBackwards never touches the beginning of the file, so that should clearly lead to a lot less disk I/O. I'm assuming te tests you ran may have had the file still in disk cache, though, so that would make the difference a lot less significant, but still ReadBackwards takes time proportional to the size of the removed bit, while the rest take time proportional to the size of the whole file. Have you also tried removing 10 lines from a million-line file ? And for giggles, you could try a hand-rolled one that uses the functions seek(), sysread() and truncate() to accomplish the job. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
From: Uri Guttman on 17 May 2010 13:34
>>>>> "W" == Willem <willem(a)turtle.stack.nl> writes: W> Have you also tried removing 10 lines from a million-line file ? W> And for giggles, you could try a hand-rolled one that uses the functions W> seek(), sysread() and truncate() to accomplish the job. ahem. that is what file::readbackward does! it may be possible to hand roll optimize it by removing some overhead, etc. but it was designed to be very fast. your earlier point about how much to remove or skip is the important one. truncating most of a large file will be slower but you still need to count lines from the end. since you don't need to read each line for this you could read large blocks, scan for newlines and count them and then truncate to the desired point. readbackwards has the overhead of splitting the blocks into lines and returning each one for counting. but you always need to read the part you are truncating if you are counting lines from the end. uri -- Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com -- ----- Perl Code Review , Architecture, Development, Training, Support ------ --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com --------- |