Prev: check for file in $PATH
Next: FAQ 9.13 How do I edit my .htpasswd and .htgroup files with Perl?
From: pwaring on 15 Mar 2010 16:15 I'm using LWP::RobotUA to download a series of pages with the following code: #!/usr/bin/perl -w use strict; use LWP::RobotUA; my %options = ('agent' => 'crawler', 'show_progress' => 1, 'delay' => 10/60, 'from' => 'example(a)example.org'); my $ua = LWP::RobotUA->new(%options); my @all_urls = (array of liniks populated from elsewhere); foreach my $url (@all_urls) { $filename = "$url.html"; $ua->mirror($url, $filename); } } The problem is that LWP::RobotUA seems to make a GET request for the robots.txt file each time I call the mirror() method, even though all of the URLs are on the same domain. I'd expect the module to cache the file, either in memory or on disk, because it's highly unlikely to change between requests, but it doesn't seem to do so. Do I need to write my own cache module, or tack on an existing one from CPAN? I was hoping that calling mirror() would Just Work. Thanks in advance!
From: pwaring on 15 Mar 2010 16:18 On 15 Mar, 20:15, "pwar...(a)gmail.com" <pwar...(a)gmail.com> wrote: > foreach my $url (@all_urls) > { > $filename = "$url.html"; > $ua->mirror($url, $filename); > } > > } That second bracket shouldn't be there - I forgot to omit it when snipping the code down to just show the relevant parts.
|
Pages: 1 Prev: check for file in $PATH Next: FAQ 9.13 How do I edit my .htpasswd and .htgroup files with Perl? |