Prev: piped open and shell metacharacters
Next: FAQ 8.27 What's wrong with using backticks in a void context?
From: Thomas Andersson on 30 Jul 2010 20:33 Hmm, been playing around a bit and gotten further than I had thought. I open a file and read in the next webpage to be processed (a id number) and set up the page count to 1 (each ID to process can have any number of pages). I create my URL from page count and current ID (pid) The idea I have is that it will loop as long as there is a page to grab by increasing the page count (this plan was flawed I realised though, but that's another problem). As it is now it keeps grabbing the same page over and over thousands of times (creating new files for each loop). #Create URL for sid list from pid and page count. my $pcnt = 1; my $page = get "http://csr.wwiionline.com/scripts/services/persona/sorties.jsp?page=$pcnt&pid=$pid"; while ($page) { if ($page) { print "Site is alive\n"; } else { print "Site is not accessible\n"; }; #Create filename and write file, then save grabbed webpage into it. open FILE, ">", "c:\\scr\\$pid-pg$pcnt.txt" or die $!; print FILE $page; $pcnt += 1; }; I guess the URL doesn't get updated by the increased pagecount, any suggestions on how to fix that part?
From: Sherm Pendley on 30 Jul 2010 21:33 "Thomas Andersson" <thomas(a)tifozi.net> writes: > As it is now it keeps grabbing the same page over and over thousands of > times (creating new files for each loop). Not quite - the get() is outside of the loop, so it's grabbing the page only once, and saving it over and over. > #Create URL for sid list from pid and page count. > my $pcnt = 1; I'd put the "base" URL in a separate variable, to avoid repetition: my $base = 'http://csr.wwiionline.com/scripts/services/persona/sorties.jsp'; > my $page = get > "http://csr.wwiionline.com/scripts/services/persona/sorties.jsp?page=$pcnt&pid=$pid"; So, using the "base" url, this would become: my $page = get "$base?page=$pcnt&pid=$pid"; > while ($page) { The if() is redundant here; if $page is false, the while() will exit and the if() won't be reached. > print "Site is alive\n"; > #Create filename and write file, then save grabbed webpage into it. > open FILE, ">", "c:\\scr\\$pid-pg$pcnt.txt" or die $!; You can use forward slashes on Windows too - it's only the command shell (aka "DOS Box") that requires backslashes. Also, it's a good idea to include the filename you're trying to open when reporting an error, because that can help you figure out why it failed. my $outfile = "c:/scr/$pid-pg$pcnt.txt"; open FILE, ">", $outfile or die "Could not open $outfile: $!"; > print FILE $page; > $pcnt += 1; Now that you've updated $pcnt, you need to fetch the next page and store it in $page. $page = get "$base?page=$pcnt&pid=$pid"; > }; > > I guess the URL doesn't get updated by the increased pagecount Right. When you interpolate a variable into a string, it's a one-time deal. The current value of the interpolated variable is used, but no long-lasting relationship exists between them, so the string is not updated when the interpolated variable's value changes. For example, this will print the same thing ten times: #!/usr/bin/perl use warnings; use strict; my $num = 0; my $string = "Num: $num\n"; for $num (1 .. 10) { print $string; } Compare that with this, where a new value is assigned to $string each time around the loop: #!/usr/bin/perl use warnings; use strict; for my $num (1 .. 10) { my $string = "Num: $num\n"; print $string; } sherm-- -- Sherm Pendley <www.shermpendley.com> <www.camelbones.org> Cocoa Developer
From: Ben Morrow on 30 Jul 2010 21:35 Quoth "Thomas Andersson" <thomas(a)tifozi.net>: > Hmm, been playing around a bit and gotten further than I had thought. > I open a file and read in the next webpage to be processed (a id number) and > set up the page count to 1 (each ID to process can have any number of > pages). > I create my URL from page count and current ID (pid) > The idea I have is that it will loop as long as there is a page to grab by > increasing the page count (this plan was flawed I realised though, but > that's another problem). > As it is now it keeps grabbing the same page over and over thousands of > times (creating new files for each loop). > > #Create URL for sid list from pid and page count. > my $pcnt = 1; > my $page = get > "http://csr.wwiionline.com/scripts/services/persona/sorties.jsp?page=$pcnt&pid=$pid"; This happens once, before the loop, when $pcnt = 1. > while ($page) { > if ($page) { > print "Site is alive\n"; > } > else { > print "Site is not accessible\n"; > }; > > #Create filename and write file, then save grabbed webpage into it. > open FILE, ">", "c:\\scr\\$pid-pg$pcnt.txt" or die $!; This happens every time around the loop, with different values of $pcnt. > print FILE $page; > $pcnt += 1; > }; > > I guess the URL doesn't get updated by the increased pagecount, any > suggestions on how to fix that part? You seem to be expecting Perl variables to act like macros; they don't. If you want to recreate the URL and re-fetch the new page every time you go round the loop, you need the 'my $page = get...' line *inside* the loop. Also: get into the habit, now, of keeping you filehandles in proper variables. It will make life easier later. open my $FILE, ">", "..." or ...; Ben
From: Thomas Andersson on 30 Jul 2010 23:07 Sherm Pendley wrote: > I'd put the "base" URL in a separate variable, to avoid repetition: > my $base = > 'http://csr.wwiionline.com/scripts/services/persona/sorties.jsp'; Excellent idea, just realised that the links I will collect from the page also uses the same base. Yhanks for the examples, helps me a lot! >> while ($page) { > The if() is redundant here; if $page is false, the while() will exit > and the if() won't be reached. Sorry, didn't quite get what you were saying here? One problem I've realised that kinda breaks this is that if you just up the page count it will never fail and exit as you just keep getting empty sortie pages back witha ever higher page number. (there's a string "No more sorties found" on them though that I guess could be detected and used to exit the loop). > You can use forward slashes on Windows too - it's only the command > shell (aka "DOS Box") that requires backslashes. Also, it's a good > idea to include the filename you're trying to open when reporting an > error, because that can help you figure out why it failed. Ah, didn't realize, good to know, will definitely follow your suggestion (might as well pick up good habbits early on). Thanks for your good advice, I really apreciate it (and will likely come back time and again for more ;) ). Best Wishes Thomas
From: Thomas Andersson on 30 Jul 2010 23:13
> Also: get into the habit, now, of keeping you filehandles in proper > variables. It will make life easier later. > > open my $FILE, ">", "..." or ...; Will definitely try to pick up good habbits on coding and formatting so thanks for advice. But if I createa variable of the filehandler like this, won't it contain the filepath then, so when I do the print $FILE it will print the filepath instead of the content of the file as I want? Or am I missunderstanding? (quite likely). Best Wishes Thomas |