Prev: piped open and shell metacharacters
Next: FAQ 8.27 What's wrong with using backticks in a void context?
From: J�rgen Exner on 30 Jul 2010 23:14 "Thomas Andersson" <thomas(a)tifozi.net> wrote: >As it is now it keeps grabbing the same page over and over thousands of >times (creating new files for each loop). > >my $pcnt = 1; >my $page = get >"http://csr.wwiionline.com/scripts/services/persona/sorties.jsp?page=$pcnt&pid=$pid"; >while ($page) { > if ($page) { > print "Site is alive\n"; > } > else { > print "Site is not accessible\n"; > }; > >#Create filename and write file, then save grabbed webpage into it. >open FILE, ">", "c:\\scr\\$pid-pg$pcnt.txt" or die $!; >print FILE $page; >$pcnt += 1; >}; > >I guess the URL doesn't get updated by the increased pagecount, any >suggestions on how to fix that part? It may or it may not. Had you used better indentation then you might have spotted that your get() is outside of the loop, therefore it is executed only once, therefore the value of $page never changes, and therefore of course your loop never terminates because the loop condition will always be the same value as in the first test. jue
From: Sherm Pendley on 31 Jul 2010 00:08 "Thomas Andersson" <thomas(a)tifozi.net> writes: > Sherm Pendley wrote: > >>> while ($page) { >> The if() is redundant here; if $page is false, the while() will exit >> and the if() won't be reached. > > Sorry, didn't quite get what you were saying here? You had originally written something like this: while ($page) { if ($page) { # do stuff } else { } } Since the while() loop repeats only if $page evaluates to a true value, you don't need to check $page again with an if(). If $page is false, the body of the loop will not execute at all, so by the time you reach the line that the if() is on, you already know that $page is true. So, the if() block will always run, and the else block never will; that being the case, it's simpler to just omit the if(): while ($page) { # do stuff } Note that while() only checks its condition *once* before repeating its block of code. So you can't omit the if(), if the value of $page might get changed inside the while(), before reaching the if(): while ($page) { # code that might change $page # check $page again, because it might have been changed, and # the while() loop won't check again until the next time we get # to the top of the loop if ($page) { # do stuff } } sherm-- -- Sherm Pendley <www.shermpendley.com> <www.camelbones.org> Cocoa Developer
From: Thomas Andersson on 31 Jul 2010 07:48 Sherm Pendley wrote: > You had originally written something like this: > > while ($page) { > if ($page) { > # do stuff > } else { > } > } > > Since the while() loop repeats only if $page evaluates to a true > value, you don't need to check $page again with an if(). If $page is > false, the body of the loop will not execute at all, so by the time > you reach the line that the if() is on, you already know that $page > is true. So, the if() block will always run, and the else block never > will; that being the case, it's simpler to just omit the if(): Ah, I realized that afterwards while looking over the code. That if/then bit was a leftover from a example script I found and is now gone as it serves no purpose in my script. Next thing I need to add is a check for the exit conditions. Thinking about using $page as condition might be a bad idea, how about it checking for a signal variable to be set? Inside the loop code would run untill my exit conditions are meet and then it sets the signal variable telling the loop to end? The two conditions would be finding a either of two strings within the captured page (either a sid we already know or the string "No more sorties").
From: Thomas Andersson on 31 Jul 2010 08:14 Uri Guttman wrote: >>>>>> "TA" == Thomas Andersson <thomas(a)tifozi.net> writes: > so you need to put some conditionals in the loop. first, how would you > know when the pages are done? can you look for a link to the next page > and exit the loop if it isn't there? then define what a 'processed > link' is. keep track (likely in a hash) of processed links and if you > find one exit the loop. exiting a loop is easy, use the last function. They've been quite helpfull there as the empty pages contain the string "No more sorties". The other condition is trickier, I need to load a variable at the same time as the pid that tells the last processed sid, when that sid is found no further pages needs to be loaded (the whole point of capturing these list pages is so we can extract all sids we find in them for further processing). > use less comment. make your comments mean something outside the > code. code is what, comments are why. and you are writing code to be > read by a maintainer. always keep that person in your mind and your > code will be better for it. Well, I only started learning perl a day ago and the comments are mostly for my own sake to remind me what I'm doing as most of this stuff is still pretty voodoo to me. > have you ever heard of white space? jamming lines of code together > makes major migraines when reading it. loosen up a little. blank > lines between sections is a good idea. Rodger that, will do. >> open PIDLIST, "<", $pidfile or die "Could not open $pidfile: $!"; >> my $pid = <PIDLIST>; >> print $pid; # print just so we know we have a pid to process. > > comments on the code line are a poor idea in most cases. when they are > long comments it is a horrible idea. OK, will stop doing that then. >> chomp $pid; # Remove endline from pid. > > again, you are telling us what you just did. redundant to anyone who > knows what chomp is. Ok, but as I said before, I'm learning and those comments are only for my own information to help me learn. Once it's done I can go over and remove all thsoe comments and put something more useful in. >> my $page = get "$pbase?page=$pcnt&pid=$pid"; >> while ($page) { > > bah. it is not clear why you are testing page in the loop. and you > have two duplicate lines with the get. make it an infinite loop and > exit when the get fails. Yeah, that's a big bug with my code and I know about it. The idea was to keep loading pages untill there was no more, but that idea failed as the server keeps serving empty pages with ever higher page numbers. Another solution for finding a loop ender is needed and I have two requirements that each should end it. >> # Create file for storing pages containing the sids. >> my $tmpf = "c:/scr/$pid.txt"; >> open TEMPF, ">>", $tmpf or die "Could not open $tmpf: $!"; >> print TEMPF $page; # Store grabbed webpage into the file > > you can do that with getstore or use File::Slurp's write_file (from > cpan). > > use File::Slurp ; > > write_file( "c:/scr/$pid.txt", $page ) ; > much easier to read. Definitely, so that one call replaces all 3 of my lines? Butwill I get a error message like prrevious if it fails? > here is a better loop: > > while( 1 ) { > > my $page = get "$pbase?page=$pcnt&pid=$pid"; > last unless $page ; > write_file( "c:/scr/$pid.txt", $page ) ; > } > > short, easy to read, easy to maintain. now you can add in the checks > for exiting the loop and it will be easier. Hmm, as I'm noob I don't quit get it, but I think it's allong the lines I mentioned in another message. I assume a non failure signals 1? and I need to set anything but inside the loop to exit it? But what do I set? it has no variable name?
From: Tad McClellan on 31 Jul 2010 09:08
Thomas Andersson <thomas(a)tifozi.net> wrote: > Uri Guttman wrote: >> here is a better loop: >> >> while( 1 ) { >> >> my $page = get "$pbase?page=$pcnt&pid=$pid"; >> last unless $page ; >> write_file( "c:/scr/$pid.txt", $page ) ; >> } >> >> short, easy to read, easy to maintain. now you can add in the checks >> for exiting the loop and it will be easier. > > Hmm, as I'm noob I don't quit get it, but I think it's allong the lines I > mentioned in another message. I assume No need to assume, just look it up in the docs for the function you are using: perldoc LWP::Simple The get() function will fetch the document identified by the given URL and return it. It returns "undef" if it fails. > a non failure signals 1? A non-failure stores the contents of the page in $page (a true value). A failure stores an undef in $page (a false value). You should probably avoid using the word "signal" unless you are talking about signals. That is, the term has a particular meaning to programmers: http://en.wikipedia.org/wiki/Signal_%28computing%29 > and I need > to set anything but inside the loop to exit it? No. $page will contain undef (false) when the get() fails. "unless" executes its statement when the condition is false. So, when get() fails, "last" is evaluated and the loop will be exited. -- Tad McClellan email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/" The above message is a Usenet post. I don't recall having given anyone permission to use it on a Web site. |