Prev: I think I found a bug in Python 2.6.4 (in the inspect module)
Next: Solved: TypeError: startView() takes exactly 1 argument (3 given)
From: Brian D on 30 Dec 2009 11:00 I'm actually using mechanize, but that's too complicated for testing purposes. Instead, I've simulated in a urllib2 sample below an attempt to test for a valid URL request. I'm attempting to craft a loop that will trap failed attempts to request a URL (in cases where the connection intermittently fails), and repeat the URL request a few times, stopping after the Nth attempt is tried. Specifically, in the example below, a bad URL is requested for the first and second iterations. On the third iteration, a valid URL will be requested. The valid URL will be requested until the 5th iteration, when a break statement is reached to stop the loop. The 5th iteration also restores the values to their original state for ease of repeat execution. What I don't understand is how to test for a valid URL request, and then jump out of the "while True" loop to proceed to another line of code below the loop. There's probably faulty logic in this approach. I imagine I should wrap the URL request in a function, and perhaps store the response as a global variable. This is really more of a basic Python logic question than it is a urllib2 question. Any suggestions? Thanks, Brian import urllib2 user_agent = 'Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/ 2009042316 Firefox/3.0.10' user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: 1.9.0.16) ' \ 'Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)' headers={'User-Agent':user_agent,} url = 'http://this is a bad url' count = 0 while True: count += 1 try: print 'attempt ' + str(count) request = urllib2.Request(url, None, headers) response = urllib2.urlopen(request) if response: print 'True response.' if count == 5: count = 0 url = 'http://this is a bad url' print 'How do I get out of this thing?' break except: print 'fail ' + str(count) if count == 3: url = 'http://www.google.com'
From: samwyse on 30 Dec 2009 12:06 On Dec 30, 10:00 am, Brian D <brianden...(a)gmail.com> wrote: > What I don't understand is how to test for a valid URL request, and > then jump out of the "while True" loop to proceed to another line of > code below the loop. There's probably faulty logic in this approach. I > imagine I should wrap the URL request in a function, and perhaps store > the response as a global variable. > > This is really more of a basic Python logic question than it is a > urllib2 question. There, I've condensed your question to what you really meant to say. You have several approaches. First, let's define some useful objects: >>> max_attempts = 5 >>> def do_something(i): assert 2 < i < 5 Getting back to original question, if you want to limit the number of attempts, don't use a while, use this: >>> for count in xrange(max_attempts): print 'attempt', count+1 do_something(count+1) attempt 1 Traceback (most recent call last): File "<pyshell#55>", line 3, in <module> do_something(count+1) File "<pyshell#47>", line 2, in do_something assert 2 < i < 5 AssertionError If you want to keep exceptions from ending the loop prematurely, you add this: >>> for count in xrange(max_attempts): print 'attempt', count+1 try: do_something(count+1) except StandardError: pass Note that bare except clauses are *evil* and should be avoided. Most exceptions derive from StandardError, so trap that if you want to catch errors. Finally, to stop iterating when the errors cease, do this: >>> try: for count in xrange(max_attempts): print 'attempt', count+1 try: do_something(count+1) raise StopIteration except StandardError: pass except StopIteration: pass attempt 1 attempt 2 attempt 3 Note that StopIteration doesn't derive from StandardError, because it's not an error, it's a notification. So, throw it if and when you want to stop iterating. BTW, note that you don't have to wrap your code in a function. do_something could be replaced with it's body and everything would still work.
From: Brian D on 30 Dec 2009 13:18 On Dec 30, 11:06 am, samwyse <samw...(a)gmail.com> wrote: > On Dec 30, 10:00 am, Brian D <brianden...(a)gmail.com> wrote: > > > What I don't understand is how to test for a valid URL request, and > > then jump out of the "while True" loop to proceed to another line of > > code below the loop. There's probably faulty logic in this approach. I > > imagine I should wrap the URL request in a function, and perhaps store > > the response as a global variable. > > > This is really more of a basic Python logic question than it is a > > urllib2 question. > > There, I've condensed your question to what you really meant to say. > You have several approaches. First, let's define some useful objects:>>> max_attempts = 5 > >>> def do_something(i): > > assert 2 < i < 5 > > Getting back to original question, if you want to limit the number of > attempts, don't use a while, use this: > > >>> for count in xrange(max_attempts): > > print 'attempt', count+1 > do_something(count+1) > > attempt 1 > Traceback (most recent call last): > File "<pyshell#55>", line 3, in <module> > do_something(count+1) > File "<pyshell#47>", line 2, in do_something > assert 2 < i < 5 > AssertionError > > If you want to keep exceptions from ending the loop prematurely, you > add this: > > >>> for count in xrange(max_attempts): > > print 'attempt', count+1 > try: > do_something(count+1) > except StandardError: > pass > > Note that bare except clauses are *evil* and should be avoided. Most > exceptions derive from StandardError, so trap that if you want to > catch errors. Finally, to stop iterating when the errors cease, do > this: > > >>> try: > > for count in xrange(max_attempts): > print 'attempt', count+1 > try: > do_something(count+1) > raise StopIteration > except StandardError: > pass > except StopIteration: > pass > > attempt 1 > attempt 2 > attempt 3 > > Note that StopIteration doesn't derive from StandardError, because > it's not an error, it's a notification. So, throw it if and when you > want to stop iterating. > > BTW, note that you don't have to wrap your code in a function. > do_something could be replaced with it's body and everything would > still work. I'm totally impressed. I love elegant code. Could you tell I was trained as a VB programmer? I think I can still be reformed. I appreciate the admonition not to use bare except clauses. I will avoid that in the future. I've never seen StopIteration used -- and certainly not used in combination with a try/except pair. That was an exceptionally valuable lesson. I think I can take it from here, so I'll just say thank you, Sam, for steering me straight -- very nice.
From: MRAB on 30 Dec 2009 13:46 Brian D wrote: > On Dec 30, 11:06 am, samwyse <samw...(a)gmail.com> wrote: >> On Dec 30, 10:00 am, Brian D <brianden...(a)gmail.com> wrote: >> >>> What I don't understand is how to test for a valid URL request, and >>> then jump out of the "while True" loop to proceed to another line of >>> code below the loop. There's probably faulty logic in this approach. I >>> imagine I should wrap the URL request in a function, and perhaps store >>> the response as a global variable. >>> This is really more of a basic Python logic question than it is a >>> urllib2 question. >> There, I've condensed your question to what you really meant to say. >> You have several approaches. First, let's define some useful objects:>>> max_attempts = 5 >>>>> def do_something(i): >> assert 2 < i < 5 >> >> Getting back to original question, if you want to limit the number of >> attempts, don't use a while, use this: >> >>>>> for count in xrange(max_attempts): >> print 'attempt', count+1 >> do_something(count+1) >> >> attempt 1 >> Traceback (most recent call last): >> File "<pyshell#55>", line 3, in <module> >> do_something(count+1) >> File "<pyshell#47>", line 2, in do_something >> assert 2 < i < 5 >> AssertionError >> >> If you want to keep exceptions from ending the loop prematurely, you >> add this: >> >>>>> for count in xrange(max_attempts): >> print 'attempt', count+1 >> try: >> do_something(count+1) >> except StandardError: >> pass >> >> Note that bare except clauses are *evil* and should be avoided. Most >> exceptions derive from StandardError, so trap that if you want to >> catch errors. Finally, to stop iterating when the errors cease, do >> this: >> >>>>> try: >> for count in xrange(max_attempts): >> print 'attempt', count+1 >> try: >> do_something(count+1) >> raise StopIteration >> except StandardError: >> pass >> except StopIteration: >> pass >> >> attempt 1 >> attempt 2 >> attempt 3 >> >> Note that StopIteration doesn't derive from StandardError, because >> it's not an error, it's a notification. So, throw it if and when you >> want to stop iterating. >> >> BTW, note that you don't have to wrap your code in a function. >> do_something could be replaced with it's body and everything would >> still work. > > I'm totally impressed. I love elegant code. Could you tell I was > trained as a VB programmer? I think I can still be reformed. > > I appreciate the admonition not to use bare except clauses. I will > avoid that in the future. > > I've never seen StopIteration used -- and certainly not used in > combination with a try/except pair. That was an exceptionally valuable > lesson. > > I think I can take it from here, so I'll just say thank you, Sam, for > steering me straight -- very nice. Instead of raising StopIteration you could use 'break': for count in xrange(max_attempts): print 'attempt', count + 1 try: do_something(count + 1) break except StandardError: pass The advantage, apart from the length, is that you can then add the 'else' clause to the 'for' loop, which will be run if it _didn't_ break out of the loop. If you break out only after do_something() is successful, then not breaking out means that do_something() never succeeded: for count in xrange(max_attempts): print 'attempt', count + 1 try: do_something(count + 1) break except StandardError: pass else: print 'all attempts failed'
From: Brian D on 30 Dec 2009 19:25
Thanks MRAB as well. I've printed all of the replies to retain with my pile of essential documentation. To follow up with a complete response, I'm ripping out of my mechanize module the essential components of the solution I got to work. The main body of the code passes a URL to the scrape_records function. The function attempts to open the URL five times. If the URL is opened, a values dictionary is populated and returned to the calling statement. If the URL cannot be opened, a fatal error is printed and the module terminates. There's a little sleep call in the function to leave time for any errant connection problem to resolve itself. Thanks to all for your replies. I hope this helps someone else: import urllib2, time from mechanize import Browser def scrape_records(url): maxattempts = 5 br = Browser() user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: 1.9.0.16) Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)' br.addheaders = [('User-agent', user_agent)] for count in xrange(maxattempts): try: print url, count br.open(url) break except urllib2.URLError: print 'URL error', count # Pretend a failed connection was fixed if count == 2: url = 'http://www.google.com' time.sleep(1) pass else: print 'Fatal URL error. Process terminated.' return None # Scrape page and populate valuesDict valuesDict = {} return valuesDict url = 'http://badurl' valuesDict = scrape_records(url) if valuesDict == None: print 'Failed to retrieve valuesDict' |