Prev: Inserting/Deleting newline(s) in very large text files
Next: EXOR or symmetric difference for the Counter class
From: John Nagle on 12 Aug 2010 16:17 I'm reading a URL which is a .gz file, and decompressing it. This works, but it seems far too complex. Yet none of the "wrapping" you might expect to work actually does. You can't wrap a GzipFile around an HTTP connection, because GzipFile, reasonably enough, needs random access, and tries to do "seek" and "tell". Nor is the output descriptor from gzip general; it fails on "readline", but accepts "read". (No good reason for that.) So I had to make a second copy. John Nagle def readurl(url) : if url.endswith(".gz") : nd = urllib2.urlopen(url,timeout=TIMEOUTSECS) td1 = tempfile.TemporaryFile() # compressed file td1.write(nd.read()) # fetch and copy file nd.close() # done with network td2 = tempfile.TemporaryFile() # decompressed file td1.seek(0) # rewind gd = gzip.GzipFile(fileobj=td1, mode="rb") # wrap unzip td2.write(gd.read()) # decompress file td1.close() # done with compressed copy td2.seek(0) # rewind return(td2) # return file object for compressed object else : return(urllib2.urlopen(url,timeout=TIMEOUTSECS)) |