From: Josh Cheek on 12 Mar 2010 16:22 [Note: parts of this message were removed to make it a legal post.] I'm trying to write a script that pulls out an image from a yfrog page So this is what I have require 'rubygems' require 'hpricot' require 'open-uri' url = 'http://yfrog.com/03gssacj' doc = Hpricot(open(url)) (doc%"#main_image").attributes['src'] # => "/img3/7036/gssac.jpg" The problem is that the path is relative. I've done a little googling, queried my ruby and rails ML archives, glanced at hpricot code, and looked through the method lists for open-uri and hpricot. So far, I don't see anything that looks very useful. Is there a way to have it give me the absolute path so that I can reference the picture later? The only thing I've found that works so far involves string manipulation, which seems like a brittle workaround to replace something that probably exists if I could just find it. url = 'http://yfrog.com/03gssacj' page = open(url) base = page.base_uri.to_s[ /(?:http:\/\/)?[^\/]*\// ] # => " http://img3.yfrog.com/" relative = (Hpricot(page)%"#main_image").attributes['src'] # => "/img3/7036/gssac.jpg" absolute = URI.join( base , relative ) absolute.to_s # => "http://img3.yfrog.com/img3/7036/gssac.jpg" Anyone know of a better solution?
From: Ben Bleything on 12 Mar 2010 16:54 On Fri, Mar 12, 2010 at 1:22 PM, Josh Cheek <josh.cheek(a)gmail.com> wrote: > The problem is that the path is relative. > I've done a little googling, queried my ruby and rails ML archives, glanced > at hpricot code, and looked through the method lists for open-uri and > hpricot. > So far, I don't see anything that looks very useful. > Is there a way to have it give me the absolute path so that I can reference > the picture later? Hpricot is just telling you what's in the HTML. Munging the document's contents are your responsibility, not the parser's :) > The only thing I've found that works so far involves string manipulation, > which seems like a brittle workaround to replace something that probably > exists if I could just find it. Look into the URI library. require 'uri' uri = URI.parse( "http://yfrog.com/03gssacj" ) uri.path = # your hpricot magic to get the image path goes here Ben
From: Josh Cheek on 12 Mar 2010 18:23 [Note: parts of this message were removed to make it a legal post.] On Fri, Mar 12, 2010 at 3:54 PM, Ben Bleything <ben(a)bleything.net> wrote: > On Fri, Mar 12, 2010 at 1:22 PM, Josh Cheek <josh.cheek(a)gmail.com> wrote: > > The problem is that the path is relative. > > I've done a little googling, queried my ruby and rails ML archives, > glanced > > at hpricot code, and looked through the method lists for open-uri and > > hpricot. > > So far, I don't see anything that looks very useful. > > Is there a way to have it give me the absolute path so that I can > reference > > the picture later? > > Hpricot is just telling you what's in the HTML. Munging the > document's contents are your responsibility, not the parser's :) > > > The only thing I've found that works so far involves string manipulation, > > which seems like a brittle workaround to replace something that probably > > exists if I could just find it. > > Look into the URI library. > > require 'uri' > > uri = URI.parse( "http://yfrog.com/03gssacj" ) > uri.path = # your hpricot magic to get the image path goes here > > Ben > > Thanks, this is what I am using now: page = open url image_path = URI.parse page.base_uri.to_s.sub( %r(/$) , '' ) image_path.path = (Hpricot(page)%"#main_image").attributes['src'] image_path.to_s It still seems a little excessive, but it's a lot better than what I had before.
|
Pages: 1 Prev: Ruby Regexp implementation? Next: [ANN] The Compleat Rubyist / June 18-19 2010 |