Hpricot Relative Path [Ruby]

Prev: Ruby Regexp implementation?
Next: [ANN] The Compleat Rubyist / June 18-19 2010

From: Josh Cheek on 12 Mar 2010 16:22

[Note: parts of this message were removed to make it a legal post.]

I'm trying to write a script that pulls out an image from a yfrog page

So this is what I have

require 'rubygems'
require 'hpricot'
require 'open-uri'

url = 'http://yfrog.com/03gssacj'
doc = Hpricot(open(url))

(doc%"#main_image").attributes['src'] # => "/img3/7036/gssac.jpg"

The problem is that the path is relative.
I've done a little googling, queried my ruby and rails ML archives, glanced
at hpricot code, and looked through the method lists for open-uri and
hpricot.
So far, I don't see anything that looks very useful.
Is there a way to have it give me the absolute path so that I can reference
the picture later?

The only thing I've found that works so far involves string manipulation,
which seems like a brittle workaround to replace something that probably
exists if I could just find it.

url = 'http://yfrog.com/03gssacj'
page = open(url)
base = page.base_uri.to_s[ /(?:http:\/\/)?[^\/]*\// ] # => "
http://img3.yfrog.com/"
relative = (Hpricot(page)%"#main_image").attributes['src'] # =>
"/img3/7036/gssac.jpg"
absolute = URI.join( base , relative )
absolute.to_s # => "http://img3.yfrog.com/img3/7036/gssac.jpg"

Anyone know of a better solution?

From: Ben Bleything on 12 Mar 2010 16:54

On Fri, Mar 12, 2010 at 1:22 PM, Josh Cheek <josh.cheek(a)gmail.com> wrote:
> The problem is that the path is relative.
> I've done a little googling, queried my ruby and rails ML archives, glanced
> at hpricot code, and looked through the method lists for open-uri and
> hpricot.
> So far, I don't see anything that looks very useful.
> Is there a way to have it give me the absolute path so that I can reference
> the picture later?

Hpricot is just telling you what's in the HTML. Munging the
document's contents are your responsibility, not the parser's :)

> The only thing I've found that works so far involves string manipulation,
> which seems like a brittle workaround to replace something that probably
> exists if I could just find it.

Look into the URI library.

require 'uri'

uri = URI.parse( "http://yfrog.com/03gssacj" )
uri.path = # your hpricot magic to get the image path goes here

Ben

From: Josh Cheek on 12 Mar 2010 18:23

[Note: parts of this message were removed to make it a legal post.]

On Fri, Mar 12, 2010 at 3:54 PM, Ben Bleything <ben(a)bleything.net> wrote:

> On Fri, Mar 12, 2010 at 1:22 PM, Josh Cheek <josh.cheek(a)gmail.com> wrote:
> > The problem is that the path is relative.
> > I've done a little googling, queried my ruby and rails ML archives,
> glanced
> > at hpricot code, and looked through the method lists for open-uri and
> > hpricot.
> > So far, I don't see anything that looks very useful.
> > Is there a way to have it give me the absolute path so that I can
> reference
> > the picture later?
>
> Hpricot is just telling you what's in the HTML. Munging the
> document's contents are your responsibility, not the parser's :)
>
> > The only thing I've found that works so far involves string manipulation,
> > which seems like a brittle workaround to replace something that probably
> > exists if I could just find it.
>
> Look into the URI library.
>
> require 'uri'
>
> uri = URI.parse( "http://yfrog.com/03gssacj" )
> uri.path = # your hpricot magic to get the image path goes here
>
> Ben
>
>
Thanks, this is what I am using now:

page = open url
image_path = URI.parse page.base_uri.to_s.sub( %r(/$) , '' )
image_path.path = (Hpricot(page)%"#main_image").attributes['src']
image_path.to_s

It still seems a little excessive, but it's a lot better than what I had
before.

|
Pages: 1
Prev: Ruby Regexp implementation?
Next: [ANN] The Compleat Rubyist / June 18-19 2010