From: Newb Newb on
I Need to Extract Img tag Using Regular Expressions From The Html Page
<\s*img [^\>]*src\s*=\s*(["\'])(.*?)\1
Is This Code Would be ok

Can Any One Say Me Some Other regexp For Img Tag Extracing?
--
Posted via http://www.ruby-forum.com/.

From: Lex Williams on
Newb Newb wrote:
> I Need to Extract Img tag Using Regular Expressions From The Html Page
> <\s*img [^\>]*src\s*=\s*(["\'])(.*?)\1
> Is This Code Would be ok
>
> Can Any One Say Me Some Other regexp For Img Tag Extracing?


Instead of using a regular expression you could consider a html parser ,
and/or do a xpath search to retrieve images. Check hpricot .
--
Posted via http://www.ruby-forum.com/.

From: Thomas Wieczorek on
On Thu, Aug 21, 2008 at 12:50 PM, Lex Williams <etaern(a)yahoo.com> wrote:
>
> Instead of using a regular expression you could consider a html parser ,
> and/or do a xpath search to retrieve images. Check hpricot .
>

Yeah, it is quite easy with Hpricot:

require 'open-uri'
require 'hpricot'

site = Hpricot(open("http://code.google.com/edu/submissions/SedgewickWayne/index.html"))
site.search("//img") #=> returns an array of all images

From: Newb Newb on
Thomas Wieczorek wrote:
> On Thu, Aug 21, 2008 at 12:50 PM, Lex Williams <etaern(a)yahoo.com> wrote:
>>
>> Instead of using a regular expression you could consider a html parser ,
>> and/or do a xpath search to retrieve images. Check hpricot .
>>
>
> Yeah, it is quite easy with Hpricot:
>
> require 'open-uri'
> require 'hpricot'
>
> site =
> Hpricot(open("http://code.google.com/edu/submissions/SedgewickWayne/index.html"))
> site.search("//img") #=> returns an array of all images



yes i used as this
doc = Hpricot.parse(item.description)
imgs = doc.search("//img")
@src_array = imgs.collect{|img|img.attributes["src"]}

but it gives only the Image Url's but I need to Get
<img src =" "> tag Fully ...
Any Helps
--
Posted via http://www.ruby-forum.com/.

From: Jan Pilz on
Newb Newb schrieb:
> Thomas Wieczorek wrote:
>
>> On Thu, Aug 21, 2008 at 12:50 PM, Lex Williams <etaern(a)yahoo.com> wrote:
>>
>>> Instead of using a regular expression you could consider a html parser ,
>>> and/or do a xpath search to retrieve images. Check hpricot .
>>>
>>>
>> Yeah, it is quite easy with Hpricot:
>>
>> require 'open-uri'
>> require 'hpricot'
>>
>> site =
>> Hpricot(open("http://code.google.com/edu/submissions/SedgewickWayne/index.html"))
>> site.search("//img") #=> returns an array of all images
>>
>
>
>
> yes i used as this
> doc = Hpricot.parse(item.description)
> imgs = doc.search("//img")
> @src_array = imgs.collect{|img|img.attributes["src"]}
>
> but it gives only the Image Url's but I need to Get
> <img src =" "> tag Fully ...
> Any Helps
>
Then do

@src_array = imgs.collect{|img| "<img src =\"#{img.attributes["src"]
}\">" }

?


--
Otto Software Partner GmbH

Jan Pilz (e-mail: Jan.Pilz(a)osp-dd.de)

Tel. 0351/49723202, Fax: 0351/49723119
01067 Dresden, Freiberger Straße 35 - AG Dresden, HRB 2475
Geschäftsführer: Burkhard Arrenberg, Heinz A. Bade, Jens Gruhl