From: Santosh Turamari on
Hi,

I am using Sanitize.clean(), for freeing contents from html tags, but
the difficulty is I want to preserve some of the tags from removing.. I
have given like this.

html = File.new(file).read
soup = BeautifulSoup.new(html)
soup.title.contents=['']
soup.find_all.each do |tag|
if tag.string!= nil
tag.contents = ['<strong>'+tag.contents.to_s+'</strong>'] if
(tag['style'] =~ /bold/)
tag.contents = ['<em>'+tag.contents.to_s+'</em>'] if
(tag['style'] =~ /italic/)
tag.contents = ['<u>'+tag.contents.to_s+'</u>'] if
(tag['style'] =~ /underline/)
end
end
soup_string = str_replace(soup.html.to_s)

return Sanitize.clean(soup_string.to_s, :elements =>
['div','p','span','center','table','tr','th','td','blockquote', 'br',
'cite', 'code', 'dd', 'dl', 'dt','em','i', 'li', 'ol','pre', 'q',
'small', 'strike','strong', 'sub','sup', 'u', 'ul','tbody']),
but the problem is that I want to preserver the center and right
justifications also, which is not happening if I give 'center' here. If
any body know how to preserve justifications pls help me.

Thanks In Advance,
Santosh




Jun Young Kim wrote:
> you can also use ruby library Sanitize (http://wonko.com/post/sanitize)
>
> This library can make you parse html template very easily.
>
> let's see the following examples.
>
> Using Sanitize is easy. First, install it:
> sudo gem install sanitize
>
> Then call it like so:
>
> require 'rubygems'
> require 'sanitize'
>
> html = '<b><a href="http://foo.com/">foo</a></b><img
> src="http://foo.com/bar.jpg
> " />'
>
> Sanitize.clean(html) # => 'foo'
>
> By default, Sanitize removes all HTML. You can use one of the built-in
> configs to tell Sanitize to allow certain attributes and elements:
>
> Sanitize.clean(html, Sanitize::Config::RESTRICTED)
> # => '<b>foo</b>'
>
> Sanitize.clean(html, Sanitize::Config::BASIC)
> # => '<b><a href="http://foo.com/" rel="nofollow">foo</a></b>'
>
> Sanitize.clean(html, Sanitize::Config::RELAXED)
> # => '<b><a href="http://foo.com/">foo</a></b><img
> src="http://foo.com/bar.jpg
> " />'
>
> Or, if you��d like more control over what��s allowed, you can provide
> your own custom configuration:
>
> Sanitize.clean(html, :elements => ['a', 'span'],
> :attributes => {'a' => ['href', 'title'], 'span' => ['class']},
> :protocols => {'a' => {'href' => ['http', 'https', 'mailto']}})
>
> good one :)
>
> 2009. 01. 02, ���� 6:42, Vivek Netha �ۼ�:

--
Posted via http://www.ruby-forum.com/.