From: Santosh Turamari on 4 Nov 2009 23:29 Hi, I am using Sanitize.clean(), for freeing contents from html tags, but the difficulty is I want to preserve some of the tags from removing.. I have given like this. html = File.new(file).read soup = BeautifulSoup.new(html) soup.title.contents=[''] soup.find_all.each do |tag| if tag.string!= nil tag.contents = ['<strong>'+tag.contents.to_s+'</strong>'] if (tag['style'] =~ /bold/) tag.contents = ['<em>'+tag.contents.to_s+'</em>'] if (tag['style'] =~ /italic/) tag.contents = ['<u>'+tag.contents.to_s+'</u>'] if (tag['style'] =~ /underline/) end end soup_string = str_replace(soup.html.to_s) return Sanitize.clean(soup_string.to_s, :elements => ['div','p','span','center','table','tr','th','td','blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt','em','i', 'li', 'ol','pre', 'q', 'small', 'strike','strong', 'sub','sup', 'u', 'ul','tbody']), but the problem is that I want to preserver the center and right justifications also, which is not happening if I give 'center' here. If any body know how to preserve justifications pls help me. Thanks In Advance, Santosh Jun Young Kim wrote: > you can also use ruby library Sanitize (http://wonko.com/post/sanitize) > > This library can make you parse html template very easily. > > let's see the following examples. > > Using Sanitize is easy. First, install it: > sudo gem install sanitize > > Then call it like so: > > require 'rubygems' > require 'sanitize' > > html = '<b><a href="http://foo.com/">foo</a></b><img > src="http://foo.com/bar.jpg > " />' > > Sanitize.clean(html) # => 'foo' > > By default, Sanitize removes all HTML. You can use one of the built-in > configs to tell Sanitize to allow certain attributes and elements: > > Sanitize.clean(html, Sanitize::Config::RESTRICTED) > # => '<b>foo</b>' > > Sanitize.clean(html, Sanitize::Config::BASIC) > # => '<b><a href="http://foo.com/" rel="nofollow">foo</a></b>' > > Sanitize.clean(html, Sanitize::Config::RELAXED) > # => '<b><a href="http://foo.com/">foo</a></b><img > src="http://foo.com/bar.jpg > " />' > > Or, if you��d like more control over what��s allowed, you can provide > your own custom configuration: > > Sanitize.clean(html, :elements => ['a', 'span'], > :attributes => {'a' => ['href', 'title'], 'span' => ['class']}, > :protocols => {'a' => {'href' => ['http', 'https', 'mailto']}}) > > good one :) > > 2009. 01. 02, ���� 6:42, Vivek Netha �ۼ�: -- Posted via http://www.ruby-forum.com/.
|
Pages: 1 Prev: ruby1.9.1 : Override string method Next: Unable to work on pop up in selenium-rc |