From: Chris Colbert on 13 Apr 2010 14:12 On Tue, Apr 13, 2010 at 1:58 PM, varnikat t <varnikat22(a)gmail.com> wrote: > > Hi, > Can anyone tell me how to get text from a html file?I am trying to display > the text of an html file in textview(of glade).If i directly display the > file,it shows with html tags and attributes, etc. in textview.I don't want > that.I just want the text. > Can someone help me with this? > > > Regards > Varnika Tewari > > -- > http://mail.python.org/mailman/listinfo/python-list > > You should look into beautiful soup http://www.crummy.com/software/BeautifulSoup/
From: Grant Edwards on 13 Apr 2010 14:22 On Tue, Apr 13, 2010 at 1:58 PM, varnikat t <varnikat22(a)gmail.com> wrote: > Can anyone tell me how to get text from a html file?I am trying to display > the text of an html file in textview(of glade).If i directly display the > file,it shows with html tags and attributes, etc. in textview.I don't want > that.I just want the text. [Parent article is unavailable on gmane, so my reply isn't quite in the right place in the tree] I generally just use something like this: Popen(['w3m','-dump',filename],stdout=PIPE).stdout.read() I'm sure there are more complex ways... -- Grant Edwards grant.b.edwards Yow! I'm having fun at HITCHHIKING to CINCINNATI gmail.com or FAR ROCKAWAY!!
From: Stefan Behnel on 13 Apr 2010 14:26 varnikat t, 13.04.2010 19:58: > Can anyone tell me how to get text from a html file?I am trying to display > the text of an html file in textview(of glade).If i directly display the > file,it shows with html tags and attributes, etc. in textview.I don't want > that.I just want the text. > Can someone help me with this? E.g. using lxml.html: import lxml.html as H html = H.parse("the_html_file.html") print H.tostring(html, method="text") Stefan
From: rake on 13 Apr 2010 20:45 On Apr 13, 2:12 pm, Chris Colbert <sccolb...(a)gmail.com> wrote: > On Tue, Apr 13, 2010 at 1:58 PM, varnikat t <varnika...(a)gmail.com> wrote: > > > Hi, > > Can anyone tell me how to get text from a html file?I am trying to display > > the text of an html file in textview(of glade).If i directly display the > > file,it shows with html tags and attributes, etc. in textview.I don't want > > that.I just want the text. > > Can someone help me with this? > > > Regards > > Varnika Tewari > > > -- > >http://mail.python.org/mailman/listinfo/python-list > > You should look into beautiful soup > > http://www.crummy.com/software/BeautifulSoup/ For more complex parsing beautiful soup is definitely the way to go. However, if all you want to do is strip the html and keep all remaining text I'd recommend pyparsing package with this short script: http://pyparsing.wikispaces.com/file/view/htmlStripper.py
From: Stefan Behnel on 14 Apr 2010 02:43 rake, 14.04.2010 02:45: > On Apr 13, 2:12 pm, Chris Colbert wrote: >> You should look into beautiful soup >> >> http://www.crummy.com/software/BeautifulSoup/ > > For more complex parsing beautiful soup is definitely the way to go. Why would a library that even the author has lost interest in be "the way to go"? Stefan
|
Next
|
Last
Pages: 1 2 Prev: Constructing an if statement from the client data in python Next: Python, CGI and Sqlite3 |