Prev: tcltk man pages on ubuntu
Next: Integer math problem
From: BingYU on 14 Feb 2010 03:44 Hi experts I want to use htmlparse lib to convert html lines to text, so that it is more easy to write regexp. To make things simple, I would convert it line by line, here we go(all in the tcl shell) tcl% puts $line <tr align=left><td colspan=2>Symbol</td><td colspan=4>AUDUSD (Australian Dollar vs. United States Dollar)</td></tr> tcl% ::htmlparse::parse $line % ::htmlparse::parse $line ==> hmstart {} {} {} ==> tr {} align=left {} ==> td {} colspan=2 Symbol ==> td / {} {} ==> td {} colspan=4 {AUDUSD (Australian Dollar vs. United States Dollar)} ==> td / {} {} ==> tr / {} {} ==> hmstart / {} {} How can I puts catch the output lines into list/array? The documents said we can build tree structure? can you give practical example. How can we get this line "td {} colspan=2 Symbol" and "Symbol" Thank you
From: George Petasis on 14 Feb 2010 05:38 στις 14/2/2010 10:44, O/H BingYU έγραψε: > Hi experts > > I want to use htmlparse lib to convert html lines to text, so that it > is more easy to write regexp. > > To make things simple, I would convert it line by line, here we go(all > in the tcl shell) > > tcl% puts $line > <tr align=left><td colspan=2>Symbol</td><td colspan=4>AUDUSD > (Australian Dollar vs. United States Dollar)</td></tr> > > tcl% ::htmlparse::parse $line > > % ::htmlparse::parse $line > ==> hmstart {} {} {} > ==> tr {} align=left {} > ==> td {} colspan=2 Symbol > ==> td / {} {} > ==> td {} colspan=4 {AUDUSD (Australian Dollar vs. United States > Dollar)} > ==> td / {} {} > ==> tr / {} {} > ==> hmstart / {} {} > > > How can I puts catch the output lines into list/array? > The documents said we can build tree structure? can you give practical > example. > > How can we get this line "td {} colspan=2 Symbol" > and "Symbol" > > Thank you > > > proc getText {{strip_white_space 0}} { package require htmlparse set ::html2text {} if {$strip_white_space} { ::htmlparse::parse -cmd \ [list accumulateTextStripWhiteSpaceCallback] $text } else { ::htmlparse::parse -cmd [list accumulateTextCallback] $text } return $html2text };# getText proc accumulateTextCallback {tag slash param textBehindTheTag} { append ::html2text $textBehindTheTag };# accumulateTextCallback proc accumulateTextStripWhiteSpaceCallback {tag slash param textBehindTheTag} { append ::html2text [string trim $textBehindTheTag] };# accumulateTextStripWhiteSpaceCallback George
From: Andreas Kupries on 14 Feb 2010 23:36 BingYU <yubingem(a)gmail.com> writes: > Hi experts > > I want to use htmlparse lib to convert html lines to text, so that it > is more easy to write regexp. > The documents said we can build tree structure? Yes. See http://docs.activestate.com/activetcl/8.5/tcllib/htmlparse/htmlparse.html#4 for the ::htmlparse::2tree command, and http://docs.activestate.com/activetcl/8.5/tcllib/struct/struct_tree.html for the tree objects you need. For a code example see the 'parse' procedure in the file examples/oreilly-oscon2001/oscon of the Tcllib sources (available on SourceForge), or a release archive. -- So long, Andreas Kupries <akupries(a)shaw.ca> <http://www.purl.org/NET/akupries/> Developer @ <http://www.activestate.com/> -------------------------------------------------------------------------------
|
Pages: 1 Prev: tcltk man pages on ubuntu Next: Integer math problem |