From: BingYU on
Hi experts

I want to use htmlparse lib to convert html lines to text, so that it
is more easy to write regexp.

To make things simple, I would convert it line by line, here we go(all
in the tcl shell)

tcl% puts $line
<tr align=left><td colspan=2>Symbol</td><td colspan=4>AUDUSD
(Australian Dollar vs. United States Dollar)</td></tr>

tcl% ::htmlparse::parse $line

% ::htmlparse::parse $line
==> hmstart {} {} {}
==> tr {} align=left {}
==> td {} colspan=2 Symbol
==> td / {} {}
==> td {} colspan=4 {AUDUSD (Australian Dollar vs. United States
Dollar)}
==> td / {} {}
==> tr / {} {}
==> hmstart / {} {}


How can I puts catch the output lines into list/array?
The documents said we can build tree structure? can you give practical
example.

How can we get this line "td {} colspan=2 Symbol"
and "Symbol"

Thank you



From: George Petasis on
στις 14/2/2010 10:44, O/H BingYU έγραψε:
> Hi experts
>
> I want to use htmlparse lib to convert html lines to text, so that it
> is more easy to write regexp.
>
> To make things simple, I would convert it line by line, here we go(all
> in the tcl shell)
>
> tcl% puts $line
> <tr align=left><td colspan=2>Symbol</td><td colspan=4>AUDUSD
> (Australian Dollar vs. United States Dollar)</td></tr>
>
> tcl% ::htmlparse::parse $line
>
> % ::htmlparse::parse $line
> ==> hmstart {} {} {}
> ==> tr {} align=left {}
> ==> td {} colspan=2 Symbol
> ==> td / {} {}
> ==> td {} colspan=4 {AUDUSD (Australian Dollar vs. United States
> Dollar)}
> ==> td / {} {}
> ==> tr / {} {}
> ==> hmstart / {} {}
>
>
> How can I puts catch the output lines into list/array?
> The documents said we can build tree structure? can you give practical
> example.
>
> How can we get this line "td {} colspan=2 Symbol"
> and "Symbol"
>
> Thank you
>
>
>

proc getText {{strip_white_space 0}} {
package require htmlparse
set ::html2text {}
if {$strip_white_space} {
::htmlparse::parse -cmd \
[list accumulateTextStripWhiteSpaceCallback] $text
} else {
::htmlparse::parse -cmd [list accumulateTextCallback] $text
}
return $html2text
};# getText
proc accumulateTextCallback {tag slash param textBehindTheTag} {
append ::html2text $textBehindTheTag
};# accumulateTextCallback
proc accumulateTextStripWhiteSpaceCallback {tag slash param

textBehindTheTag} {
append ::html2text [string trim $textBehindTheTag]
};# accumulateTextStripWhiteSpaceCallback

George
From: Andreas Kupries on
BingYU <yubingem(a)gmail.com> writes:

> Hi experts
>
> I want to use htmlparse lib to convert html lines to text, so that it
> is more easy to write regexp.

> The documents said we can build tree structure?

Yes. See
http://docs.activestate.com/activetcl/8.5/tcllib/htmlparse/htmlparse.html#4

for the ::htmlparse::2tree command, and

http://docs.activestate.com/activetcl/8.5/tcllib/struct/struct_tree.html

for the tree objects you need.

For a code example see the 'parse' procedure in the file

examples/oreilly-oscon2001/oscon

of the Tcllib sources (available on SourceForge), or a release archive.

--
So long,
Andreas Kupries <akupries(a)shaw.ca>
<http://www.purl.org/NET/akupries/>
Developer @ <http://www.activestate.com/>
-------------------------------------------------------------------------------