From: Raymond Irving on 6 Jun 2010 22:39 Hello, I'm experiencing another issue when attempting to use DOMDocument::loadXML() to load the following HTML code: <?php $html = ' <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <body> <script type="text/javascript"> <!-- var i = 0, html = "<strong>Bold Text</strong>,Normal Text"; document.write(html); i--; // this line causes the parser to fail alert(html); --> </script> </body> </html>'; $dom = new DOMDocument(); $dom->loadXML($html); echo $dom->saveHTML(); ?> The parser throws the following error when it encounters "i--" in inside the <script> tag: Warning: DOMDocument::loadXML() [domdocument.loadxml]: Comment not terminated <!-- var i = 0, html = "<strong>Bold Text< in Entity If I remove the like "i--" it will load the HTML code just fine. Any ideas as to why this throws an error? __ Raymond
From: Adam Richardson on 7 Jun 2010 00:22 On Sun, Jun 6, 2010 at 10:39 PM, Raymond Irving <xwisdom(a)gmail.com> wrote: > Hello, > > I'm experiencing another issue when attempting to use > DOMDocument::loadXML() > to load the following HTML code: > > <?php > $html = ' > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " > http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> > <html> > <body> > <script type="text/javascript"> > <!-- > var i = 0, html = "<strong>Bold Text</strong>,Normal Text"; > document.write(html); > i--; // this line causes the parser to fail > alert(html); > --> > </script> > </body> > </html>'; > $dom = new DOMDocument(); > $dom->loadXML($html); > echo $dom->saveHTML(); > ?> > > The parser throws the following error when it encounters "i--" in inside > the > <script> tag: > > Warning: DOMDocument::loadXML() [domdocument.loadxml]: Comment not > terminated <!-- var i = 0, html = "<strong>Bold Text< in Entity > > If I remove the like "i--" it will load the HTML code just fine. > > Any ideas as to why this throws an error? > > __ > Raymond > A comment declaration starts with "<!", and ends with ">", with any number of comments following the form --comment-- in between: http://htmlhelp.com/reference/wilbur/misc/comment.html You'll see at the bottom of the article that they advocate a simple rule in comments: An HTML comment begins with "<!--", ends with "-->" and does not contain "--" or ">" anywhere in the comment. The occurrence of "i--" breaks that rule. In your case, if you're maintaining the pages, you can place the javascript in a separate file or place the javascript in a CDATA section. If you're parsing pages you don't maintain, you can rip out the javascript before performing DOM tasks and parse it separately as needed to avoid potential issues. Adam -- Nephtali: PHP web framework that functions beautifully http://nephtaliproject.com
From: Raymond Irving on 7 Jun 2010 15:30 Hi Adam, Thanks for the update but I'm thinking that it would be much easier if the DOM parser could just ignore the contents of the <script> tags when parsing HTML content. This way we would not have to out JavaScript or force uses to add JavaScript to a separate file. What do you think? __ Raymond Irving On Sun, Jun 6, 2010 at 11:22 PM, Adam Richardson <simpleshot(a)gmail.com>wrote: > On Sun, Jun 6, 2010 at 10:39 PM, Raymond Irving <xwisdom(a)gmail.com> wrote: > >> Hello, >> >> I'm experiencing another issue when attempting to use >> DOMDocument::loadXML() >> to load the following HTML code: >> >> <?php >> $html = ' >> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " >> http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> >> <html> >> <body> >> <script type="text/javascript"> >> <!-- >> var i = 0, html = "<strong>Bold Text</strong>,Normal Text"; >> document.write(html); >> i--; // this line causes the parser to fail >> alert(html); >> --> >> </script> >> </body> >> </html>'; >> $dom = new DOMDocument(); >> $dom->loadXML($html); >> echo $dom->saveHTML(); >> ?> >> >> The parser throws the following error when it encounters "i--" in inside >> the >> <script> tag: >> >> Warning: DOMDocument::loadXML() [domdocument.loadxml]: Comment not >> terminated <!-- var i = 0, html = "<strong>Bold Text< in Entity >> >> If I remove the like "i--" it will load the HTML code just fine. >> >> Any ideas as to why this throws an error? >> >> __ >> Raymond >> > > > A comment declaration starts with "<!", and ends with ">", with any number > of comments following the form --comment-- in between: > http://htmlhelp.com/reference/wilbur/misc/comment.html > > You'll see at the bottom of the article that they advocate a simple rule in > comments: > An HTML comment begins with "<!--", ends with "-->" and does not contain " > --" or ">" anywhere in the comment. > > The occurrence of "i--" breaks that rule. > > In your case, if you're maintaining the pages, you can place the javascript > in a separate file or place the javascript in a CDATA section. If you're > parsing pages you don't maintain, you can rip out the javascript before > performing DOM tasks and parse it separately as needed to avoid potential > issues. > > Adam > > -- > Nephtali: PHP web framework that functions beautifully > http://nephtaliproject.com >
From: Andrew Ballard on 7 Jun 2010 15:50 On Mon, Jun 7, 2010 at 3:30 PM, Raymond Irving <xwisdom(a)gmail.com> wrote: > Hi Adam, > > Thanks for the update but I'm thinking that it would be much easier if the > DOM parser could just ignore the contents of the <script> tags when parsing > HTML content. This way we would not have to out JavaScript or force uses to > add JavaScript to a separate file. > > What do you think? > > __ > Raymond Irving You didn't tell it to open the contents as HTML; you told it to open the contents as XML. Andrew
From: Raymond Irving on 8 Jun 2010 02:50 Well it actually failed when loadHTML() is used. The strange thing is that it will fail regardless of the "--" characters: "Unexpected end tag : strong in Entity" __ Raymond Irving On Mon, Jun 7, 2010 at 2:50 PM, Andrew Ballard <aballard(a)gmail.com> wrote: > On Mon, Jun 7, 2010 at 3:30 PM, Raymond Irving <xwisdom(a)gmail.com> wrote: > > Hi Adam, > > > > Thanks for the update but I'm thinking that it would be much easier if > the > > DOM parser could just ignore the contents of the <script> tags when > parsing > > HTML content. This way we would not have to out JavaScript or force uses > to > > add JavaScript to a separate file. > > > > What do you think? > > > > __ > > Raymond Irving > > You didn't tell it to open the contents as HTML; you told it to open > the contents as XML. > > Andrew >
|
Next
|
Last
Pages: 1 2 Prev: DOMDocument throws Unexpected end tag error when loading valid HTML Next: Security Issue |