Prev: interface name file
Next: Setting up a XDebug debugging environment for PHP / WAMP / Eclipse PDT
From: "Gary ." on 8 Jul 2010 13:52 On 7/8/10, Richard Quadling wrote: > On 8 July 2010 16:15, Gary wrote: >> Okay. At least one of the problems with this so called HTML seems to >> be that the body tag looks like >> <BODY vlink=#ffffff ...> >> and xml_parse complains that "> required" on that line (i.e. it is >> claiming it can't find the end of the tag!). >> >> I'm guessing that those attributes "must" be quoted in XML and >> "should" be in HTML (but patently aren't)? Is there any way to get >> xml_parse to ignore that? My element_handler functions never even get >> a chance to see that line. >> >> Regex to insert quotes or remove the attributes entirely, perhaps? >> *gulp* I hope there's a better way than that. > > So. Essentially, you want to parse some plain text which may or may > not be well formed XML. No. I don't *want* to.... And it isn't plain text, it's just sh*t html (no doctype, missing closing tags in some cases, etc. It's an absolute mess). Browsers are pretty good at handling it. XML parsers... less so. > How badly formed is the file going to be? It's not a file. It comes from an embedded web server on a device. I could ask them to change it. I can hear the laughter already. > If it is things like missing ", then this could be managed with regex. > Essentially you are going to have to do the clean up that Tidy could > do for you. Yeah :(
From: "Gary ." on 8 Jul 2010 13:55 On 7/8/10, Marc Guay wrote: >> And yes, I'd rather use DOM, but I can't. > > Could you use this: http://simplehtmldom.sourceforge.net/? Interesting. Although I can't use DOM or Tidy (because they're normally built in, but TPTB decided to recompile PHP and exclude them, and I am not allowed to recompile it with them in), that's external so might be a possibility. Thanks.
From: "Gary ." on 8 Jul 2010 13:57 On 7/8/10, Nisse Engström wrote: > On Thu, 8 Jul 2010 17:15:02 +0200, "Gary ." wrote: >> I'm guessing that those attributes "must" be quoted in XML and >> "should" be in HTML (but patently aren't)? > > For that attribute value, it's a "must" in both cases. Okay. Please tell L******! :)
From: Richard Quadling on 8 Jul 2010 14:55 On 8 July 2010 18:55, Gary . <php-general(a)garydjones.name> wrote: > On 7/8/10, Marc Guay wrote: >>> And yes, I'd rather use DOM, but I can't. >> >> Could you use this: http://simplehtmldom.sourceforge.net/? > > Interesting. > > Although I can't use DOM or Tidy (because they're normally built in, > but TPTB decided to recompile PHP and exclude them, and I am not > allowed to recompile it with them in), that's external so might be a > possibility. > > Thanks. If it were windows, then the Tidy extension is loadable via php.ini. You could ask TPTB why they've removed the only tool that can read this sh*t with any success? Make the case for it. If they still say no, then tell them that the sh*t is NOT XML and therefore the XML tools won't read it.
First
|
Prev
|
Pages: 1 2 3 Prev: interface name file Next: Setting up a XDebug debugging environment for PHP / WAMP / Eclipse PDT |