Prev: [RELEASED] Python 2.7 alpha 3
Next: How to print all expressions that match a regular expression
From: Lawrence D'Oliveiro on 8 Feb 2010 05:19 In message <4b6fd672$0$6734$9b4e6d93(a)newsspool2.arcor-online.net>, Stefan Behnel wrote: > Jim, 06.02.2010 20:09: > >> I generate some HTML and I want to include in my unit tests a check >> for syntax. So I am looking for a program that will complain at any >> syntax irregularities. > > First thing to note here is that you should consider switching to an HTML > generation tool that does this automatically. I think that's what he's writing.
From: Stefan Behnel on 8 Feb 2010 05:36 Lawrence D'Oliveiro, 08.02.2010 11:19: > In message <4b6fd672$0$6734$9b4e6d93(a)newsspool2.arcor-online.net>, Stefan > Behnel wrote: > >> Jim, 06.02.2010 20:09: >> >>> I generate some HTML and I want to include in my unit tests a check >>> for syntax. So I am looking for a program that will complain at any >>> syntax irregularities. >> First thing to note here is that you should consider switching to an HTML >> generation tool that does this automatically. > > I think that's what he's writing. I don't read it that way. There's a huge difference between - generating HTML manually and validating (some of) it in a unit test and - generating HTML using a tool that guarantees correct HTML output the advantage of the second approach being that others have already done all the debugging for you. Stefan
From: Phlip on 8 Feb 2010 12:12 Stefan Behnel wrote: > I don't read it that way. There's a huge difference between > > - generating HTML manually and validating (some of) it in a unit test > > and > > - generating HTML using a tool that guarantees correct HTML output > > the advantage of the second approach being that others have already done > all the debugging for you. Anyone TDDing around HTML or XML should use or fork my assert_xml() (from django-test-extensions). The current version trivially detects a leading <html> tag and uses etree.HTML(xml); else it goes with the stricter etree.XML(xml). The former will not complain about the provided sample HTML. Sadly, the industry has such a legacy of HTML written in Notepad that well-formed (X)HTML will never be well-formed XML. My own action item here is to apply Stefan's parser_options suggestion to make the etree.HTML() stricter. However, a generator is free to produce arbitrarily restricted XML that avoids the problems with XHTML. It could, for example, push any Javascript that even dreams of using & instead of & out into .js files. So an assert_xml() hot-wired to process only XML - with the true HTML doctype - is still useful to TDD generated code, because its XPath reference will detect that you get the nodes you expect. -- Phlip http://c2.com/cgi/wiki?ZeekLand
From: Phlip on 8 Feb 2010 13:16 and the tweak is: parser = etree.HTMLParser(recover=False) return etree.HTML(xml, parser) That reduces tolerance. The entire assert_xml() is (apologies for wrapping lines!): def _xml_to_tree(self, xml): from lxml import etree self._xml = xml try: if '<html' in xml[:200]: # NOTE the condition COULD suck more! parser = etree.HTMLParser(recover=False) return etree.HTML(xml, parser) return etree.HTML(xml) else: return etree.XML(xml) except ValueError: # TODO don't rely on exceptions for normal control flow tree = xml self._xml = str(tree) # CONSIDER does this reconstitute the nested XML ? return tree def assert_xml(self, xml, xpath, **kw): 'Check that a given extent of XML or HTML contains a given XPath, and return its first node' tree = self._xml_to_tree(xml) nodes = tree.xpath(xpath) self.assertTrue(len(nodes) > 0, xpath + ' not found in ' + self._xml) node = nodes[0] if kw.get('verbose', False): self.reveal_xml(node) # "here have ye been? What have ye seen?"--Morgoth return node def reveal_xml(self, node): 'Spews an XML node as source, for diagnosis' from lxml import etree print etree.tostring(node, pretty_print=True) # CONSIDER does pretty_print work? why not? def deny_xml(self, xml, xpath): 'Check that a given extent of XML or HTML does not contain a given XPath' tree = self._xml_to_tree(xml) nodes = tree.xpath(xpath) self.assertEqual(0, len(nodes), xpath + ' should not appear in ' + self._xml)
From: Lawrence D'Oliveiro on 8 Feb 2010 16:39 In message <4b6fe93d$0$6724$9b4e6d93(a)newsspool2.arcor-online.net>, Stefan Behnel wrote: > - generating HTML using a tool that guarantees correct HTML output Where do you think these tools come from? They don't write themselves, you know.
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: [RELEASED] Python 2.7 alpha 3 Next: How to print all expressions that match a regular expression |