From: Mike Fowler on
Peter Eisentraut wrote:
> On l�r, 2010-07-03 at 09:26 +0100, Mike Fowler wrote:
>
>> What I will do
>> instead is implement the xml_is_well_formed function and get a patch
>> out in the next day or two.
>>
>
> That sounds very useful.
>
Here's the patch to add the 'xml_is_well_formed' function. Paraphrasing
the SGML the syntax is:

|xml_is_well_formed|(/text/)

The function |xml_is_well_formed| evaluates whether the /text/ is well
formed XML content, returning a boolean. I've done some tests (included
in the patch) with tables containing a mixture of well formed documents
and content and the function is happily returning the expected result.
Combining with IS (NOT) DOCUMENT is working nicely for pulling out
content or documents from a table of text.

Unless I missed something in the original correspondence, I think this
patch will solve the issue.

Regards,

--
Mike Fowler
Registered Linux user: 379787

From: Mike Fowler on
Thom Brown wrote:
> Would a test for mismatched or undefined namespaces be necessary?
>
> For example:
>
> Mismatched namespace:
> <pg:foo xmlns:pg="http://postgresql.org/stuff">bar</my:foo>
>
> Undefined namespace when used in conjunction with IS DOCUMENT:
> <pg:foo xmlns:my="http://postgresql.org/stuff">bar</pg:foo>
>

Thanks for looking at my patch Thom. I hadn't thought of that particular
scenario and even though I didn't specifically code for it, the
underlying libxml call does correctly reject the mismatched namespace:

template1=# SELECT xml_is_well_formed('<pg:foo xmlns:pg="http://postgresql.org/stuff">bar</my:foo>');
xml_is_well_formed
--------------------
f
(1 row)



In the attached patch I've added the example to the SGML documentation
and the regression tests.

> Also, having a look at the following example from the patch:
> SELECT xml_is_well_formed('<local:data
> xmlns:local="http://127.0.0.1";><local:piece id="1">number
> one</local:piece><local:piece id="2" /></local:data>');
> xml_is_well_formed
> --------------------
> t
> (1 row)
>
> Just wondering about that semi-colon after the namespace definition.
>
> Thom
>

The semi-colon is not supposed to be there, and I'm not sure where it's
come from. With Thunderbird I see the email with my patch as an
attachement, downloaded and viewing the file there are no instances of a
" followed by a ;. However, if I look at the message on the archive at
http://archives.postgresql.org/message-id/4C3871C2.8000605(a)mlfowler.com
I can see every URL that ends with a " has a ; following it. Should I
be escaping the " in the patch file in some way or this just an artifact
of HTML parsing a patch?

Regards,

--
Mike Fowler
Registered Linux user: 379787