From: Robert Haas on
On Thu, Jul 1, 2010 at 12:25 PM, Mike Fowler <mike(a)mlfowler.com> wrote:
> Quoting Mike Fowler <mike(a)mlfowler.com>:
>
>> Should the IS DOCUMENT predicate support this? At the moment you get
>> the following:
>>
>> template1=# SELECT
>>
>> '<towns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns>'
>> �IS
>> DOCUMENT;
>> ?column?
>> ----------
>> t
>> (1 row)
>>
>> template1=# SELECT
>>
>> '<towns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns'
>> �IS
>> DOCUMENT;
>> ERROR: �invalid XML content
>> LINE 1: SELECT '<towns><town>Bidford-on-Avon</town><town>Cwmbran</to...
>> � � � � � � �^
>> DETAIL: �Entity: line 1: parser error : expected '>'
>>
>> owns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns
>>
>> � � �^
>> Entity: line 1: parser error : chunk is not well balanced
>>
>> owns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns
>>
>> � � �^
>> I would've hoped the second would've returned 'f' rather than failing.
>> I've had a glance at the XML/SQL standard and I don't see anything in
>> the detail of the predicate (8.2) that would specifically prohibit us
>> from changing this behavior, unless the common rule �'Parsing a string
>> as an XML value' (10.16) must always be in force. I'm no standard
>> expert, but IMHO this would be an acceptable change to improve
>> usability. What do others think?
>
> Right, I've answered my own question whilst sitting in the open source
> coding session at CHAR(10). Yes, IS DOCUMENT should return false for a
> non-well formed document, and indeed is coded to do such. However, the
> conversion to the xml type which happens before the underlying
> xml_is_document function is even called fails and exceptions out. I'll work
> on a patch to resolve this behavior such that IS DOCUMENT will give you the
> missing 'xml_is_well_formed' function.

I think the point if "IS DOCUMENT" is to distinguish a document:

<foo>some stuff<bar/><baz/></foo>

from a document fragment:

<bar/><baz/>

A document is allowed only one toplevel tag.

It'd be nice, I think, to have a function that tells you whether
something is legal XML without throwing an error if it isn't, but I
suspect that should be a separate function, rather than trying to jam
it into "IS DOCUMENT".

http://developer.postgresql.org/pgdocs/postgres/functions-xml.html#AEN15187

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Fri, Jul 9, 2010 at 4:06 PM, Peter Eisentraut <peter_e(a)gmx.net> wrote:
> On ons, 2010-07-07 at 16:37 +0100, Mike Fowler wrote:
>> Here's the patch to add the 'xml_is_well_formed' function.
>
> I suppose we should remove the function from contrib/xml2 at the same
> time.

Yep.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Mike Fowler on
Robert Haas wrote:
> On Fri, Jul 9, 2010 at 4:06 PM, Peter Eisentraut <peter_e(a)gmx.net> wrote:
>
>> On ons, 2010-07-07 at 16:37 +0100, Mike Fowler wrote:
>>
>>> Here's the patch to add the 'xml_is_well_formed' function.
>>>
>> I suppose we should remove the function from contrib/xml2 at the same
>> time.
>>
>
> Yep

Revised patch deleting the contrib/xml2 version of the function attached.

Regards,

--
Mike Fowler
Registered Linux user: 379787

From: Thom Brown on
On 10 July 2010 14:12, Mike Fowler <mike(a)mlfowler.com> wrote:
> Robert Haas wrote:
>>
>> On Fri, Jul 9, 2010 at 4:06 PM, Peter Eisentraut <peter_e(a)gmx.net> wrote:
>>
>>>
>>> On ons, 2010-07-07 at 16:37 +0100, Mike Fowler wrote:
>>>
>>>>
>>>> Here's the patch to add the 'xml_is_well_formed' function.
>>>>
>>>
>>> I suppose we should remove the function from contrib/xml2 at the same
>>> time.
>>>
>>
>> Yep
>
> Revised patch deleting the contrib/xml2 version of the function attached.
>
> Regards,
>
> --
> Mike Fowler
> Registered Linux user: 379787
>
sql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Would a test for mismatched or undefined namespaces be necessary?

For example:

Mismatched namespace:
<pg:foo xmlns:pg="http://postgresql.org/stuff">bar</my:foo>

Undefined namespace when used in conjunction with IS DOCUMENT:
<pg:foo xmlns:my="http://postgresql.org/stuff">bar</pg:foo>

Also, having a look at the following example from the patch:
SELECT xml_is_well_formed('<local:data
xmlns:local="http://127.0.0.1";><local:piece id="1">number
one</local:piece><local:piece id="2" /></local:data>');
xml_is_well_formed
--------------------
t
(1 row)

Just wondering about that semi-colon after the namespace definition.

Thom

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Thom Brown on
On 12 July 2010 13:07, Mike Fowler <mike(a)mlfowler.com> wrote:
> Thom Brown wrote:
>>
>> Just wondering about that semi-colon after the namespace definition.
>>
>> Thom
>>
>
> The semi-colon is not supposed to be there, and I'm not sure where it's come
> from. With Thunderbird I see the email with my patch as an attachement,
> downloaded and viewing the file there are no instances of a " followed by a
> ;. However, if I look at the message on the archive at
> http://archives.postgresql.org/message-id/4C3871C2.8000605(a)mlfowler.com I
> can see every URL that ends with a " has �a ; following it. Should I be
> escaping the " in the patch file in some way or this just an artifact of
> HTML parsing a patch?

Yeah, I guess it's a parsing issue related to the archive viewer. I
arrived there from the commitfest page and should have really looked
directly at the patch. No problem there then I guess.

Thanks for the work you've done on this. :)

Thom

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers