From: Tom Lane on
Andrew Dunstan <andrew(a)dunslane.net> writes:
>> http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items

> I have just been looking at the xmlconcat bug on that list. I can't
> think of any better solution than parsing the resulting string to make
> sure it is well-formed before we return,

That might be a reasonable thing to do as a safety check, but I can't
escape the feeling that what this fundamentally is is a data typing
error, traceable to the lack of differentiation between xml documents
and xml fragments. Is there a way to attack it based on saying that the
inputs can't be documents, or stripping the document overhead if they are?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andrew Dunstan on


Tom Lane wrote:
> Andrew Dunstan <andrew(a)dunslane.net> writes:
>
>>> http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items
>>>
>
>
>> I have just been looking at the xmlconcat bug on that list. I can't
>> think of any better solution than parsing the resulting string to make
>> sure it is well-formed before we return,
>>
>
> That might be a reasonable thing to do as a safety check, but I can't
> escape the feeling that what this fundamentally is is a data typing
> error, traceable to the lack of differentiation between xml documents
> and xml fragments. Is there a way to attack it based on saying that the
> inputs can't be documents, or stripping the document overhead if they are?
>

Yeah, maybe. According to
<http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html> the only
legal child of an XML Document node that is not also a legal child of a
DocumentFragment node is a DocumentType node. So we could probably just
look for one of those in each argument node and strip it out. That
should be fairly lightweight in the common case where it's not present -
we'd just be searching for a fixed string. Removing it if found would be
more complex. We'd have to parse the node to remove it, since a legal
DocumentType node string could appear legally inside a CDATA node.

That has the advantage that it would fix the error rather than failing,
but I'm slightly nervous about silently mangling user supplied XML. I
guess we do that in a few other cases to make other combinations
function sanely.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Peter Eisentraut on
On sön, 2010-03-21 at 13:07 -0400, Andrew Dunstan wrote:
> Yeah, maybe. According to
> <http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html> the only
> legal child of an XML Document node that is not also a legal child of a
> DocumentFragment node is a DocumentType node. So we could probably just
> look for one of those in each argument node and strip it out. That
> should be fairly lightweight in the common case where it's not present -
> we'd just be searching for a fixed string. Removing it if found would be
> more complex. We'd have to parse the node to remove it, since a legal
> DocumentType node string could appear legally inside a CDATA node.

According to the SQL/XML standard, the document type declaration should
apparently be stripped when doing a concatenation. (This makes sense
because the result of a concatenation can never be valid according to a
DTD.)

But if we are not comfortable about being able to do that safely, I
would be OK with just raising an error if a concatenation is attempted
where one value contains a DTD. The impact in practice should be low.


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andrew Dunstan on


Peter Eisentraut wrote:
> On sön, 2010-03-21 at 13:07 -0400, Andrew Dunstan wrote:
>
>> Yeah, maybe. According to
>> <http://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html> the only
>> legal child of an XML Document node that is not also a legal child of a
>> DocumentFragment node is a DocumentType node. So we could probably just
>> look for one of those in each argument node and strip it out. That
>> should be fairly lightweight in the common case where it's not present -
>> we'd just be searching for a fixed string. Removing it if found would be
>> more complex. We'd have to parse the node to remove it, since a legal
>> DocumentType node string could appear legally inside a CDATA node.
>>
>
> According to the SQL/XML standard, the document type declaration should
> apparently be stripped when doing a concatenation. (This makes sense
> because the result of a concatenation can never be valid according to a
> DTD.)
>
> But if we are not comfortable about being able to do that safely, I
> would be OK with just raising an error if a concatenation is attempted
> where one value contains a DTD. The impact in practice should be low.
>

Right. Can you find a way to do that using the libxml API? I haven't
managed to, and I'm pretty sure I can construct XML that fails every
simple string search test I can think of, either with a false negative
or a false positive.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Peter Eisentraut on
On mån, 2010-03-22 at 19:38 -0400, Andrew Dunstan wrote:
> > But if we are not comfortable about being able to do that safely, I
> > would be OK with just raising an error if a concatenation is
> attempted
> > where one value contains a DTD. The impact in practice should be
> low.
> >
>
> Right. Can you find a way to do that using the libxml API? I haven't
> managed to, and I'm pretty sure I can construct XML that fails every
> simple string search test I can think of, either with a false negative
> or a false positive.

The documentation on that is terse as usual. In any case, you will need
to XML parse the input values, and so you might as well resort to
parsing the output value to see if it is well-formed, which should catch
this mistake and possibly others.


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers