From: Andrew Dunstan on


Peter Eisentraut wrote:
> On mån, 2010-03-22 at 19:38 -0400, Andrew Dunstan wrote:
>
>>> But if we are not comfortable about being able to do that safely, I
>>> would be OK with just raising an error if a concatenation is
>>>
>> attempted
>>
>>> where one value contains a DTD. The impact in practice should be
>>>
>> low.
>>
>>>
>>>
>> Right. Can you find a way to do that using the libxml API? I haven't
>> managed to, and I'm pretty sure I can construct XML that fails every
>> simple string search test I can think of, either with a false negative
>> or a false positive.
>>
>
> The documentation on that is terse as usual. In any case, you will need
> to XML parse the input values, and so you might as well resort to
> parsing the output value to see if it is well-formed, which should catch
> this mistake and possibly others.
>
>

Actually, I have come to the conclusion that the biggest problem in this
area is that we accept XML documents with a leading DOCTYPE node at all.
Our docs state:

The xml type can store well-formed "documents", as defined by the
XML standard, as well as "content" fragments, which are defined by
the production XMLDecl? content in the XML standard.

A document with a leading DOCTYPE node matches neither of these rules,
and when we strip the XMLDecl from a piece of XML where it's followed by
a DOCTYPE node we turn something that is legal XML into something that
isn't, even by our own (or possibly the standard's) relaxed definition.
A doctypedecl can only follow an XMLDecl, see
<http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-prolog-dtd>.

So I think we need to go back to the drawing board a bit, rather than
patch a particular reported error case. But these problems are not at
all new to 9.0, and coming up to beta as I hope we are is not the time
for it. I think it will have to wait to 9.1.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Peter Eisentraut on
On ons, 2010-03-24 at 14:51 -0400, Andrew Dunstan wrote:
> Actually, I have come to the conclusion that the biggest problem in
> this
> area is that we accept XML documents with a leading DOCTYPE node at
> all.
> Our docs state:
>
> The xml type can store well-formed "documents", as defined by the
> XML standard, as well as "content" fragments, which are defined by
> the production XMLDecl? content in the XML standard.
>
> A document with a leading DOCTYPE node matches neither of these
> rules,
> and when we strip the XMLDecl from a piece of XML where it's followed
> by
> a DOCTYPE node we turn something that is legal XML into something
> that
> isn't, even by our own (or possibly the standard's) relaxed
> definition.
> A doctypedecl can only follow an XMLDecl, see
> <http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-prolog-dtd>.

Our version of SQL/XML support references SQL:2003 which references XML
1.0, where omitting the XMLDecl is legal. You can't omit the XMLDecl in
XML 1.1, because you need it to communicate the fact that it's version
1.1.

But note that that is correctly supported:

=# select xmlconcat('<?xml version="1.0"?><foo/>', '<?xml
version="1.0"?><bar/>');
xmlconcat
--------------
<foo/><bar/>

and

=# select xmlconcat('<?xml version="1.1"?><foo/>', '<?xml
version="1.1"?><bar/>');
xmlconcat
-----------------------------------
<?xml version="1.1"?><foo/><bar/>



--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andrew Dunstan on


Peter Eisentraut wrote:
> Our version of SQL/XML support references SQL:2003 which references XML
> 1.0, where omitting the XMLDecl is legal. You can't omit the XMLDecl in
> XML 1.1, because you need it to communicate the fact that it's version
> 1.1.
>
>
>

Hmm. OK. Well here is a patch that tries to fix the xmlconcat error,
anyway. It seems to work, but maybe could stand a little tightening.

cheers

andrew
From: Tom Lane on
Andrew Dunstan <andrew(a)dunslane.net> writes:
> Hmm. OK. Well here is a patch that tries to fix the xmlconcat error,
> anyway. It seems to work, but maybe could stand a little tightening.

I liked your previous idea (rethink the whole mess in 9.1) better.

As far as the patch itself is concerned, the complete lack of error
checks seems scary, and I wonder whether the case sensitivity and
lack of whitespace tolerance in the string comparisons is OK.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andrew Dunstan on


Tom Lane wrote:
> Andrew Dunstan <andrew(a)dunslane.net> writes:
>
>> Hmm. OK. Well here is a patch that tries to fix the xmlconcat error,
>> anyway. It seems to work, but maybe could stand a little tightening.
>>
>
> I liked your previous idea (rethink the whole mess in 9.1) better.
>
> As far as the patch itself is concerned, the complete lack of error
> checks seems scary,

Yes, this wasn't intended as the final patch. If it's not wanted right
now, that's fine too. I just wanted to get it on the record as possibly
something useful when we do come to reconsider the whole mess. Getting
to grips with the libxml2 API is no fun, and it's better not to have to
repeat it if possible ;-)

> and I wonder whether the case sensitivity and
> lack of whitespace tolerance in the string comparisons is OK.
>
>
>

The tokens were chosen with some care to be such that no whitespace
tolerance would be needed (or correct). XML is case sensitive, so that's
not an issue either.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers