From: Robert Haas on 28 Mar 2010 17:19 On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams <joeyadams3.14159(a)gmail.com> wrote: > I'm wondering whether the internal representation of JSON should be > plain JSON text, or some binary code that's easier to traverse and > whatnot. For the sake of code size, just keeping it in text is > probably best. +1 for text. > Now my thoughts and opinions on the JSON parsing/unparsing itself: > > It should be built-in, rather than relying on an external library > (like XML does). Why? I'm not saying you aren't right, but you need to make an argument rather than an assertion. This is a community, so no one is entitled to decide anything unilaterally, and people want to be convinced - including me. > As far as character encodings, I'd rather keep that out of the JSON > parsing/serializing code itself and assume UTF-8. Wherever I'm wrong, > I'll just throw encode/decode/validate operations at it. I think you need to assume that the encoding will be the server encoding, not UTF-8. Although others on this list are better qualified to speak to that than I am. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Andrew Dunstan on 28 Mar 2010 17:42 Robert Haas wrote: > On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams > <joeyadams3.14159(a)gmail.com> wrote: > >> I'm wondering whether the internal representation of JSON should be >> plain JSON text, or some binary code that's easier to traverse and >> whatnot. For the sake of code size, just keeping it in text is >> probably best. >> > > +1 for text. > Agreed. > >> Now my thoughts and opinions on the JSON parsing/unparsing itself: >> >> It should be built-in, rather than relying on an external library >> (like XML does). >> > > Why? I'm not saying you aren't right, but you need to make an > argument rather than an assertion. This is a community, so no one is > entitled to decide anything unilaterally, and people want to be > convinced - including me. > Yeah, why? We should not be in the business of reinventing the wheel (and then maintaining the reinvented wheel), unless the code in question is *really* small. > >> As far as character encodings, I'd rather keep that out of the JSON >> parsing/serializing code itself and assume UTF-8. Wherever I'm wrong, >> I'll just throw encode/decode/validate operations at it. >> > > I think you need to assume that the encoding will be the server > encoding, not UTF-8. Although others on this list are better > qualified to speak to that than I am. > > > The trouble is that JSON is defined to be specifically Unicode, and in practice for us that means UTF8 on the server side. It could get a bit hairy, and it's definitely not something I think you can wave away with a simple "I'll just throw some encoding/decoding function calls at it." cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 28 Mar 2010 19:08 Andrew Dunstan <andrew(a)dunslane.net> writes: > Robert Haas wrote: >> I think you need to assume that the encoding will be the server >> encoding, not UTF-8. Although others on this list are better >> qualified to speak to that than I am. > The trouble is that JSON is defined to be specifically Unicode, and in > practice for us that means UTF8 on the server side. It could get a bit > hairy, and it's definitely not something I think you can wave away with > a simple "I'll just throw some encoding/decoding function calls at it." It's just text, no? Are there any operations where this actually makes a difference? Like Robert, I'm *very* wary of trying to introduce any text storage into the backend that is in an encoding different from server_encoding. Even the best-case scenarios for that will involve multiple new places for encoding conversion failures to happen. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Andrew Dunstan on 28 Mar 2010 19:22 Tom Lane wrote: > Andrew Dunstan <andrew(a)dunslane.net> writes: > >> Robert Haas wrote: >> >>> I think you need to assume that the encoding will be the server >>> encoding, not UTF-8. Although others on this list are better >>> qualified to speak to that than I am. >>> > > >> The trouble is that JSON is defined to be specifically Unicode, and in >> practice for us that means UTF8 on the server side. It could get a bit >> hairy, and it's definitely not something I think you can wave away with >> a simple "I'll just throw some encoding/decoding function calls at it." >> > > It's just text, no? Are there any operations where this actually makes > a difference? > If we're going to provide operations on it that might involve some. I don't know. > Like Robert, I'm *very* wary of trying to introduce any text storage > into the backend that is in an encoding different from server_encoding. > Even the best-case scenarios for that will involve multiple new places for > encoding conversion failures to happen. > > I agree entirely. All I'm suggesting is that there could be many wrinkles here. Here's another thought. Given that JSON is actually specified to consist of a string of Unicode characters, what will we deliver to the client where the client encoding is, say Latin1? Will it actually be a legal JSON byte stream? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 28 Mar 2010 19:36
Andrew Dunstan <andrew(a)dunslane.net> writes: > Here's another thought. Given that JSON is actually specified to consist > of a string of Unicode characters, what will we deliver to the client > where the client encoding is, say Latin1? Will it actually be a legal > JSON byte stream? No, it won't. We will *not* be sending anything but latin1 in such a situation, and I really couldn't care less what the JSON spec says about it. Delivering wrongly-encoded data to a client is a good recipe for all sorts of problems, since the client-side code is very unlikely to be expecting that. A datatype doesn't get to make up its own mind whether to obey those rules. Likewise, data on input had better match client_encoding, because it's otherwise going to fail the encoding checks long before a json datatype could have any say in the matter. While I've not read the spec, I wonder exactly what "consist of a string of Unicode characters" should actually be taken to mean. Perhaps it only means that all the characters must be members of the Unicode set, not that the string can never be represented in any other encoding. There's more than one Unicode encoding anyway... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |