From: Peter Eisentraut on 29 Mar 2010 05:20 On sön, 2010-03-28 at 23:24 -0400, Joseph Adams wrote: > Thus, here's an example of how (in my opinion) character sets and such > should be handled in the JSON code: > > Suppose the client's encoding is UTF-16, and the server's encoding is > Latin-1. When JSON is stored to the database: > 1. The client is responsible and sends a valid UTF-16 JSON string. > 2. PostgreSQL checks to make sure it is valid UTF-16, then converts > it to UTF-8. > 3. The JSON code parses it (to ensure it's valid). > 4. The JSON code unparses it (to get a representation without > needless whitespace). It is given a flag indicating it should only > output ASCII text. > 5. The ASCII is stored in the server, since it is valid Latin-1. > > When JSON is retrieved from the database: > 1. ASCII is retrieved from the server > 2. If user needs to extract one or more fields, the JSON is parsed, > and the fields are extracted. > 3. Otherwise, the JSON text is converted to UTF-16 and sent to the client. The problem I see here is that a data type output function is normally not aware of the client encoding. The alternatives that I see is that you always escape everything you see to plain ASCII, so it's valid in every server encoding, but that would result in pretty sad behavior for users of languages that don't use a lot of ASCII characters, or you decree a nonstandard JSON variant that momentarily uses whatever encoding you decide. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Andrew Dunstan on 29 Mar 2010 08:06 Robert Haas wrote: > I feel pretty strongly that the data should be stored in the database > in the format in which it will be returned to the user - any > conversion which is necessary should happen on the way in. I am not > 100% sure to what extent we should attempt to canonicalize the input > and to what extend we should simply store it in whichever way the user > chooses to provide it. > > ISTM that implies that, with a possible exception when the server encoding is utf8, you would have to \u escape the data on the way in fairly pessimistically. I'd be inclined to say we should store and validate it exactly as the client gives it to us (converted to the server encoding, as it would be, of course). In practice that would mean that for non-utf8 databases the client would need to \u escape it. I suspect most uses of this would be in utf8-encoded databases anyway. I also think we should provide a function to do the escaping, so users could do something like: insert into foo (myjson) values (json_escape('some jason text here')); I also thought about a switch to turn on \u escaping on output - that might be useful for pg_dump for instance. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 29 Mar 2010 12:02 Robert Haas <robertmhaas(a)gmail.com> writes: > On Sun, Mar 28, 2010 at 11:24 PM, Joseph Adams > <joeyadams3.14159(a)gmail.com> wrote: >> My reasoning for "It should be built-in" is: >> �* It would be nice to have a built-in serialization format that's >> available by default. >> �* It might be a little faster because it doesn't have to link to an >> external library. > I don't think either of these reasons is valid. FWIW, our track record with relying on external libraries has been less than great --- "upstream will maintain it" sounds good but has fallen over with respect to both the regex engine and the snowball stemmers, to take two examples. And libxml2 has been nothing but a source of pain. If this is going to end up being one fairly small C file implementing a spec that is not a moving target, I'd vote against depending on an external library instead, no matter how spiffy and license-compatible the external library might be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 29 Mar 2010 12:06 On Mon, Mar 29, 2010 at 12:02 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas(a)gmail.com> writes: >> On Sun, Mar 28, 2010 at 11:24 PM, Joseph Adams >> <joeyadams3.14159(a)gmail.com> wrote: >>> My reasoning for "It should be built-in" is: >>> * It would be nice to have a built-in serialization format that's >>> available by default. >>> * It might be a little faster because it doesn't have to link to an >>> external library. > >> I don't think either of these reasons is valid. > > FWIW, our track record with relying on external libraries has been less > than great --- "upstream will maintain it" sounds good but has fallen > over with respect to both the regex engine and the snowball stemmers, > to take two examples. And libxml2 has been nothing but a source of pain. > > If this is going to end up being one fairly small C file implementing > a spec that is not a moving target, I'd vote against depending on an > external library instead, no matter how spiffy and license-compatible > the external library might be. Fair enough. Note that I did go on to say which reasons I did think were potentially valid. ;-) ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Josh Berkus on 29 Mar 2010 13:25
On 3/28/10 8:52 PM, Hitoshi Harada wrote: > There's another choice, called BSON. > > http://www.mongodb.org/display/DOCS/BSON > > I've not researched it yet deeply, it seems reasonable to be stored in > databases as it is invented for MongoDB. I wouldn't take that for granted. The MongoDB project involves a lot of "re-inventing the wheel" and I'd scrutinize any of their innovations pretty thoroughly. Besides, I thought the point of a JSON type was to be compatible with the *majority* of JSON users? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |