From: Peter Eisentraut on
On sön, 2010-03-28 at 23:24 -0400, Joseph Adams wrote:
> Thus, here's an example of how (in my opinion) character sets and such
> should be handled in the JSON code:
>
> Suppose the client's encoding is UTF-16, and the server's encoding is
> Latin-1. When JSON is stored to the database:
> 1. The client is responsible and sends a valid UTF-16 JSON string.
> 2. PostgreSQL checks to make sure it is valid UTF-16, then converts
> it to UTF-8.
> 3. The JSON code parses it (to ensure it's valid).
> 4. The JSON code unparses it (to get a representation without
> needless whitespace). It is given a flag indicating it should only
> output ASCII text.
> 5. The ASCII is stored in the server, since it is valid Latin-1.
>
> When JSON is retrieved from the database:
> 1. ASCII is retrieved from the server
> 2. If user needs to extract one or more fields, the JSON is parsed,
> and the fields are extracted.
> 3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.

The problem I see here is that a data type output function is normally
not aware of the client encoding. The alternatives that I see is that
you always escape everything you see to plain ASCII, so it's valid in
every server encoding, but that would result in pretty sad behavior for
users of languages that don't use a lot of ASCII characters, or you
decree a nonstandard JSON variant that momentarily uses whatever
encoding you decide.



--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andrew Dunstan on


Robert Haas wrote:
> I feel pretty strongly that the data should be stored in the database
> in the format in which it will be returned to the user - any
> conversion which is necessary should happen on the way in. I am not
> 100% sure to what extent we should attempt to canonicalize the input
> and to what extend we should simply store it in whichever way the user
> chooses to provide it.
>
>

ISTM that implies that, with a possible exception when the server
encoding is utf8, you would have to \u escape the data on the way in
fairly pessimistically.

I'd be inclined to say we should store and validate it exactly as the
client gives it to us (converted to the server encoding, as it would be,
of course). In practice that would mean that for non-utf8 databases the
client would need to \u escape it. I suspect most uses of this would be
in utf8-encoded databases anyway.

I also think we should provide a function to do the escaping, so users
could do something like:

insert into foo (myjson) values (json_escape('some jason text here'));

I also thought about a switch to turn on \u escaping on output - that
might be useful for pg_dump for instance.

cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Robert Haas <robertmhaas(a)gmail.com> writes:
> On Sun, Mar 28, 2010 at 11:24 PM, Joseph Adams
> <joeyadams3.14159(a)gmail.com> wrote:
>> My reasoning for "It should be built-in" is:
>> �* It would be nice to have a built-in serialization format that's
>> available by default.
>> �* It might be a little faster because it doesn't have to link to an
>> external library.

> I don't think either of these reasons is valid.

FWIW, our track record with relying on external libraries has been less
than great --- "upstream will maintain it" sounds good but has fallen
over with respect to both the regex engine and the snowball stemmers,
to take two examples. And libxml2 has been nothing but a source of pain.

If this is going to end up being one fairly small C file implementing
a spec that is not a moving target, I'd vote against depending on an
external library instead, no matter how spiffy and license-compatible
the external library might be.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Mon, Mar 29, 2010 at 12:02 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas(a)gmail.com> writes:
>> On Sun, Mar 28, 2010 at 11:24 PM, Joseph Adams
>> <joeyadams3.14159(a)gmail.com> wrote:
>>> My reasoning for "It should be built-in" is:
>>>  * It would be nice to have a built-in serialization format that's
>>> available by default.
>>>  * It might be a little faster because it doesn't have to link to an
>>> external library.
>
>> I don't think either of these reasons is valid.
>
> FWIW, our track record with relying on external libraries has been less
> than great --- "upstream will maintain it" sounds good but has fallen
> over with respect to both the regex engine and the snowball stemmers,
> to take two examples.  And libxml2 has been nothing but a source of pain.
>
> If this is going to end up being one fairly small C file implementing
> a spec that is not a moving target, I'd vote against depending on an
> external library instead, no matter how spiffy and license-compatible
> the external library might be.

Fair enough. Note that I did go on to say which reasons I did think
were potentially valid. ;-)

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on
On 3/28/10 8:52 PM, Hitoshi Harada wrote:
> There's another choice, called BSON.
>
> http://www.mongodb.org/display/DOCS/BSON
>
> I've not researched it yet deeply, it seems reasonable to be stored in
> databases as it is invented for MongoDB.

I wouldn't take that for granted. The MongoDB project involves a lot of
"re-inventing the wheel" and I'd scrutinize any of their innovations
pretty thoroughly.

Besides, I thought the point of a JSON type was to be compatible with
the *majority* of JSON users?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers