Proposal: Add JSON support [PgSql]

Prev: [HACKERS] Proposal: Add JSON support
Next: [HACKERS] five-key syscaches

From: Joseph Adams on 5 Apr 2010 23:50

Another JSON strictness issue: the JSON standard (
http://www.ietf.org/rfc/rfc4627.txt ) states that JSON text can only
be an array or object. However, my implementation currently accepts
any valid value. Thus, '3', '"hello"', 'true', 'false', and 'null'
are all accepted by my implementation, but are not strictly JSON text.
The question is: should the JSON datatype accept atomic values (those
that aren't arrays or objects) as valid JSON?

I tried a few other JSON implementations to see where they stand
regarding atomic types as input:

JSON_checker (C) does not accept them.
JSON.parse() (JavaScript) accepts them.
json_decode() (PHP) accepts them. However, support is currently buggy
(e.g. '1' is accepted, but '1 ' is not).
cJSON (C) accepts them.
JSON.pm (Perl) accepts them if you specify the allow_nonref option.
Otherwise, it accepts 'true' and 'false', but not 'null', a number, or
a string by itself.

In my opinion, we should accept an atomic value as valid JSON content.
I suppose we could get away with calling it a "content" fragment as
is done with XML without a doctype.

Accepting atomic values as valid JSON would be more orthagonal, as it
would be possible to have a function like this:

json_values(object_or_array JSON) RETURNS SETOF JSON
-- extracts values from an object or members from an array, returning
them as JSON fragments.

Also, should we go even further and accept key:value pairs by themselves? :

'"key":"value"'::JSON

I don't think we should because doing so would be rather zany. It
would mean JSON content could be invalid in value context, as in:

// JavaScript
var content = "key" : "value";

I improved my JSON library. It now only accepts strict, UTF-8 encoded
JSON values (that is, objects, arrays, strings, numbers, true, false,
and null). It also has a json_decode_liberal() function that accepts
a string, cleans it up, and passes it through the stricter
json_decode(). json_decode_liberal() filters out comments, allows
single quoted strings, and accepts a lax number format compared to
strict JSON. I may add Unicode repair to it later on, but
implementing that well really depends on what type of Unicode errors
appear in real life, I think.

http://constellationmedia.com/~funsite/static/json-0.0.2.tar.bz2

My json.c is now 1161 lines long, so I can't quite call it "small" anymore.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 6 Apr 2010 00:45

On Mon, Apr 5, 2010 at 11:50 PM, Joseph Adams
<joeyadams3.14159(a)gmail.com> wrote:
> In my opinion, we should accept an atomic value as valid JSON content.

That seems right to me.

> Also, should we go even further and accept key:value pairs by themselves? :
>
> '"key":"value"'::JSON

Definitely not.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Petr Jelinek on 6 Apr 2010 01:00

Dne 6.4.2010 5:50, Joseph Adams napsal(a):
> Another JSON strictness issue: the JSON standard (
> http://www.ietf.org/rfc/rfc4627.txt ) states that JSON text can only
> be an array or object. However, my implementation currently accepts
> any valid value. Thus, '3', '"hello"', 'true', 'false', and 'null'
> are all accepted by my implementation, but are not strictly JSON text.
> The question is: should the JSON datatype accept atomic values (those
> that aren't arrays or objects) as valid JSON?
>

Not really sure about this myself, but keep in mind that NULL has
special meaning in SQL.

> Also, should we go even further and accept key:value pairs by themselves? :
>
> '"key":"value"'::JSON
>
>

No, especially considering that '{"key":"value"}' is a valid JSON value.

> I improved my JSON library. It now only accepts strict, UTF-8 encoded
> JSON values (that is, objects, arrays, strings, numbers, true, false,
> and null).
>

Just a note, but PostgreSQL has some UTF-8 validation code, you might
want to look at it maybe, at least once you start the actual integration
into core, so that you are not reinventing too many wheels. I can see
how your own code is good thing for general library which this can (and
I am sure will be) used as, but for the datatype itself, it might be
better idea to use what's already there, unless it's somehow
incompatible of course.

--
Regards
Petr Jelinek (PJMODOS)

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Petr Jelinek on 6 Apr 2010 02:15

Dne 6.4.2010 7:57, Joseph Adams napsal(a):
> On Tue, Apr 6, 2010 at 1:00 AM, Petr Jelinek<pjmodos(a)pjmodos.net> wrote:
>
>> Not really sure about this myself, but keep in mind that NULL has special
>> meaning in SQL.
>>
> To me, the most logical approach is to do the obvious thing: make
> JSON's 'null' be SQL's NULL. For instance, SELECTing on a table with
> NULLs in it and converting the result set to JSON would yield a
> structure with 'null's in it. 'null'::JSON would yield NULL. I'm not
> sure what startling results would come of this approach, but I'm
> guessing this would be most intuitive and useful.
>

+1

>> Just a note, but PostgreSQL has some UTF-8 validation code, you might want
>> to look at it maybe, at least once you start the actual integration into
>> core, so that you are not reinventing too many wheels. I can see how your
>> own code is good thing for general library which this can (and I am sure
>> will be) used as, but for the datatype itself, it might be better idea to
>> use what's already there, unless it's somehow incompatible of course.
>>
> Indeed. My plan is to first get a strong standalone JSON library
> written and tested so it can be used as a general-purpose library. As
> the JSON code is merged into PostgreSQL, it can be adapted. Part of
> this adaptation would most likely be removing the UTF-8 validation
> function I wrote and using PostgreSQL's Unicode support code instead.
>
> There are probably other bits that could be PostgreSQLified as well.
> I wonder if I should consider leveraging PostgreSQL's regex support or
> if it would be a bad fit/waste of time/slower/not worth it.
>

Regex ? What for ? You certainly don't need it for parsing, you have
good parser IMHO and regex would probably be all of the above.

--
Regards
Petr Jelinek (PJMODOS)

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Mike Rylander on 6 Apr 2010 10:53

On Mon, Apr 5, 2010 at 11:50 PM, Joseph Adams
<joeyadams3.14159(a)gmail.com> wrote:
> Another JSON strictness issue: the JSON standard (
> http://www.ietf.org/rfc/rfc4627.txt ) states that JSON text can only
> be an array or object. However, my implementation currently accepts
> any valid value. Thus, '3', '"hello"', 'true', 'false', and 'null'
> are all accepted by my implementation, but are not strictly JSON text.
> The question is: should the JSON datatype accept atomic values (those
> that aren't arrays or objects) as valid JSON?
>
> I tried a few other JSON implementations to see where they stand
> regarding atomic types as input:
>
> JSON_checker (C) does not accept them.
> JSON.parse() (JavaScript) accepts them.
> json_decode() (PHP) accepts them. However, support is currently buggy
> (e.g. '1' is accepted, but '1 ' is not).
> cJSON (C) accepts them.
> JSON.pm (Perl) accepts them if you specify the allow_nonref option.
> Otherwise, it accepts 'true' and 'false', but not 'null', a number, or
> a string by itself.
>
> In my opinion, we should accept an atomic value as valid JSON content.
> I suppose we could get away with calling it a "content" fragment as
> is done with XML without a doctype.
>
> Accepting atomic values as valid JSON would be more orthagonal, as it
> would be possible to have a function like this:
>
> json_values(object_or_array JSON) RETURNS SETOF JSON
> -- extracts values from an object or members from an array, returning
> them as JSON fragments.
>

For these reasons, and the fact that my project uses atomic values ;),
I think yes, we should support them.

IIUC, the reason for requiring an array or object is that the O part
of JSON means "some sort of a collection of atomic values". But, in
ECMAScript (JavaScript), instances of strings, numbers, bools and null
are, indeed, objects. IOW, I think JSON is using a faulty definition
of "object" in the spec. It's the one part of the spec that doesn't
make sense to me at all.

> Also, should we go even further and accept key:value pairs by themselves? :
>
> '"key":"value"'::JSON
>

This, though, is probably a step too far. It violates the JS part of JSON ...

> I don't think we should because doing so would be rather zany. It
> would mean JSON content could be invalid in value context, as in:
>
> // JavaScript
> var content = "key" : "value";
>

Right.

Thanks, Joseph. I think this will be a great addition!

--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker(a)esilibrary.com
| web: http://www.esilibrary.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: [HACKERS] Proposal: Add JSON support
Next: [HACKERS] five-key syscaches