Draft PEP on RSON configuration file format [Python]

Prev: [ANNC] pynguin-0.1 (python-based turtle graphics application)
Next: Detecting new removable drives in Linux

From: Erik Max Francis on 1 Mar 2010 18:57

Patrick Maupin wrote:
> On Feb 28, 9:18 pm, Steven D'Aprano > Wait a minute... if JSON is too
> hard to edit, and RSON is a *superset* of
>> JSON, that means by definition every JSON file is also a valid RSON file.
>> Since JSON is too hard to manually edit, so is RSON.
>
> Well, Python is essentially a superset of JSON, with string escape
> handling being ever so slightly different, and using True instead of
> true, False instead of false, and None instead of null. YMMV, but I
> find it possible, even probable, to write Python that is far easier to
> edit than JSON, and in fact, I have used Python for configuration
> files that are only to be edited by programmers or other technical
> types.

This not only seriously stretching the meaning of the term "superset"
(as Python is most definitely not even remotely a superset of JSON), but
still doesn't address the question. Is RSON and _actual_ superset of
JSON, or are you just misusing the term there, as well? If it is, then
your rationale for not using JSON makes no sense if you're making a new
format that's merely a superset of it. Obviously JSON can't be that
unreadable if you're _extending_ it to make your own "more readable"
format. If JSON is unreadable, so must be RSON.

--
Erik Max Francis && max(a)alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM/Y!M/Skype erikmaxfrancis
It's better to be quotable than to be honest.
-- Tom Stoppard

From: Patrick Maupin on 1 Mar 2010 19:45

On Mar 1, 5:33 pm, Erik Max Francis <m...(a)alcyone.com> wrote:
> Psst. That you're allowed to present the idea that you think is good
> doesn't mean that other people aren't allowed to respond and point out
> that in their opinion it's not such a good idea. You don't own this or
> any other thread.

Absolutely, but I still do (and will always) express a clear
preference for opinions that have at least a modicum of reasoning
behind them.

Regards,
Pat

From: Patrick Maupin on 1 Mar 2010 19:56

On Mar 1, 5:57 pm, Erik Max Francis <m...(a)alcyone.com> wrote:
> Patrick Maupin wrote:
> This not only seriously stretching the meaning of the term "superset"
> (as Python is most definitely not even remotely a superset of JSON), but

Well, you are entitled to that opinion, but seriously, if I take valid
JSON, replace unquoted true with True, unquoted false with False,
replace unquoted null with None, and take the quoted strings and
replace occurrences of \uXXXX with the appropriate unicode, then I do,
in fact, have valid Python. But don't take my word for it -- try it
out!

But if you really want to be pedantic about it, JavaScript (rather
than Python) is, in fact a superset of JSON, and, despite the
disparagement JavaScript receives, in my opinion, it is possible to
write much better looking JavaScript than JSON for many tasks.

YAML, also, is a superset of JSON, and IMO, it is possible to write
much better looking YAML than JSON.

> still doesn't address the question. Is RSON and _actual_ superset of
> JSON, or are you just misusing the term there, as well?

Yes, the RSON definition, in fact, a superset of JSON, just like the
YAML definition. But RSON is a much smaller grammar than YAML.

If it is, then
> your rationale for not using JSON makes no sense if you're making a new
> format that's merely a superset of it. Obviously JSON can't be that
> unreadable if you're _extending_ it to make your own "more readable"
> format. If JSON is unreadable, so must be RSON.

Well, we'll have to agree to disagree here. Bearing in mind that the
definition of "unreadable" depends on the target application and user,
obviously, it will be *possible* to write unreadable RSON, just as it
is *possible* to write unreadable JavaScript or Python or YAML, but it
will be *possible* to write better looking RSON than is possible to
achieve with JSON, just as it is *possible* to write better looking
JavaScript or YAML or Python than it is *possible* to achieve with
pure JSON.

Best regards,
Pat

From: Kirill Simonov on 1 Mar 2010 20:07

Erik Max Francis wrote:
> Daniel Fetchinson wrote:
>>>> it is my goal (which I may or may not be smart enough to reach) to
>>>> write a module that anybody would want to use;
>>> But you are working on a solution in search of a problem. The really
>>> smart thing to do would be pick something more useful to work on. We
>>> don't need another configuration language. I can't even say "yet
>>> another" because there's already a "yet another" called yaml.
>>
>> And in case you are new here let me assure you that Paul is saying
>> this with his full intention of being helpful to you. I also would
>> think that working on such a project might be fun and educational for
>> you but completely useless if you have users other than yourself in
>> mind. Again, I'm trying to be helpful here, so you can focus on a
>> project that is both fun/educational for you and also potentially
>> useful for others. This RSON business is not one of them.
>
> Agreed. Even YAML's acronym indicates that it is already a bridge too
> far; we don't need more.
>

Note that YA in the acronym doesn't mean Yet Another, YAML = YAML Ain't
Markup Language.

Thanks,
Kirill

From: Kirill Simonov on 1 Mar 2010 21:02

Patrick Maupin wrote:
> All:
>
> Finding .ini configuration files too limiting, JSON and XML to hard to
> manually edit, and YAML too complex to parse quickly, I have started
> work on a new configuration file parser.

I'd like to note that with the optional libyaml bindings, the PyYAML
parser is pretty fast.

> I call the new format RSON (for "Readable Serial Object Notation"),
> and it is designed to be a superset of JSON.
>
> I would love for it to be considered valuable enough to be a part of
> the standard library, but even if that does not come to pass, I would
> be very interested in feedback to help me polish the specification,
> and then possibly help for implementation and testing.
>
> The documentation is in rst PEP form, at:
>
> http://rson.googlecode.com/svn/trunk/doc/draftpep.txt

=== cut ===
Because YAML does allow for highly readable configuration files, it
is tempting to overlook its other flaws for the task. But a fully
(or almost) compliant parser has to understand the whole YAML
specification, and this is apparently expensive. Running the rst2pdf
testsuite, without sphinx or most of the other optional packages, in
"fast" mode (preloading all the modules, and then forking for every
test) generates 161 smallish PDF files, totaling around 2.5 MB. On
one test system this process takes 22 seconds. Disabling the _json C
scanner and reading the configuration files using the json pure Python
implementation adds about 0.3 seconds to the 22 seconds. But using
pyyaml v. 3.09 instead of json adds 33 seconds to the 22 second process!
It might seem that this is an edge case, but it makes it unacceptable to
use YAML for this sort of testing, and taking 200 ms to read in 1000
lines of simple JSON will be unacceptable in many other application
domains as well.
=== cut ===

I'd question your testing methodology. From your description, it looks
like the _json speedup never was enabled. Also PyYAML provides optional
bindings to libyaml, which makes parsing and emitting yaml much faster.
In my tests, it parses a 10Mb file in 3 sec.

=== cut ===
RSON semantics are based on JSON. Like JSON, an RSON document represents
either a single scalar object, or a DAG (Directed Acyclic Graph), which
may contain only a few simple data types.
=== cut ===

JSON doesn't represent a DAG, at least, not an arbitrary DAG since each
node in the document has no more than one parent. It would be more
accurate to say that that it represents a tree-like structure.

=== cut ===
The YAML syntax for supporting back-references was considered and deemed
unsatisfactory. A human user who wants to put identical information in a
"ship to" and "bill to" address is much more likely to use cut and paste
than he is to understand and use backreferences, so the additional overhead
of supporting more complex document structures is unwarranted.

The concept of a "merge" in YAML, where two sub-trees of data can be
merged together (similar to a recursive Python dictionary update)
is quite useful, though, and will be copied. This does not alter the
outcome that parsing a RSON file will result in a DAG, but does give
more flexibility in the syntax that can be used to achieve a particular
output DAG.
=== cut ===

This paragraph assumes the reader is familiar with intricate details of
the YAML grammar and semantics. I bet most of your audience are
completely lost here.

=== cut ===
Enhanced example::

key1/key2a
key3a = Some random string
key3b = 42
key1/key2a
key3c
1
2
{}
key4a = anything
key4b = something else
[]
a
b
c
3
4
key1/key2b = [1, 2, 3, 4]
key5 = ""
This is a multi-line string. It is
dedented to the farthest left
column that is indented from
the line containing "".
key6 = [""]
This is an array of strings, one per line.
Each string is dedented appropriately.
=== cut ===

Frankly, this is an example that only a mother could love. I'd suggest
you to add some real-world examples, make sure they look nice and put
them to the introductory part of the document. Examples is how the
format will be evaluated by the readers, and yours don't stand a chance.

Seriously, the only reason YAML enjoys its moderate popularity despite
its overcomplicated grammar, chronic lack of manpower and deficient
implementations is because it's so cute.

Disclaimer: I'm the author of PyYAML and libyaml.

Thanks,
Kirill

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Prev: [ANNC] pynguin-0.1 (python-based turtle graphics application)
Next: Detecting new removable drives in Linux