From: Patrick Maupin on
Kirill:

Thank you for your constructive criticism. This is the gem that made
it worthwhile to post my document. I think all of your points are
spot-on, and I will be fixing the documentation.

I can well believe that the C implementation of YAML is much faster
than the Python one, but I am aiming for something that will be
reasonably quick in pure Python. I will double-check the JSON C test
results, but something I probably did not make clear is that the 22
seconds is not spent parsing -- that is for the entire test, which
involves reading restructured text and generating some 160 separate
PDF files.

Best regards,
Pat


On Mon, Mar 1, 2010 at 8:02 PM, Kirill Simonov <xi(a)gamma.dn.ua> wrote:
> Patrick Maupin wrote:
>>
>> All:
>>
>> Finding .ini configuration files too limiting, JSON and XML to hard to
>> manually edit, and YAML too complex to parse quickly, I have started
>> work on a new configuration file parser.
>
> I'd like to note that with the optional libyaml bindings, the PyYAML parser
> is pretty fast.
>
>> I call the new format RSON (for "Readable Serial Object Notation"),
>> and it is designed to be a superset of JSON.
>>
>> I would love for it to be considered valuable enough to be a part of
>> the standard library, but even if that does not come to pass, I would
>> be very interested in feedback to help me polish the specification,
>> and then possibly help for implementation and testing.
>>
>> The documentation is in rst PEP form, at:
>>
>> http://rson.googlecode.com/svn/trunk/doc/draftpep.txt
>
> === cut ===
> Because YAML does allow for highly readable configuration files, it
> is tempting to overlook its other flaws for the task.  But a fully
> (or almost) compliant parser has to understand the whole YAML
> specification, and this is apparently expensive.  Running the rst2pdf
> testsuite, without sphinx or most of the other optional packages, in
> "fast" mode (preloading all the modules, and then forking for every
> test) generates 161 smallish PDF files, totaling around 2.5 MB.  On
> one test system this process takes 22 seconds.  Disabling the _json C
> scanner and reading the configuration files using the json pure Python
> implementation adds about 0.3 seconds to the 22 seconds.  But using
> pyyaml v. 3.09 instead of json adds 33 seconds to the 22 second process!
> It might seem that this is an edge case, but it makes it unacceptable to
> use YAML for this sort of testing, and taking 200 ms to read in 1000
> lines of simple JSON will be unacceptable in many other application
> domains as well.
> === cut ===
>
> I'd question your testing methodology.  From your description, it looks like
> the _json speedup never was enabled.  Also PyYAML provides optional bindings
> to libyaml, which makes parsing and emitting yaml much faster.  In my tests,
> it parses a 10Mb file in 3 sec.
>
> === cut ===
> RSON semantics are based on JSON.  Like JSON, an RSON document represents
> either a single scalar object, or a DAG (Directed Acyclic Graph), which
> may contain only a few simple data types.
> === cut ===
>
> JSON doesn't represent a DAG, at least, not an arbitrary DAG since each node
> in the document has no more than one parent.  It would be more accurate to
> say that that it represents a tree-like structure.
>
> === cut ===
> The YAML syntax for supporting back-references was considered and deemed
> unsatisfactory. A human user who wants to put identical information in a
> "ship to" and "bill to" address is much more likely to use cut and paste
> than he is to understand and use backreferences, so the additional overhead
> of supporting more complex document structures is unwarranted.
>
> The concept of a "merge" in YAML, where two sub-trees of data can be
> merged together (similar to a recursive Python dictionary update)
> is quite useful, though, and will be copied.  This does not alter the
> outcome that parsing a RSON file will result in a DAG, but does give
> more flexibility in the syntax that can be used to achieve a particular
> output DAG.
> === cut ===
>
> This paragraph assumes the reader is familiar with intricate details of the
> YAML grammar and semantics.  I bet most of your audience are completely lost
> here.
>
> === cut ===
> Enhanced example::
>
>    key1/key2a
>        key3a = Some random string
>        key3b = 42
>    key1/key2a
>        key3c
>            1
>            2
>            {}
>                key4a = anything
>                key4b = something else
>            []
>                a
>                b
>                c
>            3
>            4
>    key1/key2b = [1, 2, 3, 4]
>    key5 = ""
>       This is a multi-line string.  It is
>          dedented to the farthest left
>          column that is indented from
>          the line containing "".
>    key6 = [""]
>       This is an array of strings, one per line.
>       Each string is dedented appropriately.
> === cut ===
>
> Frankly, this is an example that only a mother could love.  I'd suggest you
> to add some real-world examples, make sure they look nice and put them to
> the introductory part of the document.  Examples is how the format will be
> evaluated by the readers, and yours don't stand a chance.
>
> Seriously, the only reason YAML enjoys its moderate popularity despite its
> overcomplicated grammar, chronic lack of manpower and deficient
> implementations is because it's so cute.
>
>
>
> Disclaimer: I'm the author of PyYAML and libyaml.
>
> Thanks,
> Kirill
>
From: Patrick Maupin on
On Mon, Mar 1, 2010 at 8:02 PM, Kirill Simonov <xi(a)gamma.dn.ua> wrote:

BTW, congratulations on slogging through the YAML grammar to generate
such a good working C library!

That must have been a tremendous effort.

Regards,
Pat
From: Kirill Simonov on
Patrick Maupin wrote:
> Kirill:
>
> Thank you for your constructive criticism. This is the gem that made
> it worthwhile to post my document. I think all of your points are
> spot-on, and I will be fixing the documentation.

You are welcome. Despite what others have been saying, I don't think
this area is closed to innovations.


> I can well believe that the C implementation of YAML is much faster
> than the Python one, but I am aiming for something that will be
> reasonably quick in pure Python. I will double-check the JSON C test
> results, but something I probably did not make clear is that the 22
> seconds is not spent parsing -- that is for the entire test, which
> involves reading restructured text and generating some 160 separate
> PDF files.

Yes, this makes more sense. It's quite possible that the pure-Python
PyYAML parser is much slower than the pure-Python JSON parser.

At the same time, semantically meaningful whitespaces will likely hinder
the pure-Python performance. To make it fast, you'll need to convert
the inner loops of the parser to regexps, and it is hard to support
variable-length indentation with static regular expressions.


Thanks,
Kirill
From: Kirill Simonov on
Patrick Maupin wrote:
> On Mon, Mar 1, 2010 at 8:02 PM, Kirill Simonov <xi(a)gamma.dn.ua> wrote:
>
> BTW, congratulations on slogging through the YAML grammar to generate
> such a good working C library!
>
> That must have been a tremendous effort.

The trick was to completely ignore the grammar described in the
specification. In fact, the syntax of YAML is pretty close the Python
syntax and, with some effort, the Python scanner and parser could be
adapted to parsing YAML. Once I realized it, I got a working parser in
a week or so.

Thanks,
Kirill
From: Terry Reedy on
On 3/1/2010 7:56 PM, Patrick Maupin wrote:
> On Mar 1, 5:57 pm, Erik Max Francis<m...(a)alcyone.com> wrote:
>> Patrick Maupin wrote:
>> This not only seriously stretching the meaning of the term "superset"
>> (as Python is most definitely not even remotely a superset of JSON), but
>
> Well, you are entitled to that opinion, but seriously, if I take valid
> JSON, replace unquoted true with True, unquoted false with False,
> replace unquoted null with None, and take the quoted strings and
> replace occurrences of \uXXXX with the appropriate unicode, then I do,
> in fact, have valid Python. But don't take my word for it -- try it
> out!

To me this is so strained that I do not see why why you are arguing the
point. So what? The resulting Python 'program' will be equivalent, I
believe, to 'pass'. Ie, construct objects and then discard them with no
computation or output. I suggest dropping this red-herring distraction.

> But if you really want to be pedantic about it, JavaScript (rather
> than Python) is, in fact a superset of JSON, and, despite the
> disparagement JavaScript receives, in my opinion, it is possible to
> write much better looking JavaScript than JSON for many tasks.
>
> YAML, also, is a superset of JSON, and IMO, it is possible to write
> much better looking YAML than JSON.

I agree that adding a bit of syntax to something can sometimes make it
easier to write readable text. This is hardly a new idea and should not
be controversial. That is why people developed 'macro assembley' as a
superset of 'assembly' languages and why, for instance, Python added the
'with' statement.

I read your proposal. I have not needed config files and have never
written json or yaml and so cannot really evaluate your proposal for
something 'in between'. It does seem plausible that it could be useful.

While using the PEP format is great, calling your currently vaperware
module proposal a 'standards track' PEP is somewhat off-putting and
confusing. If Guido rejected it, would you simply drop the idea? If not,
if you would continue it as a third-party module that would eventually
be released and announced on PyPI, I seriously suggest renaming it to
what it is.

Terry Jan Reedy