S-expression I/O in Ada [ADA]

Prev: GPRbuild compatibility
Next: Irony?

From: Ludovic Brenta on 12 Aug 2010 08:46

Natacha Kerensikova wrote on comp.lang.ada:
> On Aug 12, 12:55 pm, Ludovic Brenta <ludo...(a)ludovic-brenta.org>
> wrote:
>
>> Natacha Kerensikova wrote on comp.lang.ada:
>> [...]
>
>>> Sexp_Stream is supposed to perform S-expression I/O over streams,
>>> without ever constructing a S-expression into memory. It is supposed
>>> to do only the encoding and decoding of S-expression format, to expose
>>> its clients only atoms and relations.
>
>> But how can it expose atoms and relations without an in-memory tree
>> representation? Honestly, I do not think that such a concept is
>> viable.
[...]
> The reading part is a bit more tricky, and I admitted when I proposed
> Sexp_Stream I didn't know how to make it. Having thought (maybe too
> much) since then, here is what the interface might look like:
>
> type Node_Type is (S_None, S_List, S_Atom);
>
> function Current_Node_Type(sstream: in Sexp_Stream) return Atom_Type;
>
> procedure Get_Atom(sstream: in Sexp_Stream, contents: out
> octet_array);
> -- raises an exception when Current_Node_Type is not S_Atom
> -- not sure "out octet_array" works, but that's the idea
> -- maybe turn it into a function for easier atom-to-object
> conversion
>
> procedure Move_Next(sstream: in out Sexp_Stream);
> -- when the client is finished with the current node, it calls this
> -- procedure to update stream internal state to reflect the next
> node in
> -- list
>
> procedure Move_Lower(sstream: in out Sexp_Stream);
> -- raises an exception when Current_Node_Type is not S_List
> -- update the internal state to reflect the first child of the
> current list
>
> procedure Move_Upper(sstream: in out Sexp_Stream);
> -- update the internal state to reflect the node following the list
> -- containing the current node. sortof "uncle" node
> -- the implementation is roughly skipping whatever nodes exist in
> the
> -- current list until reading its ')' and reading the following node
>
> Such an interface support data streams, i.e. reading new data without
> seek or pushback. The data would probably be read octet-by-octet
> (unless reading a verbatim encoded atom), hoping for an efficient
> buffering in the underlying stream. If that's a problem I guess some
> buffering can be transparently implemented in the private part of
> Sexp_Stream.
>
> Of course atoms are to be kept in memory, which means Sexp_Stream will
> have to contain a dynamic array (Vector in Ada terminology, right?)
> populated from the underlying stream until the atom is completely
> read. Get_Atom would hand over a (static) array built from the private
> dynamic array.
>
> The implementation would for example rely on a procedure to advance
> the underlying to the next node (i.e. skipping white space), and from
> the first character know whether it's the beginning of a new list
> (when '(' is encountered) and update the state to S_List, and it's
> finished; whether it's the end of the list (when ')' is encountered)
> and update the state to S_None; or whether it's the beginning of an
> atom, so read it completely, updating the internal dynamic array and
> setting the state to S_Atom.
>
> Well actually, it's probably not the best idea, I'm not yet clear
> enough on the specifications and on stream I/O to clearly think about
> implementation, but that should be enough to make you understand the
> idea of S-expression reading without S-expression objects.
>
> Now regarding the actual use of this interface, I think my previous
> writing example is enough, so here is the reading part.
>
> My usual way of reading S-expression configuration file is to read
> sequentially, one at a time, a list whose first element is an atom.
> Atoms, empty lists and lists beginning with a list are considered as
> comments and silently dropped. "(tcp-connect )" is meant to be one of
> these, processed by TCP_Info's client, which will hand over only the
> "" part to TCP_Info.Read (or other initialization subprogram). So
> just like TCP_Info's client processes something like "(what-ever-
> config ) (tcp-connect ) (other-config )", TCP_Info.Read will
> process only "(host foo.example) (port 80)".
>
> So TCP_Info's client, after having read "tcp-connect" atom, will call
> Move_Next on the Sexp_Stream and pass it to TCP_Info.Read. Then
> TCP_Info.Read proceeds with:
>
> loop
> case Current_Atom_Type(sstream) is
> when S_None => return; -- TCP_Info configuration is over
> when S_Atom => null; -- silent atom drop
> when S_List =>
> Move_Lower(sstream);
> Get_Atom(sstream, atom);
> -- make command of type String from atom
> -- if Get_Atom was successful
> Move_Next(sstream);
> if command = "host" then
> -- Get_Atom and turn it into host string
> elif command = "port" then
> -- Get_Atom and turn it into port number
> else
> -- complain about unrecognized command
> end if;
> Move_Upper(sstream);
> end case;
> end loop;
>
> TCP_Info's client S-expression parsing would be very similar, except
> if command = would be followed by a call to TCP_Info.Read rather
> than a Get_Atom.
>
> So where are the problems with my Sexp_Stream without in memory
> object? What am I missing?

The "problem" is that, without admitting it, you have reintroduced a
full S-Expression parser. Most of it is hidden in the Sexp_Stream
implementation, but it has to be there. Otherwise, how can the
Move_Lower, Advance, and Move_Upper operations work, keeping track of
how many levels deep you are at all times?

Note also that your TCP_Info.Read looks quite similar to mine, except
that mine takes an S-Expression as the input, rather than a stream.
Afterwards, it traverses the S-Expression using pretty much the same
algorithm as yours. The S-Expression itself comes from the stream. So,
the only difference between your concept and my implementation is that
I expose the S-Expression memory tree and you don't.

The reason why I prefer to expose the S-Expression is because, in the
general (arbitrarily complex) case, you cannot traverse an S-
Expresssion linearly; you need to traverse it as what it really is, a
tree. A stream suggests linear traversal only.

[...]
>> TCP_Info : constant String := "(tcp-connect (host foo.bar) (port 80))";
>> TCP_Info_Structured := constant To_TCP_Info (To_Sexp (TCP_Info));
>
> That's an interesting idea, which conceptually boils down to serialize
> by hand the S-expression into a String, in order to unserialize it as
> an in-memory object, in order to serialize back into a Stream.
>
> Proofreading my post, the above might sound sarcastic though actually
> it is not. It's a twist I haven't thought of, but it might indeed turn
> out to be the simplest practical way of doing it.

Right. I was not being sarcastic either. The Cons (), To_Atom () and
Append () operations are needed only when creating arbitrary and
dynamic S-Expressions. For simple cases where most of the expression
is hardcoded, the textual representation of the S-Expression is much
more compact, readable and maintainable than the Ada procedural
representation. In fact, you could also conceivably write something
like:

TCP_Info_Sexp : S_Expression := To_Sexp ("(tcp-info (host *) (port
*))");

and programmatically change the values of the atoms containing the
actual data. Once you have the in-memory S-Expression, there is no
limit to what you can do with it. You can *change* the S-Expression as
you traverse it, deleting, replacing or adding nodes as you wish. You
cannot do that with a Sexp_Stream.

> Actually for a very long time I used to write S-expressions to file
> using only string facilities and a special sx_print_atom() function to
> handle escaping of unsafe data. By then I would have handled
> TCP_Info.Write with the following C fragment (sorry I don't know yet
> how to turn it into Ada, but I'm sure equivalent as simple exists):
>
> fprintf(outfile, "(tcp-connect\n\t(host ");
> sx_print_atom(outfile, host);
> fprintf(outfile, ")\n\t(port %d)\n)\n", port);

Sure, that can also be done just as easily in Ada. You can write S-
Expressions as easily as any blob or string; it is reading them back
and understanding their structure that is tricky.

--
Ludovic Brenta.

From: Natacha Kerensikova on 12 Aug 2010 09:23

On Aug 12, 2:46 pm, Ludovic Brenta <ludo...(a)ludovic-brenta.org> wrote:
> Natacha Kerensikova wrote on comp.lang.ada:
> > So where are the problems with my Sexp_Stream without in memory
> > object? What am I missing?
>
> The "problem" is that, without admitting it, you have reintroduced a
> full S-Expression parser. Most of it is hidden in the Sexp_Stream
> implementation, but it has to be there.

Actually, I *do* admit it. That's even the whole point of Sexp_Stream:
it does the parsing, the whole parsing, and nothing more.

As soon as I realized (thanks to Dmitry) that the parser can be
meaningfully split from everything else (memory management, object
construction, etc), I was seduced by the idea.

> Otherwise, how can the
> Move_Lower, Advance, and Move_Upper operations work, keeping track of
> how many levels deep you are at all times?

In my idea it doesn't keep track of level depth, though if needed (for
example, to assert Move_Lower and Move_Upper are correctly balanced)
it can be easily added, using and integer in the Sexp_Stream object
that is incremented when moving up and decremented when moving down.
That way it can trigger an exception when encountering an unmatched
')' (the other mismatch being already handled by end-of-stream
exception), though I prefer silently dropping them (though it's a
personal preference, it's probably better design to raise the
exception in the library and catch and drop it in the application if
I'm so inclined).

> Note also that your TCP_Info.Read looks quite similar to mine, except
> that mine takes an S-Expression as the input, rather than a stream.
> Afterwards, it traverses the S-Expression using pretty much the same
> algorithm as yours. The S-Expression itself comes from the stream. So,
> the only difference between your concept and my implementation is that
> I expose the S-Expression memory tree and you don't.

Exactly. It's also very similar to my C clients, which all work on in-
memory S-expression objects.

> The reason why I prefer to expose the S-Expression is because, in the
> general (arbitrarily complex) case, you cannot traverse an S-
> Expresssion linearly; you need to traverse it as what it really is, a
> tree. A stream suggests linear traversal only.

Yes, linear traversal only is indeed a limitation of Sexp_Stream.
That's the reason why I had in mind another package of object and
memory handling, which would rely on Sexp_Stream for the parsing.

I prefer (for now) the Sexp_Stream/memory-object split because 1. it
makes thing more modular, which means more possibilities of localized
changes and of implementation overhaul without changing specification,
and 2. in my real-life application the linear traversal is often
enough. Far from always though, especially when one counts cases when
application can traverse linearly but when of the subobjects needs to
store its sub-S-expression in memory.

> > Actually for a very long time I used to write S-expressions to file
> > using only string facilities and a special sx_print_atom() function to
> > handle escaping of unsafe data. By then I would have handled
> > TCP_Info.Write with the following C fragment (sorry I don't know yet
> > how to turn it into Ada, but I'm sure equivalent as simple exists):
>
> > fprintf(outfile, "(tcp-connect\n\t(host ");
> > sx_print_atom(outfile, host);
> > fprintf(outfile, ")\n\t(port %d)\n)\n", port);
>
> Sure, that can also be done just as easily in Ada. You can write S-
> Expressions as easily as any blob or string;

And even nowadays I still wonder, all things considered, whether it
might be the best way of doing it after all.

As far as I know the three lines above is the simplest way of writing
an S-expression, and often have a hard time finding justifications of
using a more complex way. The only added value I found to S-expression
writing facilities over String I/O is ensuring the written S-
expression is actually what I meant, with parenthesis correctly
matching and no stray character or anything messing up the parsing. As
it never happened to me despite my quite heavy use of S-expressions, I
often doubt the necessity of such extra checks, especially compared to
their complexity-cost.

> it is reading them back
> and understanding their structure that is tricky.

I completely agree.

Thanks for your discussion,
Natacha

From: Ludovic Brenta on 12 Aug 2010 12:19

Natacha Kerensikova wrote on comp.lang.ada:
>> The reason why I prefer to expose the S-Expression is because, in the
>> general (arbitrarily complex) case, you cannot traverse an S-
>> Expresssion linearly; you need to traverse it as what it really is, a
>> tree. A stream suggests linear traversal only.
>
> Yes, linear traversal only is indeed a limitation of Sexp_Stream.
> That's the reason why I had in mind another package of object and
> memory handling, which would rely on Sexp_Stream for the parsing.
>
> I prefer (for now) the Sexp_Stream/memory-object split because 1. it
> makes thing more modular, which means more possibilities of localized
> changes and of implementation overhaul without changing specification,
> and 2. in my real-life application the linear traversal is often
> enough. Far from always though, especially when one counts cases when
> application can traverse linearly but when of the subobjects needs to
> store its sub-S-expression in memory.

OK, let's assume this can work technically. Let's do a cost-benefit
analysis of your preferred solution.

Cost: you limit the use of the S-Expression parser to linear traversal
only; this makes it unsuitable for a general-purpose library, which I
thought was one of your goals.

Cost: while the S-Expression parser has to construct S-Expressions in
memory anyway, you need artificial measures to hide the tree structure
from clients, thereby cluttering the implementation of the S-
Expression parser and making any future changes more difficult.

Benefit: in memory-constrained systems, you may deallocate memory soon
(after the client calls Move_Up, e.g.) and allocate it late (only when
the client calls Get_Atom or Move_Down), thereby holding only one node
at a time in memory. But if you're planning to write a web server, I
don't think you were concerned about the ~100 bytes necessary for the
tree of "(tcp-connect (host foo.bar) (port 80))", were you?

Benefit: I can't see any for the client, honestly, since parsing your
stream is not easier than parsing the S-Expression tree directly, as
you can see when comparing our respective implementations of
TCP_Info.Read.

So, I do think the more general solution is better.

--
Ludovic Brenta.

From: Natacha Kerensikova on 12 Aug 2010 13:17

On Aug 12, 6:19 pm, Ludovic Brenta <ludo...(a)ludovic-brenta.org> wrote:
> Natacha Kerensikova wrote on comp.lang.ada:
> >> The reason why I prefer to expose the S-Expression is because, in the
> >> general (arbitrarily complex) case, you cannot traverse an S-
> >> Expresssion linearly; you need to traverse it as what it really is, a
> >> tree. A stream suggests linear traversal only.
>
> > Yes, linear traversal only is indeed a limitation of Sexp_Stream.
> > That's the reason why I had in mind another package of object and
> > memory handling, which would rely on Sexp_Stream for the parsing.
>
> > I prefer (for now) the Sexp_Stream/memory-object split because 1. it
> > makes thing more modular, which means more possibilities of localized
> > changes and of implementation overhaul without changing specification,
> > and 2. in my real-life application the linear traversal is often
> > enough. Far from always though, especially when one counts cases when
> > application can traverse linearly but when of the subobjects needs to
> > store its sub-S-expression in memory.
>
> OK, let's assume this can work technically. Let's do a cost-benefit
> analysis of your preferred solution.

This is turning a bit too much into a childish "my idea is better than
yours" for my taste.

However please note that I'm afraid your misjudging my project by
taking into consideration only what I call Sexp_Stream. I've never
thought Sexp_Stream without the other (still unnamed) package I keep
talking about that handles the representations in memory, because
Sexp_Stream alone won't fit all my needs. These two packages are
functionally equivalent to what I wanted initially, and they are also
equivalent to the package your writing. So from a client point of
view, there wouldn't be much (if any) differences between my second
package and yours. The possibility of using directly Sexp_Stream is
just an option for the particular cases where it is as easy as using
in-memory objects. So I'd say ex-aequo on functionality and
generality.

Now from an implementation point of view, my two-package idea is also
equivalent to my original plans in that the glue between the two
packages is so thin coding both packages takes about as much effort as
my original idea. I suspect your package is very similar to my
original idea, which means ex-eaquo too on implementation efforts and
complexity.

So what are the differences then?

My two-package idea allows to make use of the memory-efficient
Sexp_Stream interface instead of building objects. Honestly I don't
expect ever encountering a situation where it matters, so let's say it
doesn't.

My two-package idea implies a well-defined interface between what
would be two halves of your packages. Well it boils to whether two
independent packages are better or not than a single double-sized
package. I personally have a clear preference of two halves rather
than one equally size; but again, that's just my taste, I'm not sure
there is really any objective technical advantage.

My two-package idea would be implemented under an ISC license, while
your package is GPL. I don't want to start a license troll or
anything, my point here is just that at this point of the analysis
both projects looks so similar to me that license-preferences might be
the only thing left to tip the balance.

My two-package idea is so far just an idea. Your package already has
lots of real code. I won't be surprised if I'm so busy here learning
about what people think about my ideas and learning about Ada that you
end up finishing your package before I actually write my first line in
Ada. We've got a clear winner here.

Thanks for your attention,
Natacha

From: Jeffrey Carter on 12 Aug 2010 14:51

On 08/12/2010 02:26 AM, Natacha Kerensikova wrote:
>
> There are situations when a memory representation of S-expressions is
> not needed, and the tcp-connect example here seems to be such a case.
> That's why I imagined TCP_Info as a client of Sexp_Stream instead of a
> client of the second package.

Not needed, perhaps, but it makes things clearer and more reusable if it is used
anyway. The result might not be acceptable for a memory-constrained system, but
general, reusable pkgs are usually not acceptable when you have special
requirements.

>> S : S_Expression := Some_Initialization;
>>
>> Put (File => F, Expression => S);
>
> To achieve such an interface, the client has to build an in-memory S-
> expression object. In the tcp-connect example, there are eight nodes
> (five atoms and three lists). They have to be somehow built, and it
> doesn't look like a simple initialization.

Perhaps not. But for each kind of thing you want to store using this
representation, it need be done only once, and then reused:

S : S_Expression := To_S_Expression (TCP_Info);

> The second interface I proposed, with a lot of nested calls, builds
> the S-expression object with functions looking like:
> function NewList(Contents, Next: S_node_access) return S_node_access;
> function AtomFromWhatever(Contents: whatever, Next: S_node_access)
> return S_node_access;

An aside on Ada style: Ada is not case sensitive, and many of us reformat code
to make it easier for us to read. The resulting identifiers such as Newlist and
Atomfromwidget are not very readable. This is why the convention is to use
underlines between words.

> For the TCP_info stuff particular case, the only simplification I can
> imagine is leveraging the fact that only 2 among the 8 nodes change.
> So one could construct a global S-expression skeleton with the 6
> constant nodes, and when actually writing data append to the "host"
> and "port" atoms the relevant variable atoms. However that means
> having a read/write global variable, which is bad for tasking, so one
> might prefer a constant 6-node skeleton, duplicated in the declarative
> part of the writing procedure, and then appending the variable nodes.
> However that appending would probably end up with lines like:
> root_sexp.FirstChild.Next.FirstChild.Append(host_node);
> root_Sexp.FirstChild.Next.Next.FirstChild.Append(port_node);
> which looks very ugly to me.

Then I would suggest that your client interface for the TCP-info stuff would
only deal with (host, port) pairs.

I would suggest a constant in the pkg body that you would use to initialize an
S-expression variable, then substitute the host and port values into the
variable before writing. On reading, you would read the S-expression, then
extract the host and port values and return them to the client.

Ugly is relative; having code like that sprinkled throughout your application is
ugly; having it occur once in a pkg body is less so. Having it be less ugly and
occur once in a pkg body would be even better, though.

>> Your TCP_Info-handling pkg would convert the record into an S-expression, and
>> call a single operation from your S-expression pkg to output the S-expression.
>
> That's the tricky part. At least so tricky that I can't imagine how to
> do it properly.

I'm not sure what you think is tricky about it. Clearly you see how to output an
S-expression:

>> Your S-expression library would implement the writing of the expression as a
>> series of steps.
>
> Indeed, as a series of calls similar to my Sexp_Stream example. I'm
> glad to have at least that part right.

--
Jeff Carter
"What I wouldn't give for a large sock with horse manure in it."
Annie Hall
42

--- news://freenews.netfront.net/ - complaints: news(a)netfront.net ---

First | Prev | Next | Last
Pages: 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Prev: GPRbuild compatibility
Next: Irony?