Thinking about a new shell [Shell]

Prev: Regular Expression
Next: Symbol for "not" within []

From: Janis Papanagnou on 7 Aug 2010 08:25

On 07/08/10 13:48, patata wrote:
> On Aug 7, 1:10 pm, Janis Papanagnou <janis_papanag...(a)hotmail.com>
> wrote:
>> On 07/08/10 12:01, patata wrote:
>>
>
>>> lack of ortogonality (an example, a command "ls" with some sorting
>>> rule):
>>> "ls" has some intrinsic sort capabilities, but not very complete.
>>
>> ...which is a Good Thing. Sorting is a function interesting for many
>> of those hundreds and thousands of utilities in the Unix environment.
>> Duplicating that (and/or other) function(s) is mostly a Bad Thing.
>>
>>> In the same way, lots of commands have several sorting capabilities,
>>> always limited and duplicated to the others
>>> "sort" command could have a complete set of capabilities, but then it
>>> must parse the "ls" output, something not always easy.
>>> It seems we have not a shell who allows a powerful "sort" command or
>>> librarian to merge easily with any "data provider" (ls).
>>
>> Well, the existing sort(1) command *is* extremely powerful. And the
>> sort command is luckily also indepentent from the shell and can thus
>> be used in any other Unix tool context as well.
>>
>
> I agree, it is better the orthogonal approach: a powerful "sort"
> command instead of sort capabilities in the other ones. But, if
> currently commands like "ls" or "ps" have their own sort capabilities
> is because it is not easy to merge them with the standard sort. This
> is the open point.

Wait. The point was not to implement sorting into whatever command,
the point was to have well defined output formats - this is an issue
for the individual commands, and the shell can't do anything here -,
or have the option to format output individually. Using the existing
sort(1) for post-processing is crucial; I wouldn't want to duplicate
that function.

>> Let the commands produce well defined (textual and simply parsable)
>> output using format specifiers, as in ps with ps -o , then you can
>> use the standard sort command and all the existing column and row
>> extraction tools easily.
>>
>> Say, with you ls example, you want to sort by time; don't use ls -t
>> rather use the (hypothetical) ls -o "%t %n" | sort ...
>>
>> [...]
>> [***] It would even be good to have a standard option letter for that,
>> but given the historical uncontrolled growth of options I don't see a
>> chance here.
>>
>
> Here is where the shell extension could help us. Instead of modify
> existing commands,

You are aware, I hope, that you are already suggesting to modify all
existing commands with your approach. (This has also been pointed out
by others, so you may want to re-think about it if you don't see that
consequence.)

> add a formatting capability to the shell.

But we already have that! Though not in the shell where it doesn't belong
to, rather in separate commands. Use awk(1). In simpler cases use cut(1).
The problem is not a lacking formatter, the problem is a _well defined_
format by a historically grown set of commands.

We disagree in the ways how to achieve a consistent transfer syntax, i.e.
a well defined output format.

Your approach would actually have to modify all programs *and* the
shell to support a new transfer syntax (probably some TLV like BER),
while that's completely unnecessary and makes certain details even
worse. While I say that you need just add another format option to
a couple commands that have inappropriate formats. And continue to
use the powerful mechanisms we already have since 40 year. Even if
you don't implement the capability of defining clean formats, you
often can with some more effort in the regexps format in arbitrary
ways already, just use awk as the predominant example of a formatter.

> This
> extension will wrap the command and format their output in the
> standard syntax. Something like:
>
> reformat ls ...... | sort ...
>
> or event a sugar syntax:
>
> @ls ... | sort
>
> The "standard" syntax could be, like in your examples, a tab separated
> one, where each item is in one single line at fields are splited by
> tabs. In this way the output is standard, formatted and textual

Sorry, no. This is even worse than I thought it could be made.

Actually what we need is control over the format. So your examples would
have to be extended to something like

reformat <format-specifier> ls ...... | sort ...

and you can also just have

ls ...... | reformat <format-specifier> | sort ...

which nowadays already exists, e.g. as

ls ...... | awk <format-specifier> | sort ...

But that is inferior to creating the desired set of command specific
attributes in the first place, as in

ls <format-specifier> ...... | sort ...

The problem is that ls (in the example) or any other program because of
historical reasons doesn't have an adequate output format defined, or
that the formats even vary between OS'es, like in the case of ancient
(but still existing) versions of ps(1). The problem is not that we are
lacking a new transfer syntax format.

The additional criticism mentioned upthread still applies.

Janis

> (however, in more complex cases, it could be necessary to group data
> items with something like {}).
>
>
> Thanks a lot for your collaboration.
>
>

From: Barry Margolin on 7 Aug 2010 18:52

In article <i3jeua$9on$1(a)news.m-online.net>,
Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote:

> > I'm wondering if it is not better to have "binary" input/output for
> > commands (TLV, XML or something similar).
>
> Why pass binary information; there's really no necessity it seems.[*]
>
> (And, BTW, XML is not binary; rather it's bloated text.)

What he's actually getting at is that the data is structured, rather
than free-form.

One of the problems with the usual plain text data used with Unix pipes
is that it's difficult to deal with fields that contain the field
delimiter. Most text processing tools (e.g. sed, cut, awk) don't
support any kind of quoting or escaping in the input data.

25 years ago, when Multics development was winding down and our team was
assigned a new project, we came up with a similar idea of replacing the
text-based output of commands with a database approach. But the new OS
we were designing never really got off the ground.

--
Barry Margolin, barmar(a)alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***

From: Janis Papanagnou on 8 Aug 2010 04:12

On 08/08/10 00:52, Barry Margolin wrote:
> In article <i3jeua$9on$1(a)news.m-online.net>,
> Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote:
>
>>> I'm wondering if it is not better to have "binary" input/output for
>>> commands (TLV, XML or something similar).
>>
>> Why pass binary information; there's really no necessity it seems.[*]
>>
>> (And, BTW, XML is not binary; rather it's bloated text.)
>
> What he's actually getting at is that the data is structured, rather
> than free-form.
>
> One of the problems with the usual plain text data used with Unix pipes
> is that it's difficult to deal with fields that contain the field
> delimiter. Most text processing tools (e.g. sed, cut, awk) don't
> support any kind of quoting or escaping in the input data.

This is correct and I am aware of that.

Though my experience in this respect is that it would already be
enormous helpful if we had cleaner output syntax in a couple more
of the Unix tools, and the delimiter issue could then be covered
as well (probably in form of an option) where it's necessary.

Defining only those elementary attributes in the format specifier
(as proposed; with given example ps) that are necessary to process
further will make things really easy to handle. The inconvenient
part is (as you briefly mention), if complex data are output as a
single unit, e.g. as the complete command with arguments that will
(on Linux) be output by ps -x, or the standard issue of file names
with blanks (or even more pathological characters). I've used tools
where the delimiters could be optionally defined, and even complex
or composed attributes could be easily handled that way.

With formal syntaxes and selectable delimiters we could also avoid
most (or even all?) of those escapes and quoting which would lead
us to a nightmare like CSV otherwise.

Janis

>
> 25 years ago, when Multics development was winding down and our team was
> assigned a new project, we came up with a similar idea of replacing the
> text-based output of commands with a database approach. But the new OS
> we were designing never really got off the ground.
>

From: patata on 8 Aug 2010 07:06

On Aug 7, 12:01 pm, patata <qpat...(a)gmail.com> wrote:
> Hello,
>
> First, let me to thank all people involved in this group for your
> continuous support, that saves me a lot of troubles.
>
> After years as user of shells, several problems with all them appears
> always. Some examples of these problems are:
>
> lack of ortogonality (an example, a command "ls" with some sorting
> rule):
> "ls" has some intrinsic sort capabilities, but not very complete. In
> the same way, lots of commands have several sorting capabilities,
> always limited and duplicated to the others
> "sort" command could have a complete set of capabilities, but then it
> must parse the "ls" output, something not always easy.
> It seems we have not a shell who allows a powerful "sort" command or
> librarian to merge easily with any "data provider" (ls).
>
> Too much text oriented:
> Most commands tries to provide an output in a nice textual, human
> readable form. That is good if the command is used in a standalone
> way, but bad when this output must be used by the next command.
> I'm wondering if it is not better to have "binary" input/output for
> commands (TLV, XML or something similar). The shell will be the
> responsable of convert this binary data from/to human readable at the
> start/end of the pipelines.
>
> Thinking on how to solve these and similar problems, or even in the
> development of a new shell variant, any comment, suggestion or link
> about these subjects is welcome.
>
> Kind regards.

Another problem related to increase orthogonality is that most part of
commands uses only one input stream, when their functionality could be
improved using two or more (if shell support in a easy way them).

One simplified example could be: list (ls) some files and filter the
ones who match some characteristics (like size in some range, or be a
file of some kind according to "file" command or "inode" in some range
according to "stat" result).

The logical steps are:

a) get the initial list of files using "ls". I will name it "stream1"

b) for each item in "stream1", get the value of the characteristic to
be evaluated, directly from the "ls" output or using another command
(stat, file, ...).

c) make a calculous over the previous characteristic (the size belong
to some range, the file type is equat to ASCII, ...) using a command
to evaluate expressions. The result of the expression must be "true"
or "false". I will name the output of this evaluation "stream2"

d) Now, a nice "grep" command should take into account two streams.
The first one is the stream with data to be filtered (stream1), the
second one is the stream2 who controls the filtering. However, current
"grep" allows only one input stream.

That is, current "grep" is restricted to filter their input stream
according to data in this stream and using only their own capabilities
to evaluate characteristics on this data (match a regular expression).
That is a lack of ortogonality, because a) it doesn't allows to use an
external and powerful expression evaluator ( arithmetic, regular
expressions, ...); b) it needs that stream to be filtered contains the
values of the expression to evaluate.

It seems we could need commands with more input streams, and shells
who supports them in a easy way.

Kind regards.

From: patata on 8 Aug 2010 07:22

On Aug 8, 12:52 am, Barry Margolin <bar...(a)alum.mit.edu> wrote:
> In article <i3jeua$9o...(a)news.m-online.net>,
> Janis Papanagnou <janis_papanag...(a)hotmail.com> wrote:
>
> > > I'm wondering if it is not better to have "binary" input/output for
> > > commands (TLV, XML or something similar).
>
> > Why pass binary information; there's really no necessity it seems.[*]
>
> > (And, BTW, XML is not binary; rather it's bloated text.)
>
> What he's actually getting at is that the data is structured, rather
> than free-form.
>
> One of the problems with the usual plain text data used with Unix pipes
> is that it's difficult to deal with fields that contain the field
> delimiter. Most text processing tools (e.g. sed, cut, awk) don't
> support any kind of quoting or escaping in the input data.
>
> 25 years ago, when Multics development was winding down and our team was
> assigned a new project, we came up with a similar idea of replacing the
> text-based output of commands with a database approach. But the new OS
> we were designing never really got off the ground.
>
> --
> Barry Margolin, bar...(a)alum.mit.edu
> Arlington, MA
> *** PLEASE post questions in newsgroups, not directly to me ***
> *** PLEASE don't copy me on replies, I'll read them in the group ***

Thanks for your post.

It is very interesting to known this subject has been present during
years.

According to my understanding of these posts, it seems everybody
agrees in the advantages of having an standard syntax for input/output
of commands.

It seems that the initial ideas about this syntax is lines with
fields, using some separator character.

However, that means data with only two levels of abstraction: we have
a list of items (each one in a line) and each item is composed of a
list of fields (using some character to split one field of next one).
That is, a stream is logically a "list of list of strings".

I wonder if this is enough or we need something that allows, if
necessary, more levels of abstraction.

Kind regards.

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Regular Expression
Next: Symbol for "not" within []