ANSI terminal escape sequence regexp [Shell]

Prev: subtraction files
Next: copy files

From: John DuBois on 25 May 2010 14:41

In article <htgnf1$s5d$1(a)news.m-online.net>,
Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote:
>It seems that ANSI sequences can terminate in a digit. How could one
>distinguish in a sequence like, say, \x1b[0A whether the A is part of
>the ANSI sequence or part of the subsequent data.

No, I don't think they can. The patterns I've used in the past for excising
ANSI sequences:

gsub(/\033\[[^a-zA-Z]*./, "")
gsub(/\033./, "")

Apparently the terminating character can actually be characters 64 through 95,
not just letters, though I haven't seen that.
And of course you may also encounter the single-character CSI, character 155,
in place of \033[.

John

--
John DuBois spcecdt(a)armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/

From: Ben Bacarisse on 25 May 2010 17:15

pk <pk(a)pk.invalid> writes:
<snip>
> For reference, here are some tables with most ANSI escape sequences:
>
> http://isthe.com/chongo/tech/comp/ansi_escapes.html
> http://ascii-table.com/ansi-escape-sequences.php

Yes, I found both of those but they seem less that comprehensive (my
test being if they tell you about \e[J and \e[1J as well as \e2J).

ECMA-48 seems to be the most definitive reference I can find online. It
gives a more restrictive pattern:

(\x1b\[|\x9b)[\x30-\x3f]*[\x40-\x7e]

In fact, trailing bytes in the range \x70-\7e ('p' to '~' in ASCII) are
reserved for private or experimental use so this could be made even more
restricted.

--
Ben.

From: Janis Papanagnou on 25 May 2010 18:25

Ben Bacarisse wrote:
> pk <pk(a)pk.invalid> writes:
> <snip>
>> For reference, here are some tables with most ANSI escape sequences:
>>
>> http://isthe.com/chongo/tech/comp/ansi_escapes.html
>> http://ascii-table.com/ansi-escape-sequences.php
>
> Yes, I found both of those but they seem less that comprehensive (my
> test being if they tell you about \e[J and \e[1J as well as \e2J).
>
> ECMA-48 seems to be the most definitive reference I can find online. It
> gives a more restrictive pattern:
>
> (\x1b\[|\x9b)[\x30-\x3f]*[\x40-\x7e]

I wonder, though, why, e.g.,

ESC ( B
ESC =
ESC >

(which, incidentally, are all in the data that I parse) are not covered
by the pattern that you've found in the ECMA-48 reference.

> In fact, trailing bytes in the range \x70-\7e ('p' to '~' in ASCII) are
> reserved for private or experimental use so this could be made even more
> restricted.
>

BTW, in one of the references there are also escape sequences that seems
to be terminated by a digit; ESC 7 and ESC 8, for example.

Janis

From: Ben Bacarisse on 25 May 2010 19:06

Janis Papanagnou <janis_papanagnou(a)hotmail.com> writes:

> Ben Bacarisse wrote:
>> pk <pk(a)pk.invalid> writes:
>> <snip>
>>> For reference, here are some tables with most ANSI escape sequences:
>>>
>>> http://isthe.com/chongo/tech/comp/ansi_escapes.html
>>> http://ascii-table.com/ansi-escape-sequences.php
>>
>> Yes, I found both of those but they seem less that comprehensive (my
>> test being if they tell you about \e[J and \e[1J as well as \e2J).
>>
>> ECMA-48 seems to be the most definitive reference I can find online. It
>> gives a more restrictive pattern:
>>
>> (\x1b\[|\x9b)[\x30-\x3f]*[\x40-\x7e]
>
> I wonder, though, why, e.g.,
>
> ESC ( B
> ESC =
> ESC >
>
> (which, incidentally, are all in the data that I parse) are not covered
> by the pattern that you've found in the ECMA-48 reference.

What I quoted was a pattern for what ECMA-48 calls control sequences.
There are four other categories (the C0 set, the C1 set, independent
control functions and control strings) and I have not gone through and
worked them all out. I think there is a lot of history being codified
here.

>> In fact, trailing bytes in the range \x70-\7e ('p' to '~' in ASCII) are
>> reserved for private or experimental use so this could be made even more
>> restricted.
>>
>
> BTW, in one of the references there are also escape sequences that seems
> to be terminated by a digit; ESC 7 and ESC 8, for example.

That may well be possible. I was only describing "control sequences" --
those that start with CSI (the Control Sequence Introducer) \e[.

There aught to be an ANSI document, of course, but they are not always
easily available. It might be easier to read though than ECMA-48 which
is rather hard going.

--
Ben.

From: stan on 26 May 2010 17:23

Janis Papanagnou wrote:
> Ben Bacarisse wrote:
>> pk <pk(a)pk.invalid> writes:
>> <snip>
>>> For reference, here are some tables with most ANSI escape sequences:
>>>
>>> http://isthe.com/chongo/tech/comp/ansi_escapes.html
>>> http://ascii-table.com/ansi-escape-sequences.php
>>
>> Yes, I found both of those but they seem less that comprehensive (my
>> test being if they tell you about \e[J and \e[1J as well as \e2J).
>>
>> ECMA-48 seems to be the most definitive reference I can find online. It
>> gives a more restrictive pattern:
>>
>> (\x1b\[|\x9b)[\x30-\x3f]*[\x40-\x7e]
>
> I wonder, though, why, e.g.,
>
> ESC ( B
> ESC =
> ESC >

I don't know of a handy online reference but I have an old copy of an
actual VT100 user guide with a pretty good description that seems
comprehensive. For example

ESC ( B is shown as ANSI SCS control which switches from G0 to G1
char set.

ESC = is shown as DECKPAM Keypad App Mode (DEC private)
ESC > is shown as DECKPNM Keypad Numeric Mode (DEC private)

> (which, incidentally, are all in the data that I parse) are not covered
> by the pattern that you've found in the ECMA-48 reference.
>
>> In fact, trailing bytes in the range \x70-\7e ('p' to '~' in ASCII) are
>> reserved for private or experimental use so this could be made even more
>> restricted.
>>
> BTW, in one of the references there are also escape sequences that seems
> to be terminated by a digit; ESC 7 and ESC 8, for example.

Ok, I'm back and it seems there is a copy at:

www.piesoftwareinc.co.uk/textonly/VT100_User_Guide.pdf

I don't know if it helps but it has a lot of pages :)

First | Prev |
Pages: 1 2 3
Prev: subtraction files
Next: copy files