Prev: subtraction files
Next: copy files
From: pk on 25 May 2010 09:32 Janis Papanagnou wrote: > pk wrote: >> Janis Papanagnou wrote: >> >>> I am looking for a regexp that matches the ANSI terminal escape >>> sequences (ESC [ ...) (for xterm), or alternatively for a tool (Linux) >>> that replaces ANSI terminal sequences by an arbitrary chosen fixed >>> replacement. Thanks. >> >> I've never done that, but I suppose any regex flavor that can match the >> escape character would do, so for example with GNU sed's ERE to match >> coloring sequences: >> >> \x1b\[[0-9]+;[0-9]+m >> >> or something similar. >> >> $ GREEN='\033[01;32m'; YELLOW='\033[01;33m' >> $ printf "$GREEN - $YELLOW\n" | sed -r 's/\x1b\[[0-9]+;[0-9]+m/FOO/g' >> FOO - FOO >> >> Apologies if I didn't understand correctly what you're after. > > Sorry for having been unclear. > > I know that I just need some BRE/ERE tool, like sed, to substitute the > actual ANSI codes. I was interested in a regexp that covers all ANSI > sequences in one regexp expression because, actually, I don't know what > the telnet server will emit. (Please see also my response to Andrew.) See if this expect tip helps: http://wiki.tcl.tk/9673
From: Janis Papanagnou on 25 May 2010 10:39 pk wrote: > Janis Papanagnou wrote: > >> pk wrote: >>> Janis Papanagnou wrote: >>> >>>> I am looking for a regexp that matches the ANSI terminal escape >>>> sequences (ESC [ ...) (for xterm), or alternatively for a tool (Linux) >>>> that replaces ANSI terminal sequences by an arbitrary chosen fixed >>>> replacement. Thanks. >>> I've never done that, but I suppose any regex flavor that can match the >>> escape character would do, so for example with GNU sed's ERE to match >>> coloring sequences: >>> >>> \x1b\[[0-9]+;[0-9]+m >>> >>> or something similar. >>> >>> $ GREEN='\033[01;32m'; YELLOW='\033[01;33m' >>> $ printf "$GREEN - $YELLOW\n" | sed -r 's/\x1b\[[0-9]+;[0-9]+m/FOO/g' >>> FOO - FOO >>> >>> Apologies if I didn't understand correctly what you're after. >> Sorry for having been unclear. >> >> I know that I just need some BRE/ERE tool, like sed, to substitute the >> actual ANSI codes. I was interested in a regexp that covers all ANSI >> sequences in one regexp expression because, actually, I don't know what >> the telnet server will emit. (Please see also my response to Andrew.) > > See if this expect tip helps: > > http://wiki.tcl.tk/9673 Not sure. Quoting from the link (first example)... regexp -- {^\x1b(\[|\(|\))[;?0-9]*[0-9A-Za-z]} ${data} match It seems that ANSI sequences can terminate in a digit. How could one distinguish in a sequence like, say, \x1b[0A whether the A is part of the ANSI sequence or part of the subsequent data. Janis
From: Ben Bacarisse on 25 May 2010 12:43 Janis Papanagnou <janis_papanagnou(a)hotmail.com> writes: > pk wrote: <snip> >> See if this expect tip helps: >> >> http://wiki.tcl.tk/9673 > > Not sure. Quoting from the link (first example)... > > regexp -- {^\x1b(\[|\(|\))[;?0-9]*[0-9A-Za-z]} ${data} match > > It seems that ANSI sequences can terminate in a digit. A quick scan of some online documents suggest that this is not so. All the sequences I've see end in a letter. Wikipedia suggest the last byte must be between ASCII @ and ~ inclusive. If you are prepared to use a very general regexp that will strip out ill-formed escape sequences you could start with \x1b\[[^@-~]*[@-~] You then need to catch the two-byte sequences: \x1b\[[^@-~]*[@-~]|\x1b[@-~] This will go wrong for those sequences that can include quoted strings like those that set key mappings. Maybe you can ignore these. There is also a one-byte alternative to \x1b[ which is \x9b so you might want to try: (\x1b\[|\x9b)[^@-~]*[@-~]|\x1b[@-~] -- Ben.
From: pk on 25 May 2010 12:49 Ben Bacarisse wrote: > Janis Papanagnou <janis_papanagnou(a)hotmail.com> writes: >> pk wrote: > <snip> >>> See if this expect tip helps: >>> >>> http://wiki.tcl.tk/9673 >> >> Not sure. Quoting from the link (first example)... >> >> regexp -- {^\x1b(\[|\(|\))[;?0-9]*[0-9A-Za-z]} ${data} match >> >> It seems that ANSI sequences can terminate in a digit. > > A quick scan of some online documents suggest that this is not so. All > the sequences I've see end in a letter. Wikipedia suggest the last byte > must be between ASCII @ and ~ inclusive. > > If you are prepared to use a very general regexp that will strip out > ill-formed escape sequences you could start with > > \x1b\[[^@-~]*[@-~] > > You then need to catch the two-byte sequences: > > \x1b\[[^@-~]*[@-~]|\x1b[@-~] > > This will go wrong for those sequences that can include quoted strings > like those that set key mappings. Maybe you can ignore these. > > There is also a one-byte alternative to \x1b[ which is \x9b so you might > want to try: > > (\x1b\[|\x9b)[^@-~]*[@-~]|\x1b[@-~] For reference, here are some tables with most ANSI escape sequences: http://isthe.com/chongo/tech/comp/ansi_escapes.html http://ascii-table.com/ansi-escape-sequences.php
From: Janis Papanagnou on 25 May 2010 13:52
Ben Bacarisse wrote: > Janis Papanagnou <janis_papanagnou(a)hotmail.com> writes: >> pk wrote: > <snip> >>> See if this expect tip helps: >>> >>> http://wiki.tcl.tk/9673 >> Not sure. Quoting from the link (first example)... >> >> regexp -- {^\x1b(\[|\(|\))[;?0-9]*[0-9A-Za-z]} ${data} match >> >> It seems that ANSI sequences can terminate in a digit. > > A quick scan of some online documents suggest that this is not so. All > the sequences I've see end in a letter. Wikipedia suggest the last byte > must be between ASCII @ and ~ inclusive. > > If you are prepared to use a very general regexp that will strip out > ill-formed escape sequences you could start with > > \x1b\[[^@-~]*[@-~] > > You then need to catch the two-byte sequences: > > \x1b\[[^@-~]*[@-~]|\x1b[@-~] > > This will go wrong for those sequences that can include quoted strings > like those that set key mappings. Maybe you can ignore these. Yes, I think I can ignore those. > > There is also a one-byte alternative to \x1b[ which is \x9b so you might > want to try: > > (\x1b\[|\x9b)[^@-~]*[@-~]|\x1b[@-~] > Looks good, and seems to work. Thanks, Ben. Thanks also to Andrew and pk. Just an additional note for those who try that expression and observe problems; setting LANG=C might fix some issues in non-C locales. Janis |