Stuck on a very basic regex. [Shell]

Prev: Posix fortran and the gnu toolchain
Next: Need help with awk regular expression to process chess games

From: Harry on 11 Apr 2010 02:49

Hi,

My intent here is to be able to look for fully upper-cased words only,
and filter out the rest.
I was, thus, expecting the following egrep to fail.

$ echo "Abby" | egrep '^[A-Z]+$'
Abby

But for some reason, egrep is able to match the extended regex for the
mixed-case input.

Does anyone know what I'm missing here?

Many thanks in advance,
/HS

PS:
I'm using Gnu bash 4.0.23 and Gnu grep 2.5.3 on Fedora 11.

From: Sidney Lambe on 11 Apr 2010 03:08

On comp.unix.shell, Harry <simonsharry(a)gmail.com> wrote:
> Hi,
>
> My intent here is to be able to look for fully upper-cased words only,
> and filter out the rest.
> I was, thus, expecting the following egrep to fail.
>
> $ echo "Abby" | egrep '^[A-Z]+$'
> Abby
>
> But for some reason, egrep is able to match the extended regex for the
> mixed-case input.
>
> Does anyone know what I'm missing here?
>
> Many thanks in advance,
> /HS
>
> PS:
> I'm using Gnu bash 4.0.23 and Gnu grep 2.5.3 on Fedora 11.

$echo Abby | egrep -v '[a-z]+'
$echo ABBY | egrep -v '[a-z]+'
ABBY

Sid

From: Huibert Bol on 11 Apr 2010 03:28

Harry wrote:

> My intent here is to be able to look for fully upper-cased words only,
> and filter out the rest.
> I was, thus, expecting the following egrep to fail.
>
> $ echo "Abby" | egrep '^[A-Z]+$'
> Abby
>
> But for some reason, egrep is able to match the extended regex for the
> mixed-case input.

Ranges are only meaningful in the "POSIX" locale, use the character
classses instead:

grep '^[[:upper:]]+$'

--
Huibert
"Okay... really not something I needed to see." --Raven

From: Harry on 11 Apr 2010 03:55

On Apr 11, 12:28 pm, Huibert Bol <huibert....(a)quicknet.nl> wrote:
> Ranges are only meaningful in the "POSIX" locale, use the character
> classses instead:
>
> grep '^[[:upper:]]+$'

Didn't know that, thanks! (Also wondering btw, how come I never ran
into this issue before!)

Here's what my locale is:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

I tried the easier way, didn't work:

$ LC_ALL=POSIX echo "Abby" | egrep '^[A-Z]+$'
Abby

From: Harry on 11 Apr 2010 04:00

On Apr 11, 12:08 pm, Sidney Lambe <sidneyla...(a)nospam.invalid> wrote:
> $echo Abby | egrep -v '[a-z]+'
> $echo ABBY | egrep -v '[a-z]+'
> ABBY

Both fail to match on my system. I get an exit code of 1.

Secondly, the above regex would match pure numbers also! As I said, I
was looking to match only (and only) upper-case words.

| Next | Last
Pages: 1 2 3
Prev: Posix fortran and the gnu toolchain
Next: Need help with awk regular expression to process chess games