From: Andre Majorel on 3 Aug 2010 08:50 On 2010-08-03 19:37 +0800, Zhang Weiwu wrote: > On 2010???08???03??? 17:53, Andre Majorel wrote: > >> > $ printf 'a\nb' | grep -zo a.*b > >> > > >> > (The above should output something /if/ -z would make egrep > >> > not consider \n as string terminator. But it has produced no > >> > output) > >> > > But grep -z does. This would seem to be an undocumented > > limitation of -o. > > > > No it doesn't. > > $ printf 'a\nb' | grep -z 'a.*b' > $ You're welcome. What version of grep ? -- Andr� Majorel <http://www.teaser.fr/~amajorel/> If the Debian project published their users' email addresses, we'd be getting spam. So I'm glad they don't. -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/20100803123949.GA4007(a)aym.net2.nerim.net
From: Bob McGowan on 3 Aug 2010 13:10 On 08/03/2010 05:39 AM, Andre Majorel wrote: > On 2010-08-03 19:37 +0800, Zhang Weiwu wrote: >> On 2010???08???03??? 17:53, Andre Majorel wrote: >>>>> $ printf 'a\nb' | grep -zo a.*b >>>>> >>>>> (The above should output something /if/ -z would make egrep >>>>> not consider \n as string terminator. But it has produced no >>>>> output) >>>> >>> But grep -z does. This would seem to be an undocumented >>> limitation of -o. >>> >> >> No it doesn't. >> >> $ printf 'a\nb' | grep -z 'a.*b' >> $ > > You're welcome. What version of grep ? > The -z "sort of" does/doesn't work for me. If I do this: $ perl -e 'print "a\nb\0"'| grep -z 'a.*b' $ There's no output. But change it like this: $ perl -e 'print "a\nb\0"'| grep -z 'a' a b$ It found, and printed, the newline containing string. I would suspect the regex engine is still honoring '. (dot) does not match newline' convention but is OK with literals, if present. If, instead of using the '.*' pattern, I embed a literal newline, it also works: $ perl -e 'print "a\nb\0"'| grep -z 'a > b' a b$ And just to prove the point, it does work with multiple null terminated lines: perl -e 'print "a\nb\0not here\0"'| grep -z 'a > b' a b$ I'm using GNU grep 2.5.3 -- Bob McGowan -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/4C584A92.70102(a)symantec.com
From: Andre Majorel on 3 Aug 2010 14:30 On 2010-08-03 09:57 -0700, Bob McGowan wrote: > On 08/03/2010 05:39 AM, Andre Majorel wrote: > > On 2010-08-03 19:37 +0800, Zhang Weiwu wrote: > >> On 2010???08???03??? 17:53, Andre Majorel wrote: > >>>>> $ printf 'a\nb' | grep -zo a.*b > >>>>> > >>>>> (The above should output something /if/ -z would make egrep > >>>>> not consider \n as string terminator. But it has produced no > >>>>> output) > >>>> > >>> But grep -z does. This would seem to be an undocumented > >>> limitation of -o. > >> > >> No it doesn't. > >> > >> $ printf 'a\nb' | grep -z 'a.*b' > >> $ > > > > You're welcome. What version of grep ? > > The -z "sort of" does/doesn't work for me. If I do this: > > $ perl -e 'print "a\nb\0"'| grep -z 'a.*b' > $ $ printf 'a\nb\0'| grep -z 'a.*b' a b$ grep --version GNU grep 2.5.3 Fun, eh ? Maybe the answer is in there : $ locale LANG= LC_CTYPE=en_US LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE=C LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= > There's no output. But change it like this: > > $ perl -e 'print "a\nb\0"'| grep -z 'a' > a > b$ > > It found, and printed, the newline containing string. I would suspect > the regex engine is still honoring '. (dot) does not match newline' > convention but is OK with literals, if present. My grep -z acts like it used a regexp engine where "." matches newline. Only when -o is in effect and there is a newline in the match, there's no output. But the exit status is still good : $ printf 'a\nb\0'| (grep -z 'a.*b' && printf 'st=%d chars=' $? >&2) | wc -c st=0 chars=4 $ printf 'a\nb\0'| (grep -oz 'a.*b' && printf 'st=%d chars=' $? >&2) | wc -c st=0 chars=0 -- Andr� Majorel <http://www.teaser.fr/~amajorel/> No one ever sends you any email ? Report a bug in Debian ! -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/20100803182837.GC4007(a)aym.net2.nerim.net
From: Bob McGowan on 3 Aug 2010 17:00 On 08/03/2010 11:28 AM, Andre Majorel wrote: > On 2010-08-03 09:57 -0700, Bob McGowan wrote: >> On 08/03/2010 05:39 AM, Andre Majorel wrote: >>> On 2010-08-03 19:37 +0800, Zhang Weiwu wrote: >>>> On 2010???08???03??? 17:53, Andre Majorel wrote: >>>>>>> $ printf 'a\nb' | grep -zo a.*b >>>>>>> <--deleted--> > Fun, eh ? Maybe the answer is in there : > > $ locale > LANG= > LC_CTYPE=en_US > LC_NUMERIC="POSIX" > LC_TIME="POSIX" > LC_COLLATE=C > LC_MONETARY="POSIX" > LC_MESSAGES="POSIX" > LC_PAPER="POSIX" > LC_NAME="POSIX" > LC_ADDRESS="POSIX" > LC_TELEPHONE="POSIX" > LC_MEASUREMENT="POSIX" > LC_IDENTIFICATION="POSIX" > LC_ALL= This does appear to be the "issue". My settings are: $ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE=C LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= > >> There's no output. But change it like this: >> >> $ perl -e 'print "a\nb\0"'| grep -z 'a' >> a >> b$ >> >> It found, and printed, the newline containing string. I would suspect >> the regex engine is still honoring '. (dot) does not match newline' >> convention but is OK with literals, if present. > I did a sub-shell and reset all the variables to match yours, and, bingo, the wildcard worked. Looking through the list of names, nothing seems 'obvious' as a single contributor. In fact, the LC_ names all seem to be specific to things that would not necessarily impact the regex operation. So, I picked LANG as a starting point and reset it, *only*, to empty. And got lucky. That is, apparently, the variable that affects how the regex is handled. -- Bob McGowan Symantec US Internationalization -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/4C588249.8010604(a)symantec.com
From: Zhang Weiwu on 5 Aug 2010 21:50 On 2010年08月04日 04:55, Bob McGowan wrote: > In fact, the LC_ names all seem to be specific to things > that would not necessarily impact the regex operation. > It is not totally true. The encoding part might. If it is UTF-8, in theory, [:digit:] should match more than 0-9. It might, for example, mache 一-十 (Chinese digits). -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/4C5B6A10.3070702(a)realss.com
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: udev: renamed network interface eth0 to eth1 Next: Setting up local Debian mirror |