Prev: POSIX Shell and Parameter Expansion
Next: awk: is it possible to use some charcters' combination as the field-separator?
From: Stephane CHAZELAS on 22 Feb 2010 09:06 2010-02-21, 16:16(+00), pk: > Stephane CHAZELAS wrote: > >> 2010-02-21, 10:35(+00), pk: >> [...] >>> awk -F '{|}' ... >> >> That's incorrect POSIX syntax (leads to unspecified results), >> you want: >> >> awk -F '\{|\}' >> >> with a POSIX awk (like with gawk when POSIXLY_CORRECT is on). > > That's incorrect as well, and takes you back to the '{|}' case; if you go > that route, you need > > awk -F '\\{|\\}' > > due to the way awk scans strings. s/awk/gawk/ > I used just '{|}' because most awk nowadays either do NOT support {} as s/most/GNU/ (except in POSIX mode). > regex characters (though it's mandated by POSIX), and those that do are > smart enough to see that there's nothing to "quantify" there and take the { > and } literally. Except GNU awk: $ POSIXLY_CORRECT=1 gawk -F '{|}' '{print $1}' gawk: fatal: Invalid preceding regular expression: /{|}/ awk -F '\{|\}' should seems to be OK with every POSIX awk except GNU awk. I'm not sure if it's a gawk bug or not as the POSIX spec is unclear to me on that point, but I agree that awk -F '\\{|\\}' is better as it works on all POSIX awks including GNU awk. -- St�phane
From: pk on 22 Feb 2010 09:33 Stephane CHAZELAS wrote: >> awk -F '\\{|\\}' >> >> due to the way awk scans strings. > > s/awk/gawk/ I must admit that I had always thought that the "double pass" on strings as described in the GNU awk manual was the default for awk in general, not just gawk. But it seems indeed that other awks do accept the version with single backslashes, so I stand corrected. Thanks. > should seems to be OK with every POSIX awk except GNU awk. I'm > not sure if it's a gawk bug or not as the POSIX spec is unclear > to me on that point, POSIX seems indeed to mandate that: "...If the right-hand operand is any expression other than the lexical token ERE, the string value of the expression shall be interpreted as an extended regular expression, including the escape conventions described above. Note that these same escape conventions shall also be applied in determining the value of a string literal (the lexical token STRING), and thus shall be applied a second time when a string literal is used in this context." ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ However it's true that the text above refers to when a string (not a literal ERE) is used in the context of "~" and "!~" operator only. I guess each awk implementation had its take on whether the above should apply in other contexts (like FS) or not. > but I agree that > > awk -F '\\{|\\}' > > is better as it works on all POSIX awks including GNU awk. Agreed. That should work in any case.
From: Geoff Clare on 25 Feb 2010 12:33
Stephane CHAZELAS wrote: > not sure if it's a gawk bug or not as the POSIX spec is unclear > to me on that point, but I agree that > > awk -F '\\{|\\}' > > is better as it works on all POSIX awks including GNU awk. Looks like a defect in POSIX to me. I have reported it. http://austingroupbugs.net/view.php?id=224 -- Geoff Clare <netnews(a)gclare.org.uk> |