Dealing with different number of fields in a file [Shell]

Prev: nyd frequency
Next: chown problems

From: ezhil on 28 Apr 2010 09:29

Hi,

I have a file with 4000 rows. The file consists of one filed (in a
single row) followed by 4 fields (multiple rows but not a fixed number
of rows). Something like:

g1
fs01 7 800 0.01
fs03 7 805 0.5
fs05 7 900 0.001
g2
as1 10 231 0.06
as7 10 335 0.01
as11 10 400 0.8

I would like to print g1 and then check for the minimum value in
column 4 and then print that minimum value.

g1
fs05 7 900 0.001
g2
as7 10 335 0.01

I was struggling a bit and then started trying with printing the min
value at least not the whole line (need to mark the line where I find
the min value but I don't know how to do this).
g1
0.001

So, I have tried,

awk '{if(NF==1) {print $0} else {min=1; while(NF > 1) {if($4 < min)
min=$4}; {printf("%s\n", min)}} }' file1

This prints g1 and then stays blank for ever. Do I need to set up RS
and NF differently for this? Could you please help me to do this?

Thanks in advance.

Kind regards,
Ezhil

From: pk on 28 Apr 2010 09:43

ezhil wrote:

> I have a file with 4000 rows. The file consists of one filed (in a
> single row) followed by 4 fields (multiple rows but not a fixed number
> of rows). Something like:
>
> g1
> fs01 7 800 0.01
> fs03 7 805 0.5
> fs05 7 900 0.001
> g2
> as1 10 231 0.06
> as7 10 335 0.01
> as11 10 400 0.8
>
> I would like to print g1 and then check for the minimum value in
> column 4 and then print that minimum value.
>
> g1
> fs05 7 900 0.001
> g2
> as7 10 335 0.01
>
> I was struggling a bit and then started trying with printing the min
> value at least not the whole line (need to mark the line where I find
> the min value but I don't know how to do this).
> g1
> 0.001
>
> So, I have tried,
>
> awk '{if(NF==1) {print $0} else {min=1; while(NF > 1) {if($4 < min)
> min=$4}; {printf("%s\n", min)}} }' file1
>
> This prints g1 and then stays blank for ever. Do I need to set up RS
> and NF differently for this? Could you please help me to do this?

Try this, assuming that the values in 4th columns are always less than 1000
(arbitrary value; use another if it's not appropriate)

awk 'NF==1{if(min)print minline;min=1000;print;next}
$4 < min {min=$4;minline=$0}
END{if(min)print minline}' file

If you don't know how big the values can be, then you can do this to base
them only on real data:

awk 'NF==1{if(min)print minline;min="";new = 1;print;next}
new { min = $4; minline = $0; new = 0; next }
$4 < min {min=$4;minline=$0}
END{if(min)print minline}' file

this way, the value in the first line of each block is assumed to be the
initial minimum, and that may or not be changed by values in subsequent
lines.
You can do the same thing by using the dreaded getline:

awk 'NF==1{if(min)print minline;print;getline;min=$4;minline=$0;next}
$4 < min {min=$4;minline=$0}
END{if(min)print minline}' file

The caveat with the getline version is that it requires that each block has
at least one data line.

From: pk on 28 Apr 2010 09:52

pk wrote:

> Try this, assuming that the values in 4th columns are always less than
> 1000 (arbitrary value; use another if it's not appropriate)
>
> awk 'NF==1{if(min)print minline;min=1000;print;next}
> $4 < min {min=$4;minline=$0}
> END{if(min)print minline}' file
>
> If you don't know how big the values can be, then you can do this to base
> them only on real data:
>
> awk 'NF==1{if(min)print minline;min="";new = 1;print;next}
> new { min = $4; minline = $0; new = 0; next }
> $4 < min {min=$4;minline=$0}
> END{if(min)print minline}' file
>
> this way, the value in the first line of each block is assumed to be the
> initial minimum, and that may or not be changed by values in subsequent
> lines.
> You can do the same thing by using the dreaded getline:
>
> awk 'NF==1{if(min)print minline;print;getline;min=$4;minline=$0;next}
> $4 < min {min=$4;minline=$0}
> END{if(min)print minline}' file
>
> The caveat with the getline version is that it requires that each block
> has at least one data line.

Also, all that code assumes that the minimum is never exactly 0.

From: Jon LaBadie on 28 Apr 2010 09:57

ezhil wrote:
> Hi,
>
> I have a file with 4000 rows. The file consists of one filed (in a
> single row) followed by 4 fields (multiple rows but not a fixed number
> of rows). Something like:
>
> g1
> fs01 7 800 0.01
> fs03 7 805 0.5
> fs05 7 900 0.001
> g2
> as1 10 231 0.06
> as7 10 335 0.01
> as11 10 400 0.8
>
> I would like to print g1 and then check for the minimum value in
> column 4 and then print that minimum value.
>
> g1
> fs05 7 900 0.001
> g2
> as7 10 335 0.01
>
> I was struggling a bit and then started trying with printing the min
> value at least not the whole line (need to mark the line where I find
> the min value but I don't know how to do this).
> g1
> 0.001
>
> So, I have tried,
>

I'm sure others will supply something tighter, but here is a framework
to try.
awk '
NF == 1 {
print label, minval
label = $1
minval = 0
minline = ""
}

NF == 4 {
if (minline == "") {
minline = $0
minval = $4
} else
if ($4 < minval) {
minline = $0
minval = $4
}
}

END {
if (minline != "")
print label, minval
}' "$1"

From: pk on 28 Apr 2010 09:56

pk wrote:

> Also, all that code assumes that the minimum is never exactly 0.

....which means that to lift that assumption it can be modified by changing
all the

if(min)

tests into

if(minline"")

and adding minline="" where min is initialized.

(sorry for the multiple follow-ups)

| Next | Last
Pages: 1 2
Prev: nyd frequency
Next: chown problems