From: Rahul on
I'm trying to join two unsorted files and print lines common to both based
on a "key" field. The first field is the key.

cat file1:
A;Ablah
B;Bblah

cat file2:
A;Ablahblah
B;Bblahblah

A join seems to work:
join -t ';' -j 1 file1 file2

A;Ablah;Ablahblah
B;Bblah;Bblahblah

But the moment there is a non matching line the join fails:

e.g.

cat file1:
C:Cblah
A;Ablah
B;Bblah

Is there any way around this? To still output the lines where field1
matches? If not, then can awk etc. handle this situation? I've only used
awk on single files before so not sure....



--
Rahul
From: The Natural Philosopher on
Rahul wrote:
> I'm trying to join two unsorted files and print lines common to both based
> on a "key" field. The first field is the key.
>
> cat file1:
> A;Ablah
> B;Bblah
>
> cat file2:
> A;Ablahblah
> B;Bblahblah
>
> A join seems to work:
> join -t ';' -j 1 file1 file2
>
> A;Ablah;Ablahblah
> B;Bblah;Bblahblah
>
> But the moment there is a non matching line the join fails:
>
> e.g.
>
> cat file1:
> C:Cblah
> A;Ablah
> B;Bblah
>
> Is there any way around this? To still output the lines where field1
> matches? If not, then can awk etc. handle this situation? I've only used
> awk on single files before so not sure....
>
>
>
vaguely remember doing this with awk years ago..
From: Anton Treuenfels on

"Rahul" <nospam(a)nospam.invalid> wrote in message
news:Xns9D8FAB1CD93556650A1FC0D7811DDBC81(a)188.40.43.230...
> I'm trying to join two unsorted files and print lines common to both based
> on a "key" field. The first field is the key.
>
> cat file1:
> A;Ablah
> B;Bblah
>
> cat file2:
> A;Ablahblah
> B;Bblahblah
>
> Is there any way around this? To still output the lines where field1
> matches? If not, then can awk etc. handle this situation? I've only used
> awk on single files before so not sure....

I'm going to assume any particular key field can appear any number of times
in either file in any line and that the rest of each line can vary and that
you only want one copy of each line from either file.

One way is to read the first file twice and the second file once:

BEGIN { ARGV[ARGC++] = ARGV[1] }

FILENAME == "file1" {
file1keys[ $1 ] = ".T."
if ($1 in file2keys)
print
}

FILENAME == "file2" {
file2keys[ $1 ] = ".T."
if ( $1 in file1keys )
print
}

Of course this will print out all the matching lines in file2 before any in
file1. You can also of course make the order of filenames on the command
line anything you want.

- Anton Treuenfels

From: pk on
Rahul wrote:

> I'm trying to join two unsorted files and print lines common to both based
> on a "key" field. The first field is the key.
>
> cat file1:
> A;Ablah
> B;Bblah
>
> cat file2:
> A;Ablahblah
> B;Bblahblah
>
> A join seems to work:
> join -t ';' -j 1 file1 file2
>
> A;Ablah;Ablahblah
> B;Bblah;Bblahblah
>
> But the moment there is a non matching line the join fails:
>
> e.g.
>
> cat file1:
> C:Cblah
> A;Ablah
> B;Bblah
>
> Is there any way around this? To still output the lines where field1
> matches? If not, then can awk etc. handle this situation? I've only used
> awk on single files before so not sure....

Assuming no repeated keys, try

awk -F \; -v OFS=\; 'NR==FNR{a[$1]=$2;next}
$1 in a{print $1, a[$1], $2}' file1 file2
From: Rahul on
"Anton Treuenfels" <teamtempest(a)yahoo.com> wrote in
news:buqdnWDccvgqH5HRnZ2dnUVZ_i2dnZ2d(a)earthlink.com:

> I'm going to assume any particular key field can appear any number of
> times in either file in any line and that the rest of each line can
> vary and that you only want one copy of each line from either file.
>

Thanks Anton for this general solution. My problem is simpler since keys
are non-repeated. My bad, I should have mentioned.

--
Rahul