Prev: FAQ 4.14 How can I compare two dates and find the difference?
Next: FAQ 4.13 How do I find the current century or millennium?
From: nobody on 10 Nov 2009 19:19 Given the following data in a flat file: 09020000251 Joe Smith 54 Abbey Road 05020033486 John Jones 98 New York Ave. 07020000279 George Washington 234 Washington Ave. 06020004293 Fred Flintstone 123 Bedrock Road 03020004472 Fred Jones 98 New York Ave. 06020004293 Wilma Flintstone 123 Bedrock Road You can see that Fred and Wilma both share the same customer number and street address. I'm traversing the file and looking for any duplicate customer numbers, such as Wilma and Fred. If there is no duplicate, then just print the record and move on. When I do encounter a duplicate I'm trying to print the records one after the other so the output looks like: Name: Fred Flintstone Address 123 Bedrock Road Customer#: 06020004293 Name: Wilma Flintstone Address 123 Bedrock Road Customer#: 06020004293 Here's what I have so far which almost works, which I doubt is the best technique regardless. #!/usr/bin/perl use strict; use warnings; my $datafile = "$ARGV[0]"; my @file = (); my @fields = (); my $line; my $custno; my $name1; my $addr1; my $line2; my $custno2; my $name2; my $addr2; my $count; open(HFILE, "<$datafile") || die "Cannot open $datafile: $!\n"; while ( <HFILE> ) { push(@file, $_) if $_ =~ /[A-Za-z0-9]/; } close(HFILE); my @sortedFile = sort { $a cmp $b } @file; foreach $line (@sortedFile) { $custno = substr($line, 0, 11); $name1 = substr($line, 14, 19); $addr1 = substr($line, 34, 20); #print "$custno\n"; #print "$name1\n"; #print "$addr1\n"; $count = 0; foreach $line2 (@sortedFile) { $custno2 = substr($line2, 0, 11); if ($custno eq $custno2){ $count++; } if ($count == 2) { print "$custno2\n"; $count = 0; $custno2 = substr($line2, 0, 11); $name2 = substr($line2, 14, 19); $addr2 = substr($line2, 34, 20); print "$custno2\n"; print "$name2\n"; print "$addr2\n"; $count++; last; } } }
From: Dr.Ruud on 10 Nov 2009 20:10 nobody wrote: > Given the following data in a flat file: > > 09020000251 Joe Smith 54 Abbey Road > 05020033486 John Jones 98 New York Ave. > 07020000279 George Washington 234 Washington Ave. > 06020004293 Fred Flintstone 123 Bedrock Road > 03020004472 Fred Jones 98 New York Ave. > 06020004293 Wilma Flintstone 123 Bedrock Road > > > You can see that Fred and Wilma both share the same customer number and > street address. I'm traversing the file and looking for any duplicate > customer numbers, such as Wilma and Fred. perl -MData::Dumper -aF'\s\s+' -nle 'push@{$d{$F[0]}},[@F[1,2]]}{@{$d{$_}}>1or delete$d{$_} for keys%d;print Dumper\%d' flat.txt $VAR1 = { '06020004293' => [ [ 'Fred Flintstone', '123 Bedrock Road' ], [ 'Wilma Flintstone', '123 Bedrock Road' ] ] }; -- Ruud
From: Tad McClellan on 11 Nov 2009 00:18 nobody <nobody(a)nowhere.com> wrote: > my $datafile = "$ARGV[0]"; perldoc -q vars What's wrong with always quoting "$vars"? So then: my $datafile = $ARGV[0]; -- Tad McClellan email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
From: J�rgen Exner on 11 Nov 2009 01:18
nobody <nobody(a)nowhere.com> wrote: >Given the following data in a flat file: > >09020000251 Joe Smith 54 Abbey Road >05020033486 John Jones 98 New York Ave. >07020000279 George Washington 234 Washington Ave. >06020004293 Fred Flintstone 123 Bedrock Road >03020004472 Fred Jones 98 New York Ave. >06020004293 Wilma Flintstone 123 Bedrock Road > > >You can see that Fred and Wilma both share the same customer number and >street address. I'm traversing the file and looking for any duplicate >customer numbers, such as Wilma and Fred. If there is no duplicate, then >just print the record and move on. When I do encounter a duplicate I'm >trying to print the records one after the other so the output looks like: > >Name: Fred Flintstone >Address 123 Bedrock Road >Customer#: 06020004293 > >Name: Wilma Flintstone >Address 123 Bedrock Road >Customer#: 06020004293 > > >Here's what I have so far which almost works, which I doubt is the best >technique regardless. So you don't really care if or how many customers are sharing the same customer id. And each record is printed the same way, no matter if duplicate or not. In that case, yes, your approach of simply sorting the lines seems quite adequate. >#!/usr/bin/perl > >use strict; >use warnings; > >my $datafile = "$ARGV[0]"; Don't quote variables, there is no good reason for it. >my @file = (); >my @fields = (); >my $line; >my $custno; >my $name1; >my $addr1; > >my $line2; >my $custno2; >my $name2; >my $addr2; Don't use global variables unless there is a good reason for it. For almost all these there is no good reason. >my $count; > >open(HFILE, "<$datafile") || die "Cannot open $datafile: $!\n"; > > while ( <HFILE> ) { > push(@file, $_) if $_ =~ /[A-Za-z0-9]/; > } > >close(HFILE); > >my @sortedFile = sort { $a cmp $b } @file; Sorting lexically is the default behaviour of sort() already, no reason to mention it explicitely. >foreach $line (@sortedFile) { > > $custno = substr($line, 0, 11); > $name1 = substr($line, 14, 19); > $addr1 = substr($line, 34, 20); There are other ways to split the line, but this works and looks ok to me. > #print "$custno\n"; > #print "$name1\n"; > #print "$addr1\n"; Are these lines relevant in any way? > $count = 0; > foreach $line2 (@sortedFile) { What on earth are you doing with this inner loop? And what is $count about? You don't use it for anything useful. Your output doesn't distinguish between the first occurence of a customer ID and subsequent occurences. So don't bother about it, just print each record in the sequence as it appears in the sorted array. [snipped] A different approach (just in case you do care if a customer ID is duplicated or not) would be to read all customers into an HoA, using the customer ID as the key for the hash. And then just traverse the whole hash and print each array. jue |