From: greg6363 on
I know how to remove duplicates from a file by using the following
code:

proc sort in=dataset1 out=dataset2 nodupkey;
by AccountNumber;
run;

But now, I have a situation where I have duplicate records but I want
to keep the last record of the duplicate according to a particular
variable. I can't seem to figure out how to code it. Anyone run
across this situation before? Any assistance would be greatly
appreciated. Thanks.
From: Reeza on
On Apr 26, 11:03 am, greg6363 <gregtlaugh...(a)gmail.com> wrote:
> I know how to remove duplicates from a file by using the following
> code:
>
> proc sort in=dataset1 out=dataset2 nodupkey;
> by AccountNumber;
> run;
>
> But now, I have a situation where I have duplicate records but I want
> to keep the last record of the duplicate according to a particular
> variable.  I can't seem to figure out how to code it.  Anyone run
> across this situation before?  Any assistance would be greatly
> appreciated.  Thanks.

can you sort so that record would be first? Then do the sort with no
duprec

ie by accountnumber field (descending)?

I don't recall if the descending goes before or after the variable
name at the moment.
From: Jim Groeneveld on
Hi Greg,

proc sort in=dataset1 out=dataset2;
by AccountNumber;
run;

DATA Dataset2;
SET Dataset2;
by AccountNumber;
IF (LAST.AccountNumber);
run;

On the other hand: what is the "last" record? How would PROC SORT sort?
If you have some date or time variable as well, you should also use it:
BY AccountNumber DateVar TimeVar;
This forces the chronologically last record to be kept.

Regards - Jim.
--
Jim Groeneveld, Netherlands
Statistician, SAS consultant
http://jim.groeneveld.eu.tf




greg6363 <gregtlaughlin(a)gmail.com> wrote:
>I know how to remove duplicates from a file by using the
following8code:13proc sort in=dataset1 out=dataset2 nodupkey;<by
AccountNumber;7run;96But now, I have a situation where I have duplicate
records but I want9to keep the last record of the duplicate according to a
particular3variable. I can't seem to figure out how to code it. Anyone
run8across this situation before? Any assistance would be
greatly appreciated. Thanks.



From: Dav Vandenbroucke on
On Mon, 26 Apr 2010 11:03:35 -0700 (PDT), greg6363
<gregtlaughlin(a)gmail.com> wrote:

>But now, I have a situation where I have duplicate records but I want
>to keep the last record of the duplicate according to a particular
>variable.

Sort the dataset in an order that will put the duplicate records you
want to keep last. Then do something like:

DATA want;
SET have;
BY sortVar;
IF LAST.sortVar AND NOT FIRST.sortVar;
RUN;

That will keep the records only if they are duplicates.

Dav Vandenbroucke
davanden at cox dot net