From: Jack Hamilton on
I would use a double DOW loop with NOTSORTED in the BY statements.

See Paul Dorfman's paper "The Magnificent Do".




--
Jack Hamilton
jfh(a)alumni.stanford.org

-----Original Message-----
From: Haigang Zhou <haigang.zhou(a)GMAIL.COM>
Sent: Monday, November 09, 2009 5:29 AM
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: Re: [SAS-L] Delete adjacent obs with repeated info

Hi Barcelona,

I want to delete the first 3 rows of A, middle four rows of B, and the last
five rows of C.

The condition to delete them is that more than three adjacent days report
the same stock price.

Many thanks.

Haigang

2009/11/9 Fern�ndez Rodr�guez, Dani <DFernandez(a)cst.cat>

> Hi Haigang,
>
> I use SAS to 'play' with stock prices datasets but I don�t understand
> what you want to get; you want to delete the first x rows for 'A' shares
> and you want to delete the last x rows for 'C' shares at the same time.
> Could you explain more detailed conditions to reject those rows,please?
>
> Daniel Fernandez
> Barcelona
>
>
> -----Mensaje original-----
> De: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] En nombre de Haigang
> Zhou
> Enviado el: dilluns, 9 / novembre / 2009 05:20
> Para: SAS-L(a)LISTSERV.UGA.EDU
> Asunto: Delete adjacent obs with repeated info
>
> I have a dataset of daily stock prices with three variables: ID, date, and
> price.
>
> For some reason, the dataset reports the same stock price for some adjacent
> days. I want to write a program that deletes those days once the number of
> adjacent days with the same price reaches certain threshold, say 3. For
> example, the first three days for "a" should be deleted, but not the 6th
> and
> 7th days. Similarly, the last 5 days for stock "c" should be deleted but
> not
> the first 2 days.
>
> Can someone help me write the code? Many thanks.
>
> data mydata;
> input id $ date MMDDYY12. price;
> cards;
> a 1/1/2009 3
> a 1/2/2009 3
> a 1/3/2009 3
> a 1/4/2009 0.30
> a 1/5/2009 0.29
> a 1/6/2009 3
> a 1/7/2009 3
> a 1/8/2009 0.62
> a 1/9/2009 0.84
> a 1/10/2009 0.33
> b 1/1/2009 0.48
> b 1/2/2009 0.09
> b 1/3/2009 0.67
> b 1/4/2009 0.91
> b 1/5/2009 4
> b 1/6/2009 4
> b 1/7/2009 4
> b 1/8/2009 4
> b 1/9/2009 0.66
> b 1/10/2009 0.18
> c 1/1/2009 5
> c 1/2/2009 5
> c 1/3/2009 0.30
> c 1/4/2009 0.78
> c 1/5/2009 0.08
> c 1/6/2009 5
> c 1/7/2009 5
> c 1/8/2009 5
> c 1/9/2009 5
> c 1/10/2009 5
> ;
> run;
>
From: yingtao on
By using DOW tech:

data have;
do until(last.id);
set mydata ;
by id price notsorted;
if price=lag(price) then gp+1;else gp=1;
output;
end;
run;

data need;
do _n_=1 by 1 until (last.id or last.price);
set have;
by id price notsorted;
end;
do _n_=1 to _n_;
set mydata;
by id price notsorted;
if gp<3 then output;
end;
run;

Tao
From: Jack Hamilton on
I think this is a bit simpler:

=====
data have;
input @1 id $1.
@3 date MMDDYY10.
@12 price 4.;
format date date9.;
cards;
a 1/1/2009 3
a 1/2/2009 3
a 1/3/2009 3
a 1/4/2009 0.30
a 1/5/2009 0.29
a 1/6/2009 3
a 1/7/2009 3
a 1/8/2009 0.62
a 1/9/2009 0.84
a 1/10/2009 0.33
b 1/1/2009 0.48
b 1/2/2009 0.09
b 1/3/2009 0.67
b 1/4/2009 0.91
b 1/5/2009 4
b 1/6/2009 4
b 1/7/2009 4
b 1/8/2009 4
b 1/9/2009 0.66
b 1/10/2009 0.18
c 1/1/2009 5
c 1/2/2009 5
c 1/3/2009 0.30
c 1/4/2009 0.78
c 1/5/2009 0.08
c 1/6/2009 5
c 1/7/2009 5
c 1/8/2009 5
c 1/9/2009 5
c 1/10/2009 5
;

data want;

repeated_prices = 0;
drop repeated_prices;

do until (last.price);
set have;
by id price notsorted;
repeated_prices + 1;
end;

do until (last.price);
set have;
by id price notsorted;
if repeated_prices < 3
then
output;
end;

run;
=====

It's only one data step, and it's possible that the data will be cached
for fewer I/O's.


Both of our solutions are arguably better than the ones involving SQL -
it's harder - the logic seems more obvious to me.



On Mon, 9 Nov 2009 09:49:12 -0600, "yingtao" <yingtaoliu(a)GMAIL.COM>
said:
> By using DOW tech:
>
> data have;
> do until(last.id);
> set mydata ;
> by id price notsorted;
> if price=lag(price) then gp+1;else gp=1;
> output;
> end;
> run;
>
> data need;
> do _n_=1 by 1 until (last.id or last.price);
> set have;
> by id price notsorted;
> end;
> do _n_=1 to _n_;
> set mydata;
> by id price notsorted;
> if gp<3 then output;
> end;
> run;
>
> Tao


--
Jack Hamilton
Sacramento, California
jfh(a)alumni.stanford.org <== Use this, not jfh @ stanfordalumni.org

Tots units fem for�a!