From: Jack Hamilton on 9 Nov 2009 08:55 I would use a double DOW loop with NOTSORTED in the BY statements. See Paul Dorfman's paper "The Magnificent Do". -- Jack Hamilton jfh(a)alumni.stanford.org -----Original Message----- From: Haigang Zhou <haigang.zhou(a)GMAIL.COM> Sent: Monday, November 09, 2009 5:29 AM To: SAS-L(a)LISTSERV.UGA.EDU Subject: Re: [SAS-L] Delete adjacent obs with repeated info Hi Barcelona, I want to delete the first 3 rows of A, middle four rows of B, and the last five rows of C. The condition to delete them is that more than three adjacent days report the same stock price. Many thanks. Haigang 2009/11/9 Fern�ndez Rodr�guez, Dani <DFernandez(a)cst.cat> > Hi Haigang, > > I use SAS to 'play' with stock prices datasets but I don�t understand > what you want to get; you want to delete the first x rows for 'A' shares > and you want to delete the last x rows for 'C' shares at the same time. > Could you explain more detailed conditions to reject those rows,please? > > Daniel Fernandez > Barcelona > > > -----Mensaje original----- > De: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] En nombre de Haigang > Zhou > Enviado el: dilluns, 9 / novembre / 2009 05:20 > Para: SAS-L(a)LISTSERV.UGA.EDU > Asunto: Delete adjacent obs with repeated info > > I have a dataset of daily stock prices with three variables: ID, date, and > price. > > For some reason, the dataset reports the same stock price for some adjacent > days. I want to write a program that deletes those days once the number of > adjacent days with the same price reaches certain threshold, say 3. For > example, the first three days for "a" should be deleted, but not the 6th > and > 7th days. Similarly, the last 5 days for stock "c" should be deleted but > not > the first 2 days. > > Can someone help me write the code? Many thanks. > > data mydata; > input id $ date MMDDYY12. price; > cards; > a 1/1/2009 3 > a 1/2/2009 3 > a 1/3/2009 3 > a 1/4/2009 0.30 > a 1/5/2009 0.29 > a 1/6/2009 3 > a 1/7/2009 3 > a 1/8/2009 0.62 > a 1/9/2009 0.84 > a 1/10/2009 0.33 > b 1/1/2009 0.48 > b 1/2/2009 0.09 > b 1/3/2009 0.67 > b 1/4/2009 0.91 > b 1/5/2009 4 > b 1/6/2009 4 > b 1/7/2009 4 > b 1/8/2009 4 > b 1/9/2009 0.66 > b 1/10/2009 0.18 > c 1/1/2009 5 > c 1/2/2009 5 > c 1/3/2009 0.30 > c 1/4/2009 0.78 > c 1/5/2009 0.08 > c 1/6/2009 5 > c 1/7/2009 5 > c 1/8/2009 5 > c 1/9/2009 5 > c 1/10/2009 5 > ; > run; >
From: yingtao on 9 Nov 2009 10:49 By using DOW tech: data have; do until(last.id); set mydata ; by id price notsorted; if price=lag(price) then gp+1;else gp=1; output; end; run; data need; do _n_=1 by 1 until (last.id or last.price); set have; by id price notsorted; end; do _n_=1 to _n_; set mydata; by id price notsorted; if gp<3 then output; end; run; Tao
From: Jack Hamilton on 9 Nov 2009 11:38
I think this is a bit simpler: ===== data have; input @1 id $1. @3 date MMDDYY10. @12 price 4.; format date date9.; cards; a 1/1/2009 3 a 1/2/2009 3 a 1/3/2009 3 a 1/4/2009 0.30 a 1/5/2009 0.29 a 1/6/2009 3 a 1/7/2009 3 a 1/8/2009 0.62 a 1/9/2009 0.84 a 1/10/2009 0.33 b 1/1/2009 0.48 b 1/2/2009 0.09 b 1/3/2009 0.67 b 1/4/2009 0.91 b 1/5/2009 4 b 1/6/2009 4 b 1/7/2009 4 b 1/8/2009 4 b 1/9/2009 0.66 b 1/10/2009 0.18 c 1/1/2009 5 c 1/2/2009 5 c 1/3/2009 0.30 c 1/4/2009 0.78 c 1/5/2009 0.08 c 1/6/2009 5 c 1/7/2009 5 c 1/8/2009 5 c 1/9/2009 5 c 1/10/2009 5 ; data want; repeated_prices = 0; drop repeated_prices; do until (last.price); set have; by id price notsorted; repeated_prices + 1; end; do until (last.price); set have; by id price notsorted; if repeated_prices < 3 then output; end; run; ===== It's only one data step, and it's possible that the data will be cached for fewer I/O's. Both of our solutions are arguably better than the ones involving SQL - it's harder - the logic seems more obvious to me. On Mon, 9 Nov 2009 09:49:12 -0600, "yingtao" <yingtaoliu(a)GMAIL.COM> said: > By using DOW tech: > > data have; > do until(last.id); > set mydata ; > by id price notsorted; > if price=lag(price) then gp+1;else gp=1; > output; > end; > run; > > data need; > do _n_=1 by 1 until (last.id or last.price); > set have; > by id price notsorted; > end; > do _n_=1 to _n_; > set mydata; > by id price notsorted; > if gp<3 then output; > end; > run; > > Tao -- Jack Hamilton Sacramento, California jfh(a)alumni.stanford.org <== Use this, not jfh @ stanfordalumni.org Tots units fem for�a! |