Prev: SAS - proc nlmixed & Cholesky Decomposition
Next: BI Publisher Consultants - Woodland Hills CA - 10+ months
From: billyiqing on 5 Aug 2010 02:22 Hi All, I'm trying to create a SQL query, which finds the previous 52-week's maximum price for each day. My stock prices are daily. The variables are p (price), ticker and date. I have written the following code: %let J=52; proc sql noprint; create table _52mom.umd4 as select a.ticker, a.date, max(p) as high from _52mom.intreturns_s4 (keep = ticker date) as a, _52mom.intreturns_s4 as b where a.ticker = b.ticker and b.date between a.date and intnx('week', a.date, -&J) group by a.ticker, a.date having count(a.date)>=200; quit; However, my dataset is quite large (the whole set is about 4.6G, only 3 variables - Price, Ticker, and Date). I have subdivided it into four subsamples. The smallest is about 500M. I have tested the above code on a 8M data, and it works out fine. However, when I run it on the 500M, it becomes extremely slow (I have been running for 6 hours now, still going, not sure when it will finish). Also I'm not sure about the spillover table that proc SQL creates, and how large that will become. Could someone please kindly help me on this issue? I didn't think the data would take this long to run, is there an easier/more efficient way to do this? Sorry I am kind of new to SAS, don't know much about it yet. Thanks very much in advance!!!
From: Patrick on 5 Aug 2010 05:36 Hi Joining such a huge table with itself is for sure very resource intensive. Assuming that your source is a SAS table I'd expect that the code below will perform better. If the max price was reached on several days then you'll get several obs per ticker. data intreturns_s4; format date date9.; stop=today(); drop stop; do ticker=1 to 3; do date='01JAN2009'd to stop; p=ceil(ranuni(1)*100); output; end; end; run; %let J=52; data _null_; start_date=intnx('week',today(), -&J,'b'); stop_date =intnx('week',today(), -1,'e'); put start_date= weekdate.; put stop_date = weekdate.; call symput('start_date',cats(start_date)); call symput('stop_date',cats(stop_date)); run; proc sql; create view work.intreturns_s4_SORTED as select ticker, date, p from work.intreturns_s4 where date between &start_date and &stop_date order by ticker, p DESC ; quit; data work.umd4; array p_retained {1} 8 _temporary_; set intreturns_s4_SORTED; by ticker; if first.ticker then p_retained{1}=p; if p=p_retained{1} then output; run; proc print data=work.umd4; run; If you need everything done in SQL then search this forum with keywords SQL and TOP. There is more than one thread discussing how to select the top record using SQL. HTH Patrick
From: Andrew Karp Sierra Info Services on 5 Aug 2010 10:26 On Aug 5, 2:36 am, Patrick <patrick.mat...(a)gmx.ch> wrote: > Hi > > Joining such a huge table with itself is for sure very resource > intensive. > > Assuming that your source is a SAS table I'd expect that the code > below will perform better. If the max price was reached on several > days then you'll get several obs per ticker. > > data intreturns_s4; > format date date9.; > stop=today(); > drop stop; > do ticker=1 to 3; > do date='01JAN2009'd to stop; > p=ceil(ranuni(1)*100); > output; > end; > end; > run; > > %let J=52; > data _null_; > start_date=intnx('week',today(), -&J,'b'); > stop_date =intnx('week',today(), -1,'e'); > put start_date= weekdate.; > put stop_date = weekdate.; > call symput('start_date',cats(start_date)); > call symput('stop_date',cats(stop_date)); > run; > > proc sql; > create view work.intreturns_s4_SORTED as > select ticker, date, p > from work.intreturns_s4 > where date between &start_date and &stop_date > order by ticker, p DESC > ; > quit; > > data work.umd4; > array p_retained {1} 8 _temporary_; > set intreturns_s4_SORTED; > by ticker; > if first.ticker then p_retained{1}=p; > if p=p_retained{1} then output; > run; > > proc print data=work.umd4; > run; > > If you need everything done in SQL then search this forum with > keywords SQL and TOP. There is more than one thread discussing how to > select the top record using SQL. > > HTH > Patrick I think this is something PROC EXPAND, in the ETS module, is uniquely qualified to do without a lot of tedious Data Step and / or SQL coding. Check out my paper, "Time Series Magic: Using PROC EXPAND with Time Series Data" which is available for free download at http://www.sierrainformation.com . Click on the Free Downloads link on the left hand side of the page and take things from there. Thanks, Andrew Karp Sierra Information Services http://www.sierrainformation.com
From: billyiqing on 5 Aug 2010 21:49 Thank you very much for your help Patrick! However, one thing that I didn't make clear in the original thread was that I need a rolling window, i.e. I need to calculate this 52-week high for EVERY observation in my dataset. Hence the big join in my SQL. Any thoughts on this? Thanks Bill On Aug 5, 6:22 pm, billyiqing <billyiq...(a)gmail.com> wrote: > Hi All, > > I'm trying to create a SQL query, which finds the previous 52-week's > maximum price for each day. My stock prices are daily. The variables > are p (price), ticker and date. I have written the following code: > > %let J=52; > proc sql noprint; > create table _52mom.umd4 > as select a.ticker, a.date, max(p) as high > from _52mom.intreturns_s4 (keep = ticker date) as a, > _52mom.intreturns_s4 as b > where a.ticker = b.ticker and b.date between a.date and intnx('week', > a.date, -&J) > group by a.ticker, a.date > having count(a.date)>=200; > quit; > > However, my dataset is quite large (the whole set is about 4.6G, only > 3 variables - Price, Ticker, and Date). I have subdivided it into four > subsamples. The smallest is about 500M. > > I have tested the above code on a 8M data, and it works out fine. > However, when I run it on the 500M, it becomes extremely slow (I have > been running for 6 hours now, still going, not sure when it will > finish). Also I'm not sure about the spillover table that proc SQL > creates, and how large that will become. > > Could someone please kindly help me on this issue? I didn't think the > data would take this long to run, is there an easier/more efficient > way to do this? Sorry I am kind of new to SAS, don't know much about > it yet. > > Thanks very much in advance!!!
From: Patrick on 6 Aug 2010 03:38 Hi Bill Just to clarify before I'm coming up with something - please correct the following statements if wrong: - The result data set contains all the observations from the source data set which fall into the date range of the last 52 weeks. - The combination of ticker and date is not unique. - The variable 'high' (max. price) contains the max value of 'p' (price) for each combination of ticker and date. i.e. Have Ticker Date P 1 01jan2010 5 1 01jan2010 4 1 01jan2010 1 1 02jan2010 6 1 02jan2010 9 1 02jan2010 3 Want: Ticker Date High 1 01jan2010 5 1 01jan2010 5 1 01jan2010 5 1 02jan2010 9 1 02jan2010 9 1 02jan2010 9 Thanks Patrick
|
Next
|
Last
Pages: 1 2 Prev: SAS - proc nlmixed & Cholesky Decomposition Next: BI Publisher Consultants - Woodland Hills CA - 10+ months |