From: "Howard Schreier <hs AT dc-sug DOT org>" on 21 Apr 2006 08:44 On Thu, 20 Apr 2006 15:39:58 -0700, lisiqi77(a)YAHOO.COM wrote: >Hey, > >I have a dataset with 5 variables like the following: > > code year month bench_mo x; > > a 1994 1 12 0.21 > a 1994 2 12 0.22 > a 1994 3 12 0.26 > ... > a 1994 12 12 0.20 > a 1995 1 12 0.11 > ... > a 1995 12 12 0.20 > ... > b 1994 1 8 0.20 > b 1994 2 8 0.32 > ... > >In other words, for each code, I have x data for every month of >yr1994-2005. And bench_month differs for each code. > >My goal is to calculate the variance of x during a year with its >yearend defined as 10 months after the bench_month. For example, for >code a with bench_month equal 12, I'd like to know the variance of x >from 1994 month 11 to 1995 month 10, etc. > >Please help if you have any good idea... SQL makes this fairly simple. First generate test data: data have; input code $ bench_mo; do year = 1994 to 2005; do month = 1 to 12; ym = year*100 + month; x = round(ranuni(1),0.01); output; end; end; cards; a 12 b 0 c -2 ; Variable YM is there only to help track behavior of the code. It can be eliminated (along with references to it below). Solution: proc sql; create table goal as select code, bench_mo, year(intnx('month',mdy(month,1,year),2-bench_mo) ) as year, var(x) as var, n(x) as many, min(ym) as from, max(ym) as to from have group by code, bench_mo, calculated year /* having n(x)=12 */ order by code, bench_mo, calculated year ; The HAVING BY clause can be used to exclude the partial-year tails which otherwise are part of the result. The CALCULATED keyword is critical, since there is also a source column neamed YEAR. > >Thanks, >Tracy
From: Ya Huang on 21 Apr 2006 01:56 What happens if bench_mo=2? It seems that both of the 12-month periods in 1994 and 1995 will meet the criteria. On Thu, 20 Apr 2006 18:48:17 -0700, lisiqi77(a)YAHOO.COM wrote: >Hi, Dale, > >Thanks for your reply. Actually my data ranges from 1994 to 2004, which >somehow conplicates the problem. The solution you proposed works well >for two years of data, i.e., 1994 and 1995. But for my sample period, >would it be adequate? > >And upon the request of Muthia, I include a more complete sample data >as follows: >code year month bench_mo x beg_mo end_mo >a 1994 1 8 0.84 7 6 >a 1994 2 8 0.01 7 6 >a 1994 3 8 0.38 7 6 >a 1994 4 8 0.85 7 6 >a 1994 5 8 0.68 7 6 >a 1994 6 8 0.25 7 6 >a 1994 7 8 0.88 7 6 >a 1994 8 8 0.75 7 6 >a 1994 9 8 0.13 7 6 >a 1994 10 8 0.30 7 6 >a 1994 11 8 0.04 7 6 >a 1994 12 8 0.77 7 6 >a 1995 1 8 0.57 7 6 >a 1995 2 8 0.94 7 6 >a 1995 3 8 0.03 7 6 >a 1995 4 8 0.50 7 6 >a 1995 5 8 0.69 7 6 >a 1995 6 8 0.72 7 6 >a 1995 7 8 0.62 7 6 >a 1995 8 8 0.40 7 6 >a 1995 9 8 0.08 7 6 >a 1995 10 8 0.69 7 6 >a 1995 11 8 0.39 7 6 >a 1995 12 8 0.84 7 6 >b 2001 1 1 0.24 12 11 >b 2001 2 1 0.77 12 11 >b 2001 3 1 0.49 12 11 >b 2001 4 1 0.40 12 11 >b 2001 5 1 0.71 12 11 >b 2001 6 1 0.25 12 11 >b 2001 7 1 0.71 12 11 >b 2001 8 1 0.63 12 11 >b 2001 9 1 0.35 12 11 >b 2001 10 1 0.30 12 11 >b 2001 11 1 0.63 12 11 >b 2001 12 1 0.25 12 11 >b 2002 1 1 0.15 12 11 >b 2002 2 1 0.99 12 11 >b 2002 3 1 0.37 12 11 >b 2002 4 1 0.00 12 11 >b 2002 5 1 0.50 12 11 >b 2002 6 1 0.42 12 11 >b 2002 7 1 0.09 12 11 >b 2002 8 1 0.84 12 11 >b 2002 9 1 0.97 12 11 >b 2002 10 1 0.97 12 11 >b 2002 11 1 0.85 12 11 >b 2002 12 1 0.68 12 11 > >Note that I constructed two additional variables, beg_mo and end_mo >based on the defintion of yearend as 10+ of benchmark month. My goal is >to calculate the variance of X during the self-defined year. Those >months not in the year range are discarded. The output variance is >0.11 for code a and 0.14 for code b. > >Thanks, >Tracy
From: lisiqi77 on 24 Apr 2006 14:56 Hi, Howard, Your solution is magically simple! I tested it in my own dataset and it worked well! I think the clause " year(intnx('month',mdy(month,1,year),2-bench_mo) ) as year" is key but I actually don't quite understand how it works. Do you mind explainning it in more details? Thanks very much, Tracy
From: "Howard Schreier <hs AT dc-sug DOT org>" on 24 Apr 2006 20:04 On Mon, 24 Apr 2006 11:56:58 -0700, lisiqi77(a)YAHOO.COM wrote: >Hi, Howard, > >Your solution is magically simple! I tested it in my own dataset and it >worked well! > >I think the clause " year(intnx('month',mdy(month,1,year),2-bench_mo) >) as year" is key but I actually don't quite understand how it works. >Do you mind explainning it in more details? Just work through an example. We can use the one in your initial post: "My goal is to calculate the variance of x during a year with its yearend defined as 10 months after the bench_month. For example, for code a with bench_month equal 12, I'd like to know the variance of x from 1994 month 11 to 1995 month 10, etc." 1994 month 11: MDY(11,1,1994) yields the number of days from 1 Jan. 1960 to 1 Nov. 1994. Then INTNX() shifts that back by 10 months (to the number of days from 1 Jan. 1960 to 1 Jan. 1994), since 2-12 is -10. Finally, YEAR() extracts 1994 from that. 1995 month 10: the same process yields 1994 as the year including 1 Dec. 1994. Months between these 2 endpoints are mapped to the interior months (Feb. through Nov.) of 1994. All yield 1994 as the new YEAR value. Months outside this 12-month interval yield the appropriate years before/after 1994. > >Thanks very much, >Tracy
First
|
Prev
|
Pages: 1 2 Prev: Example: using infile magic to parse SYSPARM for named parameters Next: jackknife concept |