Prev: Numeric ID Values with decimals -- probably not the best idea in
Next: nps74.tmp.pdf, nps75.tmp.pdf
From: xlr82sas on 21 Nov 2009 20:15 On Nov 21, 10:07 am, Henry <chchanghe...(a)gmail.com> wrote: > Hi Tom, > > Thanks for your reply. I guess I know why it is confused. I have a > 900000 unique firms I denoted as i. As each firm has a value with > another firm, denoted as j(including itself). Then, I will have > 900000*900000 observations. > > F1 here denoted i firm and F2 denoted as the corresponding firm j for > i. That's why I wrote something like: > > F1 F2 V1 > > 1 1 0.1 > 1 2 0.3 > 1 3 0.5 > 2 1 0.56 > 2 2 0.99 > 2 3 0.78 > > Now, I tried to get the correlation between these firms except > themselves which is when F1 = F2. > > So, in brief, I want the correlation between these two columns > > 0.3 0.56 > 0.5 0.78 > > that's why I put : > > F1 F2 V1 F1 F2 V1 > 1 2 0.3 2 1 0.56 > 1 3 0.5 2 3 0.78 > > As I am thinking about reshaping the values so I can get a pair of > values as you mentioned to calculate the correlation. > > then as firm1(when F1 = 1) will also need a correlation with firm 3 > (when F1=3) So, I put: > > F1 F2 V1 F1 F2 V1 > 1 2 0.3 3 1 0.84 > 1 3 0.5 3 2 0.83 > > Meaning I need the correlation between > 0.3 0.84 > 0.5 0.83 > > This will have to be done for this 900000 unique firms which has its > another 900000 corresponding firms (including itself). > > Then, after obtaining the correlation between firm i and j, I will > have to merge them back to the original data. So, I think I still > need the value for F1 and F2 so I can do the merge. Of course, the F1 > and F2 they are characteristic values (as they also contain a, b ,c > within numbers. i.e. firm i's F1 = 200c00). > > I am not sure if you can understand my issues here? Thanks again for > your reply! > > Henry Hi, I think you have to work with one firm against the thousands of other at a time. I might macrotize this code create a massive long and skinny and process with one datastep. I decided to use a datastep to get the corr coef, because I had some old code. Also I think it might be more flexible and quicker than 'proc corr'. /* reshape for 'proc corr or datastep */ /* this reshaping can easily be done by creating a datastep and then using proc sql or merge to add the common firm. I think this is more efficient */ /* you might want to use arrays */ data res; retain cnt 0 grp frm1x2 frm1x3 frm1x4 frm1x24; drop cnt frm1x2 frm1x3 frm1x4 ; input f1 f2 v1; if f1 ne f2; cnt+1; grp="Firm &frm1"; select; when(_n_=2) frm1x2=v1; when(_n_=3) frm1x3=v1; when(_n_=4) do; frm1x4=v1;cnt=0;end; when(mod(cnt,3)=1) do; frm1x24=frm1x2; output;end; when(mod(cnt,3)=2) do; frm1x24=frm1x3; output;end; when(mod(cnt,3)=0) do; frm1x24=frm1x4; output;end; end; cards; 1 1 0.1 1 2 0.3 1 3 0.5 1 4 0.9 2 1 0.56 2 2 0.55 2 3 0.78 2 4 0.99 3 1 0.84 3 2 0.83 3 3 0.65 3 4 0.34 ; run; /* I added the slope and intercept. I aslo added a fourth F2 since two points give a corr coef of 1 or -1 */ data res2nd; retain sumx sumy sumxx sumxy sumyy n; set res(rename=(frm1x24=x v1=y)); by f1 notsorted; if first.f1 then do; sumx = 0; sumy = 0; sumxx = 0; sumxy = 0; sumyy = 0; n = 0; end; sumx = sum(sumx, x); sumy = sum(sumy, y); sumxx = sum(sumxx, x ** 2); sumxy = sum(sumxy, x * y); sumyy = sum(sumyy, y * y); n = sum(n, 1); if last.f1 then do; nmr = sumxy - sumx * sumy /n; den = sqrt ( sumxx -(sumx)**2/n ) * sqrt ( sumyy -(sumy)**2/n ); slope = (sumxy - sumx * sumy / n)/(sumxx - sumx ** 2 / n); intercept = sumy / n - slope * sumx / n; corr=nmr/den; put slope = ; put intercept = ; put corr=; end; run;
From: Henry on 23 Nov 2009 13:08 Hello, Thanks for sharing your codes with me!! I will work on it to see how it goes. Henry On Nov 21, 6:15 pm, xlr82sas <xlr82...(a)aol.com> wrote: > On Nov 21, 10:07 am, Henry <chchanghe...(a)gmail.com> wrote: > > > > > Hi Tom, > > > Thanks for your reply. I guess I know why it is confused. I have a > > 900000 unique firms I denoted as i. As each firm has a value with > > another firm, denoted as j(including itself). Then, I will have > > 900000*900000 observations. > > > F1 here denoted i firm and F2 denoted as the corresponding firm j for > > i. That's why I wrote something like: > > > F1 F2 V1 > > > 1 1 0.1 > > 1 2 0.3 > > 1 3 0.5 > > 2 1 0.56 > > 2 2 0.99 > > 2 3 0.78 > > > Now, I tried to get the correlation between these firms except > > themselves which is when F1 = F2. > > > So, in brief, I want the correlation between these two columns > > > 0.3 0.56 > > 0.5 0.78 > > > that's why I put : > > > F1 F2 V1 F1 F2 V1 > > 1 2 0.3 2 1 0.56 > > 1 3 0.5 2 3 0.78 > > > As I am thinking about reshaping the values so I can get a pair of > > values as you mentioned to calculate the correlation. > > > then as firm1(when F1 = 1) will also need a correlation with firm 3 > > (when F1=3) So, I put: > > > F1 F2 V1 F1 F2 V1 > > 1 2 0.3 3 1 0.84 > > 1 3 0.5 3 2 0.83 > > > Meaning I need the correlation between > > 0.3 0.84 > > 0.5 0.83 > > > This will have to be done for this 900000 unique firms which has its > > another 900000 corresponding firms (including itself). > > > Then, after obtaining the correlation between firm i and j, I will > > have to merge them back to the original data. So, I think I still > > need the value for F1 and F2 so I can do the merge. Of course, the F1 > > and F2 they are characteristic values (as they also contain a, b ,c > > within numbers. i.e. firm i's F1 = 200c00). > > > I am not sure if you can understand my issues here? Thanks again for > > your reply! > > > Henry > > Hi, > > I think you have to work with one firm against the thousands of other > at a time. I might macrotize this code create a massive > long and skinny and process with one datastep. I decided to use a > datastep to get the corr coef, because I had some old code. Also I > think it might be more flexible and quicker than 'proc corr'. > > /* reshape for 'proc corr or datastep */ > /* this reshaping can easily be done by creating a datastep and then > using proc sql or merge > to add the common firm. I think this is more efficient */ > /* you might want to use arrays */ > data res; > retain cnt 0 grp frm1x2 frm1x3 frm1x4 frm1x24; > drop cnt frm1x2 frm1x3 frm1x4 ; > input f1 f2 v1; > if f1 ne f2; > cnt+1; > grp="Firm &frm1"; > select; > when(_n_=2) frm1x2=v1; > when(_n_=3) frm1x3=v1; > when(_n_=4) do; frm1x4=v1;cnt=0;end; > when(mod(cnt,3)=1) do; frm1x24=frm1x2; output;end; > when(mod(cnt,3)=2) do; frm1x24=frm1x3; output;end; > when(mod(cnt,3)=0) do; frm1x24=frm1x4; output;end; > end; > cards; > 1 1 0.1 > 1 2 0.3 > 1 3 0.5 > 1 4 0.9 > 2 1 0.56 > 2 2 0.55 > 2 3 0.78 > 2 4 0.99 > 3 1 0.84 > 3 2 0.83 > 3 3 0.65 > 3 4 0.34 > ; > run; > > /* I added the slope and intercept. I aslo added a fourth F2 since two > points give a corr coef of 1 or -1 */ > data res2nd; > retain sumx sumy sumxx sumxy sumyy n; > set res(rename=(frm1x24=x v1=y)); > by f1 notsorted; > if first.f1 then do; > sumx = 0; > sumy = 0; > sumxx = 0; > sumxy = 0; > sumyy = 0; > n = 0; > end; > sumx = sum(sumx, x); > sumy = sum(sumy, y); > sumxx = sum(sumxx, x ** 2); > sumxy = sum(sumxy, x * y); > sumyy = sum(sumyy, y * y); > n = sum(n, 1); > if last.f1 then do; > nmr = sumxy - sumx * sumy /n; > den = sqrt ( sumxx -(sumx)**2/n ) * sqrt ( sumyy -(sumy)**2/n ); > slope = (sumxy - sumx * sumy / n)/(sumxx - sumx ** 2 / n); > intercept = sumy / n - slope * sumx / n; > corr=nmr/den; > put slope = ; > put intercept = ; > put corr=; > end; > run;
First
|
Prev
|
Pages: 1 2 Prev: Numeric ID Values with decimals -- probably not the best idea in Next: nps74.tmp.pdf, nps75.tmp.pdf |