Reshaping the data to run the simple correlation analysis [SAS]

Prev: Numeric ID Values with decimals -- probably not the best idea in
Next: nps74.tmp.pdf, nps75.tmp.pdf

From: Henry on 20 Nov 2009 19:31

Hello there,

I have a data set looks like:

F1 F2 V1

1 1 0.1
1 2 0.3
1 3 0.5
2 1 0.56
2 2 0.99
2 3 0.78
3 1 0.84
3 2 0.83
3 3 0.65

I will like to run the correlation on each pair of F1 group which
means that for F1=1, I will run its v1 values with
the values when F1=2 and also the values when F1=3.
Then, I will also run the values for F1=2 with the values when F1=1
and also with the values when F1=3
Same for the values for F1=3 with values when F1=1 and with the values
when F1=2;

In brief, the data to be prepared for the proc corr analysis should be
like:

1 2 0.3 2 1 0.56
1 3 0.5 2 3 0.78

1 2 0.3 3 1 0.84
1 3 0.5 3 2 0.83

2 1 0.56 1 2 0.3
2 3 0.78 1 3 0.5

2 1 0.56 3 1 0.84
2 3 0.78 3 2 0.83

3 1 0.84 1 2 0.3
3 2 0.83 1 3 0.5

3 1 0.84 2 1 0.56
3 2 0.83 2 3 0.78

I was wondering that is there an efficient way to do it? Any
thoughts? Please let me know if possible.
Thanks in advance!!!

Henry

From: Tom Abernathy on 20 Nov 2009 21:38

Henry -
You need to come up with more meaningful variable names.
Did you try tranposing you data to make it more like a matrix?

proc transpose data=orig out=x prefix=NEW;
by f1;
id f2;
var v1;
run;
proc corr data=x;
var NEW:;
run;

- Tom

On Nov 20, 7:31 pm, Henry <chchanghe...(a)gmail.com> wrote:
> Hello there,
>
> I have a data set looks like:
>
> F1 F2 V1
>
> 1 1 0.1
> 1 2 0.3
> 1 3 0.5
> 2 1 0.56
> 2 2 0.99
> 2 3 0.78
> 3 1 0.84
> 3 2 0.83
> 3 3 0.65
>
> I will like to run the correlation on each pair of F1 group which
> means that for F1=1, I will run its v1 values with
> the values when F1=2 and also the values when F1=3.
> Then, I will also run the values for F1=2 with the values when F1=1
> and also with the values when F1=3
> Same for the values for F1=3 with values when F1=1 and with the values
> when F1=2;
>
> In brief, the data to be prepared for the proc corr analysis should be
> like:
>
> 1 2 0.3 2 1 0.56
> 1 3 0.5 2 3 0.78
>
> 1 2 0.3 3 1 0.84
> 1 3 0.5 3 2 0.83
>
> 2 1 0.56 1 2 0.3
> 2 3 0.78 1 3 0.5
>
> 2 1 0.56 3 1 0.84
> 2 3 0.78 3 2 0.83
>
> 3 1 0.84 1 2 0.3
> 3 2 0.83 1 3 0.5
>
> 3 1 0.84 2 1 0.56
> 3 2 0.83 2 3 0.78
>
> I was wondering that is there an efficient way to do it? Any
> thoughts? Please let me know if possible.
> Thanks in advance!!!
>
> Henry

From: Henry on 21 Nov 2009 01:29

Hello,

I would like to make my data set like this. So, each time, I can just
run a pair to get one correlation (but if F1 = F2 then I will delete
it from the regression). Also, as I would like to put this
correlation back to my original data (create new column to put this
correlation.), so, I will need the F1 and F2 variables in order to
merge back to the original data set.

The first three columns just the first two observations from the
original data (after dropping the observation when F1=1 = F2)
The second three columns just got from the 4th and 6th observations
from original data after dropping the observation when F1 = 2 = F2. so
far and forth.

FIRST PAIR when F1=1 (as I have three group I will run the
correlation when F1=1 with F1=2)
F1 F2 V1 F1 F2 V1
1 2 0.3 2 1 0.56
1 3 0.5 2 3 0.78

SECOND PAIR when F1=1 (the correlation for F1 =1 and F1 =3)
F1 F2 V1 F1 F2 V1
1 2 0.3 3 1 0.84
1 3 0.5 3 2 0.83

THIRD PAIR when F1=2 (the correlation for F1=2 and F1 =2)
F1 F2 V1 F1 F2 V1
2 1 0.56 1 2 0.3
2 3 0.78 1 3 0.5

FOURTH PAIR when F1=2 (the correlation for F1=2 and F1 =3)
F1 F2 V1 F1 F2 V1
2 1 0.56 3 1 0.84
2 3 0.78 3 2 0.83

FIFTH PAIR when F1=3 (the correlation for F1=3 and F1 =1)
F1 F2 V1 F1 F2 V1
3 1 0.84 1 2 0.3
3 2 0.83 1 3 0.5

SIXTH PAIR when F1=3 (the correlation for F1=3 and F1 =2)
F1 F2 V1 F1 F2 V1
3 1 0.84 2 1 0.56
3 2 0.83 2 3 0.78

I tried the following codes:
proc transpose data=orig out=x prefix=NEW;
by f1;
id f2;
var v1;
run;

But, it didn't give the format I am looking for. So, I was wondering
if there is way to do it? Thank you so much for helping me out.

Henry

From: Tom Abernathy on 21 Nov 2009 10:59

Henry -
I still do not understand. Are you trying to calculate a
correlation? Or are you trying to merge in correlation values that you
have calculated in some other way?
To calculate a correlation you need two or more pairs of observed
values. For example if you sampled the Height and Weight of 10 people
you could calculate the correlation between height and weight. In
that case you would have 10 observations and 3 rows (one for an ID
variable to identify the person). You appear to be asking for the
correlation of 0.3 with 0.56 and that just cannot be done.
I still have no idea what F1 and F2 are. You talk of pairs and
groups. Do the values of F1 and F2 represent the groups? So does F1=1
and F2=2 mean a comparison between Group #1 and Group #2. It so it
would reduce confusion if you renamed F1 and F2 to FirstGroup and
SecondGroup or something more meaningful. Also if you changed the
values from numbers to characters so that you did not get confused and
calculate the mean of the group numbers.
But what is the meaning of the numbers stored in the variable V1?

- Tom

On Nov 21, 1:29 am, Henry <chchanghe...(a)gmail.com> wrote:
> Hello,
>
> I would like to make my data set like this. So, each time, I can just
> run a pair to get one correlation (but if F1 = F2 then I will delete
> it from the regression). Also, as I would like to put this
> correlation back to my original data (create new column to put this
> correlation.), so, I will need the F1 and F2 variables in order to
> merge back to the original data set.
>
> The first three columns just the first two observations from the
> original data (after dropping the observation when F1=1 = F2)
> The second three columns just got from the 4th and 6th observations
> from original data after dropping the observation when F1 = 2 = F2. so
> far and forth.
>
> FIRST PAIR when F1=1 (as I have three group I will run the
> correlation when F1=1 with F1=2)
> F1 F2 V1 F1 F2 V1
> 1 2 0.3 2 1 0.56
> 1 3 0.5 2 3 0.78
>
> SECOND PAIR when F1=1 (the correlation for F1 =1 and F1 =3)
> F1 F2 V1 F1 F2 V1
> 1 2 0.3 3 1 0.84
> 1 3 0.5 3 2 0.83
>
> THIRD PAIR when F1=2 (the correlation for F1=2 and F1 =2)
> F1 F2 V1 F1 F2 V1
> 2 1 0.56 1 2 0.3
> 2 3 0.78 1 3 0.5
>
> FOURTH PAIR when F1=2 (the correlation for F1=2 and F1 =3)
> F1 F2 V1 F1 F2 V1
> 2 1 0.56 3 1 0.84
> 2 3 0.78 3 2 0.83
>
> FIFTH PAIR when F1=3 (the correlation for F1=3 and F1 =1)
> F1 F2 V1 F1 F2 V1
> 3 1 0.84 1 2 0.3
> 3 2 0.83 1 3 0.5
>
> SIXTH PAIR when F1=3 (the correlation for F1=3 and F1 =2)
> F1 F2 V1 F1 F2 V1
> 3 1 0.84 2 1 0.56
> 3 2 0.83 2 3 0.78
>
> I tried the following codes:
> proc transpose data=orig out=x prefix=NEW;
> by f1;
> id f2;
> var v1;
> run;
>
> But, it didn't give the format I am looking for. So, I was wondering
> if there is way to do it? Thank you so much for helping me out.
>
> Henry

From: Henry on 21 Nov 2009 13:07

Hi Tom,

Thanks for your reply. I guess I know why it is confused. I have a
900000 unique firms I denoted as i. As each firm has a value with
another firm, denoted as j(including itself). Then, I will have
900000*900000 observations.

F1 here denoted i firm and F2 denoted as the corresponding firm j for
i. That's why I wrote something like:

F1 F2 V1

1 1 0.1
1 2 0.3
1 3 0.5
2 1 0.56
2 2 0.99
2 3 0.78

Now, I tried to get the correlation between these firms except
themselves which is when F1 = F2.

So, in brief, I want the correlation between these two columns

0.3 0.56
0.5 0.78

that's why I put :

F1 F2 V1 F1 F2 V1
1 2 0.3 2 1 0.56
1 3 0.5 2 3 0.78

As I am thinking about reshaping the values so I can get a pair of
values as you mentioned to calculate the correlation.

then as firm1(when F1 = 1) will also need a correlation with firm 3
(when F1=3) So, I put:

F1 F2 V1 F1 F2 V1
1 2 0.3 3 1 0.84
1 3 0.5 3 2 0.83

Meaning I need the correlation between
0.3 0.84
0.5 0.83

This will have to be done for this 900000 unique firms which has its
another 900000 corresponding firms (including itself).

Then, after obtaining the correlation between firm i and j, I will
have to merge them back to the original data. So, I think I still
need the value for F1 and F2 so I can do the merge. Of course, the F1
and F2 they are characteristic values (as they also contain a, b ,c
within numbers. i.e. firm i's F1 = 200c00).

I am not sure if you can understand my issues here? Thanks again for
your reply!

Henry

| Next | Last
Pages: 1 2
Prev: Numeric ID Values with decimals -- probably not the best idea in
Next: nps74.tmp.pdf, nps75.tmp.pdf