Restructure dataset [SAS]

Prev: fitting a regression line
Next: Stock market competition: request for data

From: muriel on 22 Jun 2010 12:15

Hi Richard,
How about using a macro (see below)?

*********** program **********
data have;
input ID Q X T$ 15-43 U;
cards;
1 2 6 Greece 4
1 8 6 United Kingdom 6
1 4 6 Spain 6
2 7 3 Norway 5
3 2 3 Finland 5
3 1 3 Greece 6
3 2 2 Germany 8
3 4 1 Democratic Republic of Congo 6
;

data have1;
set have;
by id;
if first.id then n_=0;
n_+1;
run;

%macro trans(indsn= , outdsn= , transvar= );
proc transpose data=&indsn. out=&outdsn. prefix=&transvar.;
by id;
var &transvar.;
id n_;
run;
%mend;

%trans(indsn=have1 , outdsn=set1, transvar=X );
%trans(indsn=have1 , outdsn=set2, transvar=T );
%trans(indsn=have1 , outdsn=set3, transvar=U );

data result;
merge set1 (drop=_name_) set2 (drop =_name_) set3 (drop=_name_);
by id ;
run;

proc print;
run;

************* print result **************

Obs ID X1 X2 X3 X4 T1 T2 T3 T4 U1 U2 U3 U4
1 1 6 6 6 . Greece United Kingdom Spain 4 6 6 .
2 2 3 . . . Norway 5 . . .
3 3 3 3 2 1 Finland Greece Germany Democratic Republic of
Congo 5 6 8 6

***********************************

Muriel

On Jun 22, 2:40 am, Richard Stat <richardstatis...(a)gmail.com> wrote:
> Dear Ya and others,
>
> the solution presented by Ya worked for most of my datasets, but then
> I realized that I also have datasets with character variables. By
> using Ya's code, the look like this:
>
> data have;
> input ID Q X T U;
> cards;
> 1 2 6 Greece 4
> 1 8 6 United Kingdom 6
> 1 4 6 Spain 6
> 2 7 3 Norway 5
> 3 2 3 Finland 5
> 3 1 3 Greece 6
> 3 2 2 Germany 8
> 3 4 1 Democratic Republic of Congo 6
> ;
>
> The datasets can have both numeric and character variables, and they
> can be one or many. The character variables can consist of several
> words and be of different length. I still want the same end result
> with one observation per ID, but with character variables. In the
> example above, T1-T4 would be character variables and should be of
> such length that they include the longest value in T.
>
> Any help would be greatly appreciated.
>
> All the best,
> Richard
>
> On 21 Juni, 23:56, Richard Stat <richardstatis...(a)gmail.com> wrote:
>
> > That was a very nice solution. It solved by problem in a very good
> > way. Thank you so much, Ya!
>
> > All the best, Richard
>
> > On 21 Juni, 22:49, Ya <huang8...(a)gmail.com> wrote:
>
> > > On Jun 21, 12:30 pm, Richard Stat <richardstatis...(a)gmail.com> wrote:
>
> > > > Dear all,
>
> > > > I have a dataset that I want to restructure. It has multiple
> > > > observations for each ID, but I don't know how many duplicates the
> > > > dataset will have for each ID and it differs from ID to ID. It looks
> > > > something like this:
>
> > > > ID Q X Y Z
> > > > 1 2 6 8 4
> > > > 1 8 6 9 6
> > > > 1 4 6 9 6
> > > > 2 7 3 8 5
> > > > 3 2 3 2 5
> > > > 3 1 3 2 6
> > > > 3 2 2 2 8
> > > > 3 4 1 4 6
> > > > [...]
>
> > > > And I want to restructure it so that I get one observation for each
> > > > ID, i.e.
>
> > > > ID Q1 Q2 Q3 Q4 X1 X2 X3 X4 Y1 Y2 Y3 Y4
> > > > Z1 Z2 Z3 Z4
> > > > 1 2 8 4 . 6 6 6 . 8
> > > > 9 9 . 4 6 6 5
> > > > 2 7 . . . 3 . . .
> > > > 8 . . . 5 . . .
> > > > 3 2 1 2 4 3 3 2 1 2
> > > > 2 2 4 5 6 8 6
> > > > [...]
>
> > > > I have several datasets like this and they are big. Furthermore, I
> > > > don't know how many duplicates I have per ID. I have tried to solve
> > > > this by splitting the dataset into many datasets, i.e. one for the
> > > > first observation for each ID, one for the second observation and so
> > > > on, and then add a suffix to each variable, and then merger them
> > > > together. But it ended up with many steps and not so automated. I
> > > > assume I could use proc transpose in some way, but I am not on
> > > > friendly terms with proc transpose.
>
> > > > To sum up, any help would be greatly appreciated.
>
> > > > All the best,
> > > > Richard
>
> > > data have;
> > > input ID Q X Y Z;
> > > cards;
> > > 1 2 6 8 4
> > > 1 8 6 9 6
> > > 1 4 6 9 6
> > > 2 7 3 8 5
> > > 3 2 3 2 5
> > > 3 1 3 2 6
> > > 3 2 2 2 8
> > > 3 4 1 4 6
> > > ;
>
> > > data have1;
> > > set have;
> > > array vv q x y z;
> > > by id;
> > > if first.id then n_=0;
> > > n_+1;
> > > do over vv;
> > > colnm=compress(vname(vv)||put(n_,best.));
> > > val=vv;
> > > output;
> > > end;
> > > keep id colnm val;
> > > run;
>
> > > proc transpose data=have1 out=want (drop=_name_);
> > > by id;
> > > var val;
> > > id colnm;
> > > run;
>
> > > proc print;
> > > run;
>
> > > ID Q1 X1 Y1 Z1 Q2 X2 Y2 Z2 Q3 X3 Y3 Z3 Q4
> > > X4 Y4 Z4
>
> > > 1 2 6 8 4 8 6 9 6 4 6 9
> > > 6 . . . .
> > > 2 7 3 8
> > > 5 . . . . . . . . . . . .
> > > 3 2 3 2 5 1 3 2 6 2 2 2 8 4
> > > 1 4 6
>
> > > HTH
>
> > > Ya

First | Prev |
Pages: 1 2
Prev: fitting a regression line
Next: Stock market competition: request for data