Restructure dataset [SAS]

Prev: fitting a regression line
Next: Stock market competition: request for data

From: Richard Stat on 21 Jun 2010 15:30

Dear all,

I have a dataset that I want to restructure. It has multiple
observations for each ID, but I don't know how many duplicates the
dataset will have for each ID and it differs from ID to ID. It looks
something like this:

ID Q X Y Z
1 2 6 8 4
1 8 6 9 6
1 4 6 9 6
2 7 3 8 5
3 2 3 2 5
3 1 3 2 6
3 2 2 2 8
3 4 1 4 6
[...]

And I want to restructure it so that I get one observation for each
ID, i.e.

ID Q1 Q2 Q3 Q4 X1 X2 X3 X4 Y1 Y2 Y3 Y4
Z1 Z2 Z3 Z4
1 2 8 4 . 6 6 6 . 8
9 9 . 4 6 6 5
2 7 . . . 3 . . .
8 . . . 5 . . .
3 2 1 2 4 3 3 2 1 2
2 2 4 5 6 8 6
[...]

I have several datasets like this and they are big. Furthermore, I
don't know how many duplicates I have per ID. I have tried to solve
this by splitting the dataset into many datasets, i.e. one for the
first observation for each ID, one for the second observation and so
on, and then add a suffix to each variable, and then merger them
together. But it ended up with many steps and not so automated. I
assume I could use proc transpose in some way, but I am not on
friendly terms with proc transpose.

To sum up, any help would be greatly appreciated.

All the best,
Richard

From: Ya on 21 Jun 2010 16:49

On Jun 21, 12:30 pm, Richard Stat <richardstatis...(a)gmail.com> wrote:
> Dear all,
>
> I have a dataset that I want to restructure. It has multiple
> observations for each ID, but I don't know how many duplicates the
> dataset will have for each ID and it differs from ID to ID. It looks
> something like this:
>
> ID Q X Y Z
> 1 2 6 8 4
> 1 8 6 9 6
> 1 4 6 9 6
> 2 7 3 8 5
> 3 2 3 2 5
> 3 1 3 2 6
> 3 2 2 2 8
> 3 4 1 4 6
> [...]
>
> And I want to restructure it so that I get one observation for each
> ID, i.e.
>
> ID Q1 Q2 Q3 Q4 X1 X2 X3 X4 Y1 Y2 Y3 Y4
> Z1 Z2 Z3 Z4
> 1 2 8 4 . 6 6 6 . 8
> 9 9 . 4 6 6 5
> 2 7 . . . 3 . . .
> 8 . . . 5 . .. .
> 3 2 1 2 4 3 3 2 1 2
> 2 2 4 5 6 8 6
> [...]
>
> I have several datasets like this and they are big. Furthermore, I
> don't know how many duplicates I have per ID. I have tried to solve
> this by splitting the dataset into many datasets, i.e. one for the
> first observation for each ID, one for the second observation and so
> on, and then add a suffix to each variable, and then merger them
> together. But it ended up with many steps and not so automated. I
> assume I could use proc transpose in some way, but I am not on
> friendly terms with proc transpose.
>
> To sum up, any help would be greatly appreciated.
>
> All the best,
> Richard

data have;
input ID Q X Y Z;
cards;
1 2 6 8 4
1 8 6 9 6
1 4 6 9 6
2 7 3 8 5
3 2 3 2 5
3 1 3 2 6
3 2 2 2 8
3 4 1 4 6
;

data have1;
set have;
array vv q x y z;
by id;
if first.id then n_=0;
n_+1;
do over vv;
colnm=compress(vname(vv)||put(n_,best.));
val=vv;
output;
end;
keep id colnm val;
run;

proc transpose data=have1 out=want (drop=_name_);
by id;
var val;
id colnm;
run;

proc print;
run;

ID Q1 X1 Y1 Z1 Q2 X2 Y2 Z2 Q3 X3 Y3 Z3 Q4
X4 Y4 Z4

1 2 6 8 4 8 6 9 6 4 6 9
6 . . . .
2 7 3 8
5 . . . . . . . . . . . .
3 2 3 2 5 1 3 2 6 2 2 2 8 4
1 4 6

HTH

Ya

From: Richard Stat on 21 Jun 2010 17:56

That was a very nice solution. It solved by problem in a very good
way. Thank you so much, Ya!

All the best, Richard

On 21 Juni, 22:49, Ya <huang8...(a)gmail.com> wrote:
> On Jun 21, 12:30 pm, Richard Stat <richardstatis...(a)gmail.com> wrote:
>
>
>
> > Dear all,
>
> > I have a dataset that I want to restructure. It has multiple
> > observations for each ID, but I don't know how many duplicates the
> > dataset will have for each ID and it differs from ID to ID. It looks
> > something like this:
>
> > ID Q X Y Z
> > 1 2 6 8 4
> > 1 8 6 9 6
> > 1 4 6 9 6
> > 2 7 3 8 5
> > 3 2 3 2 5
> > 3 1 3 2 6
> > 3 2 2 2 8
> > 3 4 1 4 6
> > [...]
>
> > And I want to restructure it so that I get one observation for each
> > ID, i.e.
>
> > ID Q1 Q2 Q3 Q4 X1 X2 X3 X4 Y1 Y2 Y3 Y4
> > Z1 Z2 Z3 Z4
> > 1 2 8 4 . 6 6 6 . 8
> > 9 9 . 4 6 6 5
> > 2 7 . . . 3 . . .
> > 8 . . . 5 . . .
> > 3 2 1 2 4 3 3 2 1 2
> > 2 2 4 5 6 8 6
> > [...]
>
> > I have several datasets like this and they are big. Furthermore, I
> > don't know how many duplicates I have per ID. I have tried to solve
> > this by splitting the dataset into many datasets, i.e. one for the
> > first observation for each ID, one for the second observation and so
> > on, and then add a suffix to each variable, and then merger them
> > together. But it ended up with many steps and not so automated. I
> > assume I could use proc transpose in some way, but I am not on
> > friendly terms with proc transpose.
>
> > To sum up, any help would be greatly appreciated.
>
> > All the best,
> > Richard
>
> data have;
> input ID Q X Y Z;
> cards;
> 1 2 6 8 4
> 1 8 6 9 6
> 1 4 6 9 6
> 2 7 3 8 5
> 3 2 3 2 5
> 3 1 3 2 6
> 3 2 2 2 8
> 3 4 1 4 6
> ;
>
> data have1;
> set have;
> array vv q x y z;
> by id;
> if first.id then n_=0;
> n_+1;
> do over vv;
> colnm=compress(vname(vv)||put(n_,best.));
> val=vv;
> output;
> end;
> keep id colnm val;
> run;
>
> proc transpose data=have1 out=want (drop=_name_);
> by id;
> var val;
> id colnm;
> run;
>
> proc print;
> run;
>
> ID Q1 X1 Y1 Z1 Q2 X2 Y2 Z2 Q3 X3 Y3 Z3 Q4
> X4 Y4 Z4
>
> 1 2 6 8 4 8 6 9 6 4 6 9
> 6 . . . .
> 2 7 3 8
> 5 . . . . . . . . . . . .
> 3 2 3 2 5 1 3 2 6 2 2 2 8 4
> 1 4 6
>
> HTH
>
> Ya

From: Richard Stat on 22 Jun 2010 02:40

Dear Ya and others,

the solution presented by Ya worked for most of my datasets, but then
I realized that I also have datasets with character variables. By
using Ya's code, the look like this:

data have;
input ID Q X T U;
cards;
1 2 6 Greece 4
1 8 6 United Kingdom 6
1 4 6 Spain 6
2 7 3 Norway 5
3 2 3 Finland 5
3 1 3 Greece 6
3 2 2 Germany 8
3 4 1 Democratic Republic of Congo 6
;

The datasets can have both numeric and character variables, and they
can be one or many. The character variables can consist of several
words and be of different length. I still want the same end result
with one observation per ID, but with character variables. In the
example above, T1-T4 would be character variables and should be of
such length that they include the longest value in T.

Any help would be greatly appreciated.

All the best,
Richard

On 21 Juni, 23:56, Richard Stat <richardstatis...(a)gmail.com> wrote:
> That was a very nice solution. It solved by problem in a very good
> way. Thank you so much, Ya!
>
> All the best, Richard
>
> On 21 Juni, 22:49, Ya <huang8...(a)gmail.com> wrote:
>
> > On Jun 21, 12:30 pm, Richard Stat <richardstatis...(a)gmail.com> wrote:
>
> > > Dear all,
>
> > > I have a dataset that I want to restructure. It has multiple
> > > observations for each ID, but I don't know how many duplicates the
> > > dataset will have for each ID and it differs from ID to ID. It looks
> > > something like this:
>
> > > ID Q X Y Z
> > > 1 2 6 8 4
> > > 1 8 6 9 6
> > > 1 4 6 9 6
> > > 2 7 3 8 5
> > > 3 2 3 2 5
> > > 3 1 3 2 6
> > > 3 2 2 2 8
> > > 3 4 1 4 6
> > > [...]
>
> > > And I want to restructure it so that I get one observation for each
> > > ID, i.e.
>
> > > ID Q1 Q2 Q3 Q4 X1 X2 X3 X4 Y1 Y2 Y3 Y4
> > > Z1 Z2 Z3 Z4
> > > 1 2 8 4 . 6 6 6 . 8
> > > 9 9 . 4 6 6 5
> > > 2 7 . . . 3 . . .
> > > 8 . . . 5 . . .
> > > 3 2 1 2 4 3 3 2 1 2
> > > 2 2 4 5 6 8 6
> > > [...]
>
> > > I have several datasets like this and they are big. Furthermore, I
> > > don't know how many duplicates I have per ID. I have tried to solve
> > > this by splitting the dataset into many datasets, i.e. one for the
> > > first observation for each ID, one for the second observation and so
> > > on, and then add a suffix to each variable, and then merger them
> > > together. But it ended up with many steps and not so automated. I
> > > assume I could use proc transpose in some way, but I am not on
> > > friendly terms with proc transpose.
>
> > > To sum up, any help would be greatly appreciated.
>
> > > All the best,
> > > Richard
>
> > data have;
> > input ID Q X Y Z;
> > cards;
> > 1 2 6 8 4
> > 1 8 6 9 6
> > 1 4 6 9 6
> > 2 7 3 8 5
> > 3 2 3 2 5
> > 3 1 3 2 6
> > 3 2 2 2 8
> > 3 4 1 4 6
> > ;
>
> > data have1;
> > set have;
> > array vv q x y z;
> > by id;
> > if first.id then n_=0;
> > n_+1;
> > do over vv;
> > colnm=compress(vname(vv)||put(n_,best.));
> > val=vv;
> > output;
> > end;
> > keep id colnm val;
> > run;
>
> > proc transpose data=have1 out=want (drop=_name_);
> > by id;
> > var val;
> > id colnm;
> > run;
>
> > proc print;
> > run;
>
> > ID Q1 X1 Y1 Z1 Q2 X2 Y2 Z2 Q3 X3 Y3 Z3 Q4
> > X4 Y4 Z4
>
> > 1 2 6 8 4 8 6 9 6 4 6 9
> > 6 . . . .
> > 2 7 3 8
> > 5 . . . . . . . . . . . .
> > 3 2 3 2 5 1 3 2 6 2 2 2 8 4
> > 1 4 6
>
> > HTH
>
> > Ya

From: Arthur Tabachneck on 22 Jun 2010 11:27

Richard,

You could use the same approach separately for numeric and
alphanumeric variables, and then interleave the two resulting files.
E.g.:

data have;
informat T $40.;
input ID Q X T & U;
cards;
1 2 6 Greece 4
1 8 6 United Kingdom 6
1 4 6 Spain 6
2 7 3 Norway 5
3 2 3 Finland 5
3 1 3 Greece 6
3 2 2 Germany 8
3 4 1 Democratic Republic of Congo 6
;

data have1 (keep=id colnm val)
have2 (keep=id colnm valt);
set have;
array vv q x u;
array tt $40. t;
by id;
if first.id then n_=0;
n_+1;
do over vv;
colnm=compress(vname(vv)||put(n_,best.));
val=vv;
output have1;
end;
do over tt;
colnm=compress(vname(tt)||put(n_,best.));
valt=tt;
output have2;
end;
run;

proc transpose data=have1 out=want1 (drop=_name_);
by id;
var val;
id colnm;
run;

proc transpose data=have2 out=want2 (drop=_name_);
by id;
var valt;
id colnm;
run;

data want;
set want1;
set want2;
run;

HTH,
Art
------------
On Jun 22, 2:40 am, Richard Stat <richardstatis...(a)gmail.com> wrote:
> Dear Ya and others,
>
> the solution presented by Ya worked for most of my datasets, but then
> I realized that I also have datasets with character variables. By
> using Ya's code, the look like this:
>
> data have;
> input ID Q X T U;
> cards;
> 1 2 6 Greece 4
> 1 8 6 United Kingdom 6
> 1 4 6 Spain 6
> 2 7 3 Norway 5
> 3 2 3 Finland 5
> 3 1 3 Greece 6
> 3 2 2 Germany 8
> 3 4 1 Democratic Republic of Congo 6
> ;
>
> The datasets can have both numeric and character variables, and they
> can be one or many. The character variables can consist of several
> words and be of different length. I still want the same end result
> with one observation per ID, but with character variables. In the
> example above, T1-T4 would be character variables and should be of
> such length that they include the longest value in T.
>
> Any help would be greatly appreciated.
>
> All the best,
> Richard
>
> On 21 Juni, 23:56, Richard Stat <richardstatis...(a)gmail.com> wrote:
>
>
>
> > That was a very nice solution. It solved by problem in a very good
> > way. Thank you so much, Ya!
>
> > All the best, Richard
>
> > On 21 Juni, 22:49, Ya <huang8...(a)gmail.com> wrote:
>
> > > On Jun 21, 12:30 pm, Richard Stat <richardstatis...(a)gmail.com> wrote:
>
> > > > Dear all,
>
> > > > I have a dataset that I want to restructure. It has multiple
> > > > observations for each ID, but I don't know how many duplicates the
> > > > dataset will have for each ID and it differs from ID to ID. It looks
> > > > something like this:
>
> > > > ID Q X Y Z
> > > > 1 2 6 8 4
> > > > 1 8 6 9 6
> > > > 1 4 6 9 6
> > > > 2 7 3 8 5
> > > > 3 2 3 2 5
> > > > 3 1 3 2 6
> > > > 3 2 2 2 8
> > > > 3 4 1 4 6
> > > > [...]
>
> > > > And I want to restructure it so that I get one observation for each
> > > > ID, i.e.
>
> > > > ID Q1 Q2 Q3 Q4 X1 X2 X3 X4 Y1 Y2 Y3 Y4
> > > > Z1 Z2 Z3 Z4
> > > > 1 2 8 4 . 6 6 6 . 8
> > > > 9 9 . 4 6 6 5
> > > > 2 7 . . . 3 . . .
> > > > 8 . . . 5 . . .
> > > > 3 2 1 2 4 3 3 2 1 2
> > > > 2 2 4 5 6 8 6
> > > > [...]
>
> > > > I have several datasets like this and they are big. Furthermore, I
> > > > don't know how many duplicates I have per ID. I have tried to solve
> > > > this by splitting the dataset into many datasets, i.e. one for the
> > > > first observation for each ID, one for the second observation and so
> > > > on, and then add a suffix to each variable, and then merger them
> > > > together. But it ended up with many steps and not so automated. I
> > > > assume I could use proc transpose in some way, but I am not on
> > > > friendly terms with proc transpose.
>
> > > > To sum up, any help would be greatly appreciated.
>
> > > > All the best,
> > > > Richard
>
> > > data have;
> > > input ID Q X Y Z;
> > > cards;
> > > 1 2 6 8 4
> > > 1 8 6 9 6
> > > 1 4 6 9 6
> > > 2 7 3 8 5
> > > 3 2 3 2 5
> > > 3 1 3 2 6
> > > 3 2 2 2 8
> > > 3 4 1 4 6
> > > ;
>
> > > data have1;
> > > set have;
> > > array vv q x y z;
> > > by id;
> > > if first.id then n_=0;
> > > n_+1;
> > > do over vv;
> > > colnm=compress(vname(vv)||put(n_,best.));
> > > val=vv;
> > > output;
> > > end;
> > > keep id colnm val;
> > > run;
>
> > > proc transpose data=have1 out=want (drop=_name_);
> > > by id;
> > > var val;
> > > id colnm;
> > > run;
>
> > > proc print;
> > > run;
>
> > > ID Q1 X1 Y1 Z1 Q2 X2 Y2 Z2 Q3 X3 Y3 Z3 Q4
> > > X4 Y4 Z4
>
> > > 1 2 6 8 4 8 6 9 6 4 6 9
> > > 6 . . . .
> > > 2 7 3 8
> > > 5 . . . . . . . . . . . .
> > > 3 2 3 2 5 1 3 2 6 2 2 2 8 4
> > > 1 4 6
>
> > > HTH
>
> > > Ya- Hide quoted text -
>
> - Show quoted text -

| Next | Last
Pages: 1 2
Prev: fitting a regression line
Next: Stock market competition: request for data