From: Richard Stat on 21 Jun 2010 15:30 Dear all, I have a dataset that I want to restructure. It has multiple observations for each ID, but I don't know how many duplicates the dataset will have for each ID and it differs from ID to ID. It looks something like this: ID Q X Y Z 1 2 6 8 4 1 8 6 9 6 1 4 6 9 6 2 7 3 8 5 3 2 3 2 5 3 1 3 2 6 3 2 2 2 8 3 4 1 4 6 [...] And I want to restructure it so that I get one observation for each ID, i.e. ID Q1 Q2 Q3 Q4 X1 X2 X3 X4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4 1 2 8 4 . 6 6 6 . 8 9 9 . 4 6 6 5 2 7 . . . 3 . . . 8 . . . 5 . . . 3 2 1 2 4 3 3 2 1 2 2 2 4 5 6 8 6 [...] I have several datasets like this and they are big. Furthermore, I don't know how many duplicates I have per ID. I have tried to solve this by splitting the dataset into many datasets, i.e. one for the first observation for each ID, one for the second observation and so on, and then add a suffix to each variable, and then merger them together. But it ended up with many steps and not so automated. I assume I could use proc transpose in some way, but I am not on friendly terms with proc transpose. To sum up, any help would be greatly appreciated. All the best, Richard
From: Ya on 21 Jun 2010 16:49 On Jun 21, 12:30 pm, Richard Stat <richardstatis...(a)gmail.com> wrote: > Dear all, > > I have a dataset that I want to restructure. It has multiple > observations for each ID, but I don't know how many duplicates the > dataset will have for each ID and it differs from ID to ID. It looks > something like this: > > ID Q X Y Z > 1 2 6 8 4 > 1 8 6 9 6 > 1 4 6 9 6 > 2 7 3 8 5 > 3 2 3 2 5 > 3 1 3 2 6 > 3 2 2 2 8 > 3 4 1 4 6 > [...] > > And I want to restructure it so that I get one observation for each > ID, i.e. > > ID Q1 Q2 Q3 Q4 X1 X2 X3 X4 Y1 Y2 Y3 Y4 > Z1 Z2 Z3 Z4 > 1 2 8 4 . 6 6 6 . 8 > 9 9 . 4 6 6 5 > 2 7 . . . 3 . . . > 8 . . . 5 . .. . > 3 2 1 2 4 3 3 2 1 2 > 2 2 4 5 6 8 6 > [...] > > I have several datasets like this and they are big. Furthermore, I > don't know how many duplicates I have per ID. I have tried to solve > this by splitting the dataset into many datasets, i.e. one for the > first observation for each ID, one for the second observation and so > on, and then add a suffix to each variable, and then merger them > together. But it ended up with many steps and not so automated. I > assume I could use proc transpose in some way, but I am not on > friendly terms with proc transpose. > > To sum up, any help would be greatly appreciated. > > All the best, > Richard data have; input ID Q X Y Z; cards; 1 2 6 8 4 1 8 6 9 6 1 4 6 9 6 2 7 3 8 5 3 2 3 2 5 3 1 3 2 6 3 2 2 2 8 3 4 1 4 6 ; data have1; set have; array vv q x y z; by id; if first.id then n_=0; n_+1; do over vv; colnm=compress(vname(vv)||put(n_,best.)); val=vv; output; end; keep id colnm val; run; proc transpose data=have1 out=want (drop=_name_); by id; var val; id colnm; run; proc print; run; ID Q1 X1 Y1 Z1 Q2 X2 Y2 Z2 Q3 X3 Y3 Z3 Q4 X4 Y4 Z4 1 2 6 8 4 8 6 9 6 4 6 9 6 . . . . 2 7 3 8 5 . . . . . . . . . . . . 3 2 3 2 5 1 3 2 6 2 2 2 8 4 1 4 6 HTH Ya
From: Richard Stat on 21 Jun 2010 17:56 That was a very nice solution. It solved by problem in a very good way. Thank you so much, Ya! All the best, Richard On 21 Juni, 22:49, Ya <huang8...(a)gmail.com> wrote: > On Jun 21, 12:30 pm, Richard Stat <richardstatis...(a)gmail.com> wrote: > > > > > Dear all, > > > I have a dataset that I want to restructure. It has multiple > > observations for each ID, but I don't know how many duplicates the > > dataset will have for each ID and it differs from ID to ID. It looks > > something like this: > > > ID Q X Y Z > > 1 2 6 8 4 > > 1 8 6 9 6 > > 1 4 6 9 6 > > 2 7 3 8 5 > > 3 2 3 2 5 > > 3 1 3 2 6 > > 3 2 2 2 8 > > 3 4 1 4 6 > > [...] > > > And I want to restructure it so that I get one observation for each > > ID, i.e. > > > ID Q1 Q2 Q3 Q4 X1 X2 X3 X4 Y1 Y2 Y3 Y4 > > Z1 Z2 Z3 Z4 > > 1 2 8 4 . 6 6 6 . 8 > > 9 9 . 4 6 6 5 > > 2 7 . . . 3 . . . > > 8 . . . 5 . . . > > 3 2 1 2 4 3 3 2 1 2 > > 2 2 4 5 6 8 6 > > [...] > > > I have several datasets like this and they are big. Furthermore, I > > don't know how many duplicates I have per ID. I have tried to solve > > this by splitting the dataset into many datasets, i.e. one for the > > first observation for each ID, one for the second observation and so > > on, and then add a suffix to each variable, and then merger them > > together. But it ended up with many steps and not so automated. I > > assume I could use proc transpose in some way, but I am not on > > friendly terms with proc transpose. > > > To sum up, any help would be greatly appreciated. > > > All the best, > > Richard > > data have; > input ID Q X Y Z; > cards; > 1 2 6 8 4 > 1 8 6 9 6 > 1 4 6 9 6 > 2 7 3 8 5 > 3 2 3 2 5 > 3 1 3 2 6 > 3 2 2 2 8 > 3 4 1 4 6 > ; > > data have1; > set have; > array vv q x y z; > by id; > if first.id then n_=0; > n_+1; > do over vv; > colnm=compress(vname(vv)||put(n_,best.)); > val=vv; > output; > end; > keep id colnm val; > run; > > proc transpose data=have1 out=want (drop=_name_); > by id; > var val; > id colnm; > run; > > proc print; > run; > > ID Q1 X1 Y1 Z1 Q2 X2 Y2 Z2 Q3 X3 Y3 Z3 Q4 > X4 Y4 Z4 > > 1 2 6 8 4 8 6 9 6 4 6 9 > 6 . . . . > 2 7 3 8 > 5 . . . . . . . . . . . . > 3 2 3 2 5 1 3 2 6 2 2 2 8 4 > 1 4 6 > > HTH > > Ya
From: Richard Stat on 22 Jun 2010 02:40 Dear Ya and others, the solution presented by Ya worked for most of my datasets, but then I realized that I also have datasets with character variables. By using Ya's code, the look like this: data have; input ID Q X T U; cards; 1 2 6 Greece 4 1 8 6 United Kingdom 6 1 4 6 Spain 6 2 7 3 Norway 5 3 2 3 Finland 5 3 1 3 Greece 6 3 2 2 Germany 8 3 4 1 Democratic Republic of Congo 6 ; The datasets can have both numeric and character variables, and they can be one or many. The character variables can consist of several words and be of different length. I still want the same end result with one observation per ID, but with character variables. In the example above, T1-T4 would be character variables and should be of such length that they include the longest value in T. Any help would be greatly appreciated. All the best, Richard On 21 Juni, 23:56, Richard Stat <richardstatis...(a)gmail.com> wrote: > That was a very nice solution. It solved by problem in a very good > way. Thank you so much, Ya! > > All the best, Richard > > On 21 Juni, 22:49, Ya <huang8...(a)gmail.com> wrote: > > > On Jun 21, 12:30 pm, Richard Stat <richardstatis...(a)gmail.com> wrote: > > > > Dear all, > > > > I have a dataset that I want to restructure. It has multiple > > > observations for each ID, but I don't know how many duplicates the > > > dataset will have for each ID and it differs from ID to ID. It looks > > > something like this: > > > > ID Q X Y Z > > > 1 2 6 8 4 > > > 1 8 6 9 6 > > > 1 4 6 9 6 > > > 2 7 3 8 5 > > > 3 2 3 2 5 > > > 3 1 3 2 6 > > > 3 2 2 2 8 > > > 3 4 1 4 6 > > > [...] > > > > And I want to restructure it so that I get one observation for each > > > ID, i.e. > > > > ID Q1 Q2 Q3 Q4 X1 X2 X3 X4 Y1 Y2 Y3 Y4 > > > Z1 Z2 Z3 Z4 > > > 1 2 8 4 . 6 6 6 . 8 > > > 9 9 . 4 6 6 5 > > > 2 7 . . . 3 . . . > > > 8 . . . 5 . . . > > > 3 2 1 2 4 3 3 2 1 2 > > > 2 2 4 5 6 8 6 > > > [...] > > > > I have several datasets like this and they are big. Furthermore, I > > > don't know how many duplicates I have per ID. I have tried to solve > > > this by splitting the dataset into many datasets, i.e. one for the > > > first observation for each ID, one for the second observation and so > > > on, and then add a suffix to each variable, and then merger them > > > together. But it ended up with many steps and not so automated. I > > > assume I could use proc transpose in some way, but I am not on > > > friendly terms with proc transpose. > > > > To sum up, any help would be greatly appreciated. > > > > All the best, > > > Richard > > > data have; > > input ID Q X Y Z; > > cards; > > 1 2 6 8 4 > > 1 8 6 9 6 > > 1 4 6 9 6 > > 2 7 3 8 5 > > 3 2 3 2 5 > > 3 1 3 2 6 > > 3 2 2 2 8 > > 3 4 1 4 6 > > ; > > > data have1; > > set have; > > array vv q x y z; > > by id; > > if first.id then n_=0; > > n_+1; > > do over vv; > > colnm=compress(vname(vv)||put(n_,best.)); > > val=vv; > > output; > > end; > > keep id colnm val; > > run; > > > proc transpose data=have1 out=want (drop=_name_); > > by id; > > var val; > > id colnm; > > run; > > > proc print; > > run; > > > ID Q1 X1 Y1 Z1 Q2 X2 Y2 Z2 Q3 X3 Y3 Z3 Q4 > > X4 Y4 Z4 > > > 1 2 6 8 4 8 6 9 6 4 6 9 > > 6 . . . . > > 2 7 3 8 > > 5 . . . . . . . . . . . . > > 3 2 3 2 5 1 3 2 6 2 2 2 8 4 > > 1 4 6 > > > HTH > > > Ya
From: Arthur Tabachneck on 22 Jun 2010 11:27 Richard, You could use the same approach separately for numeric and alphanumeric variables, and then interleave the two resulting files. E.g.: data have; informat T $40.; input ID Q X T & U; cards; 1 2 6 Greece 4 1 8 6 United Kingdom 6 1 4 6 Spain 6 2 7 3 Norway 5 3 2 3 Finland 5 3 1 3 Greece 6 3 2 2 Germany 8 3 4 1 Democratic Republic of Congo 6 ; data have1 (keep=id colnm val) have2 (keep=id colnm valt); set have; array vv q x u; array tt $40. t; by id; if first.id then n_=0; n_+1; do over vv; colnm=compress(vname(vv)||put(n_,best.)); val=vv; output have1; end; do over tt; colnm=compress(vname(tt)||put(n_,best.)); valt=tt; output have2; end; run; proc transpose data=have1 out=want1 (drop=_name_); by id; var val; id colnm; run; proc transpose data=have2 out=want2 (drop=_name_); by id; var valt; id colnm; run; data want; set want1; set want2; run; HTH, Art ------------ On Jun 22, 2:40 am, Richard Stat <richardstatis...(a)gmail.com> wrote: > Dear Ya and others, > > the solution presented by Ya worked for most of my datasets, but then > I realized that I also have datasets with character variables. By > using Ya's code, the look like this: > > data have; > input ID Q X T U; > cards; > 1 2 6 Greece 4 > 1 8 6 United Kingdom 6 > 1 4 6 Spain 6 > 2 7 3 Norway 5 > 3 2 3 Finland 5 > 3 1 3 Greece 6 > 3 2 2 Germany 8 > 3 4 1 Democratic Republic of Congo 6 > ; > > The datasets can have both numeric and character variables, and they > can be one or many. The character variables can consist of several > words and be of different length. I still want the same end result > with one observation per ID, but with character variables. In the > example above, T1-T4 would be character variables and should be of > such length that they include the longest value in T. > > Any help would be greatly appreciated. > > All the best, > Richard > > On 21 Juni, 23:56, Richard Stat <richardstatis...(a)gmail.com> wrote: > > > > > That was a very nice solution. It solved by problem in a very good > > way. Thank you so much, Ya! > > > All the best, Richard > > > On 21 Juni, 22:49, Ya <huang8...(a)gmail.com> wrote: > > > > On Jun 21, 12:30 pm, Richard Stat <richardstatis...(a)gmail.com> wrote: > > > > > Dear all, > > > > > I have a dataset that I want to restructure. It has multiple > > > > observations for each ID, but I don't know how many duplicates the > > > > dataset will have for each ID and it differs from ID to ID. It looks > > > > something like this: > > > > > ID Q X Y Z > > > > 1 2 6 8 4 > > > > 1 8 6 9 6 > > > > 1 4 6 9 6 > > > > 2 7 3 8 5 > > > > 3 2 3 2 5 > > > > 3 1 3 2 6 > > > > 3 2 2 2 8 > > > > 3 4 1 4 6 > > > > [...] > > > > > And I want to restructure it so that I get one observation for each > > > > ID, i.e. > > > > > ID Q1 Q2 Q3 Q4 X1 X2 X3 X4 Y1 Y2 Y3 Y4 > > > > Z1 Z2 Z3 Z4 > > > > 1 2 8 4 . 6 6 6 . 8 > > > > 9 9 . 4 6 6 5 > > > > 2 7 . . . 3 . . . > > > > 8 . . . 5 . . . > > > > 3 2 1 2 4 3 3 2 1 2 > > > > 2 2 4 5 6 8 6 > > > > [...] > > > > > I have several datasets like this and they are big. Furthermore, I > > > > don't know how many duplicates I have per ID. I have tried to solve > > > > this by splitting the dataset into many datasets, i.e. one for the > > > > first observation for each ID, one for the second observation and so > > > > on, and then add a suffix to each variable, and then merger them > > > > together. But it ended up with many steps and not so automated. I > > > > assume I could use proc transpose in some way, but I am not on > > > > friendly terms with proc transpose. > > > > > To sum up, any help would be greatly appreciated. > > > > > All the best, > > > > Richard > > > > data have; > > > input ID Q X Y Z; > > > cards; > > > 1 2 6 8 4 > > > 1 8 6 9 6 > > > 1 4 6 9 6 > > > 2 7 3 8 5 > > > 3 2 3 2 5 > > > 3 1 3 2 6 > > > 3 2 2 2 8 > > > 3 4 1 4 6 > > > ; > > > > data have1; > > > set have; > > > array vv q x y z; > > > by id; > > > if first.id then n_=0; > > > n_+1; > > > do over vv; > > > colnm=compress(vname(vv)||put(n_,best.)); > > > val=vv; > > > output; > > > end; > > > keep id colnm val; > > > run; > > > > proc transpose data=have1 out=want (drop=_name_); > > > by id; > > > var val; > > > id colnm; > > > run; > > > > proc print; > > > run; > > > > ID Q1 X1 Y1 Z1 Q2 X2 Y2 Z2 Q3 X3 Y3 Z3 Q4 > > > X4 Y4 Z4 > > > > 1 2 6 8 4 8 6 9 6 4 6 9 > > > 6 . . . . > > > 2 7 3 8 > > > 5 . . . . . . . . . . . . > > > 3 2 3 2 5 1 3 2 6 2 2 2 8 4 > > > 1 4 6 > > > > HTH > > > > Ya- Hide quoted text - > > - Show quoted text -
|
Next
|
Last
Pages: 1 2 Prev: fitting a regression line Next: Stock market competition: request for data |