From: Tom Abernathy on 7 Mar 2010 13:45 Why not just merge by ID and VISNO? You will not lose any information. Perhaps you want to carry-forward the values for X or Y onto the missing visits? You can do that by generating new variables and using the RETAIN statement. data one; input id visno x $ @@; cards; 1 1 N 1 2 N 1 3 Y 1 4 N 1 5 N 2 1 N 2 2 Y 2 3 N 2 4 N 2 5 N run; data two; input id visno y $ @@; cards; 1 0 N 1 3 N 1 4 Y 1 6 N 1 8 N 2 0 N 2 4 Y 2 6 N 2 8 N 2 10 N run; data new; merge one(in=in1) two(in=in2); by id visno; if first.id then do; newx=' ' ; newy=' '; end; retain newx newy; format visno z2.; if in1 then newx=x; if in2 then newy=y; put id visno newx newy; if last.id then put; run; 1 00 N 1 01 N N 1 02 N N 1 03 Y N 1 04 N Y 1 05 N Y 1 06 N N 1 08 N N 2 00 N 2 01 N N 2 02 Y N 2 03 N N 2 04 N Y 2 05 N Y 2 06 N N 2 08 N N 2 10 N N On Mar 7, 10:56 am, jiji <joelleahal...(a)gmail.com> wrote: > Hi, > > I am having difficulty with a many-to-many merge between two > datasets. > The two datasets actually have 2 variables in common (ID and VISNO), > but I want to merge on ID only since I would lose some data from the > first dataset if I merge with ID and VISNO. Here are both my merge and > procSQL codes: > > proc sort data=dataset1; by ID; > proc sort data=dataset2; by ID; > Data want; > Merge > dataset1 (IN=in1) > dataset2; > By ID; > if in1; > Run; > > OR > > PROC SQL; > CREATE TABLE want AS > SELECT * > FROM dataset1,dataset2 > WHERE dataset1.ID=dataset2.ID > ; > QUIT; > > Here is a snapshot of how the data looks: > dataset1: > > ID VISNO Varx > 1 1 N > 1 2 N > 1 3 Y > 1 4 N > 1 5 N > 2 1 N > 2 2 Y > 2 3 N > 2 4 N > 2 5 N > > dataset2: > > ID VISNO Varx > 1 0 N > 1 3 N > 1 4 Y > 1 6 N > 1 8 N > 2 0 N > 2 4 Y > 2 6 N > 2 8 N > 2 10 N
From: Arthur Tabachneck on 7 Mar 2010 13:01 Joelle, There is a time delay (sometimes a large time delay) between when posts are appear on the listserv and Google group's copy of the listserv. I can't recall on which I had placed my original post. However, your problem doesn't appear to be a many-to-many merge but, rather, a straight forward merge using both id and visno. e.g.,: data dataset1; input ID VISNO Varx $; cards; 1 1 N 1 2 N 1 3 Y 1 4 N 1 5 N 2 1 N 2 2 Y 2 3 N 2 4 N 2 5 N ; data dataset2; input ID VISNO Vary $; cards; 1 0 N 1 3 N 1 4 Y 1 6 N 1 8 N 2 0 N 2 4 Y 2 6 N 2 8 N 2 10 N ; proc sort data=dataset1; by ID VISNO;run; proc sort data=dataset2; by ID VISNO;run; data want; Merge dataset1 dataset2; By ID VISNO; run; HTH, Art -----Original Message----- From: Joelle Hallak <joelleahallak(a)gmail.com> To: Arthur Tabachneck <art297(a)netscape.net> Sent: Sun, Mar 7, 2010 11:52 am Subject: Re: Many-to-many merge Hi Arthur, for some reason I do not see your question posted on the discussion page. Anyway, i posted my answer to your question there I should have changed VarX to Var Y in dataset2. The two datasets do not have variables in common. I want the final dataset to look something like this ID � � VISNO � � Varx � � � Vary 1 � � � � 0 � � � � � � . � � � � � N 1 � � � � 1 � � � � � � N � � � � � . 1 � � � � 2 � � � � � � N � � � � � . 1 � � � � 3 � � � � � � Y � � � � N 1 � � � � 4 � � � � � � N � � � � Y 1 � � � � 5 � � � � � � N � � � � . 1 � � � � 6 � � � � � � . � � � � � N 1 � � � � 8 � � � � � � . � � � � � N Thanks, On Sun, Mar 7, 2010 at 10:39 AM, Arthur Tabachneck <art297(a)netscape.net> wrote: jiji, Given the sample data you provided, what do you want the resulting file to look like? � There are many possibilities and, for both ids, visno 4 has conflicting values for varx. Art ------------- On Mar 7, 10:56� am, jiji <joelleahal...(a)gmail.com> wrote: > Hi, > > I am having difficulty with a many-to-many merge between two > datasets. > The two datasets actually have 2 variables in common (ID and VISNO), > but I want to merge on ID only since I would lose some data from the > first dataset if I merge with ID and VISNO. Here are both my merge and > procSQL codes: > > proc sort data=dataset1; by ID; > proc sort data=dataset2; by ID; > Data want; > Merge > dataset1 (IN=in1) > dataset2; > By ID; > if in1; > Run; > > OR > > PROC SQL; > CREATE TABLE want AS > SELECT * > FROM dataset1,dataset2 > WHERE dataset1.ID=dataset2.ID > ; > QUIT; > > Here is a snapshot of how the data looks: > dataset1: > > ID � � VISNO � � � Varx > 1 � � � � 1 � � � � � � N > 1 � � � � 2 � � � � � � N > 1 � � � � 3 � � � � � � Y > 1 � � � � 4 � � � � � � N > 1 � � � � 5 � � � � � � N > 2 � � � � 1 � � � � � � N > 2 � � � � 2 � � � � � � Y > 2 � � � � 3 � � � � � � N > 2 � � � � 4 � � � � � � N > 2 � � � � 5 � � � � � � N > > dataset2: > > ID � � VISNO � � � Varx > 1 � � � � 0 � � � � � � N > 1 � � � � 3 � � � � � � N > 1 � � � � 4 � � � � � � Y > 1 � � � � 6 � � � � � � N > 1 � � � � 8 � � � � � � N > 2 � � � � 0 � � � � � � N > 2 � � � � 4 � � � � � � Y > 2 � � � � 6 � � � � � � N > 2 � � � � 8 � � � � � � N > 2 � � � 10 � � � � � � N
From: jiji on 7 Mar 2010 15:17 On Mar 7, 12:45 pm, Tom Abernathy <tom.aberna...(a)gmail.com> wrote: > Why not just merge by ID and VISNO? You will not lose any > information. > Perhaps you want to carry-forward the values for X or Y onto the > missing visits? > You can do that by generating new variables and using the RETAIN > statement. > > data one; > input id visno x $ @@; > cards; > 1 1 N 1 2 N 1 3 Y 1 4 N 1 5 N > 2 1 N 2 2 Y 2 3 N 2 4 N 2 5 N > run; > > data two; > input id visno y $ @@; > cards; > 1 0 N 1 3 N 1 4 Y 1 6 N 1 8 N > 2 0 N 2 4 Y 2 6 N 2 8 N 2 10 N > run; > > data new; > merge one(in=in1) two(in=in2); > by id visno; > if first.id then do; newx=' ' ; newy=' '; end; > retain newx newy; > format visno z2.; > if in1 then newx=x; > if in2 then newy=y; > put id visno newx newy; > if last.id then put; > run; > > 1 00 N > 1 01 N N > 1 02 N N > 1 03 Y N > 1 04 N Y > 1 05 N Y > 1 06 N N > 1 08 N N > > 2 00 N > 2 01 N N > 2 02 Y N > 2 03 N N > 2 04 N Y > 2 05 N Y > 2 06 N N > 2 08 N N > 2 10 N N > > On Mar 7, 10:56 am, jiji <joelleahal...(a)gmail.com> wrote: > > > > > Hi, > > > I am having difficulty with a many-to-many merge between two > > datasets. > > The two datasets actually have 2 variables in common (ID and VISNO), > > but I want to merge on ID only since I would lose some data from the > > first dataset if I merge with ID and VISNO. Here are both my merge and > > procSQL codes: > > > proc sort data=dataset1; by ID; > > proc sort data=dataset2; by ID; > > Data want; > > Merge > > dataset1 (IN=in1) > > dataset2; > > By ID; > > if in1; > > Run; > > > OR > > > PROC SQL; > > CREATE TABLE want AS > > SELECT * > > FROM dataset1,dataset2 > > WHERE dataset1.ID=dataset2.ID > > ; > > QUIT; > > > Here is a snapshot of how the data looks: > > dataset1: > > > ID VISNO Varx > > 1 1 N > > 1 2 N > > 1 3 Y > > 1 4 N > > 1 5 N > > 2 1 N > > 2 2 Y > > 2 3 N > > 2 4 N > > 2 5 N > > > dataset2: > > > ID VISNO Varx > > 1 0 N > > 1 3 N > > 1 4 Y > > 1 6 N > > 1 8 N > > 2 0 N > > 2 4 Y > > 2 6 N > > 2 8 N > > 2 10 N- Hide quoted text - > > - Show quoted text - Thank you Tom. How can I apply this to all the dataset. Since I have 102 variables. I tried the following, however it still did not retain the missing VISNO in either dataset proc sort data=dataset1; by ID VISNO; proc sort data=dataset2; by ID VISNO; data new; merge dataset1(in=in1) dataset2(in=in2); by id visno; if first.id then do; end; retain; if in1; if in2; put id; if last.id then put; run; run;
From: jiji on 7 Mar 2010 15:37 On Mar 7, 2:17 pm, jiji <joelleahal...(a)gmail.com> wrote: > On Mar 7, 12:45 pm, Tom Abernathy <tom.aberna...(a)gmail.com> wrote: > > > > > > > Why not just merge by ID and VISNO? You will not lose any > > information. > > Perhaps you want to carry-forward the values for X or Y onto the > > missing visits? > > You can do that by generating new variables and using the RETAIN > > statement. > > > data one; > > input id visno x $ @@; > > cards; > > 1 1 N 1 2 N 1 3 Y 1 4 N 1 5 N > > 2 1 N 2 2 Y 2 3 N 2 4 N 2 5 N > > run; > > > data two; > > input id visno y $ @@; > > cards; > > 1 0 N 1 3 N 1 4 Y 1 6 N 1 8 N > > 2 0 N 2 4 Y 2 6 N 2 8 N 2 10 N > > run; > > > data new; > > merge one(in=in1) two(in=in2); > > by id visno; > > if first.id then do; newx=' ' ; newy=' '; end; > > retain newx newy; > > format visno z2.; > > if in1 then newx=x; > > if in2 then newy=y; > > put id visno newx newy; > > if last.id then put; > > run; > > > 1 00 N > > 1 01 N N > > 1 02 N N > > 1 03 Y N > > 1 04 N Y > > 1 05 N Y > > 1 06 N N > > 1 08 N N > > > 2 00 N > > 2 01 N N > > 2 02 Y N > > 2 03 N N > > 2 04 N Y > > 2 05 N Y > > 2 06 N N > > 2 08 N N > > 2 10 N N > > > On Mar 7, 10:56 am, jiji <joelleahal...(a)gmail.com> wrote: > > > > Hi, > > > > I am having difficulty with a many-to-many merge between two > > > datasets. > > > The two datasets actually have 2 variables in common (ID and VISNO), > > > but I want to merge on ID only since I would lose some data from the > > > first dataset if I merge with ID and VISNO. Here are both my merge and > > > procSQL codes: > > > > proc sort data=dataset1; by ID; > > > proc sort data=dataset2; by ID; > > > Data want; > > > Merge > > > dataset1 (IN=in1) > > > dataset2; > > > By ID; > > > if in1; > > > Run; > > > > OR > > > > PROC SQL; > > > CREATE TABLE want AS > > > SELECT * > > > FROM dataset1,dataset2 > > > WHERE dataset1.ID=dataset2.ID > > > ; > > > QUIT; > > > > Here is a snapshot of how the data looks: > > > dataset1: > > > > ID VISNO Varx > > > 1 1 N > > > 1 2 N > > > 1 3 Y > > > 1 4 N > > > 1 5 N > > > 2 1 N > > > 2 2 Y > > > 2 3 N > > > 2 4 N > > > 2 5 N > > > > dataset2: > > > > ID VISNO Varx > > > 1 0 N > > > 1 3 N > > > 1 4 Y > > > 1 6 N > > > 1 8 N > > > 2 0 N > > > 2 4 Y > > > 2 6 N > > > 2 8 N > > > 2 10 N- Hide quoted text - > > > - Show quoted text - > > Thank you Tom. > How can I apply this to all the dataset. Since I have 102 variables. I > tried the following, however it still did not retain the missing VISNO > in either dataset > > proc sort data=dataset1; by ID VISNO; > proc sort data=dataset2; by ID VISNO; > data new; > merge dataset1(in=in1) dataset2(in=in2); > by id visno; > if first.id then do; end; > retain; > if in1; > if in2; > put id; > if last.id then put; > run; > run;- Hide quoted text - > > - Show quoted text - It actually worked with the following code: data want; merge dataset1(in=in1) dataset2(in=in2); by id visno; retain if in1 or in2; run; Thanks!!
From: Marianne Weires on 7 Mar 2010 15:42
Hey, you could also try a FULL JOIN on ID and VISNO or a NATURAL FULL JOIN. The PROC SQL code posted earlier was an INNER JOIN. /*full join*/ proc sql; select coalesce(d1.id,d2.id) as id, coalesce(d1.visno,d2.visno) as visno,varx,vary from dataset1 d1 full join dataset2 d2 on d1.id = d2.id and d1.visno = d2.visno; quit; /*equivalent to the code above, but less coding for you if the key variables have the same names*/ proc sql; select * from dataset1 natural full join dataset2; quit; HTH, Marianne On Sun, 7 Mar 2010 13:01:32 -0500, Arthur Tabachneck <art297(a)NETSCAPE.NET> wrote: >Joelle, > >There is a time delay (sometimes a large time delay) between when posts >are appear on the listserv and Google group's copy of the listserv. I >can't recall on which I had placed my original post. > >However, your problem doesn't appear to be a many-to-many merge but, >rather, a straight forward merge using both id and visno. e.g.,: > >data dataset1; > input ID VISNO Varx $; > cards; >1 1 N >1 2 N >1 3 Y >1 4 N >1 5 N >2 1 N >2 2 Y >2 3 N >2 4 N >2 5 N >; > >data dataset2; > input ID VISNO Vary $; > cards; >1 0 N >1 3 N >1 4 Y >1 6 N >1 8 N >2 0 N >2 4 Y >2 6 N >2 8 N >2 10 N >; > >proc sort data=dataset1; by ID VISNO;run; >proc sort data=dataset2; by ID VISNO;run; > >data want; > Merge dataset1 dataset2; > By ID VISNO; >run; > >HTH, >Art >-----Original Message----- >From: Joelle Hallak <joelleahallak(a)gmail.com> >To: Arthur Tabachneck <art297(a)netscape.net> >Sent: Sun, Mar 7, 2010 11:52 am >Subject: Re: Many-to-many merge > >Hi Arthur, >for some reason I do not see your question posted on the discussion >page. >Anyway, i posted my answer to your question there > >I should have changed VarX to Var Y in dataset2. The two datasets do > not have variables in common. > >I want the final dataset to look something like this > ID � � VISNO � � Varx � � � Vary > 1 � � � � 0 � � � � � � . � � � � � N > 1 � � � � 1 � � � � � � N � � � � � . > 1 � � � � 2 � � � � � � N � � � � � . > 1 � � � � 3 � � � � � � Y � � � � N > 1 � � � � 4 � � � � � � N � � � � Y > 1 � � � � 5 � � � � � � N � � � � . > 1 � � � � 6 � � � � � � . � � � � � N > 1 � � � � 8 � � � � � � . � � � � � N > >Thanks, > >On Sun, Mar 7, 2010 at 10:39 AM, Arthur Tabachneck ><art297(a)netscape.net> wrote: >jiji, > >Given the sample data you provided, what do you want the resulting >file to look like? � There are many possibilities and, for both ids, >visno 4 has conflicting values for varx. > >Art >------------- >On Mar 7, 10:56� am, jiji <joelleahal...(a)gmail.com> wrote: >> Hi, >> >> I am having difficulty with a many-to-many merge between two >> datasets. >> The two datasets actually have 2 variables in common (ID and >VISNO), >> but I want to merge on ID only since I would lose some data from >the >> first dataset if I merge with ID and VISNO. Here are both my merge >and >> procSQL codes: >> >> proc sort data=dataset1; by ID; >> proc sort data=dataset2; by ID; >> Data want; >> Merge >> dataset1 (IN=in1) >> dataset2; >> By ID; >> if in1; >> Run; >> >> OR >> >> PROC SQL; >> CREATE TABLE want AS >> SELECT * >> FROM dataset1,dataset2 >> WHERE dataset1.ID=dataset2.ID >> ; >> QUIT; >> >> Here is a snapshot of how the data looks: >> dataset1: >> >> ID � � VISNO � � � Varx >> 1 � � � � 1 � � � � � � N >> 1 � � � � 2 � � � � � � N >> 1 � � � � 3 � � � � � � Y >> 1 � � � � 4 � � � � � � N >> 1 � � � � 5 � � � � � � N >> 2 � � � � 1 � � � � � � N >> 2 � � � � 2 � � � � � � Y >> 2 � � � � 3 � � � � � � N >> 2 � � � � 4 � � � � � � N >> 2 � � � � 5 � � � � � � N >> >> dataset2: >> >> ID � � VISNO � � � Varx >> 1 � � � � 0 � � � � � � N >> 1 � � � � 3 � � � � � � N >> 1 � � � � 4 � � � � � � Y >> 1 � � � � 6 � � � � � � N >> 1 � � � � 8 � � � � � � N >> 2 � � � � 0 � � � � � � N >> 2 � � � � 4 � � � � � � Y >> 2 � � � � 6 � � � � � � N >> 2 � � � � 8 � � � � � � N >> 2 � � � 10 � � � � � � N |