From: Tom Abernathy on
Why not just merge by ID and VISNO? You will not lose any
information.
Perhaps you want to carry-forward the values for X or Y onto the
missing visits?
You can do that by generating new variables and using the RETAIN
statement.

data one;
input id visno x $ @@;
cards;
1 1 N 1 2 N 1 3 Y 1 4 N 1 5 N
2 1 N 2 2 Y 2 3 N 2 4 N 2 5 N
run;

data two;
input id visno y $ @@;
cards;
1 0 N 1 3 N 1 4 Y 1 6 N 1 8 N
2 0 N 2 4 Y 2 6 N 2 8 N 2 10 N
run;

data new;
merge one(in=in1) two(in=in2);
by id visno;
if first.id then do; newx=' ' ; newy=' '; end;
retain newx newy;
format visno z2.;
if in1 then newx=x;
if in2 then newy=y;
put id visno newx newy;
if last.id then put;
run;


1 00 N
1 01 N N
1 02 N N
1 03 Y N
1 04 N Y
1 05 N Y
1 06 N N
1 08 N N

2 00 N
2 01 N N
2 02 Y N
2 03 N N
2 04 N Y
2 05 N Y
2 06 N N
2 08 N N
2 10 N N

On Mar 7, 10:56 am, jiji <joelleahal...(a)gmail.com> wrote:
> Hi,
>
> I am having difficulty with a many-to-many merge between two
> datasets.
> The two datasets actually have 2 variables in common (ID and VISNO),
> but I want to merge on ID only since I would lose some data from the
> first dataset if I merge with ID and VISNO. Here are both my merge and
> procSQL codes:
>
> proc sort data=dataset1; by ID;
> proc sort data=dataset2; by ID;
> Data want;
> Merge
> dataset1 (IN=in1)
> dataset2;
> By ID;
> if in1;
> Run;
>
> OR
>
> PROC SQL;
> CREATE TABLE want AS
> SELECT *
> FROM dataset1,dataset2
> WHERE dataset1.ID=dataset2.ID
> ;
> QUIT;
>
> Here is a snapshot of how the data looks:
> dataset1:
>
> ID    VISNO      Varx
> 1        1             N
> 1        2             N
> 1        3             Y
> 1        4             N
> 1        5             N
> 2        1             N
> 2        2             Y
> 2        3             N
> 2        4             N
> 2        5             N
>
> dataset2:
>
> ID    VISNO      Varx
> 1        0             N
> 1        3             N
> 1        4             Y
> 1        6             N
> 1        8             N
> 2        0             N
> 2        4             Y
> 2        6             N
> 2        8             N
> 2      10             N

From: Arthur Tabachneck on
Joelle,

There is a time delay (sometimes a large time delay) between when posts
are appear on the listserv and Google group's copy of the listserv. I
can't recall on which I had placed my original post.

However, your problem doesn't appear to be a many-to-many merge but,
rather, a straight forward merge using both id and visno. e.g.,:

data dataset1;
input ID VISNO Varx $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
2 1 N
2 2 Y
2 3 N
2 4 N
2 5 N
;

data dataset2;
input ID VISNO Vary $;
cards;
1 0 N
1 3 N
1 4 Y
1 6 N
1 8 N
2 0 N
2 4 Y
2 6 N
2 8 N
2 10 N
;

proc sort data=dataset1; by ID VISNO;run;
proc sort data=dataset2; by ID VISNO;run;

data want;
Merge dataset1 dataset2;
By ID VISNO;
run;

HTH,
Art
-----Original Message-----
From: Joelle Hallak <joelleahallak(a)gmail.com>
To: Arthur Tabachneck <art297(a)netscape.net>
Sent: Sun, Mar 7, 2010 11:52 am
Subject: Re: Many-to-many merge

Hi Arthur,
for some reason I do not see your question posted on the discussion
page.
Anyway, i posted my answer to your question there

I should have changed VarX to Var Y in dataset2. The two datasets do
not have variables in common.

I want the final dataset to look something like this
ID � � VISNO � � Varx � � � Vary
1 � � � � 0 � � � � � � . � � � � � N
1 � � � � 1 � � � � � � N � � � � � .
1 � � � � 2 � � � � � � N � � � � � .
1 � � � � 3 � � � � � � Y � � � � N
1 � � � � 4 � � � � � � N � � � � Y
1 � � � � 5 � � � � � � N � � � � .
1 � � � � 6 � � � � � � . � � � � � N
1 � � � � 8 � � � � � � . � � � � � N

Thanks,

On Sun, Mar 7, 2010 at 10:39 AM, Arthur Tabachneck
&lt;art297(a)netscape.net&gt; wrote:
jiji,

Given the sample data you provided, what do you want the resulting
file to look like? � There are many possibilities and, for both ids,
visno 4 has conflicting values for varx.

Art
-------------
On Mar 7, 10:56� am, jiji &lt;joelleahal...(a)gmail.com&gt; wrote:
&gt; Hi,
&gt;
&gt; I am having difficulty with a many-to-many merge between two
&gt; datasets.
&gt; The two datasets actually have 2 variables in common (ID and
VISNO),
&gt; but I want to merge on ID only since I would lose some data from
the
&gt; first dataset if I merge with ID and VISNO. Here are both my merge
and
&gt; procSQL codes:
&gt;
&gt; proc sort data=dataset1; by ID;
&gt; proc sort data=dataset2; by ID;
&gt; Data want;
&gt; Merge
&gt; dataset1 (IN=in1)
&gt; dataset2;
&gt; By ID;
&gt; if in1;
&gt; Run;
&gt;
&gt; OR
&gt;
&gt; PROC SQL;
&gt; CREATE TABLE want AS
&gt; SELECT *
&gt; FROM dataset1,dataset2
&gt; WHERE dataset1.ID=dataset2.ID
&gt; ;
&gt; QUIT;
&gt;
&gt; Here is a snapshot of how the data looks:
&gt; dataset1:
&gt;
&gt; ID � � VISNO � � � Varx
&gt; 1 � � � � 1 � � � � � � N
&gt; 1 � � � � 2 � � � � � � N
&gt; 1 � � � � 3 � � � � � � Y
&gt; 1 � � � � 4 � � � � � � N
&gt; 1 � � � � 5 � � � � � � N
&gt; 2 � � � � 1 � � � � � � N
&gt; 2 � � � � 2 � � � � � � Y
&gt; 2 � � � � 3 � � � � � � N
&gt; 2 � � � � 4 � � � � � � N
&gt; 2 � � � � 5 � � � � � � N
&gt;
&gt; dataset2:
&gt;
&gt; ID � � VISNO � � � Varx
&gt; 1 � � � � 0 � � � � � � N
&gt; 1 � � � � 3 � � � � � � N
&gt; 1 � � � � 4 � � � � � � Y
&gt; 1 � � � � 6 � � � � � � N
&gt; 1 � � � � 8 � � � � � � N
&gt; 2 � � � � 0 � � � � � � N
&gt; 2 � � � � 4 � � � � � � Y
&gt; 2 � � � � 6 � � � � � � N
&gt; 2 � � � � 8 � � � � � � N
&gt; 2 � � � 10 � � � � � � N
From: jiji on
On Mar 7, 12:45 pm, Tom Abernathy <tom.aberna...(a)gmail.com> wrote:
> Why not just merge by ID and VISNO?  You will not lose any
> information.
> Perhaps you want to carry-forward the values for X or Y onto the
> missing visits?
> You can do that by generating new variables and using the RETAIN
> statement.
>
> data one;
>  input id visno x $ @@;
> cards;
> 1 1 N 1 2 N 1 3 Y 1 4 N 1 5 N
> 2 1 N 2 2 Y 2 3 N 2 4 N 2 5 N
> run;
>
> data two;
>  input id visno y $ @@;
> cards;
> 1 0 N 1 3 N 1 4 Y 1 6 N 1 8 N
> 2 0 N 2 4 Y 2 6 N 2 8 N 2 10 N
> run;
>
> data new;
>   merge one(in=in1) two(in=in2);
>   by id visno;
>   if first.id then do; newx=' ' ; newy=' '; end;
>   retain newx newy;
>   format visno z2.;
>   if in1 then newx=x;
>   if in2 then newy=y;
>   put id visno newx newy;
>   if last.id then put;
> run;
>
> 1 00   N
> 1 01 N N
> 1 02 N N
> 1 03 Y N
> 1 04 N Y
> 1 05 N Y
> 1 06 N N
> 1 08 N N
>
> 2 00   N
> 2 01 N N
> 2 02 Y N
> 2 03 N N
> 2 04 N Y
> 2 05 N Y
> 2 06 N N
> 2 08 N N
> 2 10 N N
>
> On Mar 7, 10:56 am, jiji <joelleahal...(a)gmail.com> wrote:
>
>
>
> > Hi,
>
> > I am having difficulty with a many-to-many merge between two
> > datasets.
> > The two datasets actually have 2 variables in common (ID and VISNO),
> > but I want to merge on ID only since I would lose some data from the
> > first dataset if I merge with ID and VISNO. Here are both my merge and
> > procSQL codes:
>
> > proc sort data=dataset1; by ID;
> > proc sort data=dataset2; by ID;
> > Data want;
> > Merge
> > dataset1 (IN=in1)
> > dataset2;
> > By ID;
> > if in1;
> > Run;
>
> > OR
>
> > PROC SQL;
> > CREATE TABLE want AS
> > SELECT *
> > FROM dataset1,dataset2
> > WHERE dataset1.ID=dataset2.ID
> > ;
> > QUIT;
>
> > Here is a snapshot of how the data looks:
> > dataset1:
>
> > ID    VISNO      Varx
> > 1        1             N
> > 1        2             N
> > 1        3             Y
> > 1        4             N
> > 1        5             N
> > 2        1             N
> > 2        2             Y
> > 2        3             N
> > 2        4             N
> > 2        5             N
>
> > dataset2:
>
> > ID    VISNO      Varx
> > 1        0             N
> > 1        3             N
> > 1        4             Y
> > 1        6             N
> > 1        8             N
> > 2        0             N
> > 2        4             Y
> > 2        6             N
> > 2        8             N
> > 2      10             N- Hide quoted text -
>
> - Show quoted text -

Thank you Tom.
How can I apply this to all the dataset. Since I have 102 variables. I
tried the following, however it still did not retain the missing VISNO
in either dataset

proc sort data=dataset1; by ID VISNO;
proc sort data=dataset2; by ID VISNO;
data new;
merge dataset1(in=in1) dataset2(in=in2);
by id visno;
if first.id then do; end;
retain;
if in1;
if in2;
put id;
if last.id then put;
run;
run;
From: jiji on
On Mar 7, 2:17 pm, jiji <joelleahal...(a)gmail.com> wrote:
> On Mar 7, 12:45 pm, Tom Abernathy <tom.aberna...(a)gmail.com> wrote:
>
>
>
>
>
> > Why not just merge by ID and VISNO?  You will not lose any
> > information.
> > Perhaps you want to carry-forward the values for X or Y onto the
> > missing visits?
> > You can do that by generating new variables and using the RETAIN
> > statement.
>
> > data one;
> >  input id visno x $ @@;
> > cards;
> > 1 1 N 1 2 N 1 3 Y 1 4 N 1 5 N
> > 2 1 N 2 2 Y 2 3 N 2 4 N 2 5 N
> > run;
>
> > data two;
> >  input id visno y $ @@;
> > cards;
> > 1 0 N 1 3 N 1 4 Y 1 6 N 1 8 N
> > 2 0 N 2 4 Y 2 6 N 2 8 N 2 10 N
> > run;
>
> > data new;
> >   merge one(in=in1) two(in=in2);
> >   by id visno;
> >   if first.id then do; newx=' ' ; newy=' '; end;
> >   retain newx newy;
> >   format visno z2.;
> >   if in1 then newx=x;
> >   if in2 then newy=y;
> >   put id visno newx newy;
> >   if last.id then put;
> > run;
>
> > 1 00   N
> > 1 01 N N
> > 1 02 N N
> > 1 03 Y N
> > 1 04 N Y
> > 1 05 N Y
> > 1 06 N N
> > 1 08 N N
>
> > 2 00   N
> > 2 01 N N
> > 2 02 Y N
> > 2 03 N N
> > 2 04 N Y
> > 2 05 N Y
> > 2 06 N N
> > 2 08 N N
> > 2 10 N N
>
> > On Mar 7, 10:56 am, jiji <joelleahal...(a)gmail.com> wrote:
>
> > > Hi,
>
> > > I am having difficulty with a many-to-many merge between two
> > > datasets.
> > > The two datasets actually have 2 variables in common (ID and VISNO),
> > > but I want to merge on ID only since I would lose some data from the
> > > first dataset if I merge with ID and VISNO. Here are both my merge and
> > > procSQL codes:
>
> > > proc sort data=dataset1; by ID;
> > > proc sort data=dataset2; by ID;
> > > Data want;
> > > Merge
> > > dataset1 (IN=in1)
> > > dataset2;
> > > By ID;
> > > if in1;
> > > Run;
>
> > > OR
>
> > > PROC SQL;
> > > CREATE TABLE want AS
> > > SELECT *
> > > FROM dataset1,dataset2
> > > WHERE dataset1.ID=dataset2.ID
> > > ;
> > > QUIT;
>
> > > Here is a snapshot of how the data looks:
> > > dataset1:
>
> > > ID    VISNO      Varx
> > > 1        1             N
> > > 1        2             N
> > > 1        3             Y
> > > 1        4             N
> > > 1        5             N
> > > 2        1             N
> > > 2        2             Y
> > > 2        3             N
> > > 2        4             N
> > > 2        5             N
>
> > > dataset2:
>
> > > ID    VISNO      Varx
> > > 1        0             N
> > > 1        3             N
> > > 1        4             Y
> > > 1        6             N
> > > 1        8             N
> > > 2        0             N
> > > 2        4             Y
> > > 2        6             N
> > > 2        8             N
> > > 2      10             N- Hide quoted text -
>
> > - Show quoted text -
>
> Thank you Tom.
> How can I apply this to all the dataset. Since I have 102 variables. I
> tried the following, however it still did not retain the missing VISNO
> in either dataset
>
> proc sort data=dataset1; by ID VISNO;
> proc sort data=dataset2; by ID VISNO;
> data new;
>   merge dataset1(in=in1) dataset2(in=in2);
>   by id visno;
>   if first.id then do; end;
>   retain;
>   if in1;
>   if in2;
>   put id;
>   if last.id then put;
>   run;
> run;- Hide quoted text -
>
> - Show quoted text -

It actually worked with the following code:
data want;
merge dataset1(in=in1) dataset2(in=in2);
by id visno;
retain if in1 or in2;
run;

Thanks!!
From: Marianne Weires on
Hey,

you could also try a FULL JOIN on ID and VISNO or a NATURAL FULL JOIN.
The PROC SQL code posted earlier was an INNER JOIN.

/*full join*/
proc sql;
select coalesce(d1.id,d2.id) as id, coalesce(d1.visno,d2.visno) as visno,varx,vary
from dataset1 d1 full join dataset2 d2
on d1.id = d2.id and d1.visno = d2.visno;
quit;


/*equivalent to the code above, but less coding for you if the key variables have the same
names*/
proc sql;
select * from dataset1 natural full join dataset2;
quit;

HTH,
Marianne

On Sun, 7 Mar 2010 13:01:32 -0500, Arthur Tabachneck <art297(a)NETSCAPE.NET> wrote:

>Joelle,
>
>There is a time delay (sometimes a large time delay) between when posts
>are appear on the listserv and Google group's copy of the listserv. I
>can't recall on which I had placed my original post.
>
>However, your problem doesn't appear to be a many-to-many merge but,
>rather, a straight forward merge using both id and visno. e.g.,:
>
>data dataset1;
> input ID VISNO Varx $;
> cards;
>1 1 N
>1 2 N
>1 3 Y
>1 4 N
>1 5 N
>2 1 N
>2 2 Y
>2 3 N
>2 4 N
>2 5 N
>;
>
>data dataset2;
> input ID VISNO Vary $;
> cards;
>1 0 N
>1 3 N
>1 4 Y
>1 6 N
>1 8 N
>2 0 N
>2 4 Y
>2 6 N
>2 8 N
>2 10 N
>;
>
>proc sort data=dataset1; by ID VISNO;run;
>proc sort data=dataset2; by ID VISNO;run;
>
>data want;
> Merge dataset1 dataset2;
> By ID VISNO;
>run;
>
>HTH,
>Art
>-----Original Message-----
>From: Joelle Hallak <joelleahallak(a)gmail.com>
>To: Arthur Tabachneck <art297(a)netscape.net>
>Sent: Sun, Mar 7, 2010 11:52 am
>Subject: Re: Many-to-many merge
>
>Hi Arthur,
>for some reason I do not see your question posted on the discussion
>page.
>Anyway, i posted my answer to your question there
>
>I should have changed VarX to Var Y in dataset2. The two datasets do
> not have variables in common.
>
>I want the final dataset to look something like this
> ID � � VISNO � � Varx � � � Vary
> 1 � � � � 0 � � � � � � . � � � � � N
> 1 � � � � 1 � � � � � � N � � � � � .
> 1 � � � � 2 � � � � � � N � � � � � .
> 1 � � � � 3 � � � � � � Y � � � � N
> 1 � � � � 4 � � � � � � N � � � � Y
> 1 � � � � 5 � � � � � � N � � � � .
> 1 � � � � 6 � � � � � � . � � � � � N
> 1 � � � � 8 � � � � � � . � � � � � N
>
>Thanks,
>
>On Sun, Mar 7, 2010 at 10:39 AM, Arthur Tabachneck
>&lt;art297(a)netscape.net&gt; wrote:
>jiji,
>
>Given the sample data you provided, what do you want the resulting
>file to look like? � There are many possibilities and, for both ids,
>visno 4 has conflicting values for varx.
>
>Art
>-------------
>On Mar 7, 10:56� am, jiji &lt;joelleahal...(a)gmail.com&gt; wrote:
>&gt; Hi,
>&gt;
>&gt; I am having difficulty with a many-to-many merge between two
>&gt; datasets.
>&gt; The two datasets actually have 2 variables in common (ID and
>VISNO),
>&gt; but I want to merge on ID only since I would lose some data from
>the
>&gt; first dataset if I merge with ID and VISNO. Here are both my merge
>and
>&gt; procSQL codes:
>&gt;
>&gt; proc sort data=dataset1; by ID;
>&gt; proc sort data=dataset2; by ID;
>&gt; Data want;
>&gt; Merge
>&gt; dataset1 (IN=in1)
>&gt; dataset2;
>&gt; By ID;
>&gt; if in1;
>&gt; Run;
>&gt;
>&gt; OR
>&gt;
>&gt; PROC SQL;
>&gt; CREATE TABLE want AS
>&gt; SELECT *
>&gt; FROM dataset1,dataset2
>&gt; WHERE dataset1.ID=dataset2.ID
>&gt; ;
>&gt; QUIT;
>&gt;
>&gt; Here is a snapshot of how the data looks:
>&gt; dataset1:
>&gt;
>&gt; ID � � VISNO � � � Varx
>&gt; 1 � � � � 1 � � � � � � N
>&gt; 1 � � � � 2 � � � � � � N
>&gt; 1 � � � � 3 � � � � � � Y
>&gt; 1 � � � � 4 � � � � � � N
>&gt; 1 � � � � 5 � � � � � � N
>&gt; 2 � � � � 1 � � � � � � N
>&gt; 2 � � � � 2 � � � � � � Y
>&gt; 2 � � � � 3 � � � � � � N
>&gt; 2 � � � � 4 � � � � � � N
>&gt; 2 � � � � 5 � � � � � � N
>&gt;
>&gt; dataset2:
>&gt;
>&gt; ID � � VISNO � � � Varx
>&gt; 1 � � � � 0 � � � � � � N
>&gt; 1 � � � � 3 � � � � � � N
>&gt; 1 � � � � 4 � � � � � � Y
>&gt; 1 � � � � 6 � � � � � � N
>&gt; 1 � � � � 8 � � � � � � N
>&gt; 2 � � � � 0 � � � � � � N
>&gt; 2 � � � � 4 � � � � � � Y
>&gt; 2 � � � � 6 � � � � � � N
>&gt; 2 � � � � 8 � � � � � � N
>&gt; 2 � � � 10 � � � � � � N