about full join problem [SAS]

Prev: Windows client connecting to Linux server (SAS/Connect)
Next: Excel Workbook Setting Error Message

From: "Choate, Paul on 8 Dec 2009 17:14

That wasn't cheating, it was creativity. Lateral thinking is a key component of good programming. ;p

Paul Choate
DDS Data Extraction
(916) 654-2160

-----Original Message-----
From: Toby Dunn [mailto:tobydunn(a)HOTMAIL.COM]
Sent: Tuesday, December 08, 2009 1:35 PM
To: SAS-L(a)LISTSERV.UGA.EDU; Choate, Paul(a)DDS
Subject: Re: about full join problem

Paul,

You cheated in so much as you also used the Unique key on the data step
merge. If you try to use your data in SQL and mimic the posters original
data step your SQL code wont work.

On Tue, 8 Dec 2009 13:24:59 -0800, Choate, Paul(a)DDS
<Paul.Choate(a)DDS.CA.GOV> wrote:

>As Toby said - SQL and the data step work differently - and while it's
not to hard to make a data step mimic a SQL join the reverse isn't always
so easy.
>
>The key is the uniqueness of the rows - so the most straightforward
solution is to make your rows unique...
>
>data a;
>input code manager $;
>cards;
>145 max
>145 xam
>155 paul
>165 kobe
>;
>
>data b;
>input code assistant $;
>cards;
>145 jerry
>145 tracy
>145 wade
>155 chen
>170 yi
>;
>
>data a;
> set a;
> by code;
> if first.code then n=0;
> n+1;
>run;
>
>data b;
> set b;
> by code;
> if first.code then n=0;
> n+1;
>run;
>
>data C_DATASTEP(drop=n);
> merge a b;
> by code n;
>run;
>
>proc sql;
> create TABLE C_SQL as
> select COALESCE(a.code,b.code) as code,
> a.manager, b.assistant
> from a full join b
> on a.code=b.code and a.n=b.n;
>quit;
>
>
>
>Paul Choate
>DDS Data Extraction
>(916) 654-2160
>
>-----Original Message-----
>From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
siyuan li
>Sent: Tuesday, December 08, 2009 4:44 AM
>To: SAS-L(a)LISTSERV.UGA.EDU
>Subject: about full join problem
>
>hi, I am sorry first for that I ask the same problem.
>here has 2 datasets:
> data a;
>input code manager$;
>cards;
>145 max
>145 xam
>155 paul
>165 kobe
>;
>run;
>data b;
>input code assistant$;
>cards;
>145 jerry
>145 tracy
>145 wade
>155 chen
>170 yi
>;
>run;
>data c;
>merge a b;
>by code;
>run;
>the question is that I want to yield the same output to use proc sql
>full join procedure, I try to use the procudure below,but it did not
>yield the same output,can anyone help me?
> proc sql;
>select a.code,a.manager,b.assistant from a full join b
>on a.code=b.code;
>quit;
>
>thanks in advance

From: "Keintz, H. Mark" on 8 Dec 2009 21:38

Paul:

Your DATA step and proc sql results match each other, but they don't actually reproduce the merge specified by the OP, which was:

data c;
merge a b;
by code;
run;

The feature to note is that data set B has three records with code=145, and data set A has only two such records. The OP's merged dataset will have three records for this code, as does yours. But, in the OP's program, the last code-145 record of A (i.e. the 2nd record) is matched with all the "extra" records in dataset B.

As a result, variables in the last (i.e. 2nd) code-145 record from A will have their values propagated to those "extra" code-145 records (i.e. the 3rd) generated by B. But your results will have missing values for the variables from A in such extra records.

(I wonder if the OP realizes this property of many-to-many merge in the data step).

Anyhow, I imagine it would be a very recherche' SQL program that would reproduce that property, and I hereby pre-emptively lavish high praise on the individual who can do it in a single CREATE TABLE statement, (and no preparatory DATA steps either).

Regards,
Mark

> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
> Choate, Paul(a)DDS
> Sent: Tuesday, December 08, 2009 5:15 PM
> To: SAS-L(a)LISTSERV.UGA.EDU
> Subject: Re: about full join problem
>
> That wasn't cheating, it was creativity. Lateral thinking is a key
> component of good programming. ;p
>
> Paul Choate
> DDS Data Extraction
> (916) 654-2160
>
> -----Original Message-----
> From: Toby Dunn [mailto:tobydunn(a)HOTMAIL.COM]
> Sent: Tuesday, December 08, 2009 1:35 PM
> To: SAS-L(a)LISTSERV.UGA.EDU; Choate, Paul(a)DDS
> Subject: Re: about full join problem
>
> Paul,
>
> You cheated in so much as you also used the Unique key on the data step
> merge. If you try to use your data in SQL and mimic the posters
> original
> data step your SQL code wont work.
>
> On Tue, 8 Dec 2009 13:24:59 -0800, Choate, Paul(a)DDS
> <Paul.Choate(a)DDS.CA.GOV> wrote:
>
> >As Toby said - SQL and the data step work differently - and while it's
> not to hard to make a data step mimic a SQL join the reverse isn't
> always
> so easy.
> >
> >The key is the uniqueness of the rows - so the most straightforward
> solution is to make your rows unique...
> >
> >data a;
> >input code manager $;
> >cards;
> >145 max
> >145 xam
> >155 paul
> >165 kobe
> >;
> >
> >data b;
> >input code assistant $;
> >cards;
> >145 jerry
> >145 tracy
> >145 wade
> >155 chen
> >170 yi
> >;
> >
> >data a;
> > set a;
> > by code;
> > if first.code then n=0;
> > n+1;
> >run;
> >
> >data b;
> > set b;
> > by code;
> > if first.code then n=0;
> > n+1;
> >run;
> >
> >data C_DATASTEP(drop=n);
> > merge a b;
> > by code n;
> >run;
> >
> >proc sql;
> > create TABLE C_SQL as
> > select COALESCE(a.code,b.code) as code,
> > a.manager, b.assistant
> > from a full join b
> > on a.code=b.code and a.n=b.n;
> >quit;
> >
> >
> >
> >Paul Choate
> >DDS Data Extraction
> >(916) 654-2160
> >
> >-----Original Message-----
> >From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
> siyuan li
> >Sent: Tuesday, December 08, 2009 4:44 AM
> >To: SAS-L(a)LISTSERV.UGA.EDU
> >Subject: about full join problem
> >
> >hi, I am sorry first for that I ask the same problem.
> >here has 2 datasets:
> > data a;
> >input code manager$;
> >cards;
> >145 max
> >145 xam
> >155 paul
> >165 kobe
> >;
> >run;
> >data b;
> >input code assistant$;
> >cards;
> >145 jerry
> >145 tracy
> >145 wade
> >155 chen
> >170 yi
> >;
> >run;
> >data c;
> >merge a b;
> >by code;
> >run;
> >the question is that I want to yield the same output to use proc sql
> >full join procedure, I try to use the procudure below,but it did not
> >yield the same output,can anyone help me?
> > proc sql;
> >select a.code,a.manager,b.assistant from a full join b
> >on a.code=b.code;
> >quit;
> >
> >thanks in advance

From: Craig Johnson on 8 Dec 2009 22:33

I'll heap some praise on someone if they can figure this out because I'm
convinced it's impossible. Frankly, I'm confused on how one knows if the
original data step merge is correct. If you look at dataset A you have code
145 for max and xam. In dataset B you have 145 for jerry, tracy, and wade.
After the merge they end up with max with jerry and xam with tracy and
wade. How is max only being grouped with jerry while xam is grouped with
two? IE how does SAS know that xam has two and max only has one? IMO there
is no way for SAS how to know what to merge what manager together with what
assistant. Here is an example, if you add another case in for code 145 in
dataset B (145 bob in line one of the cards) it is grouped with Max and
Jerry is now grouped with xam. To me that indicates that SAS is merging the
first 145 in A with the first 145 in B and then merging the rest of the 145s
in B with the second 145 in A. In other words, the example was a fluke and
the merge that is wanted can't be done because there is no PK/FK
relationship between manager and assistant. Instead a secondary FK (code)
is being improperly used to try and do a full join on a many-to-many
relationship. In which case, the Cartesian product is technically correct.

From: siyuan li on 9 Dec 2009 08:24

On 12æ9æ¥, ä¸å5æ¶24å, Paul.Cho....(a)DDS.CA.GOV ("Choate, Paul(a)DDS") wrote:
> As Toby said - SQL and the data step work differently - and while it's not to hard to make a data step mimic a SQL join the reverse isn't always so easy.
>
> The key is the uniqueness of the rows - so the most straightforward solution is to make your rows unique...
>
> data a;
> input code manager $;
> cards;
> 145 max
> 145 xam
> 155 paul
> 165 kobe
> ;
>
> data b;
> input code assistant $;
> cards;
> 145 jerry
> 145 tracy
> 145 wade
> 155 chen
> 170 yi
> ;
>
> data a;
> Â Â Â set a;
> Â Â Â by code;
> Â Â Â if first.code then n=0;
> Â Â Â n+1;
> run;
>
> data b;
> Â Â Â set b;
> Â Â Â by code;
> Â Â Â if first.code then n=0;
> Â Â Â n+1;
> run;
>
> data C_DATASTEP(drop=n);
> Â Â Â merge a b;
> Â Â Â by code n;
> run;
>
> proc sql;
> Â create TABLE C_SQL as
> Â Â Â select COALESCE(a.code,b.code) as code,
> Â Â Â Â Â Â a.manager, b.assistant
> Â Â Â from a full join b
> Â Â Â on a.code=b.code and a.n=b.n;
> quit;
>
> Paul Choate
> DDS Data Extraction
> (916) 654-2160
>
>
>
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of siyuan li
> Sent: Tuesday, December 08, 2009 4:44 AM
> To: SA...(a)LISTSERV.UGA.EDU
> Subject: about full join problem
>
> hi, I am sorry first for that I ask the same problem.
> here has 2 datasets:
> Â data a;
> input code manager$;
> cards;
> 145 max
> 145 xam
> 155 paul
> 165 kobe
> ;
> run;
> data b;
> input code assistant$;
> cards;
> 145 jerry
> 145 tracy
> 145 wade
> 155 chen
> 170 yi
> ;
> run;
> data c;
> merge a b;
> by code;
> run;
> the question is that I want to yield the same output to use proc sql
> full join procedure, I try to use the procudure below,but it did not
> yield the same output,can anyone help me?
> Â proc sql;
> select a.code,a.manager,b.assistant from a full join b
> on a.code=b.code;
> quit;
>
> thanks in advance- éèè¢«å¼ç¨æå -
>
> - æ¾ç¤ºå¼ç¨çæå -

hi,paul, this n variables are then generated by your own, the original
data set has no n variable, so the significance of such a merge merger
is very slim. In addition, you such a situation, than those without
when the merge should be simple n a lot of programs from your sql
view, you can illustrate this point

From: "Keintz, H. Mark" on 9 Dec 2009 12:01

Proper is in the eyes of the beholder. And I agree that the behavior of me=
rge In this
case may not be what the OP wanted.

But I can imagine cases where the behavior is appropriate. Maybe you have =
two
Datasets sorted by ID and DATE, and both datasets have the same first (say)=
4
dates for a given ID. But dataset B can have extra records, and you want =
the most
recent record from A to be matched with those records.

I'm not saying this is why SAS made merge work the way it does, but it does=
have
its uses.

From: Craig Johnson [mailto:cjohns38(a)gmail.com]
Sent: Wednesday, December 09, 2009 11:49 AM
To: Keintz, H. Mark
Cc: SAS-L(a)LISTSERV.UGA.EDU
Subject: Re: Do we need a NORETAIN statement? was: about full join problem

That's true but it still doesn't answer my original question, how do we kno=
w that this is properly merging the manager with the appropriate assistant?=
Assuming that this is a subset of a dataset it seems unlikely that every =
time there is more than one manager that the last manager should be matched=
with all the remaining cases in the second data set. My gut tells me that=
this type of merge isn't correct but without knowing how they are suppose=
to properly merge (Manager to Assistant) I can't prove it.

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Windows client connecting to Linux server (SAS/Connect)
Next: Excel Workbook Setting Error Message