about full join problem [SAS]

Prev: Windows client connecting to Linux server (SAS/Connect)
Next: Excel Workbook Setting Error Message

From: Chang Chung on 9 Dec 2009 12:12

On Tue, 8 Dec 2009 21:38:21 -0500, Keintz, H. Mark
<mkeintz(a)WHARTON.UPENN.EDU> wrote:

>Paul:
>
>Your DATA step and proc sql results match each other, but they don't
actually reproduce the merge specified by the OP, which was:
>
> data c;
> merge a b;
> by code;
> run;
>
>The feature to note is that data set B has three records with code=145, and
data set A has only two such records. The OP's merged dataset will have
three records for this code, as does yours. But, in the OP's program, the
last code-145 record of A (i.e. the 2nd record) is matched with all the
"extra" records in dataset B.
>
>As a result, variables in the last (i.e. 2nd) code-145 record from A will
have their values propagated to those "extra" code-145 records (i.e. the
3rd) generated by B. But your results will have missing values for the
variables from A in such extra records.
>
>(I wonder if the OP realizes this property of many-to-many merge in the
data step).
>
>Anyhow, I imagine it would be a very recherche' SQL program that would
reproduce that property, and I hereby pre-emptively lavish high praise on
the individual who can do it in a single CREATE TABLE statement, (and no
preparatory DATA steps either).

Hi, Mark,
Thanks for your pre-emptive lavishing high praise! Refined and tasteful, I
don't know about, though. Below ran on 9.2 TS1M0, W32_VSPRO platform.
Cheers,
Chang

/* test data */
data a;
input code manager$;
cards;
145 max
145 xam
155 paul
165 kobe
;
run;
data b;
input code assistant$;
cards;
145 jerry
145 tracy
145 wade
155 chen
170 yi
;
run;

/* assumes that a and b are sorted by code */
data c_merge;
merge a b;
by code;
run;

/* also assumes that the undocumented monotonic() works */
proc sql;
create table c_sql as
select coalesce(a3.code, b3.code) as code, a3.manager, b3.assistant
from
(select a2.id, a2.obs, max(a2.obs) as max, a2.code, a2.manager
from (
select monotonic() as id, min(calculated id) as min,
calculated id - calculated min as obs, *
from (select * from a
union
(select distinct(b.code) as code, ' ' as manager from b
except
select distinct(a.code) as code, ' ' as manager from a))
group by code) as a2
group by code) as a3,
(select b2.id, b2.obs, max(b2.obs) as max, b2.code, b2.assistant
from (
select monotonic() as id, min(calculated id) as min,
calculated id - calculated min as obs, *
from (select * from b
union
(select distinct(a.code) as code, ' ' as assistant from a
except
select distinct(b.code) as code, ' ' as assistant from b))
group by code) as b2
group by code) as b3
where (a3.code = b3.code) and (
(a3.obs = b3.obs) or
(a3.obs>b3.obs and b3.obs=b3.max) or
(a3.obs<b3.obs and a3.obs=a3.max))
order by a3.id, b3.id;
quit;

proc compare base=c_merge compare=c_sql;
run;
/* on log
NOTE: No unequal values were found. All values compared are exactly equal.
*/

From: "Keintz, H. Mark" on 9 Dec 2009 12:28

So, after all the email exchanges, Chang has provided an
answer to the OP's request, which was how to use PROC SQL
to reproduce the behavior of the SAS MERGE statement
when matching many-to-many.

ALL HAIL CHANG

We now know it can be done, (and, looking at the code below,
I know I never want to do it.)

Chang's solution once again demonstates the high level of
skill and sense of humor intrinsic to this listserve.

Regards,
Mark

> -----Original Message-----
> From: Chang Chung [mailto:chang_y_chung(a)HOTMAIL.COM]
> Sent: Wednesday, December 09, 2009 12:13 PM
> To: SAS-L(a)LISTSERV.UGA.EDU; Keintz, H. Mark
> Subject: Re: about full join problem
>
> On Tue, 8 Dec 2009 21:38:21 -0500, Keintz, H. Mark
> <mkeintz(a)WHARTON.UPENN.EDU> wrote:
>
> >Paul:
> >
> >Your DATA step and proc sql results match each other, but they don't
> actually reproduce the merge specified by the OP, which was:
> >
> > data c;
> > merge a b;
> > by code;
> > run;
> >
> >The feature to note is that data set B has three records with
> code=145, and
> data set A has only two such records. The OP's merged dataset will
> have
> three records for this code, as does yours. But, in the OP's program,
> the
> last code-145 record of A (i.e. the 2nd record) is matched with all the
> "extra" records in dataset B.
> >
> >As a result, variables in the last (i.e. 2nd) code-145 record from A
> will
> have their values propagated to those "extra" code-145 records (i.e.
> the
> 3rd) generated by B. But your results will have missing values for the
> variables from A in such extra records.
> >
> >(I wonder if the OP realizes this property of many-to-many merge in
> the
> data step).
> >
> >Anyhow, I imagine it would be a very recherche' SQL program that would
> reproduce that property, and I hereby pre-emptively lavish high praise
> on
> the individual who can do it in a single CREATE TABLE statement, (and
> no
> preparatory DATA steps either).
>
> Hi, Mark,
> Thanks for your pre-emptive lavishing high praise! Refined and
> tasteful, I
> don't know about, though. Below ran on 9.2 TS1M0, W32_VSPRO platform.
> Cheers,
> Chang
>
> /* test data */
> data a;
> input code manager$;
> cards;
> 145 max
> 145 xam
> 155 paul
> 165 kobe
> ;
> run;
> data b;
> input code assistant$;
> cards;
> 145 jerry
> 145 tracy
> 145 wade
> 155 chen
> 170 yi
> ;
> run;
>
> /* assumes that a and b are sorted by code */
> data c_merge;
> merge a b;
> by code;
> run;
>
> /* also assumes that the undocumented monotonic() works */
> proc sql;
> create table c_sql as
> select coalesce(a3.code, b3.code) as code, a3.manager, b3.assistant
> from
> (select a2.id, a2.obs, max(a2.obs) as max, a2.code, a2.manager
> from (
> select monotonic() as id, min(calculated id) as min,
> calculated id - calculated min as obs, *
> from (select * from a
> union
> (select distinct(b.code) as code, ' ' as manager from b
> except
> select distinct(a.code) as code, ' ' as manager from a))
> group by code) as a2
> group by code) as a3,
> (select b2.id, b2.obs, max(b2.obs) as max, b2.code, b2.assistant
> from (
> select monotonic() as id, min(calculated id) as min,
> calculated id - calculated min as obs, *
> from (select * from b
> union
> (select distinct(a.code) as code, ' ' as assistant from a
> except
> select distinct(b.code) as code, ' ' as assistant from b))
> group by code) as b2
> group by code) as b3
> where (a3.code = b3.code) and (
> (a3.obs = b3.obs) or
> (a3.obs>b3.obs and b3.obs=b3.max) or
> (a3.obs<b3.obs and a3.obs=a3.max))
> order by a3.id, b3.id;
> quit;
>
> proc compare base=c_merge compare=c_sql;
> run;
> /* on log
> NOTE: No unequal values were found. All values compared are exactly
> equal.
> */

From: Sigurd Hermansen on 9 Dec 2009 18:18

Mark:
I'll join the chorus. Even though I have gone to great lengths in the past to demonstrate SQL solutions to tricky database programming problems, I would not have had the patience to piece together this solution. I've admired Chang's programming skills for some time. This feat adds luster to his laurels.

I'll agree with your evaluation and will go beyond it to suggest a caveat: don't use Chang's solution for production work. Aside from the fact that it requires the undocumented and deprecated MONOTONIC() function that I once carelessly exposed on SAS-L, it implements a really bad idea; that is, a many-many MERGE of two datasets. Ian Whitlock posted some years ago the only coherent explanation of what the SAS MERGE does with a many-many join problem that I have seen. His explanation made sense, but I found the MERGE itself in this context both overly complicated and difficult to remember. As best I recall, a long thread on the topic in SAS-L concluded with a near consensus (including Ian) rule. Don't use the SAS MERGE for a many-many join.

I'll argue the case for that rule in more depth if anyone can give me sufficient reason for doing so. Though an interesting technical challenge, replicating the many-many MERGE result in a logical programming context (e.g. SQL) has as much value to database programmers as replicating the
predictions of Ptolemaic astronomy with Copernican astronomy: http://galileo.rice.edu/sci/theories/ptolemaic_system.html . Thanks, Chang, for demonstrated just how convoluted the solution becomes. Reminds me of something that I might have done some years ago if only I had had the patience and skill .... In any event, now that I have the solution on paper, I may be able to use your parts of your solution in a SAS SQL course.

This construct looks particularly interesting:

select * from a
union
(select distinct(b.key) as key, ' ' as d from b
except
select distinct(a.key) as key, ' ' as d from a)

It sets up a BY variable group for dataset that has the key intersection set b complement of a appended to distinct rows of a, with data values from b set to missing. (A longer list of non-key variables would require more assignments of missing values.) So in effect Chang is substituting a SAS BY group structure for the original many-many structure prior to a Cartesian product of two BY groups. Bravo!

How well does the two dataset solution generalize to multiple variable key or three or more dataset cases? Even the mighty Chang may pass on that challenge!
S

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Keintz, H. Mark
Sent: Wednesday, December 09, 2009 12:29 PM
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: Re: about full join problem

So, after all the email exchanges, Chang has provided an
answer to the OP's request, which was how to use PROC SQL
to reproduce the behavior of the SAS MERGE statement
when matching many-to-many.

ALL HAIL CHANG

We now know it can be done, (and, looking at the code below,
I know I never want to do it.)

Chang's solution once again demonstates the high level of
skill and sense of humor intrinsic to this listserve.

Regards,
Mark

> -----Original Message-----
> From: Chang Chung [mailto:chang_y_chung(a)HOTMAIL.COM]
> Sent: Wednesday, December 09, 2009 12:13 PM
> To: SAS-L(a)LISTSERV.UGA.EDU; Keintz, H. Mark
> Subject: Re: about full join problem
>
> On Tue, 8 Dec 2009 21:38:21 -0500, Keintz, H. Mark
> <mkeintz(a)WHARTON.UPENN.EDU> wrote:
>
> >Paul:
> >
> >Your DATA step and proc sql results match each other, but they don't
> actually reproduce the merge specified by the OP, which was:
> >
> > data c;
> > merge a b;
> > by code;
> > run;
> >
> >The feature to note is that data set B has three records with
> code=145, and
> data set A has only two such records. The OP's merged dataset will
> have
> three records for this code, as does yours. But, in the OP's program,
> the
> last code-145 record of A (i.e. the 2nd record) is matched with all the
> "extra" records in dataset B.
> >
> >As a result, variables in the last (i.e. 2nd) code-145 record from A
> will
> have their values propagated to those "extra" code-145 records (i.e.
> the
> 3rd) generated by B. But your results will have missing values for the
> variables from A in such extra records.
> >
> >(I wonder if the OP realizes this property of many-to-many merge in
> the
> data step).
> >
> >Anyhow, I imagine it would be a very recherche' SQL program that would
> reproduce that property, and I hereby pre-emptively lavish high praise
> on
> the individual who can do it in a single CREATE TABLE statement, (and
> no
> preparatory DATA steps either).
>
> Hi, Mark,
> Thanks for your pre-emptive lavishing high praise! Refined and
> tasteful, I
> don't know about, though. Below ran on 9.2 TS1M0, W32_VSPRO platform.
> Cheers,
> Chang
>
> /* test data */
> data a;
> input code manager$;
> cards;
> 145 max
> 145 xam
> 155 paul
> 165 kobe
> ;
> run;
> data b;
> input code assistant$;
> cards;
> 145 jerry
> 145 tracy
> 145 wade
> 155 chen
> 170 yi
> ;
> run;
>
> /* assumes that a and b are sorted by code */
> data c_merge;
> merge a b;
> by code;
> run;
>
> /* also assumes that the undocumented monotonic() works */
> proc sql;
> create table c_sql as
> select coalesce(a3.code, b3.code) as code, a3.manager, b3.assistant
> from
> (select a2.id, a2.obs, max(a2.obs) as max, a2.code, a2.manager
> from (
> select monotonic() as id, min(calculated id) as min,
> calculated id - calculated min as obs, *
> from (select * from a
> union
> (select distinct(b.code) as code, ' ' as manager from b
> except
> select distinct(a.code) as code, ' ' as manager from a))
> group by code) as a2
> group by code) as a3,
> (select b2.id, b2.obs, max(b2.obs) as max, b2.code, b2.assistant
> from (
> select monotonic() as id, min(calculated id) as min,
> calculated id - calculated min as obs, *
> from (select * from b
> union
> (select distinct(a.code) as code, ' ' as assistant from a
> except
> select distinct(b.code) as code, ' ' as assistant from b))
> group by code) as b2
> group by code) as b3
> where (a3.code = b3.code) and (
> (a3.obs = b3.obs) or
> (a3.obs>b3.obs and b3.obs=b3.max) or
> (a3.obs<b3.obs and a3.obs=a3.max))
> order by a3.id, b3.id;
> quit;
>
> proc compare base=c_merge compare=c_sql;
> run;
> /* on log
> NOTE: No unequal values were found. All values compared are exactly
> equal.
> */

From: siyuan li on 9 Dec 2009 23:57

On 12ÔÂ10ÈÕ, ÉÏÎç1Ê±12·Ö, chang_y_ch...(a)HOTMAIL.COM (Chang Chung) wrote:
> On Tue, 8 Dec 2009 21:38:21 -0500, Keintz, H. Mark
>
> <mkei...(a)WHARTON.UPENN.EDU> wrote:
> >Paul:
>
> >Your DATA step and proc sql results match each other, but they don't
>
> actually reproduce the merge specified by the OP, which was:
>
> > data c;
> > merge a b;
> > by code;
> > run;
>
> >The feature to note is that data set B has three records with code=145, and
>
> data set A has only two such records. The OP's merged dataset will have
> three records for this code, as does yours. But, in the OP's program, the
> last code-145 record of A (i.e. the 2nd record) is matched with all the
> "extra" records in dataset B.
>
> >As a result, variables in the last (i.e. 2nd) code-145 record from A will
>
> have their values propagated to those "extra" code-145 records (i.e. the
> 3rd) generated by B. But your results will have missing values for the
> variables from A in such extra records.
>
> >(I wonder if the OP realizes this property of many-to-many merge in the
> data step).
>
> >Anyhow, I imagine it would be a very recherche' SQL program that would
>
> reproduce that property, and I hereby pre-emptively lavish high praise on
> the individual who can do it in a single CREATE TABLE statement, (and no
> preparatory DATA steps either).
>
> Hi, Mark,
> Thanks for your pre-emptive lavishing high praise! Refined and tasteful, I
> don't know about, though. Below ran on 9.2 TS1M0, W32_VSPRO platform.
> Cheers,
> Chang
>
> /* test data */
> data a;
> input code manager$;
> cards;
> 145 max
> 145 xam
> 155 paul
> 165 kobe
> ;
> run;
> data b;
> input code assistant$;
> cards;
> 145 jerry
> 145 tracy
> 145 wade
> 155 chen
> 170 yi
> ;
> run;
>
> /* assumes that a and b are sorted by code */
> data c_merge;
> merge a b;
> by code;
> run;
>
> /* also assumes that the undocumented monotonic() works */
> proc sql;
> create table c_sql as
> select coalesce(a3.code, b3.code) as code, a3.manager, b3.assistant
> from
> (select a2.id, a2.obs, max(a2.obs) as max, a2.code, a2.manager
> from (
> select monotonic() as id, min(calculated id) as min,
> calculated id - calculated min as obs, *
> from (select * from a
> union
> (select distinct(b.code) as code, ' ' as manager from b
> except
> select distinct(a.code) as code, ' ' as manager from a))
> group by code) as a2
> group by code) as a3,
> (select b2.id, b2.obs, max(b2.obs) as max, b2.code, b2.assistant
> from (
> select monotonic() as id, min(calculated id) as min,
> calculated id - calculated min as obs, *
> from (select * from b
> union
> (select distinct(a.code) as code, ' ' as assistant from a
> except
> select distinct(b.code) as code, ' ' as assistant from b))
> group by code) as b2
> group by code) as b3
> where (a3.code = b3.code) and (
> (a3.obs = b3.obs) or
> (a3.obs>b3.obs and b3.obs=b3.max) or
> (a3.obs<b3.obs and a3.obs=a3.max))
> order by a3.id, b3.id;
> quit;
>
> proc compare base=c_merge compare=c_sql;
> run;
> /* on log
> NOTE: No unequal values were found. All values compared are exactly equal..
> */

thank you very much.thanks for helping me

From: giorgio taddia on 22 Dec 2009 12:47

good morning,
i ve got a environment SAS 9.13 on unix, i 'd like to create another ENV ,
it s enough to copy the folder /meta/env/SASmain... to
/meta/env2/SASmain... and then change all the config files like
metadataserver.sh sas9.cfg...ect?
regards

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Windows client connecting to Linux server (SAS/Connect)
Next: Excel Workbook Setting Error Message