From: Regi on
Hi,

Recently I was trying to merge 2 datasets as below. My objective to
get the values of 'pop' by 'num' in the dataset 'sur'. This works, but
zip 11 is being replaced by 1.

Can somebody explain why it so?? I thought using 'if a;' will ignore
'zip' in the second dataset.

Thanks, Regi

DATA sur;
INPUT zip num sales;
DATALINES;
11 1453 26425.94
16 1453 6161.03
17 1453 24813.87
18 1453 9084.55
19 1453 24462.37
11 1454 55020.42
14 1454 26304.23
16 1454 4248.89
17 1454 23762.24
18 1454 10328.93
19 1454 19316.61
;

DATA pop;
INPUT zip num pop;
DATALINES;
1 1453 2
1 1454 4
;

data surpop;
merge sur (in=a) pop;
by num; if a;
run;

proc print data=surpop;

Obs zip num sales pop

1 1 1453 26425.94 2
2 16 1453 6161.03 2
3 17 1453 24813.87 2
4 18 1453 9084.55 2
5 19 1453 24462.37 2
6 1 1454 55020.42 4
7 14 1454 26304.23 4
8 16 1454 4248.89 4
9 17 1454 23762.24 4
10 18 1454 10328.93 4
11 19 1454 19316.61 4
From: Arthur Tabachneck on
Regi,

Try reversing the order in which you are merging the two files.
i.e.,:

data surpop;
merge pop sur (in=a);
by num; if a;
run;

HTH,
Art
-------------
On Jul 5, 7:00 am, Regi <reg...(a)gmail.com> wrote:
> Hi,
>
> Recently I was trying to merge 2 datasets as below. My objective to
> get the values of 'pop' by 'num' in the dataset 'sur'. This works, but
> zip 11 is being replaced by 1.
>
> Can somebody explain why it so?? I thought using 'if a;' will ignore
> 'zip' in the second dataset.
>
> Thanks, Regi
>
> DATA sur;
>    INPUT zip num sales;
>    DATALINES;
> 11   1453   26425.94
> 16   1453    6161.03
> 17   1453   24813.87
> 18   1453    9084.55
> 19   1453   24462.37
> 11   1454   55020.42
> 14   1454   26304.23
> 16   1454    4248.89
> 17   1454   23762.24
> 18   1454   10328.93
> 19   1454   19316.61
> ;
>
> DATA pop;
>    INPUT zip num pop;
>    DATALINES;
> 1   1453  2
> 1   1454  4
> ;
>
> data surpop;
>         merge sur (in=a) pop;
>         by num; if a;
> run;
>
> proc print data=surpop;
>
> Obs    zip     num      sales     pop
>
>   1      1    1453    26425.94     2
>   2     16    1453     6161.03     2
>   3     17    1453    24813.87     2
>   4     18    1453     9084.55     2
>   5     19    1453    24462.37     2
>   6      1    1454    55020.42     4
>   7     14    1454    26304.23     4
>   8     16    1454     4248.89     4
>   9     17    1454    23762.24     4
>  10     18    1454    10328.93     4
>  11     19    1454    19316.61     4

From: kvasikonkav on
It looks like the variable "zip" in data "pop" is not needed. If this
is true, just drop the "zip" variable in the "pop" data, and try
merging.

hth
From: Regi on
Thanks you very much for your responses. As I mentioned, I was not
looking for a working solution.

But CAN SOMEBODY EXPLAIN WHY ZIP 11 IS REPLACED BY 1?? I like to
understand the logic so that I will not repeat this in future.

Thanks, Regi
From: Arthur Tabachneck on
Regi,

The explanation you are looking for was provided by Ian and shown in
my last response. He wrote:

>>> Both POP and SUR have the same variable ZIP, but ZIP is not
>>> in the by group. Only one of the zip variables can win. In
>>> SAS the first data set with ZIP determines the characteristics
>>> and the last determines the value. Hence 1 beats 11. (If
>>> there were two records with ZIP=11 for the same NUM, the 11
>>> would trump 1 on the second record.)

>>> Rule 1: Never have a common variable name that is not in the
>>> by group.

>>> Cor: If you have the same variable name in more than one data
>>> set, then either MERGE is the wrong tool, or you should do
>>> something to avoid the collision, i.e rename or drop.

>>> The proper answer is

>>>> data surpop;
>>> merge sur (in=a) pop ( drop = zip );
>>>> by num; if a;
>>>> run;

>>> Cor: The order of data sets in a properly constructed merge
>>> should never matter.

>>> If it does it is invitation to problems and maintenance nightmares.

>>> Ian Whitlock

Art
-------------
On Jul 6, 1:37 am, Regi <reg...(a)gmail.com> wrote:
> Thanks you very much for your responses. As I mentioned, I was not
> looking for a working solution.
>
> But CAN SOMEBODY EXPLAIN WHY ZIP 11 IS REPLACED BY 1??  I like to
> understand the logic so that I will not repeat this in future.
>
> Thanks, Regi