From: Regi on 5 Jul 2010 07:00 Hi, Recently I was trying to merge 2 datasets as below. My objective to get the values of 'pop' by 'num' in the dataset 'sur'. This works, but zip 11 is being replaced by 1. Can somebody explain why it so?? I thought using 'if a;' will ignore 'zip' in the second dataset. Thanks, Regi DATA sur; INPUT zip num sales; DATALINES; 11 1453 26425.94 16 1453 6161.03 17 1453 24813.87 18 1453 9084.55 19 1453 24462.37 11 1454 55020.42 14 1454 26304.23 16 1454 4248.89 17 1454 23762.24 18 1454 10328.93 19 1454 19316.61 ; DATA pop; INPUT zip num pop; DATALINES; 1 1453 2 1 1454 4 ; data surpop; merge sur (in=a) pop; by num; if a; run; proc print data=surpop; Obs zip num sales pop 1 1 1453 26425.94 2 2 16 1453 6161.03 2 3 17 1453 24813.87 2 4 18 1453 9084.55 2 5 19 1453 24462.37 2 6 1 1454 55020.42 4 7 14 1454 26304.23 4 8 16 1454 4248.89 4 9 17 1454 23762.24 4 10 18 1454 10328.93 4 11 19 1454 19316.61 4
From: Arthur Tabachneck on 5 Jul 2010 08:04 Regi, Try reversing the order in which you are merging the two files. i.e.,: data surpop; merge pop sur (in=a); by num; if a; run; HTH, Art ------------- On Jul 5, 7:00 am, Regi <reg...(a)gmail.com> wrote: > Hi, > > Recently I was trying to merge 2 datasets as below. My objective to > get the values of 'pop' by 'num' in the dataset 'sur'. This works, but > zip 11 is being replaced by 1. > > Can somebody explain why it so?? I thought using 'if a;' will ignore > 'zip' in the second dataset. > > Thanks, Regi > > DATA sur; > INPUT zip num sales; > DATALINES; > 11 1453 26425.94 > 16 1453 6161.03 > 17 1453 24813.87 > 18 1453 9084.55 > 19 1453 24462.37 > 11 1454 55020.42 > 14 1454 26304.23 > 16 1454 4248.89 > 17 1454 23762.24 > 18 1454 10328.93 > 19 1454 19316.61 > ; > > DATA pop; > INPUT zip num pop; > DATALINES; > 1 1453 2 > 1 1454 4 > ; > > data surpop; > merge sur (in=a) pop; > by num; if a; > run; > > proc print data=surpop; > > Obs zip num sales pop > > 1 1 1453 26425.94 2 > 2 16 1453 6161.03 2 > 3 17 1453 24813.87 2 > 4 18 1453 9084.55 2 > 5 19 1453 24462.37 2 > 6 1 1454 55020.42 4 > 7 14 1454 26304.23 4 > 8 16 1454 4248.89 4 > 9 17 1454 23762.24 4 > 10 18 1454 10328.93 4 > 11 19 1454 19316.61 4
From: kvasikonkav on 5 Jul 2010 17:06 It looks like the variable "zip" in data "pop" is not needed. If this is true, just drop the "zip" variable in the "pop" data, and try merging. hth
From: Regi on 6 Jul 2010 01:37 Thanks you very much for your responses. As I mentioned, I was not looking for a working solution. But CAN SOMEBODY EXPLAIN WHY ZIP 11 IS REPLACED BY 1?? I like to understand the logic so that I will not repeat this in future. Thanks, Regi
From: Arthur Tabachneck on 6 Jul 2010 07:38
Regi, The explanation you are looking for was provided by Ian and shown in my last response. He wrote: >>> Both POP and SUR have the same variable ZIP, but ZIP is not >>> in the by group. Only one of the zip variables can win. In >>> SAS the first data set with ZIP determines the characteristics >>> and the last determines the value. Hence 1 beats 11. (If >>> there were two records with ZIP=11 for the same NUM, the 11 >>> would trump 1 on the second record.) >>> Rule 1: Never have a common variable name that is not in the >>> by group. >>> Cor: If you have the same variable name in more than one data >>> set, then either MERGE is the wrong tool, or you should do >>> something to avoid the collision, i.e rename or drop. >>> The proper answer is >>>> data surpop; >>> merge sur (in=a) pop ( drop = zip ); >>>> by num; if a; >>>> run; >>> Cor: The order of data sets in a properly constructed merge >>> should never matter. >>> If it does it is invitation to problems and maintenance nightmares. >>> Ian Whitlock Art ------------- On Jul 6, 1:37 am, Regi <reg...(a)gmail.com> wrote: > Thanks you very much for your responses. As I mentioned, I was not > looking for a working solution. > > But CAN SOMEBODY EXPLAIN WHY ZIP 11 IS REPLACED BY 1?? I like to > understand the logic so that I will not repeat this in future. > > Thanks, Regi |