From: Sierra Information Services on
I am glad a solution/understanding to this problem has already been
identified, but I wanted to point out that the SOUNDEX Function, and
the underlying SOUNDEX algorithim it implements, is a failry weak way
of trying to meausre similarity/dissimilarity between two text
strings. There are a lot of limitations to it, especially when using
it on family names.

You may want to explore use of the SPEDIS (spelling distance) ,
COMPGED (compute generalize distance) and COMPLEV (compute Levenshtein
edit distance) functions as more powerful tools for your project.
SPEDIS was added in V8 and the other two were added in SAS 9,0. You
can also use the CALL COMPCOST routine in SAS 9 to assign your own
"penalty costs" if you don't like the ones that are implemented by
default in the COMPGED function.

There are examples of how to use SOUNDEX, SPEDIS, COMPGED and COMPLEV
in the PDF of my paper "Becoming More FUNCTIONal in SAS 9 Software,"
available for free download at http://www.sierrainformation.com. From
the home page click on "Free Downloads" and take things from there.

Hope this helps

Andrew Karp
Sierra Information Services
http://www.sierrainformation.com



On Mar 11, 11:42�am, Nancy <nancy0...(a)gmail.com> wrote:
> Yes, that is the reason.
> I checked the code again.
> And found that I used the soudex function before I seperated the first
> name and middle for some names.
>
> Thank you very much!
>
> Xiaohong
>
> On Mar 11, 2:16�pm, "data _null_;" <datan...(a)gmail.com> wrote:
>
>
>
> > On Mar 11, 12:58�pm, Nancy <nancy0...(a)gmail.com> wrote:
>
> > > I �used the
>
> > > IDF=soundex(first_name)
>
> > > to get the soundex ID for the first name.
>
> > > Is there anything wrong?
>
> > > Thank you!
>
> > > On Mar 11, 12:46�pm, "Lou" <lpog...(a)hotmail.com> wrote:
>
> > > > From the description of the function in the documentation, "TAMARI" should
> > > > encode as T56 - if you're getting anything else, it would appear that you
> > > > have a problem. �But whether it's a problem with your installation or your
> > > > code, we can't tell. �It might be helpful if you posted an example of your
> > > > code.
>
> > > > "Nancy" <nancy0...(a)gmail.com> wrote in message
>
> > > >news:c6d64752-83c8-4db6-972f-138850277d73(a)a18g2000yqc.googlegroups.com...
>
> > > > > Hello All,
>
> > > > > I just found a problem when using the soundex function. It singed
> > > > > different values for the same names in my data sets. I am wondering
> > > > > whether there is somehing wrong with my opreation or somthing else.
>
> > > > > Thank you,
>
> > > > > Please see the examples:
>
> > > > > Obs � �First_name � � � � IDF
>
> > > > > �1 � �TAMARI � � � � � � T5623
> > > > > �2 � �TAMARI � � � � � � T56
> > > > > �3 � �DEVIN � � � � � � �D151
> > > > > �4 � �DEVIN � � � � � � �D15
> > > > > �5 � �JULIO � � � � � � �J42
> > > > > �6 � �JULIO � � � � � � �J4
> > > > > �7 � �NGOC � � � � � � � N221
> > > > > �8 � �NGOC � � � � � � � N22
> > > > > �9 � �TAMARI � � � � � � T5623
> > > > > 10 � �TAMARI � � � � � � T562- Hide quoted text -
>
> > > > - Show quoted text -- Hide quoted text -
>
> > > - Show quoted text -
>
> > I ran the data you posted and got the same soundex values for each
> > pair. �So I don't think the problem is SOUNDEX. �But what could it
> > be? �Different soundex values imply that some of the words were longer
> > but you see the names as being equal. �I can think of one way that
> > could happen, I'm sure others can think of otherways. �Perhaps the
> > NAMES are formatted with a format that does not display the entire
> > value. �As in this example.
>
> > data test;
> > � �input First_name $16. �IDF $;
> > � �s = soundex(first_name);
> > � �s2 = soundex(scan(first_name,1,' '));
> > � �format First_name $6.;
> > � �Name = First_name;
> > � �cards;
> > TAMARI J � � � � �T5623
> > TAMARI � � � � � �T56
> > DEVIN �S � � � � �D151
> > DEVIN � � � � � � D15
> > JULIO �H � � � � �J42
> > JULIO � � � � � � J4
> > NGOC � P � � � � �N221
> > NGOC � � � � � � �N22
> > TAMARI � � � � � �T5623
> > TAMARI � � � � � �T562
> > ;;;;
> > � �run;
> > proc print;
> > � �run;- Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -