From: Nancy on
Hello All,

I just found a problem when using the soundex function. It singed
different values for the same names in my data sets. I am wondering
whether there is somehing wrong with my opreation or somthing else.

Thank you,

Please see the examples:

Obs First_name IDF

1 TAMARI T5623
2 TAMARI T56
3 DEVIN D151
4 DEVIN D15
5 JULIO J42
6 JULIO J4
7 NGOC N221
8 NGOC N22
9 TAMARI T5623
10 TAMARI T562

From: Lou on
From the description of the function in the documentation, "TAMARI" should
encode as T56 - if you're getting anything else, it would appear that you
have a problem. But whether it's a problem with your installation or your
code, we can't tell. It might be helpful if you posted an example of your
code.


"Nancy" <nancy0318(a)gmail.com> wrote in message
news:c6d64752-83c8-4db6-972f-138850277d73(a)a18g2000yqc.googlegroups.com...
> Hello All,
>
> I just found a problem when using the soundex function. It singed
> different values for the same names in my data sets. I am wondering
> whether there is somehing wrong with my opreation or somthing else.
>
> Thank you,
>
> Please see the examples:
>
> Obs First_name IDF
>
> 1 TAMARI T5623
> 2 TAMARI T56
> 3 DEVIN D151
> 4 DEVIN D15
> 5 JULIO J42
> 6 JULIO J4
> 7 NGOC N221
> 8 NGOC N22
> 9 TAMARI T5623
> 10 TAMARI T562
>


From: Nancy on

I used the

IDF=soundex(first_name)

to get the soundex ID for the first name.

Is there anything wrong?

Thank you!




On Mar 11, 12:46 pm, "Lou" <lpog...(a)hotmail.com> wrote:
> From the description of the function in the documentation, "TAMARI" should
> encode as T56 - if you're getting anything else, it would appear that you
> have a problem.  But whether it's a problem with your installation or your
> code, we can't tell.  It might be helpful if you posted an example of your
> code.
>
> "Nancy" <nancy0...(a)gmail.com> wrote in message
>
> news:c6d64752-83c8-4db6-972f-138850277d73(a)a18g2000yqc.googlegroups.com...
>
>
>
> > Hello All,
>
> > I just found a problem when using the soundex function. It singed
> > different values for the same names in my data sets. I am wondering
> > whether there is somehing wrong with my opreation or somthing else.
>
> > Thank you,
>
> > Please see the examples:
>
> > Obs    First_name         IDF
>
> >  1    TAMARI             T5623
> >  2    TAMARI             T56
> >  3    DEVIN              D151
> >  4    DEVIN              D15
> >  5    JULIO              J42
> >  6    JULIO              J4
> >  7    NGOC               N221
> >  8    NGOC               N22
> >  9    TAMARI             T5623
> > 10    TAMARI             T562- Hide quoted text -
>
> - Show quoted text -

From: data _null_; on
On Mar 11, 12:58 pm, Nancy <nancy0...(a)gmail.com> wrote:
> I  used the
>
> IDF=soundex(first_name)
>
> to get the soundex ID for the first name.
>
> Is there anything wrong?
>
> Thank you!
>
> On Mar 11, 12:46 pm, "Lou" <lpog...(a)hotmail.com> wrote:
>
>
>
> > From the description of the function in the documentation, "TAMARI" should
> > encode as T56 - if you're getting anything else, it would appear that you
> > have a problem.  But whether it's a problem with your installation or your
> > code, we can't tell.  It might be helpful if you posted an example of your
> > code.
>
> > "Nancy" <nancy0...(a)gmail.com> wrote in message
>
> >news:c6d64752-83c8-4db6-972f-138850277d73(a)a18g2000yqc.googlegroups.com....
>
> > > Hello All,
>
> > > I just found a problem when using the soundex function. It singed
> > > different values for the same names in my data sets. I am wondering
> > > whether there is somehing wrong with my opreation or somthing else.
>
> > > Thank you,
>
> > > Please see the examples:
>
> > > Obs    First_name         IDF
>
> > >  1    TAMARI             T5623
> > >  2    TAMARI             T56
> > >  3    DEVIN              D151
> > >  4    DEVIN              D15
> > >  5    JULIO              J42
> > >  6    JULIO              J4
> > >  7    NGOC               N221
> > >  8    NGOC               N22
> > >  9    TAMARI             T5623
> > > 10    TAMARI             T562- Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

I ran the data you posted and got the same soundex values for each
pair. So I don't think the problem is SOUNDEX. But what could it
be? Different soundex values imply that some of the words were longer
but you see the names as being equal. I can think of one way that
could happen, I'm sure others can think of otherways. Perhaps the
NAMES are formatted with a format that does not display the entire
value. As in this example.


data test;
input First_name $16. IDF $;
s = soundex(first_name);
s2 = soundex(scan(first_name,1,' '));
format First_name $6.;
Name = First_name;
cards;
TAMARI J T5623
TAMARI T56
DEVIN S D151
DEVIN D15
JULIO H J42
JULIO J4
NGOC P N221
NGOC N22
TAMARI T5623
TAMARI T562
;;;;
run;
proc print;
run;

From: Nancy on
Yes, that is the reason.
I checked the code again.
And found that I used the soudex function before I seperated the first
name and middle for some names.

Thank you very much!

Xiaohong



On Mar 11, 2:16 pm, "data _null_;" <datan...(a)gmail.com> wrote:
> On Mar 11, 12:58 pm, Nancy <nancy0...(a)gmail.com> wrote:
>
>
>
>
>
> > I  used the
>
> > IDF=soundex(first_name)
>
> > to get the soundex ID for the first name.
>
> > Is there anything wrong?
>
> > Thank you!
>
> > On Mar 11, 12:46 pm, "Lou" <lpog...(a)hotmail.com> wrote:
>
> > > From the description of the function in the documentation, "TAMARI" should
> > > encode as T56 - if you're getting anything else, it would appear that you
> > > have a problem.  But whether it's a problem with your installation or your
> > > code, we can't tell.  It might be helpful if you posted an example of your
> > > code.
>
> > > "Nancy" <nancy0...(a)gmail.com> wrote in message
>
> > >news:c6d64752-83c8-4db6-972f-138850277d73(a)a18g2000yqc.googlegroups.com....
>
> > > > Hello All,
>
> > > > I just found a problem when using the soundex function. It singed
> > > > different values for the same names in my data sets. I am wondering
> > > > whether there is somehing wrong with my opreation or somthing else.
>
> > > > Thank you,
>
> > > > Please see the examples:
>
> > > > Obs    First_name         IDF
>
> > > >  1    TAMARI             T5623
> > > >  2    TAMARI             T56
> > > >  3    DEVIN              D151
> > > >  4    DEVIN              D15
> > > >  5    JULIO              J42
> > > >  6    JULIO              J4
> > > >  7    NGOC               N221
> > > >  8    NGOC               N22
> > > >  9    TAMARI             T5623
> > > > 10    TAMARI             T562- Hide quoted text -
>
> > > - Show quoted text -- Hide quoted text -
>
> > - Show quoted text -
>
> I ran the data you posted and got the same soundex values for each
> pair.  So I don't think the problem is SOUNDEX.  But what could it
> be?  Different soundex values imply that some of the words were longer
> but you see the names as being equal.  I can think of one way that
> could happen, I'm sure others can think of otherways.  Perhaps the
> NAMES are formatted with a format that does not display the entire
> value.  As in this example.
>
> data test;
>    input First_name $16.  IDF $;
>    s = soundex(first_name);
>    s2 = soundex(scan(first_name,1,' '));
>    format First_name $6.;
>    Name = First_name;
>    cards;
> TAMARI J          T5623
> TAMARI            T56
> DEVIN  S          D151
> DEVIN             D15
> JULIO  H          J42
> JULIO             J4
> NGOC   P          N221
> NGOC              N22
> TAMARI            T5623
> TAMARI            T562
> ;;;;
>    run;
> proc print;
>    run;- Hide quoted text -
>
> - Show quoted text -