Scan Function [SAS]

Prev: survival
Next: Standard deviation of elements of a vector in proc iml?

From: Tan on 13 Apr 2010 16:09

I have a quick question. I am hoping some has an answer to help me,
since i can't seem to find an answer in any textbooks or website.

So basically, I have a dataset with a variable like this:

Andover town, Tolland County
North Canaan town, Litchfield County,

So what I had in mind was:

Town1= scan(Town,1);

But this would only grab the first word, some towns have two words
(like North Canaan above). But if i set it to 2, then towns with 1
words would be "Andover town". Is there a way to specify the scan
function to grab whatever is before the word town? and then I can use
the trim function to delete the space between the town name and town?

Thanks

From: Reeza on 13 Apr 2010 22:59

On Apr 13, 1:09 pm, Tan <tan.p.p...(a)gmail.com> wrote:
> I have a quick question. I am hoping some has an answer to help me,
> since i can't seem to find an answer in any textbooks or website.
>
> So basically, I have a dataset with a variable like this:
>
> Andover town, Tolland County
> North Canaan town, Litchfield County,
>
> So what I had in mind was:
>
> Town1= scan(Town,1);
>
> But this would only grab the first word, some towns have two words
> (like North Canaan above). But if i set it to 2, then towns with 1
> words would be "Andover town". Is there a way to specify the scan
> function to grab whatever is before the word town? and then I can use
> the trim function to delete the space between the town name and town?
>
> Thanks

It looks like its comma delimited try scan(Town,1,",").

If its not comma delimited search for the first instance of town and
then substring based on that.

substr(town, 1, find(town, "town", ,1)+4). Keep an eye on the case.

I'm sure you can do it in one step using expressions but I don't know
how to do that :)

HTH,
Reeza

From: Barry Schwarz on 14 Apr 2010 00:08

Use INDEXW to find the first occurrence of "town" in variable town.
The substring you are interested runs from position 1 to the return
value - 2. If the return value is 0, use INDEX to find the comma and
then extract the substring.

On Tue, 13 Apr 2010 13:09:22 -0700 (PDT), Tan <tan.p.pham(a)gmail.com>
wrote:

>I have a quick question. I am hoping some has an answer to help me,
>since i can't seem to find an answer in any textbooks or website.
>
>So basically, I have a dataset with a variable like this:
>
>Andover town, Tolland County
>North Canaan town, Litchfield County,
>
>So what I had in mind was:
>
>Town1= scan(Town,1);
>
>But this would only grab the first word, some towns have two words
>(like North Canaan above). But if i set it to 2, then towns with 1
>words would be "Andover town". Is there a way to specify the scan
>function to grab whatever is before the word town? and then I can use
>the trim function to delete the space between the town name and town?
>
>Thanks

--
Remove del for email

From: Patrick on 14 Apr 2010 05:56

scan() is fine. Just use as third element a ',' as delimiter like
Reeza suggests.

And this is the link:

http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000214639.htm

From: Richard A. DeVenezia on 14 Apr 2010 15:56

On Apr 13, 4:09 pm, Tan <tan.p.p...(a)gmail.com> wrote:
> I have a quick question. I am hoping some has an answer to help me,
> since i can't seem to find an answer in any textbooks or website.
>
> So basically, I have a dataset with a variable like this:
>
> Andover town, Tolland County
> North Canaan town, Litchfield County,
>
> So what I had in mind was:
>
> Town1= scan(Town,1);
>
> But this would only grab the first word, some towns have two words
> (like North Canaan above). But if i set it to 2, then towns with 1
> words would be "Andover town". Is there a way to specify the scan
> function to grab whatever is before the word town? and then I can use
> the trim function to delete the space between the town name and town?

Regular expressions are powerful tools for processing texts and
available in SAS.

data foo;
input;
myVariable = _infile_;
datalines;
Andover town, Tolland County
North Canaan town, Litchfield County,
run;

data foofoo;
set foo;
retain prxid ;

if _n_ = 1 then
prxid = prxParse('/\s*(.*)\s*town\s*,/i');

if prxMatch (prxid,myVariable) then
townname = prxPosN(prxid,1,myVariable);

drop prxid;
run;

--
Richard A. DeVenezia
http://www.devenezia.com

|
Pages: 1
Prev: survival
Next: Standard deviation of elements of a vector in proc iml?