problems selecting subst [SAS]

Prev: Interesting finding: Proc SQL create quoted name dataset!
Next: sas data xport problem

From: Mary on 26 Jan 2010 19:32

Trish,

The way I see these done is using Proc Format, then output a data set that is long rather than wide; which you can then deal with:

proc format;
value $diags
'171'-'17199',
'172'-'17299'='Disease'
other='Other';
run;

data diags;
set z.namc94;
informat diagnosis $30.;
array diag{3} diag1-diag3;
do x=1 to 3;
diagnosis=put(diag[x],$diags.);
output;
end;
keep id date_of_service diagnosis;
run;

Other formats could be added if you have multiple diseases of interest, then you can aggregate this resulting data by diagnosis

proc sql noprint;
create table ids_with_disease
as select distinct id from diags
where diagnosis='Disease'
order by id;
quit;

Not sure why yours would not work, however.

-Mary

--- tboussard(a)GMAIL.COM wrote:

From: Trish Bous <tboussard(a)GMAIL.COM>
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: problems selecting subst
Date: Tue, 26 Jan 2010 19:17:14 -0500

The data are numeric:
Hi All,

I am having some problems selecting a substring of data, and I was hoping
someone might have some insight on the problem.

I am trying to select ICD-9 codes. My code works successfully in other
datasets, but I am not able to select my observations in the below dataset
and I am not sure why.

My code that works on other datasets is as follows:

data z.NMSC;
set z.namc94;
tag = 0;
array prx (3) diag1-diag3;
do i = 1 to 3;
if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1;
end;
run;

In the dataset described below, the code does not select any observations.
However, if I change the code to the below, I select a few observations, but
not the entire set I want:

data z.NMSC;
set z.namc94;
tag = 0;
array prx (3) diag1-diag3;
do i = 1 to 3;
if prx{i} in ('172100', '172110') then tag = 1;
end;
run;

Any help on the problem? As a work around, I could list all possible codes
in the (), but there has to be a better way.

Dataset:

26 DIAG1 Num 8
27 DIAG2 Num 8

Example of data:

Obs DIAG1 DIAG2 DIAG3

1 172100 147790 900000
2 202509 900000 900000
3 184490 140190 144390
4 159900 207000 900000
5 172110 900000 900000
6 143590 900000 900000
7 141490 141390 900000
8 156400 145580 140190
9 147390 137230 149390
10 173530 147790 900000
11 149600 116200 900000
12 171900 100880 118500

Thank you!

From: Arthur Tabachneck on 26 Jan 2010 19:36

Trish,

What are you trying to select? Your current code selects exactly what you
specified, namely any records that contain either 172100 or 172110.

Art
--------
On Tue, 26 Jan 2010 19:17:14 -0500, Trish Bous <tboussard(a)GMAIL.COM> wrote:

>The data are numeric:
>Hi All,
>
>I am having some problems selecting a substring of data, and I was hoping
>someone might have some insight on the problem.
>
>I am trying to select ICD-9 codes. My code works successfully in other
>datasets, but I am not able to select my observations in the below dataset
>and I am not sure why.
>
>My code that works on other datasets is as follows:
>
>
>data z.NMSC;
> set z.namc94;
> tag = 0;
> array prx (3) diag1-diag3;
> do i = 1 to 3;
> if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1;
> end;
>run;
>
>In the dataset described below, the code does not select any observations.
>However, if I change the code to the below, I select a few observations,
but
>not the entire set I want:
>
>
>data z.NMSC;
> set z.namc94;
> tag = 0;
> array prx (3) diag1-diag3;
> do i = 1 to 3;
> if prx{i} in ('172100', '172110') then tag = 1;
> end;
>run;
>
>Any help on the problem? As a work around, I could list all possible
codes
>in the (), but there has to be a better way.
>
>
>Dataset:
>
> 26 DIAG1 Num 8
> 27 DIAG2 Num 8
>
>Example of data:
>
> Obs DIAG1 DIAG2 DIAG3
>
> 1 172100 147790 900000
> 2 202509 900000 900000
> 3 184490 140190 144390
> 4 159900 207000 900000
> 5 172110 900000 900000
> 6 143590 900000 900000
> 7 141490 141390 900000
> 8 156400 145580 140190
> 9 147390 137230 149390
> 10 173530 147790 900000
> 11 149600 116200 900000
> 12 171900 100880 118500
>
>
>Thank you!

From: Muthia Kachirayan on 26 Jan 2010 20:05

Trish,

Could this be due to the wrong data type used? SUBSTR() works with a string
and you are checking for the equality of '171 OR '172'. Your array PRX[ ] is
numeric as well as variables DIAG1 - DIAG3. Be consistent in the use of data
type and that would fix your issue.

On Tue, Jan 26, 2010 at 8:17 PM, Trish Bous <tboussard(a)gmail.com> wrote:

> The data are numeric:
> Hi All,
>
> I am having some problems selecting a substring of data, and I was hoping
> someone might have some insight on the problem.
>
> I am trying to select ICD-9 codes. My code works successfully in other
> datasets, but I am not able to select my observations in the below dataset
> and I am not sure why.
>
> My code that works on other datasets is as follows:
>
>
> data z.NMSC;
> set z.namc94;
> tag = 0;
> array prx (3) diag1-diag3;
> do i = 1 to 3;
> if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1;
> end;
> run;
>
> In the dataset described below, the code does not select any observations.
> However, if I change the code to the below, I select a few observations,
> but
> not the entire set I want:
>
>
> data z.NMSC;
> set z.namc94;
> tag = 0;
> array prx (3) diag1-diag3;
> do i = 1 to 3;
> if prx{i} in ('172100', '172110') then tag = 1;
> end;
> run;
>
> Any help on the problem? As a work around, I could list all possible codes
> in the (), but there has to be a better way.
>
>
> Dataset:
>
> 26 DIAG1 Num 8
> 27 DIAG2 Num 8
>
> Example of data:
>
> Obs DIAG1 DIAG2 DIAG3
>
> 1 172100 147790 900000
> 2 202509 900000 900000
> 3 184490 140190 144390
> 4 159900 207000 900000
> 5 172110 900000 900000
> 6 143590 900000 900000
> 7 141490 141390 900000
> 8 156400 145580 140190
> 9 147390 137230 149390
> 10 173530 147790 900000
> 11 149600 116200 900000
> 12 171900 100880 118500
>
>
> Thank you!
>

From: Mike Zdeb on 27 Jan 2010 09:13

hi ... as others have pointed out, you have numeric data and are using a character function

so, since your diagnosis codes are numeric and you want to look at the 1st three characters
here's an idea that allows you to look at the 1st three characters and not get any of those
type conversion notes that you should see in the LOG if you used your original code

***
array prx (3) diag1-diag3;
do i = 1 to 3;
if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1;
***

the CAT function allows you to treat the array elements as character values and the ":" allows
you to just look at the first three characters of various diagnoses ... you can stop the loop
once you find a hit ... the equation

tag = (cat(prx(_n_)) in : ('171' '172'));

produces a 1 if you find a diagnosis and a 0 if you do not

* your data;
data dx;
input diag1-diag3;
datalines;
172100 147790 900000
202509 900000 900000
184490 140190 144390
159900 207000 900000
172110 900000 900000
143590 900000 900000
141490 141390 900000
156400 145580 140190
147390 137230 149390
173530 147790 900000
149600 116200 900000
171900 100880 118500
;
run;

* a suggestion;
data dxplus;
set dx;
array prx(3) diag1-diag3;
do _n_ = 1 to 3 until (tag eq 1);
tag = (cat(prx(_n_)) in : ('171' '172'));
end;
run;

no nasty LOG messages ...
408 data dxplus;
409 set dx;
410 array prx(3) diag1-diag3;
411 do _n_ = 1 to 3 until (tag eq 1);
412 tag = (cat(prx(_n_)) in : ('171' '172'));
413 end;
414 run;
NOTE: There were 12 observations read from the data set WORK.DX.
NOTE: The data set WORK.DXPLUS has 12 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

and the data set...
Obs diag1 diag2 diag3 tag
1 172100 147790 900000 1
2 202509 900000 900000 0
3 184490 140190 144390 0
4 159900 207000 900000 0
5 172110 900000 900000 1
6 143590 900000 900000 0
7 141490 141390 900000 0
8 156400 145580 140190 0
9 147390 137230 149390 0
10 173530 147790 900000 0
11 149600 116200 900000 0
12 171900 100880 118500 1

it also works fine with your 5-character strings
you can leave the ":" in the tag equation or take it out since
you are looking for all 5 characters (works either way)

data dxplus;
set dx;
array prx(3) diag1-diag3;
do _n_ = 1 to 3 until (tag eq 1);
tag = (cat(prx(_n_)) in ('172100', '172110'));
end;
run;

--
Mike Zdeb
U(a)Albany School of Public Health
One University Place
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475

> The data are numeric:
> Hi All,
>
> I am having some problems selecting a substring of data, and I was hoping someone might have some insight on the problem.
>
> I am trying to select ICD-9 codes. My code works successfully in other datasets, but I am not able to select my observations in the below dataset
and I am not sure why.
>
> My code that works on other datasets is as follows:
>
>
> data z.NMSC;
> set z.namc94;
> tag = 0;
> array prx (3) diag1-diag3;
> do i = 1 to 3;
> if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1;
> end;
> run;
>
> In the dataset described below, the code does not select any observations. However, if I change the code to the below, I select a few
observations, but not the entire set I want:
>
>
> data z.NMSC;
> set z.namc94;
> tag = 0;
> array prx (3) diag1-diag3;
> do i = 1 to 3;
> if prx{i} in ('172100', '172110') then tag = 1;
> end;
> run;
>
> Any help on the problem? As a work around, I could list all possible codes in the (), but there has to be a better way.
>
>
> Dataset:
>
> 26 DIAG1 Num 8
> 27 DIAG2 Num 8
>
> Example of data:
>
> Obs DIAG1 DIAG2 DIAG3
>
> 1 172100 147790 900000 2 202509 900000 900000 3 184490 140190 144390 4 159900
207000 900000 5 172110 900000 900000 6 143590 900000 900000 7 141490 141390
900000 8 156400 145580 140190 9 147390 137230 149390
> 10 173530 147790 900000 11 149600 116200 900000 12 171900 100880 118500
>
>
> Thank you!
>

|
Pages: 1
Prev: Interesting finding: Proc SQL create quoted name dataset!
Next: sas data xport problem