From: Mary on 26 Jan 2010 19:32 Trish, The way I see these done is using Proc Format, then output a data set that is long rather than wide; which you can then deal with: proc format; value $diags '171'-'17199', '172'-'17299'='Disease' other='Other'; run; data diags; set z.namc94; informat diagnosis $30.; array diag{3} diag1-diag3; do x=1 to 3; diagnosis=put(diag[x],$diags.); output; end; keep id date_of_service diagnosis; run; Other formats could be added if you have multiple diseases of interest, then you can aggregate this resulting data by diagnosis proc sql noprint; create table ids_with_disease as select distinct id from diags where diagnosis='Disease' order by id; quit; Not sure why yours would not work, however. -Mary --- tboussard(a)GMAIL.COM wrote: From: Trish Bous <tboussard(a)GMAIL.COM> To: SAS-L(a)LISTSERV.UGA.EDU Subject: problems selecting subst Date: Tue, 26 Jan 2010 19:17:14 -0500 The data are numeric: Hi All, I am having some problems selecting a substring of data, and I was hoping someone might have some insight on the problem. I am trying to select ICD-9 codes. My code works successfully in other datasets, but I am not able to select my observations in the below dataset and I am not sure why. My code that works on other datasets is as follows: data z.NMSC; set z.namc94; tag = 0; array prx (3) diag1-diag3; do i = 1 to 3; if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1; end; run; In the dataset described below, the code does not select any observations. However, if I change the code to the below, I select a few observations, but not the entire set I want: data z.NMSC; set z.namc94; tag = 0; array prx (3) diag1-diag3; do i = 1 to 3; if prx{i} in ('172100', '172110') then tag = 1; end; run; Any help on the problem? As a work around, I could list all possible codes in the (), but there has to be a better way. Dataset: 26 DIAG1 Num 8 27 DIAG2 Num 8 Example of data: Obs DIAG1 DIAG2 DIAG3 1 172100 147790 900000 2 202509 900000 900000 3 184490 140190 144390 4 159900 207000 900000 5 172110 900000 900000 6 143590 900000 900000 7 141490 141390 900000 8 156400 145580 140190 9 147390 137230 149390 10 173530 147790 900000 11 149600 116200 900000 12 171900 100880 118500 Thank you!
From: Arthur Tabachneck on 26 Jan 2010 19:36 Trish, What are you trying to select? Your current code selects exactly what you specified, namely any records that contain either 172100 or 172110. Art -------- On Tue, 26 Jan 2010 19:17:14 -0500, Trish Bous <tboussard(a)GMAIL.COM> wrote: >The data are numeric: >Hi All, > >I am having some problems selecting a substring of data, and I was hoping >someone might have some insight on the problem. > >I am trying to select ICD-9 codes. My code works successfully in other >datasets, but I am not able to select my observations in the below dataset >and I am not sure why. > >My code that works on other datasets is as follows: > > >data z.NMSC; > set z.namc94; > tag = 0; > array prx (3) diag1-diag3; > do i = 1 to 3; > if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1; > end; >run; > >In the dataset described below, the code does not select any observations. >However, if I change the code to the below, I select a few observations, but >not the entire set I want: > > >data z.NMSC; > set z.namc94; > tag = 0; > array prx (3) diag1-diag3; > do i = 1 to 3; > if prx{i} in ('172100', '172110') then tag = 1; > end; >run; > >Any help on the problem? As a work around, I could list all possible codes >in the (), but there has to be a better way. > > >Dataset: > > 26 DIAG1 Num 8 > 27 DIAG2 Num 8 > >Example of data: > > Obs DIAG1 DIAG2 DIAG3 > > 1 172100 147790 900000 > 2 202509 900000 900000 > 3 184490 140190 144390 > 4 159900 207000 900000 > 5 172110 900000 900000 > 6 143590 900000 900000 > 7 141490 141390 900000 > 8 156400 145580 140190 > 9 147390 137230 149390 > 10 173530 147790 900000 > 11 149600 116200 900000 > 12 171900 100880 118500 > > >Thank you!
From: Muthia Kachirayan on 26 Jan 2010 20:05 Trish, Could this be due to the wrong data type used? SUBSTR() works with a string and you are checking for the equality of '171 OR '172'. Your array PRX[ ] is numeric as well as variables DIAG1 - DIAG3. Be consistent in the use of data type and that would fix your issue. On Tue, Jan 26, 2010 at 8:17 PM, Trish Bous <tboussard(a)gmail.com> wrote: > The data are numeric: > Hi All, > > I am having some problems selecting a substring of data, and I was hoping > someone might have some insight on the problem. > > I am trying to select ICD-9 codes. My code works successfully in other > datasets, but I am not able to select my observations in the below dataset > and I am not sure why. > > My code that works on other datasets is as follows: > > > data z.NMSC; > set z.namc94; > tag = 0; > array prx (3) diag1-diag3; > do i = 1 to 3; > if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1; > end; > run; > > In the dataset described below, the code does not select any observations. > However, if I change the code to the below, I select a few observations, > but > not the entire set I want: > > > data z.NMSC; > set z.namc94; > tag = 0; > array prx (3) diag1-diag3; > do i = 1 to 3; > if prx{i} in ('172100', '172110') then tag = 1; > end; > run; > > Any help on the problem? As a work around, I could list all possible codes > in the (), but there has to be a better way. > > > Dataset: > > 26 DIAG1 Num 8 > 27 DIAG2 Num 8 > > Example of data: > > Obs DIAG1 DIAG2 DIAG3 > > 1 172100 147790 900000 > 2 202509 900000 900000 > 3 184490 140190 144390 > 4 159900 207000 900000 > 5 172110 900000 900000 > 6 143590 900000 900000 > 7 141490 141390 900000 > 8 156400 145580 140190 > 9 147390 137230 149390 > 10 173530 147790 900000 > 11 149600 116200 900000 > 12 171900 100880 118500 > > > Thank you! >
From: Mike Zdeb on 27 Jan 2010 09:13 hi ... as others have pointed out, you have numeric data and are using a character function so, since your diagnosis codes are numeric and you want to look at the 1st three characters here's an idea that allows you to look at the 1st three characters and not get any of those type conversion notes that you should see in the LOG if you used your original code *** array prx (3) diag1-diag3; do i = 1 to 3; if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1; *** the CAT function allows you to treat the array elements as character values and the ":" allows you to just look at the first three characters of various diagnoses ... you can stop the loop once you find a hit ... the equation tag = (cat(prx(_n_)) in : ('171' '172')); produces a 1 if you find a diagnosis and a 0 if you do not * your data; data dx; input diag1-diag3; datalines; 172100 147790 900000 202509 900000 900000 184490 140190 144390 159900 207000 900000 172110 900000 900000 143590 900000 900000 141490 141390 900000 156400 145580 140190 147390 137230 149390 173530 147790 900000 149600 116200 900000 171900 100880 118500 ; run; * a suggestion; data dxplus; set dx; array prx(3) diag1-diag3; do _n_ = 1 to 3 until (tag eq 1); tag = (cat(prx(_n_)) in : ('171' '172')); end; run; no nasty LOG messages ... 408 data dxplus; 409 set dx; 410 array prx(3) diag1-diag3; 411 do _n_ = 1 to 3 until (tag eq 1); 412 tag = (cat(prx(_n_)) in : ('171' '172')); 413 end; 414 run; NOTE: There were 12 observations read from the data set WORK.DX. NOTE: The data set WORK.DXPLUS has 12 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds and the data set... Obs diag1 diag2 diag3 tag 1 172100 147790 900000 1 2 202509 900000 900000 0 3 184490 140190 144390 0 4 159900 207000 900000 0 5 172110 900000 900000 1 6 143590 900000 900000 0 7 141490 141390 900000 0 8 156400 145580 140190 0 9 147390 137230 149390 0 10 173530 147790 900000 0 11 149600 116200 900000 0 12 171900 100880 118500 1 it also works fine with your 5-character strings you can leave the ":" in the tag equation or take it out since you are looking for all 5 characters (works either way) data dxplus; set dx; array prx(3) diag1-diag3; do _n_ = 1 to 3 until (tag eq 1); tag = (cat(prx(_n_)) in ('172100', '172110')); end; run; -- Mike Zdeb U(a)Albany School of Public Health One University Place Rensselaer, New York 12144-3456 P/518-402-6479 F/630-604-1475 > The data are numeric: > Hi All, > > I am having some problems selecting a substring of data, and I was hoping someone might have some insight on the problem. > > I am trying to select ICD-9 codes. My code works successfully in other datasets, but I am not able to select my observations in the below dataset and I am not sure why. > > My code that works on other datasets is as follows: > > > data z.NMSC; > set z.namc94; > tag = 0; > array prx (3) diag1-diag3; > do i = 1 to 3; > if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1; > end; > run; > > In the dataset described below, the code does not select any observations. However, if I change the code to the below, I select a few observations, but not the entire set I want: > > > data z.NMSC; > set z.namc94; > tag = 0; > array prx (3) diag1-diag3; > do i = 1 to 3; > if prx{i} in ('172100', '172110') then tag = 1; > end; > run; > > Any help on the problem? As a work around, I could list all possible codes in the (), but there has to be a better way. > > > Dataset: > > 26 DIAG1 Num 8 > 27 DIAG2 Num 8 > > Example of data: > > Obs DIAG1 DIAG2 DIAG3 > > 1 172100 147790 900000 2 202509 900000 900000 3 184490 140190 144390 4 159900 207000 900000 5 172110 900000 900000 6 143590 900000 900000 7 141490 141390 900000 8 156400 145580 140190 9 147390 137230 149390 > 10 173530 147790 900000 11 149600 116200 900000 12 171900 100880 118500 > > > Thank you! >
|
Pages: 1 Prev: Interesting finding: Proc SQL create quoted name dataset! Next: sas data xport problem |