Rename columns using lookup table..Plz Help [SAS]

Prev: Macro with variable in text string
Next: help on macro calculation

From: Amar Mundankar on 10 May 2010 07:22

Hi all,
I have following 2 datasets.

1) Dataset AHU:

It has columns as:

AHUTNRX1-AHUTNRX24
ADUMNRX1-ADUMNRX24
DCUMNRX1-DCUMNRX24
U20FNRX1-U20FNRX24
ARUFNRX1-ARUFNRX24
U21DNRX1-U21DNRX24
U22DNRX1-U22DNRX24

Total number of columns = 24 * 7 = 168 columns

The first 3 letters are Product Name. Fourth letter is level code
(T=TC, M=MARKET, F=PRODUCT FAMILY, D=DOSES LEVEL).
The last 4 OR 5 letters are NRX values from 1 through 24 months.

2) Dataset Segment_Definition:
It has columns as TC, MARKET, FAMILY, DOSES, TC_CODE, MARKET_CODE,
FAMILY_CODE, DOSES_CODE.

The dataset is as follows:
TC MARKET FAMILY DOSES TC_CODE MARKET_CODE
FAMILY_CODE DOSES_CODE
AHY ACD ACU ACU AHU
ADU ADU ADU
AHY DTR B30 B30 AHU
DCU U41 U41
AHY CCA ABZ AB2 AHU
CAU U20 U22
AHY ACE ACC ACC AHU
AEU ARU ARU
AHY CCA ABZ AB1 AHU
CAU U20 U22

If we look at AHU dataset then by looking at the 4th letter of every
column, we can say
AHU is TC, ADU and DCU are MARKET, U20 and ARU are PRODUCT FAMILY and
U21, U22 are PRODUCTs.

Now I have to create a new dataseet AHU_NEW.
AHU_NEW will be exact copy of AHU. Only the column names will be
renamed by using the dataset Segment_Definition.
e.g. AHUTNRX1-AHUTNRX24 will be renamed as AHYTNRX1 through AHYTNRX24.

DCUMNRX1-DCUMNRX24 will be renamed as DTRMNRX1 through DTRMNRX24.

If the fouth letter of a column in AHU Dataset is 'F', then consider
columns FAMILY_CODE and FAMILY.
i.e. for U20FNRX1 column in AHU, cosider columns FAMILY_CODE and
FAMILY from Segment_Definition table.
Look for the value U20 in column FAMILY_CODE and the respective
value of FAMILY column is ABZ.
Therefore in the AHU_NEW dataset U20FNRX1-U20FNRX24 columns will be
renamed as ABZFNRX1 through ABZFNRX24.

If the 4th letter is 'D','T' or 'M' then take DOSES_CODE and DOSES,
TC_CODE and TC, MARKET_CODE and MARKET
columns into consideration respectively.

3) Required Dataset AHU_NEW:

It should have columns as:

AHYTNRX1-AHYTNRX24
ACDMNRX1-ACDMNRX24
DTRMNRX1-DTRMNRX24
ABZFNRX1-ABZFNRX24
ACCFNRX1-ACCFNRX24
AB1DNRX1-AB1DNRX24
AB2DNRX1-AB2DNRX24

Any help will be appreciated.
Thanks in Advance.

Regards,
Amar Mundankar.

From: data _null_; on 10 May 2010 08:59

On May 10, 6:22 am, Amar Mundankar <amarmundan...(a)gmail.com> wrote:
> Hi all,
> I have following 2 datasets.
>
> 1) Dataset AHU:
>
> It has columns as:
>
> AHUTNRX1-AHUTNRX24
> ADUMNRX1-ADUMNRX24
> DCUMNRX1-DCUMNRX24
> U20FNRX1-U20FNRX24
> ARUFNRX1-ARUFNRX24
> U21DNRX1-U21DNRX24
> U22DNRX1-U22DNRX24
>
> Total number of columns = 24 * 7 = 168 columns
>
> The first 3 letters are Product Name. Fourth letter is level code
> (T=TC, M=MARKET, F=PRODUCT FAMILY, D=DOSES LEVEL).
> The last 4 OR 5 letters are NRX values from 1 through 24 months.
>
> 2) Dataset Segment_Definition:
> It has columns as TC, MARKET, FAMILY, DOSES, TC_CODE, MARKET_CODE,
> FAMILY_CODE, DOSES_CODE.
>
> The dataset is as follows:
> TC MARKET FAMILY DOSES TC_CODE MARKET_CODE
> FAMILY_CODE DOSES_CODE
> AHY ACD ACU ACU AHU
> ADU ADU ADU
> AHY DTR B30 B30 AHU
> DCU U41 U41
> AHY CCA ABZ AB2 AHU
> CAU U20 U22
> AHY ACE ACC ACC AHU
> AEU ARU ARU
> AHY CCA ABZ AB1 AHU
> CAU U20 U22
>
> If we look at AHU dataset then by looking at the 4th letter of every
> column, we can say
> AHU is TC, ADU and DCU are MARKET, U20 and ARU are PRODUCT FAMILY and
> U21, U22 are PRODUCTs.
>
> Now I have to create a new dataseet AHU_NEW.
> AHU_NEW will be exact copy of AHU. Only the column names will be
> renamed by using the dataset Segment_Definition.
> e.g. AHUTNRX1-AHUTNRX24 will be renamed as AHYTNRX1 through AHYTNRX24.
>
> DCUMNRX1-DCUMNRX24 will be renamed as DTRMNRX1 through DTRMNRX24.
>
> If the fouth letter of a column in AHU Dataset is 'F', then consider
> columns FAMILY_CODE and FAMILY.
> i.e. for U20FNRX1 column in AHU, cosider columns FAMILY_CODE and
> FAMILY from Segment_Definition table.
> Look for the value U20 in column FAMILY_CODE and the respective
> value of FAMILY column is ABZ.
> Therefore in the AHU_NEW dataset U20FNRX1-U20FNRX24 columns will be
> renamed as ABZFNRX1 through ABZFNRX24.
>
> If the 4th letter is 'D','T' or 'M' then take DOSES_CODE and DOSES,
> TC_CODE and TC, MARKET_CODE and MARKET
> columns into consideration respectively.
>
> 3) Required Dataset AHU_NEW:
>
> It should have columns as:
>
> AHYTNRX1-AHYTNRX24
> ACDMNRX1-ACDMNRX24
> DTRMNRX1-DTRMNRX24
> ABZFNRX1-ABZFNRX24
> ACCFNRX1-ACCFNRX24
> AB1DNRX1-AB1DNRX24
> AB2DNRX1-AB2DNRX24
>
> Any help will be appreciated.
> Thanks in Advance.
>
> Regards,
> Amar Mundankar.

"storing" so much data in variable names is not a good idea.

To do the rename you need
1) list of names.
2) a lookup table

I used INFORMATS for the lookup table. You could just as well used a
data set. I just sorta got it in my head the formats would be good.

The lookup table you provided was not unique. I changed it, don't
know if I fixed it.

data test;
length
AHUTNRX1-AHUTNRX24
ADUMNRX1-ADUMNRX24
DCUMNRX1-DCUMNRX24
U20FNRX1-U20FNRX24
ARUFNRX1-ARUFNRX24
U21DNRX1-U21DNRX24
U22DNRX1-U22DNRX24 $1;
stop;
call missing(of _all_);
run;
proc transpose data=test(obs=0) out=vars;
var _all_;
run;
proc datasets nowarn;
delete formats / memtype=catalog;
run;
quit;
data control;
retain fmtname 'switch' type 'J' hlo 'UJ ';
infile cards firstobs=2;
array to[4] $3;
array fr[4] $3;
drop to: fr:;
input to[*] fr[*];

do _n_ = 1 to dim(to);
fmtname = cats('SWT_',substr('TMFD',_n_,1));
start = fr[_n_];
label = to[_n_];
output;
end;
output;
cards;
TC MARKET FAMILY DOSES TC_CODE MARKET_CODE FAMILY_CODE DOSES_CODE
AHY ACD ACU ACU AHU ADU ADU ADU
AHY DTR B30 B30 AHU DCU U41 U41
AHY CCA ABZ AB2 AHU CAU U20 U22
AHY ACE ACC ACC AHU AEU ARU ARU
AHY CCA ABZ AB1 AHU CAU U22 U21
;;;;
run;
proc sort data=control nodupkey;
by fmtname start label;
run;
proc format cntlin=control cntlout=cntlout;
run;
proc print;
run;
filename FT44F001 temp;
data _null_;
file FT44F001;
set vars end=eof;
if _n_ eq 1 then put +3 'rename';
newname = _name_;
substr(newname,1,3) = inputC(substr(_name_,
1,3),cats('SWT_',substr(_name_,4,1)));
put +9 _name_ '=' newname;
if eof then put +9 ';';
run;

proc datasets;
modify test;
%inc FT44F001 / source2;
run;
quit;

From: RolandRB on 10 May 2010 10:05

On May 10, 1:22 pm, Amar Mundankar <amarmundan...(a)gmail.com> wrote:
> Hi all,
> I have following 2 datasets.
>
> 1) Dataset AHU:
>
> It has columns as:
>
> AHUTNRX1-AHUTNRX24
> ADUMNRX1-ADUMNRX24
> DCUMNRX1-DCUMNRX24
> U20FNRX1-U20FNRX24
> ARUFNRX1-ARUFNRX24
> U21DNRX1-U21DNRX24
> U22DNRX1-U22DNRX24
>
> Total number of columns = 24 * 7 = 168 columns
>
> The first 3 letters are Product Name. Fourth letter is level code
> (T=TC, M=MARKET, F=PRODUCT FAMILY, D=DOSES LEVEL).
> The last 4 OR 5 letters are NRX values from 1 through 24 months.
>
> 2) Dataset Segment_Definition:
> It has columns as TC, MARKET, FAMILY, DOSES, TC_CODE, MARKET_CODE,
> FAMILY_CODE, DOSES_CODE.
>
> The dataset is as follows:
> TC MARKET FAMILY DOSES TC_CODE MARKET_CODE
> FAMILY_CODE DOSES_CODE
> AHY ACD ACU ACU AHU
> ADU ADU ADU
> AHY DTR B30 B30 AHU
> DCU U41 U41
> AHY CCA ABZ AB2 AHU
> CAU U20 U22
> AHY ACE ACC ACC AHU
> AEU ARU ARU
> AHY CCA ABZ AB1 AHU
> CAU U20 U22
>
> If we look at AHU dataset then by looking at the 4th letter of every
> column, we can say
> AHU is TC, ADU and DCU are MARKET, U20 and ARU are PRODUCT FAMILY and
> U21, U22 are PRODUCTs.
>
> Now I have to create a new dataseet AHU_NEW.
> AHU_NEW will be exact copy of AHU. Only the column names will be
> renamed by using the dataset Segment_Definition.
> e.g. AHUTNRX1-AHUTNRX24 will be renamed as AHYTNRX1 through AHYTNRX24.
>
> DCUMNRX1-DCUMNRX24 will be renamed as DTRMNRX1 through DTRMNRX24.
>
> If the fouth letter of a column in AHU Dataset is 'F', then consider
> columns FAMILY_CODE and FAMILY.
> i.e. for U20FNRX1 column in AHU, cosider columns FAMILY_CODE and
> FAMILY from Segment_Definition table.
> Look for the value U20 in column FAMILY_CODE and the respective
> value of FAMILY column is ABZ.
> Therefore in the AHU_NEW dataset U20FNRX1-U20FNRX24 columns will be
> renamed as ABZFNRX1 through ABZFNRX24.
>
> If the 4th letter is 'D','T' or 'M' then take DOSES_CODE and DOSES,
> TC_CODE and TC, MARKET_CODE and MARKET
> columns into consideration respectively.
>
> 3) Required Dataset AHU_NEW:
>
> It should have columns as:
>
> AHYTNRX1-AHYTNRX24
> ACDMNRX1-ACDMNRX24
> DTRMNRX1-DTRMNRX24
> ABZFNRX1-ABZFNRX24
> ACCFNRX1-ACCFNRX24
> AB1DNRX1-AB1DNRX24
> AB2DNRX1-AB2DNRX24
>
> Any help will be appreciated.
> Thanks in Advance.
>
> Regards,
> Amar Mundankar.

Can you do without the lookup table and just use a rename staement
such as the following?

RENAME
AHUTNRX1-AHUTNRX24=AHYTNRX1-AHYTNRX24
ADUMNRX1-ADUMNRX24=ACDMNRX1-ACDMNRX24
DCUMNRX1-DCUMNRX24=DTRMNRX1-DTRMNRX24
U20FNRX1-U20FNRX24=ABZFNRX1-ABZFNRX24
ARUFNRX1-ARUFNRX24=ACCFNRX1-ACCFNRX24
U21DNRX1-U21DNRX24=AB1DNRX1-AB1DNRX24
U22DNRX1-U22DNRX24=AB2DNRX1-AB2DNRX24
;

If you need it to be more complicated and to use the lookup dataset
then you could always create formats from it and use the formats to
generate the rename statement in a macro.

From: RolandRB on 10 May 2010 11:14

On May 10, 1:22 pm, Amar Mundankar <amarmundan...(a)gmail.com> wrote:
> Hi all,
> I have following 2 datasets.
>
> 1) Dataset AHU:
>
> It has columns as:
>
> AHUTNRX1-AHUTNRX24
> ADUMNRX1-ADUMNRX24
> DCUMNRX1-DCUMNRX24
> U20FNRX1-U20FNRX24
> ARUFNRX1-ARUFNRX24
> U21DNRX1-U21DNRX24
> U22DNRX1-U22DNRX24
>
> Total number of columns = 24 * 7 = 168 columns
>
> The first 3 letters are Product Name. Fourth letter is level code
> (T=TC, M=MARKET, F=PRODUCT FAMILY, D=DOSES LEVEL).
> The last 4 OR 5 letters are NRX values from 1 through 24 months.
>
> 2) Dataset Segment_Definition:
> It has columns as TC, MARKET, FAMILY, DOSES, TC_CODE, MARKET_CODE,
> FAMILY_CODE, DOSES_CODE.
>
> The dataset is as follows:
> TC MARKET FAMILY DOSES TC_CODE MARKET_CODE
> FAMILY_CODE DOSES_CODE
> AHY ACD ACU ACU AHU
> ADU ADU ADU
> AHY DTR B30 B30 AHU
> DCU U41 U41
> AHY CCA ABZ AB2 AHU
> CAU U20 U22
> AHY ACE ACC ACC AHU
> AEU ARU ARU
> AHY CCA ABZ AB1 AHU
> CAU U20 U22
>
> If we look at AHU dataset then by looking at the 4th letter of every
> column, we can say
> AHU is TC, ADU and DCU are MARKET, U20 and ARU are PRODUCT FAMILY and
> U21, U22 are PRODUCTs.
>
> Now I have to create a new dataseet AHU_NEW.
> AHU_NEW will be exact copy of AHU. Only the column names will be
> renamed by using the dataset Segment_Definition.
> e.g. AHUTNRX1-AHUTNRX24 will be renamed as AHYTNRX1 through AHYTNRX24.
>
> DCUMNRX1-DCUMNRX24 will be renamed as DTRMNRX1 through DTRMNRX24.
>
> If the fouth letter of a column in AHU Dataset is 'F', then consider
> columns FAMILY_CODE and FAMILY.
> i.e. for U20FNRX1 column in AHU, cosider columns FAMILY_CODE and
> FAMILY from Segment_Definition table.
> Look for the value U20 in column FAMILY_CODE and the respective
> value of FAMILY column is ABZ.
> Therefore in the AHU_NEW dataset U20FNRX1-U20FNRX24 columns will be
> renamed as ABZFNRX1 through ABZFNRX24.
>
> If the 4th letter is 'D','T' or 'M' then take DOSES_CODE and DOSES,
> TC_CODE and TC, MARKET_CODE and MARKET
> columns into consideration respectively.
>
> 3) Required Dataset AHU_NEW:
>
> It should have columns as:
>
> AHYTNRX1-AHYTNRX24
> ACDMNRX1-ACDMNRX24
> DTRMNRX1-DTRMNRX24
> ABZFNRX1-ABZFNRX24
> ACCFNRX1-ACCFNRX24
> AB1DNRX1-AB1DNRX24
> AB2DNRX1-AB2DNRX24
>
> Any help will be appreciated.
> Thanks in Advance.
>
> Regards,
> Amar Mundankar.

I think this works. Note that you mapped U21 wrongly in your lookup
dataset so I fixed it for you. Now for a beer.

data ahu;
retain
AHUTNRX1-AHUTNRX24
ADUMNRX1-ADUMNRX24
DCUMNRX1-DCUMNRX24
U20FNRX1-U20FNRX24
ARUFNRX1-ARUFNRX24
U21DNRX1-U21DNRX24
U22DNRX1-U22DNRX24
99
;
run;
proc contents data=ahu out=ahucont(keep=name) noprint;
run;
data ahucont2;
length code $ 4;
set ahucont;
code=name;
run;
proc print data=ahucont2;
run;

data dseg;
informat
TC MARKET FAMILY DOSES TC_CODE MARKET_CODE
FAMILY_CODE DOSES_CODE
$3.
;
input
TC MARKET FAMILY DOSES TC_CODE MARKET_CODE
FAMILY_CODE DOSES_CODE
;
cards;
AHY ACD ACU ACU AHU
ADU ADU ADU
AHY DTR B30 B30 AHU
DCU U41 U41
AHY CCA ABZ AB2 AHU
CAU U20 U22
AHY ACE ACC ACC AHU
AEU ARU ARU
AHY CCA ABZ AB1 AHU
CAU U20 U21
;
run;

proc print data=dseg;
run;

data dseg2;
length code $ 4 fmt $ 4;
set dseg;
code=tc_code||"T";fmt=tc||"T";output;
code=market_code||"M";fmt=market||"M";output;
code=family_code||"F";fmt=family||"F";output;
code=doses_code||"D";fmt=doses||"D";output;
keep code fmt;
run;
proc sort data=dseg2;
by code;
run;
proc print data=dseg2;
run;
data dseg3;
length newname $ 9;
merge dseg2 ahucont2(in=_ahu);
by code;
if _ahu;
newname=fmt||substr(name,5);
keep name newname;
run;
proc print data=dseg3;
run;
filename renlist temp;
data _null_;
set dseg3;
file renlist;
put name "=" newname;
run;
data ahu_new;
set ahu;
run;
proc datasets;
modify ahu_new;
rename
%inc renlist / source2;
;
run;
quit;
proc print data=ahu_new;
run;

From: Amar Mundankar on 11 May 2010 02:33

On May 10, 8:14 pm, RolandRB <rolandbe...(a)hotmail.com> wrote:
> On May 10, 1:22 pm, Amar Mundankar <amarmundan...(a)gmail.com> wrote:
>
>
>
>
>
> > Hi all,
> > I have following 2 datasets.
>
> > 1) Dataset AHU:
>
> > It has columns as:
>
> > AHUTNRX1-AHUTNRX24
> > ADUMNRX1-ADUMNRX24
> > DCUMNRX1-DCUMNRX24
> > U20FNRX1-U20FNRX24
> > ARUFNRX1-ARUFNRX24
> > U21DNRX1-U21DNRX24
> > U22DNRX1-U22DNRX24
>
> > Total number of columns = 24 * 7 = 168 columns
>
> > The first 3 letters are Product Name. Fourth letter is level code
> > (T=TC, M=MARKET, F=PRODUCT FAMILY, D=DOSES LEVEL).
> > The last 4 OR 5 letters are NRX values from 1 through 24 months.
>
> > 2) Dataset Segment_Definition:
> > It has columns as TC, MARKET, FAMILY, DOSES, TC_CODE, MARKET_CODE,
> > FAMILY_CODE, DOSES_CODE.
>
> > The dataset is as follows:
> > TC MARKET FAMILY DOSES TC_CODE MARKET_CODE
> > FAMILY_CODE DOSES_CODE
> > AHY ACD ACU ACU AHU
> > ADU ADU ADU
> > AHY DTR B30 B30 AHU
> > DCU U41 U41
> > AHY CCA ABZ AB2 AHU
> > CAU U20 U22
> > AHY ACE ACC ACC AHU
> > AEU ARU ARU
> > AHY CCA ABZ AB1 AHU
> > CAU U20 U22
>
> > If we look at AHU dataset then by looking at the 4th letter of every
> > column, we can say
> > AHU is TC, ADU and DCU are MARKET, U20 and ARU are PRODUCT FAMILY and
> > U21, U22 are PRODUCTs.
>
> > Now I have to create a new dataseet AHU_NEW.
> > AHU_NEW will be exact copy of AHU. Only the column names will be
> > renamed by using the dataset Segment_Definition.
> > e.g. AHUTNRX1-AHUTNRX24 will be renamed as AHYTNRX1 through AHYTNRX24.
>
> > DCUMNRX1-DCUMNRX24 will be renamed as DTRMNRX1 through DTRMNRX24.
>
> > If the fouth letter of a column in AHU Dataset is 'F', then consider
> > columns FAMILY_CODE and FAMILY.
> > i.e. for U20FNRX1 column in AHU, cosider columns FAMILY_CODE and
> > FAMILY from Segment_Definition table.
> > Look for the value U20 in column FAMILY_CODE and the respective
> > value of FAMILY column is ABZ.
> > Therefore in the AHU_NEW dataset U20FNRX1-U20FNRX24 columns will be
> > renamed as ABZFNRX1 through ABZFNRX24.
>
> > If the 4th letter is 'D','T' or 'M' then take DOSES_CODE and DOSES,
> > TC_CODE and TC, MARKET_CODE and MARKET
> > columns into consideration respectively.
>
> > 3) Required Dataset AHU_NEW:
>
> > It should have columns as:
>
> > AHYTNRX1-AHYTNRX24
> > ACDMNRX1-ACDMNRX24
> > DTRMNRX1-DTRMNRX24
> > ABZFNRX1-ABZFNRX24
> > ACCFNRX1-ACCFNRX24
> > AB1DNRX1-AB1DNRX24
> > AB2DNRX1-AB2DNRX24
>
> > Any help will be appreciated.
> > Thanks in Advance.
>
> > Regards,
> > Amar Mundankar.
>
> I think this works. Note that you mapped U21 wrongly in your lookup
> dataset so I fixed it for you. Now for a beer.
>
> data ahu;
> retain
> AHUTNRX1-AHUTNRX24
> ADUMNRX1-ADUMNRX24
> DCUMNRX1-DCUMNRX24
> U20FNRX1-U20FNRX24
> ARUFNRX1-ARUFNRX24
> U21DNRX1-U21DNRX24
> U22DNRX1-U22DNRX24
> 99
> ;
> run;
> proc contents data=ahu out=ahucont(keep=name) noprint;
> run;
> data ahucont2;
> length code $ 4;
> set ahucont;
> code=name;
> run;
> proc print data=ahucont2;
> run;
>
> data dseg;
> informat
> TC MARKET FAMILY DOSES TC_CODE MARKET_CODE
> FAMILY_CODE DOSES_CODE
> $3.
> ;
> input
> TC MARKET FAMILY DOSES TC_CODE MARKET_CODE
> FAMILY_CODE DOSES_CODE
> ;
> cards;
> AHY ACD ACU ACU AHU
> ADU ADU ADU
> AHY DTR B30 B30 AHU
> DCU U41 U41
> AHY CCA ABZ AB2 AHU
> CAU U20 U22
> AHY ACE ACC ACC AHU
> AEU ARU ARU
> AHY CCA ABZ AB1 AHU
> CAU U20 U21
> ;
> run;
>
> proc print data=dseg;
> run;
>
> data dseg2;
> length code $ 4 fmt $ 4;
> set dseg;
> code=tc_code||"T";fmt=tc||"T";output;
> code=market_code||"M";fmt=market||"M";output;
> code=family_code||"F";fmt=family||"F";output;
> code=doses_code||"D";fmt=doses||"D";output;
> keep code fmt;
> run;
> proc sort data=dseg2;
> by code;
> run;
> proc print data=dseg2;
> run;
> data dseg3;
> length newname $ 9;
> merge dseg2 ahucont2(in=_ahu);
> by code;
> if _ahu;
> newname=fmt||substr(name,5);
> keep name newname;
> run;
> proc print data=dseg3;
> run;
> filename renlist temp;
> data _null_;
> set dseg3;
> file renlist;
> put name "=" newname;
> run;
> data ahu_new;
> set ahu;
> run;
> proc datasets;
> modify ahu_new;
> rename
> %inc renlist / source2;
> ;
> run;
> quit;
> proc print data=ahu_new;
> run;- Hide quoted text -
>
> - Show quoted text -

Hi RolandDB, data_null,
thanks for your help. It is working for me.
You both are correct. In the last record of Segment_Definition dataset
the value should be U21.
Thanks for correcting.
Cheers.....

Amar Mundankar.

|
Pages: 1
Prev: Macro with variable in text string
Next: help on macro calculation