From: Kevin Myers on 23 Jan 2010 06:37 Right Ted, these are the types of problems that my proposed enhancement is intended to solve. There is no way to prevent conversion to character on the first transpose, because values for multiple columns with differing types are being forced into a single output column. But upon transposing the data a second time, the proposed enhancements would generally allow the original variables types and other attributes to be restored. However, your examples raise a couple of potential problems that I failed to account for previously: 1) What if there is no one-to-one correspondence between the formatted and informatted values, or 2) what it there is a format but no corresponding informat or vice-versa? In these cases, my proposal as it currently stands would not produce the desired result, because there would be no way to convert the formatted values back into the original raw data values. This problem suggests the need for two additional options on the PROC FORMAT statement in order to completely implement the desired behavior: NOFORMAT and NOINFORMAT. NOFORMAT would be used to force default formats when outputting values and NOINFORMAT would be used to force default informats when reading values. Note that these options would only prevent non-default formats and informats from being actively *used* for converting values during processing, they would NOT prevent the format and informat attribute values from being written to or read from variables identified by using the proposed FORMAT= and INFORMAT= options or the IDFORMAT and IDINFORMAT statements. I will add this discussion to my proposal. s/KAM ----- Original Message ----- From: "Ted Clay" <tclay(a)ashlandhome.net> To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM>; <SAS-L(a)LISTSERV.UGA.EDU> Sent: Friday, January 22, 2010 23:57 Subject: RE: SASware ballot #18 Proc Transpose preserving variable attributes > Hi Kevin, > > The default behavior of Proc Transpose is making it difficult to achieve > the > preservation of variable attributes. The problem is with the values > themselves. > > Take a look at this: > > proc format; > value nfmt 1='One%' 2='Two$two' 3='%three'; > value $cfmt 'A'='AOne%' 'B'='BTwo$two' 'C'='%three#C'; > run; > > Data testing; > Input Group $ N C $ D mmddyy10.; > label N='Numeric var' > C='Text' > D='Birthday'; > format n nfmt. c $cfmt. D mmddyy10.; > Datalines; > G1 1 A 01/15/2010 > ; > proc transpose data=testing out=Tran; > var N C D; > run; > > proc print data=tran noobs; > run; > > PRODUCES: > > _NAME_ _LABEL_ COL1 > > N Numeric var One% > C Text A > D Birthday 01/15/2010 > > The problem is that the numeric variable N has been converted to character > using the format we assigned to it. The original numeric value is lost. > If > we prevent this by removing the format, we lose the format attribute. Is > this Catch-22.0? > > Ted > > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Kevin > Myers > Sent: Thursday, January 21, 2010 1:30 PM > To: SAS-L(a)LISTSERV.UGA.EDU > Subject: Re: SASware ballot #18 Proc Transpose preserving variable > attributes > > Hi Ted, > > The proposed SUGI ballot item enhancements to PROC TRANSPOSE could > potentially help with your process to some degree, even though > fundamentally > my proposal only deals with variable attributes, whereas your needs > involve > an actual rearrangement of the data into different observations and > variables. > > If you first transposed your data into "long-skinny" form using a BY > statement, then provided a distinct _NAME_ variable for each individual > observation in each BY group of the long-skinny dataset using logic of > your > own, then transposed that result back into short-wide form once again > using > a BY statement, then you could achieve roughly what you are looking for, > and > perhaps this is what you have already tried. > > The contribution that my proposal could make to this process is that you > would be able to use the new PROC TRANSPOSE statements and options to > force > the resulting variables in the final data set (after transposing twice) to > have all of the same types, lengths, formats, and informats as they did in > the original data set, rather than all being changed to character > variables > with none of the other original attributes. If maintaining the original > variable types and attributes is important, then my proposal could > substantially simplify your process. > > s/KAM > > P.S. - Hope you don't mind that I copied this reply back to the list. I > thought others might also be interested. > > > ----- Original Message ----- > From: "Ted Clay" <tclay(a)ashlandhome.net> > To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM> > Sent: Thursday, January 21, 2010 13:50 > Subject: RE: SASware ballot #18 Proc Transpose preserving variable > attributes > > >> Hi Kevin, >> Thanks for filling the gaps. Yes, what I am describing creates a >> differently >> shaped output data set, so is not equivalent to your proposal. Let any >> confusion on that point end here. >> >> The ability to create and work with a "self-defining" skinny data set >> would >> be great. >> >> The situation I run into all the time (say, weekly) is where the data is >> in >> groups, and I have a mixture of variable types: numeric, character, >> numeric >> date. I want to get all the data in a group onto one observation. I >> sometimes have an ID variable, or at other times just want numeric >> suffixes >> (1,2,3..). >> Would your proposal make that easier to do than it is now? My hunch is >> yes, >> but it would not be a slam-dunk: >> Proc transpose, with the additional options you propose on the Proc >> statement. >> Data step to calculate new variable names, and possibly labels. >> (Calculate >> numeric suffixes?) >> Proc transpose, with the additional ID statements you propose. >> >> I had a typo on my example -- I meant to type "ONEOUT" not "OUTOUT". I >> guess my dog "Spot" was tugging on my pants-leg. >> Proc transpose data=testing out=trans ONEOUT; >> By group; >> Var N D; >> ID C; >> Run; >> >> Ted >> >> >> >> >> -----Original Message----- >> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of >> Kevin >> Myers >> Sent: Thursday, January 21, 2010 9:27 AM >> To: SAS-L(a)LISTSERV.UGA.EDU >> Subject: Re: SASware ballot #18 Proc Transpose preserving variable >> attributes >> >> Ted and all - >> >> The current item on the SASware ballot was put on there largely based on >> my >> input. I provided a detailed description of the suggested enhancements >> via >> SAS Institute's online Tracks system: >> >> 7610240654 - Base SAS: Additional options and statements for PROC >> TRANSPOSE >> to support transposing additional variable attributes, including format, >> informat, type, and length. >> >> I don't know whether you will be able to view that, but am short on time >> at >> the moment, so below is an excerpt from my original submission. Please >> note >> that my proposal does NOT require any changes to existing PROC TRANSPOSE >> behavior, but rather adds support for additional options and statements >> to >> support handling additional variable attributes analogous to the existing >> support for variable lablels. >> >> I haven't had time to review your suggestion in detail, but after briefly >> scanning through it I believe that you are asking for something that is >> substantially different in concept. If so, then I would suggest that you >> submit a separate enhancement suggestion, because I wouldn't want your >> proposal to be confused with or complicate the implementation of mine, or >> vice-versa. >> >> Hope this helps. >> Kevin M. >> >> from: kevinmyers(a)austin.rr.com >> name: Kevin A. Myers >> company: Myers Engineering >> product: BASE >> os: none >> suggestion text: PROC TRANSPOSE should be enhanced to support transposing >> additional variable attributes including format, >> informat, type, and length. >> >> The following new STATEMENTS should be added. These are similar to the >> existing idlabel statement, but provide input variable >> names for the other variable attributes: >> >> idformat statement - variable name to provide formats >> idinformat statement - variable name to provide informats >> idtype statement - variable name to provide variable types >> idlength statement - variable name to provide variable lengths >> >> The PROC TRANSPOSE statement should be enhanced to add the following >> additional OPTIONS, which all perform similar functions to the >> existing label option, but for the other variable attributes: >> >> format - output variable name for formats (default _FORMAT_) >> informat - output variable name for informats (default _INFORMAT_) >> type - output variable name for types (default _TYPE_) >> length - output variable name for lengths (default _LENGTH_) >> >> Providing the new statements and options listed above would allow for >> simple, complete transformation of a data set from >> "long-skinny" to "short-wide" representations AND BACK AGAIN, or >> vice-versa, >> WITHOUT LOSS OF ANY FORMAT, INFORMAT, TYPE, and LENGTH >> information. It would also greatly simplify >> input and output of data from/to "self-defining" external file formats >> where >> the format, informat, type, and/or length attributes >> are included within the external data file. >> >> Availability of the proposed new statements and options would allow >> simple >> loops based on "long-skinny" data representation >> to be used for reading/writing data in self-defining formats, >> followed/preceded by proc transpose steps to convert the data to/from a >> more >> typical "short-wide" representation. >> >> Code to implement the proposed statements and options should be >> relatively >> easy to develop by simply duplicating the existing >> support for IDLABEL/LABEL= and making relatively minor modifications as >> needed for each of the other specific attributes (format, >> informat, type, length). >> >> >> ----- Original Message ----- >> From: "Ted Clay" <tclay(a)ASHLANDHOME.NET> >> To: <SAS-L(a)LISTSERV.UGA.EDU> >> Sent: Thursday, January 21, 2010 09:33 >> Subject: SASware ballot #18 Proc Transpose preserving variable attributes >> >> >>> SASware ballot #18 refers to Proc Transpose preserving variable >>> attributes. >>> >>> I'm familiar with the problem and frequently use a macro to solve it, >>> but >>> would be delighted to see this feature migrate into Base SAS. >>> >>> >>> >>> I am soliciting input from SAS-L in the hopes that this change will be >>> implemented in the most useful way. >>> >>> Posting your comments may have some influence on a future enhancement of >>> Proc Transpose. >>> >>> >>> >>> First off, I think this requires limiting the output to one observation >>> per >>> BY-group. I can't see any way around that. >>> >>> >>> >>> Assuming that is true, one idea for implementing the change to Proc >>> Transpose would be to add a "ONEOUT" option on the proc statement. >>> >>> >>> >>> Example: >>> >>> Data testing; >>> >>> Input Group N C $ D mmddyy10.; >>> >>> format d mmddyy10.; >>> >>> label N='Numeric var' >>> >>> C='Text' >>> >>> D='Birthday'; >>> >>> Datalines; >>> >>> 1 57 A 01/15/2010 >>> >>> 1 84 B 02/15/2010 >>> >>> 2 33 B 03/15/2010 >>> >>> ; >>> >>> Proc transpose data=testing out=trans OUTOUT; >>> >>> By group; >>> >>> Var N D; >>> >>> Run; >>> >>> >>> >>> Proc print data=trans; >>> >>> Would produce: >>> >>> Obs Group N1 D1 N2 D2 >>> >>> >>> >>> 1 1 57 01/15/2010 84 02/15/2010 >>> >>> 2 2 33 03/15/2010 >>> >>> >>> >>> With the ID statement: >>> >>> Proc transpose data=testing out=trans OUTOUT; >>> >>> By group; >>> >>> Var N D; >>> >>> ID C; >>> >>> Run; >>> >>> >>> >>> Would produce: >>> >>> Obs Group NA DA NB DB >>> >>> >>> >>> 1 1 57 01/15/2010 84 02/15/2010 >>> >>> 2 2 33 03/15/2010 >>> >>> >>> >>> Other possible options come up in naming the output variables: (1) ID >>> values >>> before names (2) a separator between ID values and names, and (3) >>> whether >>> and how to modify labels. >>> >> >> >> > > >
From: "Data _null_;" on 23 Jan 2010 08:20 You boys now see why TRANSPOSE does not try to preserve information that is no longer relevant. Please don't mess with TRANSPOSE. It ain't broke and don't need fixin'. On 1/23/10, Kevin Myers <KevinMyers(a)austin.rr.com> wrote: > Right Ted, these are the types of problems that my proposed enhancement is > intended to solve. There is no way to prevent conversion to character on > the first transpose, because values for multiple columns with differing > types are being forced into a single output column. But upon transposing > the data a second time, the proposed enhancements would generally allow the > original variables types and other attributes to be restored. > > However, your examples raise a couple of potential problems that I failed to > account for previously: 1) What if there is no one-to-one correspondence > between the formatted and informatted values, or 2) what it there is a > format but no corresponding informat or vice-versa? In these cases, my > proposal as it currently stands would not produce the desired result, > because there would be no way to convert the formatted values back into the > original raw data values. This problem suggests the need for two additional > options on the PROC FORMAT statement in order to completely implement the > desired behavior: NOFORMAT and NOINFORMAT. NOFORMAT would be used to force > default formats when outputting values and NOINFORMAT would be used to force > default informats when reading values. Note that these options would only > prevent non-default formats and informats from being actively *used* for > converting values during processing, they would NOT prevent the format and > informat attribute values from being written to or read from variables > identified by using the proposed FORMAT= and INFORMAT= options or the > IDFORMAT and IDINFORMAT statements. > > I will add this discussion to my proposal. > > s/KAM > > > ----- Original Message ----- > From: "Ted Clay" <tclay(a)ashlandhome.net> > To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM>; <SAS-L(a)LISTSERV.UGA.EDU> > Sent: Friday, January 22, 2010 23:57 > Subject: RE: SASware ballot #18 Proc Transpose preserving variable > attributes > > > > Hi Kevin, > > > > The default behavior of Proc Transpose is making it difficult to achieve > > the > > preservation of variable attributes. The problem is with the values > > themselves. > > > > Take a look at this: > > > > proc format; > > value nfmt 1='One%' 2='Two$two' 3='%three'; > > value $cfmt 'A'='AOne%' 'B'='BTwo$two' 'C'='%three#C'; > > run; > > > > Data testing; > > Input Group $ N C $ D mmddyy10.; > > label N='Numeric var' > > C='Text' > > D='Birthday'; > > format n nfmt. c $cfmt. D mmddyy10.; > > Datalines; > > G1 1 A 01/15/2010 > > ; > > proc transpose data=testing out=Tran; > > var N C D; > > run; > > > > proc print data=tran noobs; > > run; > > > > PRODUCES: > > > > _NAME_ _LABEL_ COL1 > > > > N Numeric var One% > > C Text A > > D Birthday 01/15/2010 > > > > The problem is that the numeric variable N has been converted to character > > using the format we assigned to it. The original numeric value is lost. > > If > > we prevent this by removing the format, we lose the format attribute. Is > > this Catch-22.0? > > > > Ted > > > > -----Original Message----- > > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] > On Behalf Of Kevin > > Myers > > Sent: Thursday, January 21, 2010 1:30 PM > > To: SAS-L(a)LISTSERV.UGA.EDU > > Subject: Re: SASware ballot #18 Proc Transpose preserving variable > > attributes > > > > Hi Ted, > > > > The proposed SUGI ballot item enhancements to PROC TRANSPOSE could > > potentially help with your process to some degree, even though > > fundamentally > > my proposal only deals with variable attributes, whereas your needs > > involve > > an actual rearrangement of the data into different observations and > > variables. > > > > If you first transposed your data into "long-skinny" form using a BY > > statement, then provided a distinct _NAME_ variable for each individual > > observation in each BY group of the long-skinny dataset using logic of > > your > > own, then transposed that result back into short-wide form once again > > using > > a BY statement, then you could achieve roughly what you are looking for, > > and > > perhaps this is what you have already tried. > > > > The contribution that my proposal could make to this process is that you > > would be able to use the new PROC TRANSPOSE statements and options to > > force > > the resulting variables in the final data set (after transposing twice) to > > have all of the same types, lengths, formats, and informats as they did in > > the original data set, rather than all being changed to character > > variables > > with none of the other original attributes. If maintaining the original > > variable types and attributes is important, then my proposal could > > substantially simplify your process. > > > > s/KAM > > > > P.S. - Hope you don't mind that I copied this reply back to the list. I > > thought others might also be interested. > > > > > > ----- Original Message ----- > > From: "Ted Clay" <tclay(a)ashlandhome.net> > > To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM> > > Sent: Thursday, January 21, 2010 13:50 > > Subject: RE: SASware ballot #18 Proc Transpose preserving variable > > attributes > > > > > > > > > Hi Kevin, > > > Thanks for filling the gaps. Yes, what I am describing creates a > > > differently > > > shaped output data set, so is not equivalent to your proposal. Let any > > > confusion on that point end here. > > > > > > The ability to create and work with a "self-defining" skinny data set > > > would > > > be great. > > > > > > The situation I run into all the time (say, weekly) is where the data is > > > in > > > groups, and I have a mixture of variable types: numeric, character, > > > numeric > > > date. I want to get all the data in a group onto one observation. I > > > sometimes have an ID variable, or at other times just want numeric > > > suffixes > > > (1,2,3..). > > > Would your proposal make that easier to do than it is now? My hunch is > > > yes, > > > but it would not be a slam-dunk: > > > Proc transpose, with the additional options you propose on the Proc > > > statement. > > > Data step to calculate new variable names, and possibly labels. > > > (Calculate > > > numeric suffixes?) > > > Proc transpose, with the additional ID statements you propose. > > > > > > I had a typo on my example -- I meant to type "ONEOUT" not "OUTOUT". I > > > guess my dog "Spot" was tugging on my pants-leg. > > > Proc transpose data=testing out=trans ONEOUT; > > > By group; > > > Var N D; > > > ID C; > > > Run; > > > > > > Ted > > > > > > > > > > > > > > > -----Original Message----- > > > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] > On Behalf Of > > > Kevin > > > Myers > > > Sent: Thursday, January 21, 2010 9:27 AM > > > To: SAS-L(a)LISTSERV.UGA.EDU > > > Subject: Re: SASware ballot #18 Proc Transpose preserving variable > > > attributes > > > > > > Ted and all - > > > > > > The current item on the SASware ballot was put on there largely based on > > > my > > > input. I provided a detailed description of the suggested enhancements > > > via > > > SAS Institute's online Tracks system: > > > > > > 7610240654 - Base SAS: Additional options and statements for PROC > > > TRANSPOSE > > > to support transposing additional variable attributes, including format, > > > informat, type, and length. > > > > > > I don't know whether you will be able to view that, but am short on time > > > at > > > the moment, so below is an excerpt from my original submission. Please > > > note > > > that my proposal does NOT require any changes to existing PROC TRANSPOSE > > > behavior, but rather adds support for additional options and statements > > > to > > > support handling additional variable attributes analogous to the > existing > > > support for variable lablels. > > > > > > I haven't had time to review your suggestion in detail, but after > briefly > > > scanning through it I believe that you are asking for something that is > > > substantially different in concept. If so, then I would suggest that > you > > > submit a separate enhancement suggestion, because I wouldn't want your > > > proposal to be confused with or complicate the implementation of mine, > or > > > vice-versa. > > > > > > Hope this helps. > > > Kevin M. > > > > > > from: kevinmyers(a)austin.rr.com > > > name: Kevin A. Myers > > > company: Myers Engineering > > > product: BASE > > > os: none > > > suggestion text: PROC TRANSPOSE should be enhanced to support > transposing > > > additional variable attributes including format, > > > informat, type, and length. > > > > > > The following new STATEMENTS should be added. These are similar to the > > > existing idlabel statement, but provide input variable > > > names for the other variable attributes: > > > > > > idformat statement - variable name to provide formats > > > idinformat statement - variable name to provide informats > > > idtype statement - variable name to provide variable types > > > idlength statement - variable name to provide variable lengths > > > > > > The PROC TRANSPOSE statement should be enhanced to add the following > > > additional OPTIONS, which all perform similar functions to the > > > existing label option, but for the other variable attributes: > > > > > > format - output variable name for formats (default _FORMAT_) > > > informat - output variable name for informats (default _INFORMAT_) > > > type - output variable name for types (default _TYPE_) > > > length - output variable name for lengths (default _LENGTH_) > > > > > > Providing the new statements and options listed above would allow for > > > simple, complete transformation of a data set from > > > "long-skinny" to "short-wide" representations AND BACK AGAIN, or > > > vice-versa, > > > WITHOUT LOSS OF ANY FORMAT, INFORMAT, TYPE, and LENGTH > > > information. It would also greatly simplify > > > input and output of data from/to "self-defining" external file formats > > > where > > > the format, informat, type, and/or length attributes > > > are included within the external data file. > > > > > > Availability of the proposed new statements and options would allow > > > simple > > > loops based on "long-skinny" data representation > > > to be used for reading/writing data in self-defining formats, > > > followed/preceded by proc transpose steps to convert the data to/from a > > > more > > > typical "short-wide" representation. > > > > > > Code to implement the proposed statements and options should be > > > relatively > > > easy to develop by simply duplicating the existing > > > support for IDLABEL/LABEL= and making relatively minor modifications as > > > needed for each of the other specific attributes (format, > > > informat, type, length). > > > > > > > > > ----- Original Message ----- > > > From: "Ted Clay" <tclay(a)ASHLANDHOME.NET> > > > To: <SAS-L(a)LISTSERV.UGA.EDU> > > > Sent: Thursday, January 21, 2010 09:33 > > > Subject: SASware ballot #18 Proc Transpose preserving variable > attributes > > > > > > > > > > > > > SASware ballot #18 refers to Proc Transpose preserving variable > > > > attributes. > > > > > > > > I'm familiar with the problem and frequently use a macro to solve it, > > > > but > > > > would be delighted to see this feature migrate into Base SAS. > > > > > > > > > > > > > > > > I am soliciting input from SAS-L in the hopes that this change will be > > > > implemented in the most useful way. > > > > > > > > Posting your comments may have some influence on a future enhancement > of > > > > Proc Transpose. > > > > > > > > > > > > > > > > First off, I think this requires limiting the output to one > observation > > > > per > > > > BY-group. I can't see any way around that. > > > > > > > > > > > > > > > > Assuming that is true, one idea for implementing the change to Proc > > > > Transpose would be to add a "ONEOUT" option on the proc statement. > > > > > > > > > > > > > > > > Example: > > > > > > > > Data testing; > > > > > > > > Input Group N C $ D mmddyy10.; > > > > > > > > format d mmddyy10.; > > > > > > > > label N='Numeric var' > > > > > > > > C='Text' > > > > > > > > D='Birthday'; > > > > > > > > Datalines; > > > > > > > > 1 57 A 01/15/2010 > > > > > > > > 1 84 B 02/15/2010 > > > > > > > > 2 33 B 03/15/2010 > > > > > > > > ; > > > > > > > > Proc transpose data=testing out=trans OUTOUT; > > > > > > > > By group; > > > > > > > > Var N D; > > > > > > > > Run; > > > > > > > > > > > > > > > > Proc print data=trans; > > > > > > > > Would produce: > > > > > > > > Obs Group N1 D1 N2 D2 > > > > > > > > > > > > > > > > 1 1 57 01/15/2010 84 02/15/2010 > > > > > > > > 2 2 33 03/15/2010 > > > > > > > > > > > > > > > > With the ID statement: > > > > > > > > Proc transpose data=testing out=trans OUTOUT; > > > > > > > > By group; > > > > > > > > Var N D; > > > > > > > > ID C; > > > > > > > > Run; > > > > > > > > > > > > > > > > Would produce: > > > > > > > > Obs Group NA DA NB DB > > > > > > > > > > > > > > > > 1 1 57 01/15/2010 84 02/15/2010 > > > > > > > > 2 2 33 03/15/2010 > > > > > > > > > > > > > > > > Other possible options come up in naming the output variables: (1) ID > > > > values > > > > before names (2) a separator between ID values and names, and (3) > > > > whether > > > > and how to modify labels. > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
From: Kevin Myers on 23 Jan 2010 11:33 Sorry Null, but I strongly disagree with you on this one. I have long been aware of why TRANSPOSE doesn't currently try to preserve certain information, but IMNSHO that doesn't mean that it shouldn't be able to. If you don't use the proposed enhancements, then TRANSPOSE processing won't change one single bit. So just don't use 'em, and these changes won't affect you whatsoever. But if you do need the proposed enhancements, then they will save you a bunch of code, greatly improve processing speed, and also improve the size of datasets that can be handled compared to complex macro based solutions or overly simplistic array based alternatives. In the end I may not be able to convince you on this one, because obviously you don't believe that you have ever needed this capability, whereas I have run up against this need countless times over the past 30 years. I guess that each of us along with all of the other SUGI voters will just cast our ballots according to our own experiences and needs and see what happens. s/KAM ----- Original Message ----- From: "Data _null_;" <iebupdte(a)gmail.com> To: "Kevin Myers" <KevinMyers(a)austin.rr.com> Cc: <SAS-L(a)listserv.uga.edu> Sent: Saturday, January 23, 2010 07:20 Subject: Re: SASware ballot #18 Proc Transpose preserving variable attributes > You boys now see why TRANSPOSE does not try to preserve information > that is no longer relevant. Please don't mess with TRANSPOSE. It > ain't broke and don't need fixin'. > > On 1/23/10, Kevin Myers <KevinMyers(a)austin.rr.com> wrote: >> Right Ted, these are the types of problems that my proposed enhancement >> is >> intended to solve. There is no way to prevent conversion to character on >> the first transpose, because values for multiple columns with differing >> types are being forced into a single output column. But upon transposing >> the data a second time, the proposed enhancements would generally allow >> the >> original variables types and other attributes to be restored. >> >> However, your examples raise a couple of potential problems that I failed >> to >> account for previously: 1) What if there is no one-to-one correspondence >> between the formatted and informatted values, or 2) what it there is a >> format but no corresponding informat or vice-versa? In these cases, my >> proposal as it currently stands would not produce the desired result, >> because there would be no way to convert the formatted values back into >> the >> original raw data values. This problem suggests the need for two >> additional >> options on the PROC FORMAT statement in order to completely implement the >> desired behavior: NOFORMAT and NOINFORMAT. NOFORMAT would be used to >> force >> default formats when outputting values and NOINFORMAT would be used to >> force >> default informats when reading values. Note that these options would >> only >> prevent non-default formats and informats from being actively *used* for >> converting values during processing, they would NOT prevent the format >> and >> informat attribute values from being written to or read from variables >> identified by using the proposed FORMAT= and INFORMAT= options or the >> IDFORMAT and IDINFORMAT statements. >> >> I will add this discussion to my proposal. >> >> s/KAM >> >> >> ----- Original Message ----- >> From: "Ted Clay" <tclay(a)ashlandhome.net> >> To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM>; <SAS-L(a)LISTSERV.UGA.EDU> >> Sent: Friday, January 22, 2010 23:57 >> Subject: RE: SASware ballot #18 Proc Transpose preserving variable >> attributes >> >> >> > Hi Kevin, >> > >> > The default behavior of Proc Transpose is making it difficult to >> > achieve >> > the >> > preservation of variable attributes. The problem is with the values >> > themselves. >> > >> > Take a look at this: >> > >> > proc format; >> > value nfmt 1='One%' 2='Two$two' 3='%three'; >> > value $cfmt 'A'='AOne%' 'B'='BTwo$two' 'C'='%three#C'; >> > run; >> > >> > Data testing; >> > Input Group $ N C $ D mmddyy10.; >> > label N='Numeric var' >> > C='Text' >> > D='Birthday'; >> > format n nfmt. c $cfmt. D mmddyy10.; >> > Datalines; >> > G1 1 A 01/15/2010 >> > ; >> > proc transpose data=testing out=Tran; >> > var N C D; >> > run; >> > >> > proc print data=tran noobs; >> > run; >> > >> > PRODUCES: >> > >> > _NAME_ _LABEL_ COL1 >> > >> > N Numeric var One% >> > C Text A >> > D Birthday 01/15/2010 >> > >> > The problem is that the numeric variable N has been converted to >> > character >> > using the format we assigned to it. The original numeric value is >> > lost. >> > If >> > we prevent this by removing the format, we lose the format attribute. >> > Is >> > this Catch-22.0? >> > >> > Ted >> > >> > -----Original Message----- >> > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] >> On Behalf Of Kevin >> > Myers >> > Sent: Thursday, January 21, 2010 1:30 PM >> > To: SAS-L(a)LISTSERV.UGA.EDU >> > Subject: Re: SASware ballot #18 Proc Transpose preserving variable >> > attributes >> > >> > Hi Ted, >> > >> > The proposed SUGI ballot item enhancements to PROC TRANSPOSE could >> > potentially help with your process to some degree, even though >> > fundamentally >> > my proposal only deals with variable attributes, whereas your needs >> > involve >> > an actual rearrangement of the data into different observations and >> > variables. >> > >> > If you first transposed your data into "long-skinny" form using a BY >> > statement, then provided a distinct _NAME_ variable for each individual >> > observation in each BY group of the long-skinny dataset using logic of >> > your >> > own, then transposed that result back into short-wide form once again >> > using >> > a BY statement, then you could achieve roughly what you are looking >> > for, >> > and >> > perhaps this is what you have already tried. >> > >> > The contribution that my proposal could make to this process is that >> > you >> > would be able to use the new PROC TRANSPOSE statements and options to >> > force >> > the resulting variables in the final data set (after transposing twice) >> > to >> > have all of the same types, lengths, formats, and informats as they did >> > in >> > the original data set, rather than all being changed to character >> > variables >> > with none of the other original attributes. If maintaining the >> > original >> > variable types and attributes is important, then my proposal could >> > substantially simplify your process. >> > >> > s/KAM >> > >> > P.S. - Hope you don't mind that I copied this reply back to the list. >> > I >> > thought others might also be interested. >> > >> > >> > ----- Original Message ----- >> > From: "Ted Clay" <tclay(a)ashlandhome.net> >> > To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM> >> > Sent: Thursday, January 21, 2010 13:50 >> > Subject: RE: SASware ballot #18 Proc Transpose preserving variable >> > attributes >> > >> > >> > >> > > Hi Kevin, >> > > Thanks for filling the gaps. Yes, what I am describing creates a >> > > differently >> > > shaped output data set, so is not equivalent to your proposal. Let >> > > any >> > > confusion on that point end here. >> > > >> > > The ability to create and work with a "self-defining" skinny data set >> > > would >> > > be great. >> > > >> > > The situation I run into all the time (say, weekly) is where the data >> > > is >> > > in >> > > groups, and I have a mixture of variable types: numeric, character, >> > > numeric >> > > date. I want to get all the data in a group onto one observation. I >> > > sometimes have an ID variable, or at other times just want numeric >> > > suffixes >> > > (1,2,3..). >> > > Would your proposal make that easier to do than it is now? My hunch >> > > is >> > > yes, >> > > but it would not be a slam-dunk: >> > > Proc transpose, with the additional options you propose on the Proc >> > > statement. >> > > Data step to calculate new variable names, and possibly labels. >> > > (Calculate >> > > numeric suffixes?) >> > > Proc transpose, with the additional ID statements you propose. >> > > >> > > I had a typo on my example -- I meant to type "ONEOUT" not "OUTOUT". >> > > I >> > > guess my dog "Spot" was tugging on my pants-leg. >> > > Proc transpose data=testing out=trans ONEOUT; >> > > By group; >> > > Var N D; >> > > ID C; >> > > Run; >> > > >> > > Ted >> > > >> > > >> > > >> > > >> > > -----Original Message----- >> > > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] >> On Behalf Of >> > > Kevin >> > > Myers >> > > Sent: Thursday, January 21, 2010 9:27 AM >> > > To: SAS-L(a)LISTSERV.UGA.EDU >> > > Subject: Re: SASware ballot #18 Proc Transpose preserving variable >> > > attributes >> > > >> > > Ted and all - >> > > >> > > The current item on the SASware ballot was put on there largely based >> > > on >> > > my >> > > input. I provided a detailed description of the suggested >> > > enhancements >> > > via >> > > SAS Institute's online Tracks system: >> > > >> > > 7610240654 - Base SAS: Additional options and statements for PROC >> > > TRANSPOSE >> > > to support transposing additional variable attributes, including >> > > format, >> > > informat, type, and length. >> > > >> > > I don't know whether you will be able to view that, but am short on >> > > time >> > > at >> > > the moment, so below is an excerpt from my original submission. >> > > Please >> > > note >> > > that my proposal does NOT require any changes to existing PROC >> > > TRANSPOSE >> > > behavior, but rather adds support for additional options and >> > > statements >> > > to >> > > support handling additional variable attributes analogous to the >> existing >> > > support for variable lablels. >> > > >> > > I haven't had time to review your suggestion in detail, but after >> briefly >> > > scanning through it I believe that you are asking for something that >> > > is >> > > substantially different in concept. If so, then I would suggest that >> you >> > > submit a separate enhancement suggestion, because I wouldn't want >> > > your >> > > proposal to be confused with or complicate the implementation of >> > > mine, >> or >> > > vice-versa. >> > > >> > > Hope this helps. >> > > Kevin M. >> > > >> > > from: kevinmyers(a)austin.rr.com >> > > name: Kevin A. Myers >> > > company: Myers Engineering >> > > product: BASE >> > > os: none >> > > suggestion text: PROC TRANSPOSE should be enhanced to support >> transposing >> > > additional variable attributes including format, >> > > informat, type, and length. >> > > >> > > The following new STATEMENTS should be added. These are similar to >> > > the >> > > existing idlabel statement, but provide input variable >> > > names for the other variable attributes: >> > > >> > > idformat statement - variable name to provide formats >> > > idinformat statement - variable name to provide informats >> > > idtype statement - variable name to provide variable types >> > > idlength statement - variable name to provide variable lengths >> > > >> > > The PROC TRANSPOSE statement should be enhanced to add the following >> > > additional OPTIONS, which all perform similar functions to the >> > > existing label option, but for the other variable attributes: >> > > >> > > format - output variable name for formats (default _FORMAT_) >> > > informat - output variable name for informats (default _INFORMAT_) >> > > type - output variable name for types (default _TYPE_) >> > > length - output variable name for lengths (default _LENGTH_) >> > > >> > > Providing the new statements and options listed above would allow for >> > > simple, complete transformation of a data set from >> > > "long-skinny" to "short-wide" representations AND BACK AGAIN, or >> > > vice-versa, >> > > WITHOUT LOSS OF ANY FORMAT, INFORMAT, TYPE, and LENGTH >> > > information. It would also greatly simplify >> > > input and output of data from/to "self-defining" external file >> > > formats >> > > where >> > > the format, informat, type, and/or length attributes >> > > are included within the external data file. >> > > >> > > Availability of the proposed new statements and options would allow >> > > simple >> > > loops based on "long-skinny" data representation >> > > to be used for reading/writing data in self-defining formats, >> > > followed/preceded by proc transpose steps to convert the data to/from >> > > a >> > > more >> > > typical "short-wide" representation. >> > > >> > > Code to implement the proposed statements and options should be >> > > relatively >> > > easy to develop by simply duplicating the existing >> > > support for IDLABEL/LABEL= and making relatively minor modifications >> > > as >> > > needed for each of the other specific attributes (format, >> > > informat, type, length). >> > > >> > > >> > > ----- Original Message ----- >> > > From: "Ted Clay" <tclay(a)ASHLANDHOME.NET> >> > > To: <SAS-L(a)LISTSERV.UGA.EDU> >> > > Sent: Thursday, January 21, 2010 09:33 >> > > Subject: SASware ballot #18 Proc Transpose preserving variable >> attributes >> > > >> > > >> > > >> > > > SASware ballot #18 refers to Proc Transpose preserving variable >> > > > attributes. >> > > > >> > > > I'm familiar with the problem and frequently use a macro to solve >> > > > it, >> > > > but >> > > > would be delighted to see this feature migrate into Base SAS. >> > > > >> > > > >> > > > >> > > > I am soliciting input from SAS-L in the hopes that this change will >> > > > be >> > > > implemented in the most useful way. >> > > > >> > > > Posting your comments may have some influence on a future >> > > > enhancement >> of >> > > > Proc Transpose. >> > > > >> > > > >> > > > >> > > > First off, I think this requires limiting the output to one >> observation >> > > > per >> > > > BY-group. I can't see any way around that. >> > > > >> > > > >> > > > >> > > > Assuming that is true, one idea for implementing the change to Proc >> > > > Transpose would be to add a "ONEOUT" option on the proc statement. >> > > > >> > > > >> > > > >> > > > Example: >> > > > >> > > > Data testing; >> > > > >> > > > Input Group N C $ D mmddyy10.; >> > > > >> > > > format d mmddyy10.; >> > > > >> > > > label N='Numeric var' >> > > > >> > > > C='Text' >> > > > >> > > > D='Birthday'; >> > > > >> > > > Datalines; >> > > > >> > > > 1 57 A 01/15/2010 >> > > > >> > > > 1 84 B 02/15/2010 >> > > > >> > > > 2 33 B 03/15/2010 >> > > > >> > > > ; >> > > > >> > > > Proc transpose data=testing out=trans OUTOUT; >> > > > >> > > > By group; >> > > > >> > > > Var N D; >> > > > >> > > > Run; >> > > > >> > > > >> > > > >> > > > Proc print data=trans; >> > > > >> > > > Would produce: >> > > > >> > > > Obs Group N1 D1 N2 D2 >> > > > >> > > > >> > > > >> > > > 1 1 57 01/15/2010 84 02/15/2010 >> > > > >> > > > 2 2 33 03/15/2010 >> > > > >> > > > >> > > > >> > > > With the ID statement: >> > > > >> > > > Proc transpose data=testing out=trans OUTOUT; >> > > > >> > > > By group; >> > > > >> > > > Var N D; >> > > > >> > > > ID C; >> > > > >> > > > Run; >> > > > >> > > > >> > > > >> > > > Would produce: >> > > > >> > > > Obs Group NA DA NB DB >> > > > >> > > > >> > > > >> > > > 1 1 57 01/15/2010 84 02/15/2010 >> > > > >> > > > 2 2 33 03/15/2010 >> > > > >> > > > >> > > > >> > > > Other possible options come up in naming the output variables: (1) >> > > > ID >> > > > values >> > > > before names (2) a separator between ID values and names, and (3) >> > > > whether >> > > > and how to modify labels. >> > > > >> > > > >> > > >> > > >> > > >> > > >> > >> > >> > >> > >> >
From: Kevin Myers on 23 Jan 2010 11:45 P.S. - The additional attributes to be handled are no more and no less relevant than the variable name and label attributes that are already handled by comparable existing statements and options in PROC TRANSPOSE. ----- Original Message ----- From: "Kevin Myers" <KevinMyers(a)AUSTIN.RR.COM> To: <SAS-L(a)LISTSERV.UGA.EDU> Sent: Saturday, January 23, 2010 10:33 Subject: Re: SASware ballot #18 Proc Transpose preserving variable attributes > Sorry Null, but I strongly disagree with you on this one. I have long > been > aware of why TRANSPOSE doesn't currently try to preserve certain > information, but IMNSHO that doesn't mean that it shouldn't be able to. > > If you don't use the proposed enhancements, then TRANSPOSE processing > won't > change one single bit. So just don't use 'em, and these changes won't > affect you whatsoever. > > But if you do need the proposed enhancements, then they will save you a > bunch of code, greatly improve processing speed, and also improve the size > of datasets that can be handled compared to complex macro based solutions > or > overly simplistic array based alternatives. > > In the end I may not be able to convince you on this one, because > obviously > you don't believe that you have ever needed this capability, whereas I > have > run up against this need countless times over the past 30 years. I guess > that each of us along with all of the other SUGI voters will just cast our > ballots according to our own experiences and needs and see what happens. > > s/KAM > > > ----- Original Message ----- > From: "Data _null_;" <iebupdte(a)gmail.com> > To: "Kevin Myers" <KevinMyers(a)austin.rr.com> > Cc: <SAS-L(a)listserv.uga.edu> > Sent: Saturday, January 23, 2010 07:20 > Subject: Re: SASware ballot #18 Proc Transpose preserving variable > attributes > > >> You boys now see why TRANSPOSE does not try to preserve information >> that is no longer relevant. Please don't mess with TRANSPOSE. It >> ain't broke and don't need fixin'. >> >> On 1/23/10, Kevin Myers <KevinMyers(a)austin.rr.com> wrote: >>> Right Ted, these are the types of problems that my proposed enhancement >>> is >>> intended to solve. There is no way to prevent conversion to character >>> on >>> the first transpose, because values for multiple columns with differing >>> types are being forced into a single output column. But upon >>> transposing >>> the data a second time, the proposed enhancements would generally allow >>> the >>> original variables types and other attributes to be restored. >>> >>> However, your examples raise a couple of potential problems that I >>> failed >>> to >>> account for previously: 1) What if there is no one-to-one >>> correspondence >>> between the formatted and informatted values, or 2) what it there is a >>> format but no corresponding informat or vice-versa? In these cases, my >>> proposal as it currently stands would not produce the desired result, >>> because there would be no way to convert the formatted values back into >>> the >>> original raw data values. This problem suggests the need for two >>> additional >>> options on the PROC FORMAT statement in order to completely implement >>> the >>> desired behavior: NOFORMAT and NOINFORMAT. NOFORMAT would be used to >>> force >>> default formats when outputting values and NOINFORMAT would be used to >>> force >>> default informats when reading values. Note that these options would >>> only >>> prevent non-default formats and informats from being actively *used* for >>> converting values during processing, they would NOT prevent the format >>> and >>> informat attribute values from being written to or read from variables >>> identified by using the proposed FORMAT= and INFORMAT= options or the >>> IDFORMAT and IDINFORMAT statements. >>> >>> I will add this discussion to my proposal. >>> >>> s/KAM >>> >>> >>> ----- Original Message ----- >>> From: "Ted Clay" <tclay(a)ashlandhome.net> >>> To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM>; <SAS-L(a)LISTSERV.UGA.EDU> >>> Sent: Friday, January 22, 2010 23:57 >>> Subject: RE: SASware ballot #18 Proc Transpose preserving variable >>> attributes >>> >>> >>> > Hi Kevin, >>> > >>> > The default behavior of Proc Transpose is making it difficult to >>> > achieve >>> > the >>> > preservation of variable attributes. The problem is with the values >>> > themselves. >>> > >>> > Take a look at this: >>> > >>> > proc format; >>> > value nfmt 1='One%' 2='Two$two' 3='%three'; >>> > value $cfmt 'A'='AOne%' 'B'='BTwo$two' 'C'='%three#C'; >>> > run; >>> > >>> > Data testing; >>> > Input Group $ N C $ D mmddyy10.; >>> > label N='Numeric var' >>> > C='Text' >>> > D='Birthday'; >>> > format n nfmt. c $cfmt. D mmddyy10.; >>> > Datalines; >>> > G1 1 A 01/15/2010 >>> > ; >>> > proc transpose data=testing out=Tran; >>> > var N C D; >>> > run; >>> > >>> > proc print data=tran noobs; >>> > run; >>> > >>> > PRODUCES: >>> > >>> > _NAME_ _LABEL_ COL1 >>> > >>> > N Numeric var One% >>> > C Text A >>> > D Birthday 01/15/2010 >>> > >>> > The problem is that the numeric variable N has been converted to >>> > character >>> > using the format we assigned to it. The original numeric value is >>> > lost. >>> > If >>> > we prevent this by removing the format, we lose the format attribute. >>> > Is >>> > this Catch-22.0? >>> > >>> > Ted >>> > >>> > -----Original Message----- >>> > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] >>> On Behalf Of Kevin >>> > Myers >>> > Sent: Thursday, January 21, 2010 1:30 PM >>> > To: SAS-L(a)LISTSERV.UGA.EDU >>> > Subject: Re: SASware ballot #18 Proc Transpose preserving variable >>> > attributes >>> > >>> > Hi Ted, >>> > >>> > The proposed SUGI ballot item enhancements to PROC TRANSPOSE could >>> > potentially help with your process to some degree, even though >>> > fundamentally >>> > my proposal only deals with variable attributes, whereas your needs >>> > involve >>> > an actual rearrangement of the data into different observations and >>> > variables. >>> > >>> > If you first transposed your data into "long-skinny" form using a BY >>> > statement, then provided a distinct _NAME_ variable for each >>> > individual >>> > observation in each BY group of the long-skinny dataset using logic of >>> > your >>> > own, then transposed that result back into short-wide form once again >>> > using >>> > a BY statement, then you could achieve roughly what you are looking >>> > for, >>> > and >>> > perhaps this is what you have already tried. >>> > >>> > The contribution that my proposal could make to this process is that >>> > you >>> > would be able to use the new PROC TRANSPOSE statements and options to >>> > force >>> > the resulting variables in the final data set (after transposing >>> > twice) >>> > to >>> > have all of the same types, lengths, formats, and informats as they >>> > did >>> > in >>> > the original data set, rather than all being changed to character >>> > variables >>> > with none of the other original attributes. If maintaining the >>> > original >>> > variable types and attributes is important, then my proposal could >>> > substantially simplify your process. >>> > >>> > s/KAM >>> > >>> > P.S. - Hope you don't mind that I copied this reply back to the list. >>> > I >>> > thought others might also be interested. >>> > >>> > >>> > ----- Original Message ----- >>> > From: "Ted Clay" <tclay(a)ashlandhome.net> >>> > To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM> >>> > Sent: Thursday, January 21, 2010 13:50 >>> > Subject: RE: SASware ballot #18 Proc Transpose preserving variable >>> > attributes >>> > >>> > >>> > >>> > > Hi Kevin, >>> > > Thanks for filling the gaps. Yes, what I am describing creates a >>> > > differently >>> > > shaped output data set, so is not equivalent to your proposal. Let >>> > > any >>> > > confusion on that point end here. >>> > > >>> > > The ability to create and work with a "self-defining" skinny data >>> > > set >>> > > would >>> > > be great. >>> > > >>> > > The situation I run into all the time (say, weekly) is where the >>> > > data >>> > > is >>> > > in >>> > > groups, and I have a mixture of variable types: numeric, character, >>> > > numeric >>> > > date. I want to get all the data in a group onto one observation. I >>> > > sometimes have an ID variable, or at other times just want numeric >>> > > suffixes >>> > > (1,2,3..). >>> > > Would your proposal make that easier to do than it is now? My hunch >>> > > is >>> > > yes, >>> > > but it would not be a slam-dunk: >>> > > Proc transpose, with the additional options you propose on the Proc >>> > > statement. >>> > > Data step to calculate new variable names, and possibly labels. >>> > > (Calculate >>> > > numeric suffixes?) >>> > > Proc transpose, with the additional ID statements you propose. >>> > > >>> > > I had a typo on my example -- I meant to type "ONEOUT" not "OUTOUT". >>> > > I >>> > > guess my dog "Spot" was tugging on my pants-leg. >>> > > Proc transpose data=testing out=trans ONEOUT; >>> > > By group; >>> > > Var N D; >>> > > ID C; >>> > > Run; >>> > > >>> > > Ted >>> > > >>> > > >>> > > >>> > > >>> > > -----Original Message----- >>> > > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] >>> On Behalf Of >>> > > Kevin >>> > > Myers >>> > > Sent: Thursday, January 21, 2010 9:27 AM >>> > > To: SAS-L(a)LISTSERV.UGA.EDU >>> > > Subject: Re: SASware ballot #18 Proc Transpose preserving variable >>> > > attributes >>> > > >>> > > Ted and all - >>> > > >>> > > The current item on the SASware ballot was put on there largely >>> > > based >>> > > on >>> > > my >>> > > input. I provided a detailed description of the suggested >>> > > enhancements >>> > > via >>> > > SAS Institute's online Tracks system: >>> > > >>> > > 7610240654 - Base SAS: Additional options and statements for PROC >>> > > TRANSPOSE >>> > > to support transposing additional variable attributes, including >>> > > format, >>> > > informat, type, and length. >>> > > >>> > > I don't know whether you will be able to view that, but am short on >>> > > time >>> > > at >>> > > the moment, so below is an excerpt from my original submission. >>> > > Please >>> > > note >>> > > that my proposal does NOT require any changes to existing PROC >>> > > TRANSPOSE >>> > > behavior, but rather adds support for additional options and >>> > > statements >>> > > to >>> > > support handling additional variable attributes analogous to the >>> existing >>> > > support for variable lablels. >>> > > >>> > > I haven't had time to review your suggestion in detail, but after >>> briefly >>> > > scanning through it I believe that you are asking for something that >>> > > is >>> > > substantially different in concept. If so, then I would suggest >>> > > that >>> you >>> > > submit a separate enhancement suggestion, because I wouldn't want >>> > > your >>> > > proposal to be confused with or complicate the implementation of >>> > > mine, >>> or >>> > > vice-versa. >>> > > >>> > > Hope this helps. >>> > > Kevin M. >>> > > >>> > > from: kevinmyers(a)austin.rr.com >>> > > name: Kevin A. Myers >>> > > company: Myers Engineering >>> > > product: BASE >>> > > os: none >>> > > suggestion text: PROC TRANSPOSE should be enhanced to support >>> transposing >>> > > additional variable attributes including format, >>> > > informat, type, and length. >>> > > >>> > > The following new STATEMENTS should be added. These are similar to >>> > > the >>> > > existing idlabel statement, but provide input variable >>> > > names for the other variable attributes: >>> > > >>> > > idformat statement - variable name to provide formats >>> > > idinformat statement - variable name to provide informats >>> > > idtype statement - variable name to provide variable types >>> > > idlength statement - variable name to provide variable lengths >>> > > >>> > > The PROC TRANSPOSE statement should be enhanced to add the following >>> > > additional OPTIONS, which all perform similar functions to the >>> > > existing label option, but for the other variable attributes: >>> > > >>> > > format - output variable name for formats (default _FORMAT_) >>> > > informat - output variable name for informats (default _INFORMAT_) >>> > > type - output variable name for types (default _TYPE_) >>> > > length - output variable name for lengths (default _LENGTH_) >>> > > >>> > > Providing the new statements and options listed above would allow >>> > > for >>> > > simple, complete transformation of a data set from >>> > > "long-skinny" to "short-wide" representations AND BACK AGAIN, or >>> > > vice-versa, >>> > > WITHOUT LOSS OF ANY FORMAT, INFORMAT, TYPE, and LENGTH >>> > > information. It would also greatly simplify >>> > > input and output of data from/to "self-defining" external file >>> > > formats >>> > > where >>> > > the format, informat, type, and/or length attributes >>> > > are included within the external data file. >>> > > >>> > > Availability of the proposed new statements and options would allow >>> > > simple >>> > > loops based on "long-skinny" data representation >>> > > to be used for reading/writing data in self-defining formats, >>> > > followed/preceded by proc transpose steps to convert the data >>> > > to/from >>> > > a >>> > > more >>> > > typical "short-wide" representation. >>> > > >>> > > Code to implement the proposed statements and options should be >>> > > relatively >>> > > easy to develop by simply duplicating the existing >>> > > support for IDLABEL/LABEL= and making relatively minor modifications >>> > > as >>> > > needed for each of the other specific attributes (format, >>> > > informat, type, length). >>> > > >>> > > >>> > > ----- Original Message ----- >>> > > From: "Ted Clay" <tclay(a)ASHLANDHOME.NET> >>> > > To: <SAS-L(a)LISTSERV.UGA.EDU> >>> > > Sent: Thursday, January 21, 2010 09:33 >>> > > Subject: SASware ballot #18 Proc Transpose preserving variable >>> attributes >>> > > >>> > > >>> > > >>> > > > SASware ballot #18 refers to Proc Transpose preserving variable >>> > > > attributes. >>> > > > >>> > > > I'm familiar with the problem and frequently use a macro to solve >>> > > > it, >>> > > > but >>> > > > would be delighted to see this feature migrate into Base SAS. >>> > > > >>> > > > >>> > > > >>> > > > I am soliciting input from SAS-L in the hopes that this change >>> > > > will >>> > > > be >>> > > > implemented in the most useful way. >>> > > > >>> > > > Posting your comments may have some influence on a future >>> > > > enhancement >>> of >>> > > > Proc Transpose. >>> > > > >>> > > > >>> > > > >>> > > > First off, I think this requires limiting the output to one >>> observation >>> > > > per >>> > > > BY-group. I can't see any way around that. >>> > > > >>> > > > >>> > > > >>> > > > Assuming that is true, one idea for implementing the change to >>> > > > Proc >>> > > > Transpose would be to add a "ONEOUT" option on the proc statement. >>> > > > >>> > > > >>> > > > >>> > > > Example: >>> > > > >>> > > > Data testing; >>> > > > >>> > > > Input Group N C $ D mmddyy10.; >>> > > > >>> > > > format d mmddyy10.; >>> > > > >>> > > > label N='Numeric var' >>> > > > >>> > > > C='Text' >>> > > > >>> > > > D='Birthday'; >>> > > > >>> > > > Datalines; >>> > > > >>> > > > 1 57 A 01/15/2010 >>> > > > >>> > > > 1 84 B 02/15/2010 >>> > > > >>> > > > 2 33 B 03/15/2010 >>> > > > >>> > > > ; >>> > > > >>> > > > Proc transpose data=testing out=trans OUTOUT; >>> > > > >>> > > > By group; >>> > > > >>> > > > Var N D; >>> > > > >>> > > > Run; >>> > > > >>> > > > >>> > > > >>> > > > Proc print data=trans; >>> > > > >>> > > > Would produce: >>> > > > >>> > > > Obs Group N1 D1 N2 D2 >>> > > > >>> > > > >>> > > > >>> > > > 1 1 57 01/15/2010 84 02/15/2010 >>> > > > >>> > > > 2 2 33 03/15/2010 >>> > > > >>> > > > >>> > > > >>> > > > With the ID statement: >>> > > > >>> > > > Proc transpose data=testing out=trans OUTOUT; >>> > > > >>> > > > By group; >>> > > > >>> > > > Var N D; >>> > > > >>> > > > ID C; >>> > > > >>> > > > Run; >>> > > > >>> > > > >>> > > > >>> > > > Would produce: >>> > > > >>> > > > Obs Group NA DA NB DB >>> > > > >>> > > > >>> > > > >>> > > > 1 1 57 01/15/2010 84 02/15/2010 >>> > > > >>> > > > 2 2 33 03/15/2010 >>> > > > >>> > > > >>> > > > >>> > > > Other possible options come up in naming the output variables: (1) >>> > > > ID >>> > > > values >>> > > > before names (2) a separator between ID values and names, and (3) >>> > > > whether >>> > > > and how to modify labels. >>> > > > >>> > > > >>> > > >>> > > >>> > > >>> > > >>> > >>> > >>> > >>> > >>> >> >
From: Ted Clay on 23 Jan 2010 12:34
Yes, this would be necessary. In fact, this is what currently happens if a format is associated with a variable but is not available at the time of execution, with the NOFMTERR option turned on. In terms of default behavior, currently the output data set has _LABEL_ if and only if any VAR has a label. Would _FORMAT_, _INFORMAT_ be created under the same conditions, that is, if any VAR has one defined? What about _TYPE_? Every variable has a type. I guess I'm leading up to saying I don't think you want to be creating anything new unless the user specifies one of the added features. How about a single option to "turn on" all these features, with "NOFORMAT" and "NOINFORMAT" being the only behavior. Something like "FULLINFO". No other additional statements or options. If you say "FULLINFO" the proc would create or use the _FORMAT_, _INFORMAT_ and _TYPE_ variables as needed. (_TYPE_ only if not all numeric) Ted -----Original Message----- From: Kevin Myers [mailto:KevinMyers(a)austin.rr.com] Sent: Saturday, January 23, 2010 3:38 AM To: Ted Clay; SAS-L(a)LISTSERV.UGA.EDU Subject: Re: SASware ballot #18 Proc Transpose preserving variable attributes Right Ted, these are the types of problems that my proposed enhancement is intended to solve. There is no way to prevent conversion to character on the first transpose, because values for multiple columns with differing types are being forced into a single output column. But upon transposing the data a second time, the proposed enhancements would generally allow the original variables types and other attributes to be restored. However, your examples raise a couple of potential problems that I failed to account for previously: 1) What if there is no one-to-one correspondence between the formatted and informatted values, or 2) what it there is a format but no corresponding informat or vice-versa? In these cases, my proposal as it currently stands would not produce the desired result, because there would be no way to convert the formatted values back into the original raw data values. This problem suggests the need for two additional options on the PROC FORMAT statement in order to completely implement the desired behavior: NOFORMAT and NOINFORMAT. NOFORMAT would be used to force default formats when outputting values and NOINFORMAT would be used to force default informats when reading values. Note that these options would only prevent non-default formats and informats from being actively *used* for converting values during processing, they would NOT prevent the format and informat attribute values from being written to or read from variables identified by using the proposed FORMAT= and INFORMAT= options or the IDFORMAT and IDINFORMAT statements. I will add this discussion to my proposal. s/KAM ----- Original Message ----- From: "Ted Clay" <tclay(a)ashlandhome.net> To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM>; <SAS-L(a)LISTSERV.UGA.EDU> Sent: Friday, January 22, 2010 23:57 Subject: RE: SASware ballot #18 Proc Transpose preserving variable attributes > Hi Kevin, > > The default behavior of Proc Transpose is making it difficult to achieve > the > preservation of variable attributes. The problem is with the values > themselves. > > Take a look at this: > > proc format; > value nfmt 1='One%' 2='Two$two' 3='%three'; > value $cfmt 'A'='AOne%' 'B'='BTwo$two' 'C'='%three#C'; > run; > > Data testing; > Input Group $ N C $ D mmddyy10.; > label N='Numeric var' > C='Text' > D='Birthday'; > format n nfmt. c $cfmt. D mmddyy10.; > Datalines; > G1 1 A 01/15/2010 > ; > proc transpose data=testing out=Tran; > var N C D; > run; > > proc print data=tran noobs; > run; > > PRODUCES: > > _NAME_ _LABEL_ COL1 > > N Numeric var One% > C Text A > D Birthday 01/15/2010 > > The problem is that the numeric variable N has been converted to character > using the format we assigned to it. The original numeric value is lost. > If > we prevent this by removing the format, we lose the format attribute. Is > this Catch-22.0? > > Ted > > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Kevin > Myers > Sent: Thursday, January 21, 2010 1:30 PM > To: SAS-L(a)LISTSERV.UGA.EDU > Subject: Re: SASware ballot #18 Proc Transpose preserving variable > attributes > > Hi Ted, > > The proposed SUGI ballot item enhancements to PROC TRANSPOSE could > potentially help with your process to some degree, even though > fundamentally > my proposal only deals with variable attributes, whereas your needs > involve > an actual rearrangement of the data into different observations and > variables. > > If you first transposed your data into "long-skinny" form using a BY > statement, then provided a distinct _NAME_ variable for each individual > observation in each BY group of the long-skinny dataset using logic of > your > own, then transposed that result back into short-wide form once again > using > a BY statement, then you could achieve roughly what you are looking for, > and > perhaps this is what you have already tried. > > The contribution that my proposal could make to this process is that you > would be able to use the new PROC TRANSPOSE statements and options to > force > the resulting variables in the final data set (after transposing twice) to > have all of the same types, lengths, formats, and informats as they did in > the original data set, rather than all being changed to character > variables > with none of the other original attributes. If maintaining the original > variable types and attributes is important, then my proposal could > substantially simplify your process. > > s/KAM > > P.S. - Hope you don't mind that I copied this reply back to the list. I > thought others might also be interested. > > > ----- Original Message ----- > From: "Ted Clay" <tclay(a)ashlandhome.net> > To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM> > Sent: Thursday, January 21, 2010 13:50 > Subject: RE: SASware ballot #18 Proc Transpose preserving variable > attributes > > >> Hi Kevin, >> Thanks for filling the gaps. Yes, what I am describing creates a >> differently >> shaped output data set, so is not equivalent to your proposal. Let any >> confusion on that point end here. >> >> The ability to create and work with a "self-defining" skinny data set >> would >> be great. >> >> The situation I run into all the time (say, weekly) is where the data is >> in >> groups, and I have a mixture of variable types: numeric, character, >> numeric >> date. I want to get all the data in a group onto one observation. I >> sometimes have an ID variable, or at other times just want numeric >> suffixes >> (1,2,3..). >> Would your proposal make that easier to do than it is now? My hunch is >> yes, >> but it would not be a slam-dunk: >> Proc transpose, with the additional options you propose on the Proc >> statement. >> Data step to calculate new variable names, and possibly labels. >> (Calculate >> numeric suffixes?) >> Proc transpose, with the additional ID statements you propose. >> >> I had a typo on my example -- I meant to type "ONEOUT" not "OUTOUT". I >> guess my dog "Spot" was tugging on my pants-leg. >> Proc transpose data=testing out=trans ONEOUT; >> By group; >> Var N D; >> ID C; >> Run; >> >> Ted >> >> >> >> >> -----Original Message----- >> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of >> Kevin >> Myers >> Sent: Thursday, January 21, 2010 9:27 AM >> To: SAS-L(a)LISTSERV.UGA.EDU >> Subject: Re: SASware ballot #18 Proc Transpose preserving variable >> attributes >> >> Ted and all - >> >> The current item on the SASware ballot was put on there largely based on >> my >> input. I provided a detailed description of the suggested enhancements >> via >> SAS Institute's online Tracks system: >> >> 7610240654 - Base SAS: Additional options and statements for PROC >> TRANSPOSE >> to support transposing additional variable attributes, including format, >> informat, type, and length. >> >> I don't know whether you will be able to view that, but am short on time >> at >> the moment, so below is an excerpt from my original submission. Please >> note >> that my proposal does NOT require any changes to existing PROC TRANSPOSE >> behavior, but rather adds support for additional options and statements >> to >> support handling additional variable attributes analogous to the existing >> support for variable lablels. >> >> I haven't had time to review your suggestion in detail, but after briefly >> scanning through it I believe that you are asking for something that is >> substantially different in concept. If so, then I would suggest that you >> submit a separate enhancement suggestion, because I wouldn't want your >> proposal to be confused with or complicate the implementation of mine, or >> vice-versa. >> >> Hope this helps. >> Kevin M. >> >> from: kevinmyers(a)austin.rr.com >> name: Kevin A. Myers >> company: Myers Engineering >> product: BASE >> os: none >> suggestion text: PROC TRANSPOSE should be enhanced to support transposing >> additional variable attributes including format, >> informat, type, and length. >> >> The following new STATEMENTS should be added. These are similar to the >> existing idlabel statement, but provide input variable >> names for the other variable attributes: >> >> idformat statement - variable name to provide formats >> idinformat statement - variable name to provide informats >> idtype statement - variable name to provide variable types >> idlength statement - variable name to provide variable lengths >> >> The PROC TRANSPOSE statement should be enhanced to add the following >> additional OPTIONS, which all perform similar functions to the >> existing label option, but for the other variable attributes: >> >> format - output variable name for formats (default _FORMAT_) >> informat - output variable name for informats (default _INFORMAT_) >> type - output variable name for types (default _TYPE_) >> length - output variable name for lengths (default _LENGTH_) >> >> Providing the new statements and options listed above would allow for >> simple, complete transformation of a data set from >> "long-skinny" to "short-wide" representations AND BACK AGAIN, or >> vice-versa, >> WITHOUT LOSS OF ANY FORMAT, INFORMAT, TYPE, and LENGTH >> information. It would also greatly simplify >> input and output of data from/to "self-defining" external file formats >> where >> the format, informat, type, and/or length attributes >> are included within the external data file. >> >> Availability of the proposed new statements and options would allow >> simple >> loops based on "long-skinny" data representation >> to be used for reading/writing data in self-defining formats, >> followed/preceded by proc transpose steps to convert the data to/from a >> more >> typical "short-wide" representation. >> >> Code to implement the proposed statements and options should be >> relatively >> easy to develop by simply duplicating the existing >> support for IDLABEL/LABEL= and making relatively minor modifications as >> needed for each of the other specific attributes (format, >> informat, type, length). >> >> >> ----- Original Message ----- >> From: "Ted Clay" <tclay(a)ASHLANDHOME.NET> >> To: <SAS-L(a)LISTSERV.UGA.EDU> >> Sent: Thursday, January 21, 2010 09:33 >> Subject: SASware ballot #18 Proc Transpose preserving variable attributes >> >> >>> SASware ballot #18 refers to Proc Transpose preserving variable >>> attributes. >>> >>> I'm familiar with the problem and frequently use a macro to solve it, >>> but >>> would be delighted to see this feature migrate into Base SAS. >>> >>> >>> >>> I am soliciting input from SAS-L in the hopes that this change will be >>> implemented in the most useful way. >>> >>> Posting your comments may have some influence on a future enhancement of >>> Proc Transpose. >>> >>> >>> >>> First off, I think this requires limiting the output to one observation >>> per >>> BY-group. I can't see any way around that. >>> >>> >>> >>> Assuming that is true, one idea for implementing the change to Proc >>> Transpose would be to add a "ONEOUT" option on the proc statement. >>> >>> >>> >>> Example: >>> >>> Data testing; >>> >>> Input Group N C $ D mmddyy10.; >>> >>> format d mmddyy10.; >>> >>> label N='Numeric var' >>> >>> C='Text' >>> >>> D='Birthday'; >>> >>> Datalines; >>> >>> 1 57 A 01/15/2010 >>> >>> 1 84 B 02/15/2010 >>> >>> 2 33 B 03/15/2010 >>> >>> ; >>> >>> Proc transpose data=testing out=trans OUTOUT; >>> >>> By group; >>> >>> Var N D; >>> >>> Run; >>> >>> >>> >>> Proc print data=trans; >>> >>> Would produce: >>> >>> Obs Group N1 D1 N2 D2 >>> >>> >>> >>> 1 1 57 01/15/2010 84 02/15/2010 >>> >>> 2 2 33 03/15/2010 >>> >>> >>> >>> With the ID statement: >>> >>> Proc transpose data=testing out=trans OUTOUT; >>> >>> By group; >>> >>> Var N D; >>> >>> ID C; >>> >>> Run; >>> >>> >>> >>> Would produce: >>> >>> Obs Group NA DA NB DB >>> >>> >>> >>> 1 1 57 01/15/2010 84 02/15/2010 >>> >>> 2 2 33 03/15/2010 >>> >>> >>> >>> Other possible options come up in naming the output variables: (1) ID >>> values >>> before names (2) a separator between ID values and names, and (3) >>> whether >>> and how to modify labels. >>> >> >> >> > > > |