From: Kevin Myers on 23 Jan 2010 12:55 According to my current proposal, NO new variables should be added to the output data set unless any of the corresponding TYPE, LENGTH, FORMAT, and INFORMAT options are used. If any of these options are used without providing a variable name, then the default variable names for those options would be used. You could add an option to turn on all of them at once with the default variable names as you have suggested, but I have NOT proposed anything along those lines. Feel free to submit a suggestion to that effect, if desired. ----- Original Message ----- From: "Ted Clay" <tclay(a)ashlandhome.net> To: "'Kevin Myers'" <KevinMyers(a)austin.rr.com>; <SAS-L(a)LISTSERV.UGA.EDU> Sent: Saturday, January 23, 2010 11:34 Subject: RE: SASware ballot #18 Proc Transpose preserving variable attributes > Yes, this would be necessary. In fact, this is what currently happens if > a > format is associated with a variable but is not available at the time of > execution, with the NOFMTERR option turned on. > > In terms of default behavior, currently the output data set has _LABEL_ if > and only if any VAR has a label. Would _FORMAT_, _INFORMAT_ be created > under the same conditions, that is, if any VAR has one defined? What > about > _TYPE_? Every variable has a type. I guess I'm leading up to saying I > don't think you want to be creating anything new unless the user specifies > one of the added features. How about a single option to "turn on" all > these > features, with "NOFORMAT" and "NOINFORMAT" being the only behavior. > Something like "FULLINFO". No other additional statements or options. If > you say "FULLINFO" the proc would create or use the _FORMAT_, _INFORMAT_ > and > _TYPE_ variables as needed. (_TYPE_ only if not all numeric) > > Ted > > > > > -----Original Message----- > From: Kevin Myers [mailto:KevinMyers(a)austin.rr.com] > Sent: Saturday, January 23, 2010 3:38 AM > To: Ted Clay; SAS-L(a)LISTSERV.UGA.EDU > Subject: Re: SASware ballot #18 Proc Transpose preserving variable > attributes > > Right Ted, these are the types of problems that my proposed enhancement is > intended to solve. There is no way to prevent conversion to character on > the first transpose, because values for multiple columns with differing > types are being forced into a single output column. But upon transposing > the data a second time, the proposed enhancements would generally allow > the > original variables types and other attributes to be restored. > > However, your examples raise a couple of potential problems that I failed > to > > account for previously: 1) What if there is no one-to-one correspondence > between the formatted and informatted values, or 2) what it there is a > format but no corresponding informat or vice-versa? In these cases, my > proposal as it currently stands would not produce the desired result, > because there would be no way to convert the formatted values back into > the > original raw data values. This problem suggests the need for two > additional > > options on the PROC FORMAT statement in order to completely implement the > desired behavior: NOFORMAT and NOINFORMAT. NOFORMAT would be used to > force > default formats when outputting values and NOINFORMAT would be used to > force > > default informats when reading values. Note that these options would only > prevent non-default formats and informats from being actively *used* for > converting values during processing, they would NOT prevent the format and > informat attribute values from being written to or read from variables > identified by using the proposed FORMAT= and INFORMAT= options or the > IDFORMAT and IDINFORMAT statements. > > I will add this discussion to my proposal. > > s/KAM > > > ----- Original Message ----- > From: "Ted Clay" <tclay(a)ashlandhome.net> > To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM>; <SAS-L(a)LISTSERV.UGA.EDU> > Sent: Friday, January 22, 2010 23:57 > Subject: RE: SASware ballot #18 Proc Transpose preserving variable > attributes > > >> Hi Kevin, >> >> The default behavior of Proc Transpose is making it difficult to achieve >> the >> preservation of variable attributes. The problem is with the values >> themselves. >> >> Take a look at this: >> >> proc format; >> value nfmt 1='One%' 2='Two$two' 3='%three'; >> value $cfmt 'A'='AOne%' 'B'='BTwo$two' 'C'='%three#C'; >> run; >> >> Data testing; >> Input Group $ N C $ D mmddyy10.; >> label N='Numeric var' >> C='Text' >> D='Birthday'; >> format n nfmt. c $cfmt. D mmddyy10.; >> Datalines; >> G1 1 A 01/15/2010 >> ; >> proc transpose data=testing out=Tran; >> var N C D; >> run; >> >> proc print data=tran noobs; >> run; >> >> PRODUCES: >> >> _NAME_ _LABEL_ COL1 >> >> N Numeric var One% >> C Text A >> D Birthday 01/15/2010 >> >> The problem is that the numeric variable N has been converted to >> character >> using the format we assigned to it. The original numeric value is lost. >> If >> we prevent this by removing the format, we lose the format attribute. Is >> this Catch-22.0? >> >> Ted >> >> -----Original Message----- >> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of >> Kevin >> Myers >> Sent: Thursday, January 21, 2010 1:30 PM >> To: SAS-L(a)LISTSERV.UGA.EDU >> Subject: Re: SASware ballot #18 Proc Transpose preserving variable >> attributes >> >> Hi Ted, >> >> The proposed SUGI ballot item enhancements to PROC TRANSPOSE could >> potentially help with your process to some degree, even though >> fundamentally >> my proposal only deals with variable attributes, whereas your needs >> involve >> an actual rearrangement of the data into different observations and >> variables. >> >> If you first transposed your data into "long-skinny" form using a BY >> statement, then provided a distinct _NAME_ variable for each individual >> observation in each BY group of the long-skinny dataset using logic of >> your >> own, then transposed that result back into short-wide form once again >> using >> a BY statement, then you could achieve roughly what you are looking for, >> and >> perhaps this is what you have already tried. >> >> The contribution that my proposal could make to this process is that you >> would be able to use the new PROC TRANSPOSE statements and options to >> force >> the resulting variables in the final data set (after transposing twice) >> to >> have all of the same types, lengths, formats, and informats as they did >> in >> the original data set, rather than all being changed to character >> variables >> with none of the other original attributes. If maintaining the original >> variable types and attributes is important, then my proposal could >> substantially simplify your process. >> >> s/KAM >> >> P.S. - Hope you don't mind that I copied this reply back to the list. I >> thought others might also be interested. >> >> >> ----- Original Message ----- >> From: "Ted Clay" <tclay(a)ashlandhome.net> >> To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM> >> Sent: Thursday, January 21, 2010 13:50 >> Subject: RE: SASware ballot #18 Proc Transpose preserving variable >> attributes >> >> >>> Hi Kevin, >>> Thanks for filling the gaps. Yes, what I am describing creates a >>> differently >>> shaped output data set, so is not equivalent to your proposal. Let any >>> confusion on that point end here. >>> >>> The ability to create and work with a "self-defining" skinny data set >>> would >>> be great. >>> >>> The situation I run into all the time (say, weekly) is where the data is >>> in >>> groups, and I have a mixture of variable types: numeric, character, >>> numeric >>> date. I want to get all the data in a group onto one observation. I >>> sometimes have an ID variable, or at other times just want numeric >>> suffixes >>> (1,2,3..). >>> Would your proposal make that easier to do than it is now? My hunch is >>> yes, >>> but it would not be a slam-dunk: >>> Proc transpose, with the additional options you propose on the Proc >>> statement. >>> Data step to calculate new variable names, and possibly labels. >>> (Calculate >>> numeric suffixes?) >>> Proc transpose, with the additional ID statements you propose. >>> >>> I had a typo on my example -- I meant to type "ONEOUT" not "OUTOUT". I >>> guess my dog "Spot" was tugging on my pants-leg. >>> Proc transpose data=testing out=trans ONEOUT; >>> By group; >>> Var N D; >>> ID C; >>> Run; >>> >>> Ted >>> >>> >>> >>> >>> -----Original Message----- >>> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of >>> Kevin >>> Myers >>> Sent: Thursday, January 21, 2010 9:27 AM >>> To: SAS-L(a)LISTSERV.UGA.EDU >>> Subject: Re: SASware ballot #18 Proc Transpose preserving variable >>> attributes >>> >>> Ted and all - >>> >>> The current item on the SASware ballot was put on there largely based on >>> my >>> input. I provided a detailed description of the suggested enhancements >>> via >>> SAS Institute's online Tracks system: >>> >>> 7610240654 - Base SAS: Additional options and statements for PROC >>> TRANSPOSE >>> to support transposing additional variable attributes, including format, >>> informat, type, and length. >>> >>> I don't know whether you will be able to view that, but am short on time >>> at >>> the moment, so below is an excerpt from my original submission. Please >>> note >>> that my proposal does NOT require any changes to existing PROC TRANSPOSE >>> behavior, but rather adds support for additional options and statements >>> to >>> support handling additional variable attributes analogous to the >>> existing >>> support for variable lablels. >>> >>> I haven't had time to review your suggestion in detail, but after >>> briefly >>> scanning through it I believe that you are asking for something that is >>> substantially different in concept. If so, then I would suggest that >>> you >>> submit a separate enhancement suggestion, because I wouldn't want your >>> proposal to be confused with or complicate the implementation of mine, >>> or >>> vice-versa. >>> >>> Hope this helps. >>> Kevin M. >>> >>> from: kevinmyers(a)austin.rr.com >>> name: Kevin A. Myers >>> company: Myers Engineering >>> product: BASE >>> os: none >>> suggestion text: PROC TRANSPOSE should be enhanced to support >>> transposing >>> additional variable attributes including format, >>> informat, type, and length. >>> >>> The following new STATEMENTS should be added. These are similar to the >>> existing idlabel statement, but provide input variable >>> names for the other variable attributes: >>> >>> idformat statement - variable name to provide formats >>> idinformat statement - variable name to provide informats >>> idtype statement - variable name to provide variable types >>> idlength statement - variable name to provide variable lengths >>> >>> The PROC TRANSPOSE statement should be enhanced to add the following >>> additional OPTIONS, which all perform similar functions to the >>> existing label option, but for the other variable attributes: >>> >>> format - output variable name for formats (default _FORMAT_) >>> informat - output variable name for informats (default _INFORMAT_) >>> type - output variable name for types (default _TYPE_) >>> length - output variable name for lengths (default _LENGTH_) >>> >>> Providing the new statements and options listed above would allow for >>> simple, complete transformation of a data set from >>> "long-skinny" to "short-wide" representations AND BACK AGAIN, or >>> vice-versa, >>> WITHOUT LOSS OF ANY FORMAT, INFORMAT, TYPE, and LENGTH >>> information. It would also greatly simplify >>> input and output of data from/to "self-defining" external file formats >>> where >>> the format, informat, type, and/or length attributes >>> are included within the external data file. >>> >>> Availability of the proposed new statements and options would allow >>> simple >>> loops based on "long-skinny" data representation >>> to be used for reading/writing data in self-defining formats, >>> followed/preceded by proc transpose steps to convert the data to/from a >>> more >>> typical "short-wide" representation. >>> >>> Code to implement the proposed statements and options should be >>> relatively >>> easy to develop by simply duplicating the existing >>> support for IDLABEL/LABEL= and making relatively minor modifications as >>> needed for each of the other specific attributes (format, >>> informat, type, length). >>> >>> >>> ----- Original Message ----- >>> From: "Ted Clay" <tclay(a)ASHLANDHOME.NET> >>> To: <SAS-L(a)LISTSERV.UGA.EDU> >>> Sent: Thursday, January 21, 2010 09:33 >>> Subject: SASware ballot #18 Proc Transpose preserving variable >>> attributes >>> >>> >>>> SASware ballot #18 refers to Proc Transpose preserving variable >>>> attributes. >>>> >>>> I'm familiar with the problem and frequently use a macro to solve it, >>>> but >>>> would be delighted to see this feature migrate into Base SAS. >>>> >>>> >>>> >>>> I am soliciting input from SAS-L in the hopes that this change will be >>>> implemented in the most useful way. >>>> >>>> Posting your comments may have some influence on a future enhancement >>>> of >>>> Proc Transpose. >>>> >>>> >>>> >>>> First off, I think this requires limiting the output to one observation >>>> per >>>> BY-group. I can't see any way around that. >>>> >>>> >>>> >>>> Assuming that is true, one idea for implementing the change to Proc >>>> Transpose would be to add a "ONEOUT" option on the proc statement. >>>> >>>> >>>> >>>> Example: >>>> >>>> Data testing; >>>> >>>> Input Group N C $ D mmddyy10.; >>>> >>>> format d mmddyy10.; >>>> >>>> label N='Numeric var' >>>> >>>> C='Text' >>>> >>>> D='Birthday'; >>>> >>>> Datalines; >>>> >>>> 1 57 A 01/15/2010 >>>> >>>> 1 84 B 02/15/2010 >>>> >>>> 2 33 B 03/15/2010 >>>> >>>> ; >>>> >>>> Proc transpose data=testing out=trans OUTOUT; >>>> >>>> By group; >>>> >>>> Var N D; >>>> >>>> Run; >>>> >>>> >>>> >>>> Proc print data=trans; >>>> >>>> Would produce: >>>> >>>> Obs Group N1 D1 N2 D2 >>>> >>>> >>>> >>>> 1 1 57 01/15/2010 84 02/15/2010 >>>> >>>> 2 2 33 03/15/2010 >>>> >>>> >>>> >>>> With the ID statement: >>>> >>>> Proc transpose data=testing out=trans OUTOUT; >>>> >>>> By group; >>>> >>>> Var N D; >>>> >>>> ID C; >>>> >>>> Run; >>>> >>>> >>>> >>>> Would produce: >>>> >>>> Obs Group NA DA NB DB >>>> >>>> >>>> >>>> 1 1 57 01/15/2010 84 02/15/2010 >>>> >>>> 2 2 33 03/15/2010 >>>> >>>> >>>> >>>> Other possible options come up in naming the output variables: (1) ID >>>> values >>>> before names (2) a separator between ID values and names, and (3) >>>> whether >>>> and how to modify labels. >>>> >>> >>> >>> >> >> >> > > > > >
From: Kevin Myers on 23 Jan 2010 13:03 P.S. Just in case it wasn't already obvious, with the proposed behavior, specifying: PROC TRANSPOSE TYPE LENGTH FORMAT INFORMAT; would be equivalent to your proposed: PROC TRANSPOSE FULLINFO; Personally, I don't feel that amount of shortening (three words) deserves yet another option, but that's just my opinion. ----- Original Message ----- From: "Ted Clay" <tclay(a)ashlandhome.net> To: "'Kevin Myers'" <KevinMyers(a)austin.rr.com>; <SAS-L(a)LISTSERV.UGA.EDU> Sent: Saturday, January 23, 2010 11:34 Subject: RE: SASware ballot #18 Proc Transpose preserving variable attributes > Yes, this would be necessary. In fact, this is what currently happens if > a > format is associated with a variable but is not available at the time of > execution, with the NOFMTERR option turned on. > > In terms of default behavior, currently the output data set has _LABEL_ if > and only if any VAR has a label. Would _FORMAT_, _INFORMAT_ be created > under the same conditions, that is, if any VAR has one defined? What > about > _TYPE_? Every variable has a type. I guess I'm leading up to saying I > don't think you want to be creating anything new unless the user specifies > one of the added features. How about a single option to "turn on" all > these > features, with "NOFORMAT" and "NOINFORMAT" being the only behavior. > Something like "FULLINFO". No other additional statements or options. If > you say "FULLINFO" the proc would create or use the _FORMAT_, _INFORMAT_ > and > _TYPE_ variables as needed. (_TYPE_ only if not all numeric) > > Ted > > > > > -----Original Message----- > From: Kevin Myers [mailto:KevinMyers(a)austin.rr.com] > Sent: Saturday, January 23, 2010 3:38 AM > To: Ted Clay; SAS-L(a)LISTSERV.UGA.EDU > Subject: Re: SASware ballot #18 Proc Transpose preserving variable > attributes > > Right Ted, these are the types of problems that my proposed enhancement is > intended to solve. There is no way to prevent conversion to character on > the first transpose, because values for multiple columns with differing > types are being forced into a single output column. But upon transposing > the data a second time, the proposed enhancements would generally allow > the > original variables types and other attributes to be restored. > > However, your examples raise a couple of potential problems that I failed > to > > account for previously: 1) What if there is no one-to-one correspondence > between the formatted and informatted values, or 2) what it there is a > format but no corresponding informat or vice-versa? In these cases, my > proposal as it currently stands would not produce the desired result, > because there would be no way to convert the formatted values back into > the > original raw data values. This problem suggests the need for two > additional > > options on the PROC FORMAT statement in order to completely implement the > desired behavior: NOFORMAT and NOINFORMAT. NOFORMAT would be used to > force > default formats when outputting values and NOINFORMAT would be used to > force > > default informats when reading values. Note that these options would only > prevent non-default formats and informats from being actively *used* for > converting values during processing, they would NOT prevent the format and > informat attribute values from being written to or read from variables > identified by using the proposed FORMAT= and INFORMAT= options or the > IDFORMAT and IDINFORMAT statements. > > I will add this discussion to my proposal. > > s/KAM > > > ----- Original Message ----- > From: "Ted Clay" <tclay(a)ashlandhome.net> > To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM>; <SAS-L(a)LISTSERV.UGA.EDU> > Sent: Friday, January 22, 2010 23:57 > Subject: RE: SASware ballot #18 Proc Transpose preserving variable > attributes > > >> Hi Kevin, >> >> The default behavior of Proc Transpose is making it difficult to achieve >> the >> preservation of variable attributes. The problem is with the values >> themselves. >> >> Take a look at this: >> >> proc format; >> value nfmt 1='One%' 2='Two$two' 3='%three'; >> value $cfmt 'A'='AOne%' 'B'='BTwo$two' 'C'='%three#C'; >> run; >> >> Data testing; >> Input Group $ N C $ D mmddyy10.; >> label N='Numeric var' >> C='Text' >> D='Birthday'; >> format n nfmt. c $cfmt. D mmddyy10.; >> Datalines; >> G1 1 A 01/15/2010 >> ; >> proc transpose data=testing out=Tran; >> var N C D; >> run; >> >> proc print data=tran noobs; >> run; >> >> PRODUCES: >> >> _NAME_ _LABEL_ COL1 >> >> N Numeric var One% >> C Text A >> D Birthday 01/15/2010 >> >> The problem is that the numeric variable N has been converted to >> character >> using the format we assigned to it. The original numeric value is lost. >> If >> we prevent this by removing the format, we lose the format attribute. Is >> this Catch-22.0? >> >> Ted >> >> -----Original Message----- >> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of >> Kevin >> Myers >> Sent: Thursday, January 21, 2010 1:30 PM >> To: SAS-L(a)LISTSERV.UGA.EDU >> Subject: Re: SASware ballot #18 Proc Transpose preserving variable >> attributes >> >> Hi Ted, >> >> The proposed SUGI ballot item enhancements to PROC TRANSPOSE could >> potentially help with your process to some degree, even though >> fundamentally >> my proposal only deals with variable attributes, whereas your needs >> involve >> an actual rearrangement of the data into different observations and >> variables. >> >> If you first transposed your data into "long-skinny" form using a BY >> statement, then provided a distinct _NAME_ variable for each individual >> observation in each BY group of the long-skinny dataset using logic of >> your >> own, then transposed that result back into short-wide form once again >> using >> a BY statement, then you could achieve roughly what you are looking for, >> and >> perhaps this is what you have already tried. >> >> The contribution that my proposal could make to this process is that you >> would be able to use the new PROC TRANSPOSE statements and options to >> force >> the resulting variables in the final data set (after transposing twice) >> to >> have all of the same types, lengths, formats, and informats as they did >> in >> the original data set, rather than all being changed to character >> variables >> with none of the other original attributes. If maintaining the original >> variable types and attributes is important, then my proposal could >> substantially simplify your process. >> >> s/KAM >> >> P.S. - Hope you don't mind that I copied this reply back to the list. I >> thought others might also be interested. >> >> >> ----- Original Message ----- >> From: "Ted Clay" <tclay(a)ashlandhome.net> >> To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM> >> Sent: Thursday, January 21, 2010 13:50 >> Subject: RE: SASware ballot #18 Proc Transpose preserving variable >> attributes >> >> >>> Hi Kevin, >>> Thanks for filling the gaps. Yes, what I am describing creates a >>> differently >>> shaped output data set, so is not equivalent to your proposal. Let any >>> confusion on that point end here. >>> >>> The ability to create and work with a "self-defining" skinny data set >>> would >>> be great. >>> >>> The situation I run into all the time (say, weekly) is where the data is >>> in >>> groups, and I have a mixture of variable types: numeric, character, >>> numeric >>> date. I want to get all the data in a group onto one observation. I >>> sometimes have an ID variable, or at other times just want numeric >>> suffixes >>> (1,2,3..). >>> Would your proposal make that easier to do than it is now? My hunch is >>> yes, >>> but it would not be a slam-dunk: >>> Proc transpose, with the additional options you propose on the Proc >>> statement. >>> Data step to calculate new variable names, and possibly labels. >>> (Calculate >>> numeric suffixes?) >>> Proc transpose, with the additional ID statements you propose. >>> >>> I had a typo on my example -- I meant to type "ONEOUT" not "OUTOUT". I >>> guess my dog "Spot" was tugging on my pants-leg. >>> Proc transpose data=testing out=trans ONEOUT; >>> By group; >>> Var N D; >>> ID C; >>> Run; >>> >>> Ted >>> >>> >>> >>> >>> -----Original Message----- >>> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of >>> Kevin >>> Myers >>> Sent: Thursday, January 21, 2010 9:27 AM >>> To: SAS-L(a)LISTSERV.UGA.EDU >>> Subject: Re: SASware ballot #18 Proc Transpose preserving variable >>> attributes >>> >>> Ted and all - >>> >>> The current item on the SASware ballot was put on there largely based on >>> my >>> input. I provided a detailed description of the suggested enhancements >>> via >>> SAS Institute's online Tracks system: >>> >>> 7610240654 - Base SAS: Additional options and statements for PROC >>> TRANSPOSE >>> to support transposing additional variable attributes, including format, >>> informat, type, and length. >>> >>> I don't know whether you will be able to view that, but am short on time >>> at >>> the moment, so below is an excerpt from my original submission. Please >>> note >>> that my proposal does NOT require any changes to existing PROC TRANSPOSE >>> behavior, but rather adds support for additional options and statements >>> to >>> support handling additional variable attributes analogous to the >>> existing >>> support for variable lablels. >>> >>> I haven't had time to review your suggestion in detail, but after >>> briefly >>> scanning through it I believe that you are asking for something that is >>> substantially different in concept. If so, then I would suggest that >>> you >>> submit a separate enhancement suggestion, because I wouldn't want your >>> proposal to be confused with or complicate the implementation of mine, >>> or >>> vice-versa. >>> >>> Hope this helps. >>> Kevin M. >>> >>> from: kevinmyers(a)austin.rr.com >>> name: Kevin A. Myers >>> company: Myers Engineering >>> product: BASE >>> os: none >>> suggestion text: PROC TRANSPOSE should be enhanced to support >>> transposing >>> additional variable attributes including format, >>> informat, type, and length. >>> >>> The following new STATEMENTS should be added. These are similar to the >>> existing idlabel statement, but provide input variable >>> names for the other variable attributes: >>> >>> idformat statement - variable name to provide formats >>> idinformat statement - variable name to provide informats >>> idtype statement - variable name to provide variable types >>> idlength statement - variable name to provide variable lengths >>> >>> The PROC TRANSPOSE statement should be enhanced to add the following >>> additional OPTIONS, which all perform similar functions to the >>> existing label option, but for the other variable attributes: >>> >>> format - output variable name for formats (default _FORMAT_) >>> informat - output variable name for informats (default _INFORMAT_) >>> type - output variable name for types (default _TYPE_) >>> length - output variable name for lengths (default _LENGTH_) >>> >>> Providing the new statements and options listed above would allow for >>> simple, complete transformation of a data set from >>> "long-skinny" to "short-wide" representations AND BACK AGAIN, or >>> vice-versa, >>> WITHOUT LOSS OF ANY FORMAT, INFORMAT, TYPE, and LENGTH >>> information. It would also greatly simplify >>> input and output of data from/to "self-defining" external file formats >>> where >>> the format, informat, type, and/or length attributes >>> are included within the external data file. >>> >>> Availability of the proposed new statements and options would allow >>> simple >>> loops based on "long-skinny" data representation >>> to be used for reading/writing data in self-defining formats, >>> followed/preceded by proc transpose steps to convert the data to/from a >>> more >>> typical "short-wide" representation. >>> >>> Code to implement the proposed statements and options should be >>> relatively >>> easy to develop by simply duplicating the existing >>> support for IDLABEL/LABEL= and making relatively minor modifications as >>> needed for each of the other specific attributes (format, >>> informat, type, length). >>> >>> >>> ----- Original Message ----- >>> From: "Ted Clay" <tclay(a)ASHLANDHOME.NET> >>> To: <SAS-L(a)LISTSERV.UGA.EDU> >>> Sent: Thursday, January 21, 2010 09:33 >>> Subject: SASware ballot #18 Proc Transpose preserving variable >>> attributes >>> >>> >>>> SASware ballot #18 refers to Proc Transpose preserving variable >>>> attributes. >>>> >>>> I'm familiar with the problem and frequently use a macro to solve it, >>>> but >>>> would be delighted to see this feature migrate into Base SAS. >>>> >>>> >>>> >>>> I am soliciting input from SAS-L in the hopes that this change will be >>>> implemented in the most useful way. >>>> >>>> Posting your comments may have some influence on a future enhancement >>>> of >>>> Proc Transpose. >>>> >>>> >>>> >>>> First off, I think this requires limiting the output to one observation >>>> per >>>> BY-group. I can't see any way around that. >>>> >>>> >>>> >>>> Assuming that is true, one idea for implementing the change to Proc >>>> Transpose would be to add a "ONEOUT" option on the proc statement. >>>> >>>> >>>> >>>> Example: >>>> >>>> Data testing; >>>> >>>> Input Group N C $ D mmddyy10.; >>>> >>>> format d mmddyy10.; >>>> >>>> label N='Numeric var' >>>> >>>> C='Text' >>>> >>>> D='Birthday'; >>>> >>>> Datalines; >>>> >>>> 1 57 A 01/15/2010 >>>> >>>> 1 84 B 02/15/2010 >>>> >>>> 2 33 B 03/15/2010 >>>> >>>> ; >>>> >>>> Proc transpose data=testing out=trans OUTOUT; >>>> >>>> By group; >>>> >>>> Var N D; >>>> >>>> Run; >>>> >>>> >>>> >>>> Proc print data=trans; >>>> >>>> Would produce: >>>> >>>> Obs Group N1 D1 N2 D2 >>>> >>>> >>>> >>>> 1 1 57 01/15/2010 84 02/15/2010 >>>> >>>> 2 2 33 03/15/2010 >>>> >>>> >>>> >>>> With the ID statement: >>>> >>>> Proc transpose data=testing out=trans OUTOUT; >>>> >>>> By group; >>>> >>>> Var N D; >>>> >>>> ID C; >>>> >>>> Run; >>>> >>>> >>>> >>>> Would produce: >>>> >>>> Obs Group NA DA NB DB >>>> >>>> >>>> >>>> 1 1 57 01/15/2010 84 02/15/2010 >>>> >>>> 2 2 33 03/15/2010 >>>> >>>> >>>> >>>> Other possible options come up in naming the output variables: (1) ID >>>> values >>>> before names (2) a separator between ID values and names, and (3) >>>> whether >>>> and how to modify labels. >>>> >>> >>> >>> >> >> >> > > > > >
From: Ted Clay on 23 Jan 2010 13:16 So for clarify, in your proposal format - output variable name for formats (default _FORMAT_) means one could specify the following on the proc transpose statement: 1. Nothing. Get what we get now. 2. "FORMAT" Get a new variable _FORMAT_ 3. "FORMAT=myformat" Get a new variable MYFORMAT And similarly for INFORMAT, TYPE, LENGTH. Ted -----Original Message----- From: Kevin Myers [mailto:KevinMyers(a)austin.rr.com] Sent: Saturday, January 23, 2010 9:56 AM To: Ted Clay; SAS-L(a)LISTSERV.UGA.EDU Subject: Re: SASware ballot #18 Proc Transpose preserving variable attributes According to my current proposal, NO new variables should be added to the output data set unless any of the corresponding TYPE, LENGTH, FORMAT, and INFORMAT options are used. If any of these options are used without providing a variable name, then the default variable names for those options would be used. You could add an option to turn on all of them at once with the default variable names as you have suggested, but I have NOT proposed anything along those lines. Feel free to submit a suggestion to that effect, if desired. ----- Original Message ----- From: "Ted Clay" <tclay(a)ashlandhome.net> To: "'Kevin Myers'" <KevinMyers(a)austin.rr.com>; <SAS-L(a)LISTSERV.UGA.EDU> Sent: Saturday, January 23, 2010 11:34 Subject: RE: SASware ballot #18 Proc Transpose preserving variable attributes > Yes, this would be necessary. In fact, this is what currently happens if > a > format is associated with a variable but is not available at the time of > execution, with the NOFMTERR option turned on. > > In terms of default behavior, currently the output data set has _LABEL_ if > and only if any VAR has a label. Would _FORMAT_, _INFORMAT_ be created > under the same conditions, that is, if any VAR has one defined? What > about > _TYPE_? Every variable has a type. I guess I'm leading up to saying I > don't think you want to be creating anything new unless the user specifies > one of the added features. How about a single option to "turn on" all > these > features, with "NOFORMAT" and "NOINFORMAT" being the only behavior. > Something like "FULLINFO". No other additional statements or options. If > you say "FULLINFO" the proc would create or use the _FORMAT_, _INFORMAT_ > and > _TYPE_ variables as needed. (_TYPE_ only if not all numeric) > > Ted > > > > > -----Original Message----- > From: Kevin Myers [mailto:KevinMyers(a)austin.rr.com] > Sent: Saturday, January 23, 2010 3:38 AM > To: Ted Clay; SAS-L(a)LISTSERV.UGA.EDU > Subject: Re: SASware ballot #18 Proc Transpose preserving variable > attributes > > Right Ted, these are the types of problems that my proposed enhancement is > intended to solve. There is no way to prevent conversion to character on > the first transpose, because values for multiple columns with differing > types are being forced into a single output column. But upon transposing > the data a second time, the proposed enhancements would generally allow > the > original variables types and other attributes to be restored. > > However, your examples raise a couple of potential problems that I failed > to > > account for previously: 1) What if there is no one-to-one correspondence > between the formatted and informatted values, or 2) what it there is a > format but no corresponding informat or vice-versa? In these cases, my > proposal as it currently stands would not produce the desired result, > because there would be no way to convert the formatted values back into > the > original raw data values. This problem suggests the need for two > additional > > options on the PROC FORMAT statement in order to completely implement the > desired behavior: NOFORMAT and NOINFORMAT. NOFORMAT would be used to > force > default formats when outputting values and NOINFORMAT would be used to > force > > default informats when reading values. Note that these options would only > prevent non-default formats and informats from being actively *used* for > converting values during processing, they would NOT prevent the format and > informat attribute values from being written to or read from variables > identified by using the proposed FORMAT= and INFORMAT= options or the > IDFORMAT and IDINFORMAT statements. > > I will add this discussion to my proposal. > > s/KAM > > > ----- Original Message ----- > From: "Ted Clay" <tclay(a)ashlandhome.net> > To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM>; <SAS-L(a)LISTSERV.UGA.EDU> > Sent: Friday, January 22, 2010 23:57 > Subject: RE: SASware ballot #18 Proc Transpose preserving variable > attributes > > >> Hi Kevin, >> >> The default behavior of Proc Transpose is making it difficult to achieve >> the >> preservation of variable attributes. The problem is with the values >> themselves. >> >> Take a look at this: >> >> proc format; >> value nfmt 1='One%' 2='Two$two' 3='%three'; >> value $cfmt 'A'='AOne%' 'B'='BTwo$two' 'C'='%three#C'; >> run; >> >> Data testing; >> Input Group $ N C $ D mmddyy10.; >> label N='Numeric var' >> C='Text' >> D='Birthday'; >> format n nfmt. c $cfmt. D mmddyy10.; >> Datalines; >> G1 1 A 01/15/2010 >> ; >> proc transpose data=testing out=Tran; >> var N C D; >> run; >> >> proc print data=tran noobs; >> run; >> >> PRODUCES: >> >> _NAME_ _LABEL_ COL1 >> >> N Numeric var One% >> C Text A >> D Birthday 01/15/2010 >> >> The problem is that the numeric variable N has been converted to >> character >> using the format we assigned to it. The original numeric value is lost. >> If >> we prevent this by removing the format, we lose the format attribute. Is >> this Catch-22.0? >> >> Ted >> >> -----Original Message----- >> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of >> Kevin >> Myers >> Sent: Thursday, January 21, 2010 1:30 PM >> To: SAS-L(a)LISTSERV.UGA.EDU >> Subject: Re: SASware ballot #18 Proc Transpose preserving variable >> attributes >> >> Hi Ted, >> >> The proposed SUGI ballot item enhancements to PROC TRANSPOSE could >> potentially help with your process to some degree, even though >> fundamentally >> my proposal only deals with variable attributes, whereas your needs >> involve >> an actual rearrangement of the data into different observations and >> variables. >> >> If you first transposed your data into "long-skinny" form using a BY >> statement, then provided a distinct _NAME_ variable for each individual >> observation in each BY group of the long-skinny dataset using logic of >> your >> own, then transposed that result back into short-wide form once again >> using >> a BY statement, then you could achieve roughly what you are looking for, >> and >> perhaps this is what you have already tried. >> >> The contribution that my proposal could make to this process is that you >> would be able to use the new PROC TRANSPOSE statements and options to >> force >> the resulting variables in the final data set (after transposing twice) >> to >> have all of the same types, lengths, formats, and informats as they did >> in >> the original data set, rather than all being changed to character >> variables >> with none of the other original attributes. If maintaining the original >> variable types and attributes is important, then my proposal could >> substantially simplify your process. >> >> s/KAM >> >> P.S. - Hope you don't mind that I copied this reply back to the list. I >> thought others might also be interested. >> >> >> ----- Original Message ----- >> From: "Ted Clay" <tclay(a)ashlandhome.net> >> To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM> >> Sent: Thursday, January 21, 2010 13:50 >> Subject: RE: SASware ballot #18 Proc Transpose preserving variable >> attributes >> >> >>> Hi Kevin, >>> Thanks for filling the gaps. Yes, what I am describing creates a >>> differently >>> shaped output data set, so is not equivalent to your proposal. Let any >>> confusion on that point end here. >>> >>> The ability to create and work with a "self-defining" skinny data set >>> would >>> be great. >>> >>> The situation I run into all the time (say, weekly) is where the data is >>> in >>> groups, and I have a mixture of variable types: numeric, character, >>> numeric >>> date. I want to get all the data in a group onto one observation. I >>> sometimes have an ID variable, or at other times just want numeric >>> suffixes >>> (1,2,3..). >>> Would your proposal make that easier to do than it is now? My hunch is >>> yes, >>> but it would not be a slam-dunk: >>> Proc transpose, with the additional options you propose on the Proc >>> statement. >>> Data step to calculate new variable names, and possibly labels. >>> (Calculate >>> numeric suffixes?) >>> Proc transpose, with the additional ID statements you propose. >>> >>> I had a typo on my example -- I meant to type "ONEOUT" not "OUTOUT". I >>> guess my dog "Spot" was tugging on my pants-leg. >>> Proc transpose data=testing out=trans ONEOUT; >>> By group; >>> Var N D; >>> ID C; >>> Run; >>> >>> Ted >>> >>> >>> >>> >>> -----Original Message----- >>> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of >>> Kevin >>> Myers >>> Sent: Thursday, January 21, 2010 9:27 AM >>> To: SAS-L(a)LISTSERV.UGA.EDU >>> Subject: Re: SASware ballot #18 Proc Transpose preserving variable >>> attributes >>> >>> Ted and all - >>> >>> The current item on the SASware ballot was put on there largely based on >>> my >>> input. I provided a detailed description of the suggested enhancements >>> via >>> SAS Institute's online Tracks system: >>> >>> 7610240654 - Base SAS: Additional options and statements for PROC >>> TRANSPOSE >>> to support transposing additional variable attributes, including format, >>> informat, type, and length. >>> >>> I don't know whether you will be able to view that, but am short on time >>> at >>> the moment, so below is an excerpt from my original submission. Please >>> note >>> that my proposal does NOT require any changes to existing PROC TRANSPOSE >>> behavior, but rather adds support for additional options and statements >>> to >>> support handling additional variable attributes analogous to the >>> existing >>> support for variable lablels. >>> >>> I haven't had time to review your suggestion in detail, but after >>> briefly >>> scanning through it I believe that you are asking for something that is >>> substantially different in concept. If so, then I would suggest that >>> you >>> submit a separate enhancement suggestion, because I wouldn't want your >>> proposal to be confused with or complicate the implementation of mine, >>> or >>> vice-versa. >>> >>> Hope this helps. >>> Kevin M. >>> >>> from: kevinmyers(a)austin.rr.com >>> name: Kevin A. Myers >>> company: Myers Engineering >>> product: BASE >>> os: none >>> suggestion text: PROC TRANSPOSE should be enhanced to support >>> transposing >>> additional variable attributes including format, >>> informat, type, and length. >>> >>> The following new STATEMENTS should be added. These are similar to the >>> existing idlabel statement, but provide input variable >>> names for the other variable attributes: >>> >>> idformat statement - variable name to provide formats >>> idinformat statement - variable name to provide informats >>> idtype statement - variable name to provide variable types >>> idlength statement - variable name to provide variable lengths >>> >>> The PROC TRANSPOSE statement should be enhanced to add the following >>> additional OPTIONS, which all perform similar functions to the >>> existing label option, but for the other variable attributes: >>> >>> format - output variable name for formats (default _FORMAT_) >>> informat - output variable name for informats (default _INFORMAT_) >>> type - output variable name for types (default _TYPE_) >>> length - output variable name for lengths (default _LENGTH_) >>> >>> Providing the new statements and options listed above would allow for >>> simple, complete transformation of a data set from >>> "long-skinny" to "short-wide" representations AND BACK AGAIN, or >>> vice-versa, >>> WITHOUT LOSS OF ANY FORMAT, INFORMAT, TYPE, and LENGTH >>> information. It would also greatly simplify >>> input and output of data from/to "self-defining" external file formats >>> where >>> the format, informat, type, and/or length attributes >>> are included within the external data file. >>> >>> Availability of the proposed new statements and options would allow >>> simple >>> loops based on "long-skinny" data representation >>> to be used for reading/writing data in self-defining formats, >>> followed/preceded by proc transpose steps to convert the data to/from a >>> more >>> typical "short-wide" representation. >>> >>> Code to implement the proposed statements and options should be >>> relatively >>> easy to develop by simply duplicating the existing >>> support for IDLABEL/LABEL= and making relatively minor modifications as >>> needed for each of the other specific attributes (format, >>> informat, type, length). >>> >>> >>> ----- Original Message ----- >>> From: "Ted Clay" <tclay(a)ASHLANDHOME.NET> >>> To: <SAS-L(a)LISTSERV.UGA.EDU> >>> Sent: Thursday, January 21, 2010 09:33 >>> Subject: SASware ballot #18 Proc Transpose preserving variable >>> attributes >>> >>> >>>> SASware ballot #18 refers to Proc Transpose preserving variable >>>> attributes. >>>> >>>> I'm familiar with the problem and frequently use a macro to solve it, >>>> but >>>> would be delighted to see this feature migrate into Base SAS. >>>> >>>> >>>> >>>> I am soliciting input from SAS-L in the hopes that this change will be >>>> implemented in the most useful way. >>>> >>>> Posting your comments may have some influence on a future enhancement >>>> of >>>> Proc Transpose. >>>> >>>> >>>> >>>> First off, I think this requires limiting the output to one observation >>>> per >>>> BY-group. I can't see any way around that. >>>> >>>> >>>> >>>> Assuming that is true, one idea for implementing the change to Proc >>>> Transpose would be to add a "ONEOUT" option on the proc statement. >>>> >>>> >>>> >>>> Example: >>>> >>>> Data testing; >>>> >>>> Input Group N C $ D mmddyy10.; >>>> >>>> format d mmddyy10.; >>>> >>>> label N='Numeric var' >>>> >>>> C='Text' >>>> >>>> D='Birthday'; >>>> >>>> Datalines; >>>> >>>> 1 57 A 01/15/2010 >>>> >>>> 1 84 B 02/15/2010 >>>> >>>> 2 33 B 03/15/2010 >>>> >>>> ; >>>> >>>> Proc transpose data=testing out=trans OUTOUT; >>>> >>>> By group; >>>> >>>> Var N D; >>>> >>>> Run; >>>> >>>> >>>> >>>> Proc print data=trans; >>>> >>>> Would produce: >>>> >>>> Obs Group N1 D1 N2 D2 >>>> >>>> >>>> >>>> 1 1 57 01/15/2010 84 02/15/2010 >>>> >>>> 2 2 33 03/15/2010 >>>> >>>> >>>> >>>> With the ID statement: >>>> >>>> Proc transpose data=testing out=trans OUTOUT; >>>> >>>> By group; >>>> >>>> Var N D; >>>> >>>> ID C; >>>> >>>> Run; >>>> >>>> >>>> >>>> Would produce: >>>> >>>> Obs Group NA DA NB DB >>>> >>>> >>>> >>>> 1 1 57 01/15/2010 84 02/15/2010 >>>> >>>> 2 2 33 03/15/2010 >>>> >>>> >>>> >>>> Other possible options come up in naming the output variables: (1) ID >>>> values >>>> before names (2) a separator between ID values and names, and (3) >>>> whether >>>> and how to modify labels. >>>> >>> >>> >>> >> >> >> > > > > >
From: Kevin Myers on 23 Jan 2010 13:39 Correct. ----- Original Message ----- From: "Ted Clay" <tclay(a)ashlandhome.net> To: "'Kevin Myers'" <KevinMyers(a)austin.rr.com>; <SAS-L(a)LISTSERV.UGA.EDU> Sent: Saturday, January 23, 2010 12:16 Subject: RE: SASware ballot #18 Proc Transpose preserving variable attributes > So for clarify, in your proposal > > format - output variable name for formats (default _FORMAT_) > > means one could specify the following on the proc transpose statement: > 1. Nothing. Get what we get now. > 2. "FORMAT" Get a new variable _FORMAT_ > 3. "FORMAT=myformat" Get a new variable MYFORMAT > > And similarly for INFORMAT, TYPE, LENGTH. > > Ted > > > > -----Original Message----- > From: Kevin Myers [mailto:KevinMyers(a)austin.rr.com] > Sent: Saturday, January 23, 2010 9:56 AM > To: Ted Clay; SAS-L(a)LISTSERV.UGA.EDU > Subject: Re: SASware ballot #18 Proc Transpose preserving variable > attributes > > According to my current proposal, NO new variables should be added to the > output data set unless any of the corresponding TYPE, LENGTH, FORMAT, and > INFORMAT options are used. If any of these options are used without > providing a variable name, then the default variable names for those > options > > would be used. > > You could add an option to turn on all of them at once with the default > variable names as you have suggested, but I have NOT proposed anything > along > > those lines. Feel free to submit a suggestion to that effect, if desired. > > > ----- Original Message ----- > From: "Ted Clay" <tclay(a)ashlandhome.net> > To: "'Kevin Myers'" <KevinMyers(a)austin.rr.com>; <SAS-L(a)LISTSERV.UGA.EDU> > Sent: Saturday, January 23, 2010 11:34 > Subject: RE: SASware ballot #18 Proc Transpose preserving variable > attributes > > >> Yes, this would be necessary. In fact, this is what currently happens if >> a >> format is associated with a variable but is not available at the time of >> execution, with the NOFMTERR option turned on. >> >> In terms of default behavior, currently the output data set has _LABEL_ >> if >> and only if any VAR has a label. Would _FORMAT_, _INFORMAT_ be created >> under the same conditions, that is, if any VAR has one defined? What >> about >> _TYPE_? Every variable has a type. I guess I'm leading up to saying I >> don't think you want to be creating anything new unless the user >> specifies >> one of the added features. How about a single option to "turn on" all >> these >> features, with "NOFORMAT" and "NOINFORMAT" being the only behavior. >> Something like "FULLINFO". No other additional statements or options. >> If >> you say "FULLINFO" the proc would create or use the _FORMAT_, _INFORMAT_ >> and >> _TYPE_ variables as needed. (_TYPE_ only if not all numeric) >> >> Ted >> >> >> >> >> -----Original Message----- >> From: Kevin Myers [mailto:KevinMyers(a)austin.rr.com] >> Sent: Saturday, January 23, 2010 3:38 AM >> To: Ted Clay; SAS-L(a)LISTSERV.UGA.EDU >> Subject: Re: SASware ballot #18 Proc Transpose preserving variable >> attributes >> >> Right Ted, these are the types of problems that my proposed enhancement >> is >> intended to solve. There is no way to prevent conversion to character on >> the first transpose, because values for multiple columns with differing >> types are being forced into a single output column. But upon transposing >> the data a second time, the proposed enhancements would generally allow >> the >> original variables types and other attributes to be restored. >> >> However, your examples raise a couple of potential problems that I failed >> to >> >> account for previously: 1) What if there is no one-to-one correspondence >> between the formatted and informatted values, or 2) what it there is a >> format but no corresponding informat or vice-versa? In these cases, my >> proposal as it currently stands would not produce the desired result, >> because there would be no way to convert the formatted values back into >> the >> original raw data values. This problem suggests the need for two >> additional >> >> options on the PROC FORMAT statement in order to completely implement the >> desired behavior: NOFORMAT and NOINFORMAT. NOFORMAT would be used to >> force >> default formats when outputting values and NOINFORMAT would be used to >> force >> >> default informats when reading values. Note that these options would >> only >> prevent non-default formats and informats from being actively *used* for >> converting values during processing, they would NOT prevent the format >> and >> informat attribute values from being written to or read from variables >> identified by using the proposed FORMAT= and INFORMAT= options or the >> IDFORMAT and IDINFORMAT statements. >> >> I will add this discussion to my proposal. >> >> s/KAM >> >> >> ----- Original Message ----- >> From: "Ted Clay" <tclay(a)ashlandhome.net> >> To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM>; <SAS-L(a)LISTSERV.UGA.EDU> >> Sent: Friday, January 22, 2010 23:57 >> Subject: RE: SASware ballot #18 Proc Transpose preserving variable >> attributes >> >> >>> Hi Kevin, >>> >>> The default behavior of Proc Transpose is making it difficult to achieve >>> the >>> preservation of variable attributes. The problem is with the values >>> themselves. >>> >>> Take a look at this: >>> >>> proc format; >>> value nfmt 1='One%' 2='Two$two' 3='%three'; >>> value $cfmt 'A'='AOne%' 'B'='BTwo$two' 'C'='%three#C'; >>> run; >>> >>> Data testing; >>> Input Group $ N C $ D mmddyy10.; >>> label N='Numeric var' >>> C='Text' >>> D='Birthday'; >>> format n nfmt. c $cfmt. D mmddyy10.; >>> Datalines; >>> G1 1 A 01/15/2010 >>> ; >>> proc transpose data=testing out=Tran; >>> var N C D; >>> run; >>> >>> proc print data=tran noobs; >>> run; >>> >>> PRODUCES: >>> >>> _NAME_ _LABEL_ COL1 >>> >>> N Numeric var One% >>> C Text A >>> D Birthday 01/15/2010 >>> >>> The problem is that the numeric variable N has been converted to >>> character >>> using the format we assigned to it. The original numeric value is lost. >>> If >>> we prevent this by removing the format, we lose the format attribute. >>> Is >>> this Catch-22.0? >>> >>> Ted >>> >>> -----Original Message----- >>> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of >>> Kevin >>> Myers >>> Sent: Thursday, January 21, 2010 1:30 PM >>> To: SAS-L(a)LISTSERV.UGA.EDU >>> Subject: Re: SASware ballot #18 Proc Transpose preserving variable >>> attributes >>> >>> Hi Ted, >>> >>> The proposed SUGI ballot item enhancements to PROC TRANSPOSE could >>> potentially help with your process to some degree, even though >>> fundamentally >>> my proposal only deals with variable attributes, whereas your needs >>> involve >>> an actual rearrangement of the data into different observations and >>> variables. >>> >>> If you first transposed your data into "long-skinny" form using a BY >>> statement, then provided a distinct _NAME_ variable for each individual >>> observation in each BY group of the long-skinny dataset using logic of >>> your >>> own, then transposed that result back into short-wide form once again >>> using >>> a BY statement, then you could achieve roughly what you are looking for, >>> and >>> perhaps this is what you have already tried. >>> >>> The contribution that my proposal could make to this process is that you >>> would be able to use the new PROC TRANSPOSE statements and options to >>> force >>> the resulting variables in the final data set (after transposing twice) >>> to >>> have all of the same types, lengths, formats, and informats as they did >>> in >>> the original data set, rather than all being changed to character >>> variables >>> with none of the other original attributes. If maintaining the original >>> variable types and attributes is important, then my proposal could >>> substantially simplify your process. >>> >>> s/KAM >>> >>> P.S. - Hope you don't mind that I copied this reply back to the list. I >>> thought others might also be interested. >>> >>> >>> ----- Original Message ----- >>> From: "Ted Clay" <tclay(a)ashlandhome.net> >>> To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM> >>> Sent: Thursday, January 21, 2010 13:50 >>> Subject: RE: SASware ballot #18 Proc Transpose preserving variable >>> attributes >>> >>> >>>> Hi Kevin, >>>> Thanks for filling the gaps. Yes, what I am describing creates a >>>> differently >>>> shaped output data set, so is not equivalent to your proposal. Let any >>>> confusion on that point end here. >>>> >>>> The ability to create and work with a "self-defining" skinny data set >>>> would >>>> be great. >>>> >>>> The situation I run into all the time (say, weekly) is where the data >>>> is >>>> in >>>> groups, and I have a mixture of variable types: numeric, character, >>>> numeric >>>> date. I want to get all the data in a group onto one observation. I >>>> sometimes have an ID variable, or at other times just want numeric >>>> suffixes >>>> (1,2,3..). >>>> Would your proposal make that easier to do than it is now? My hunch is >>>> yes, >>>> but it would not be a slam-dunk: >>>> Proc transpose, with the additional options you propose on the Proc >>>> statement. >>>> Data step to calculate new variable names, and possibly labels. >>>> (Calculate >>>> numeric suffixes?) >>>> Proc transpose, with the additional ID statements you propose. >>>> >>>> I had a typo on my example -- I meant to type "ONEOUT" not "OUTOUT". I >>>> guess my dog "Spot" was tugging on my pants-leg. >>>> Proc transpose data=testing out=trans ONEOUT; >>>> By group; >>>> Var N D; >>>> ID C; >>>> Run; >>>> >>>> Ted >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of >>>> Kevin >>>> Myers >>>> Sent: Thursday, January 21, 2010 9:27 AM >>>> To: SAS-L(a)LISTSERV.UGA.EDU >>>> Subject: Re: SASware ballot #18 Proc Transpose preserving variable >>>> attributes >>>> >>>> Ted and all - >>>> >>>> The current item on the SASware ballot was put on there largely based >>>> on >>>> my >>>> input. I provided a detailed description of the suggested enhancements >>>> via >>>> SAS Institute's online Tracks system: >>>> >>>> 7610240654 - Base SAS: Additional options and statements for PROC >>>> TRANSPOSE >>>> to support transposing additional variable attributes, including >>>> format, >>>> informat, type, and length. >>>> >>>> I don't know whether you will be able to view that, but am short on >>>> time >>>> at >>>> the moment, so below is an excerpt from my original submission. Please >>>> note >>>> that my proposal does NOT require any changes to existing PROC >>>> TRANSPOSE >>>> behavior, but rather adds support for additional options and statements >>>> to >>>> support handling additional variable attributes analogous to the >>>> existing >>>> support for variable lablels. >>>> >>>> I haven't had time to review your suggestion in detail, but after >>>> briefly >>>> scanning through it I believe that you are asking for something that is >>>> substantially different in concept. If so, then I would suggest that >>>> you >>>> submit a separate enhancement suggestion, because I wouldn't want your >>>> proposal to be confused with or complicate the implementation of mine, >>>> or >>>> vice-versa. >>>> >>>> Hope this helps. >>>> Kevin M. >>>> >>>> from: kevinmyers(a)austin.rr.com >>>> name: Kevin A. Myers >>>> company: Myers Engineering >>>> product: BASE >>>> os: none >>>> suggestion text: PROC TRANSPOSE should be enhanced to support >>>> transposing >>>> additional variable attributes including format, >>>> informat, type, and length. >>>> >>>> The following new STATEMENTS should be added. These are similar to the >>>> existing idlabel statement, but provide input variable >>>> names for the other variable attributes: >>>> >>>> idformat statement - variable name to provide formats >>>> idinformat statement - variable name to provide informats >>>> idtype statement - variable name to provide variable types >>>> idlength statement - variable name to provide variable lengths >>>> >>>> The PROC TRANSPOSE statement should be enhanced to add the following >>>> additional OPTIONS, which all perform similar functions to the >>>> existing label option, but for the other variable attributes: >>>> >>>> format - output variable name for formats (default _FORMAT_) >>>> informat - output variable name for informats (default _INFORMAT_) >>>> type - output variable name for types (default _TYPE_) >>>> length - output variable name for lengths (default _LENGTH_) >>>> >>>> Providing the new statements and options listed above would allow for >>>> simple, complete transformation of a data set from >>>> "long-skinny" to "short-wide" representations AND BACK AGAIN, or >>>> vice-versa, >>>> WITHOUT LOSS OF ANY FORMAT, INFORMAT, TYPE, and LENGTH >>>> information. It would also greatly simplify >>>> input and output of data from/to "self-defining" external file formats >>>> where >>>> the format, informat, type, and/or length attributes >>>> are included within the external data file. >>>> >>>> Availability of the proposed new statements and options would allow >>>> simple >>>> loops based on "long-skinny" data representation >>>> to be used for reading/writing data in self-defining formats, >>>> followed/preceded by proc transpose steps to convert the data to/from a >>>> more >>>> typical "short-wide" representation. >>>> >>>> Code to implement the proposed statements and options should be >>>> relatively >>>> easy to develop by simply duplicating the existing >>>> support for IDLABEL/LABEL= and making relatively minor modifications as >>>> needed for each of the other specific attributes (format, >>>> informat, type, length). >>>> >>>> >>>> ----- Original Message ----- >>>> From: "Ted Clay" <tclay(a)ASHLANDHOME.NET> >>>> To: <SAS-L(a)LISTSERV.UGA.EDU> >>>> Sent: Thursday, January 21, 2010 09:33 >>>> Subject: SASware ballot #18 Proc Transpose preserving variable >>>> attributes >>>> >>>> >>>>> SASware ballot #18 refers to Proc Transpose preserving variable >>>>> attributes. >>>>> >>>>> I'm familiar with the problem and frequently use a macro to solve it, >>>>> but >>>>> would be delighted to see this feature migrate into Base SAS. >>>>> >>>>> >>>>> >>>>> I am soliciting input from SAS-L in the hopes that this change will be >>>>> implemented in the most useful way. >>>>> >>>>> Posting your comments may have some influence on a future enhancement >>>>> of >>>>> Proc Transpose. >>>>> >>>>> >>>>> >>>>> First off, I think this requires limiting the output to one >>>>> observation >>>>> per >>>>> BY-group. I can't see any way around that. >>>>> >>>>> >>>>> >>>>> Assuming that is true, one idea for implementing the change to Proc >>>>> Transpose would be to add a "ONEOUT" option on the proc statement. >>>>> >>>>> >>>>> >>>>> Example: >>>>> >>>>> Data testing; >>>>> >>>>> Input Group N C $ D mmddyy10.; >>>>> >>>>> format d mmddyy10.; >>>>> >>>>> label N='Numeric var' >>>>> >>>>> C='Text' >>>>> >>>>> D='Birthday'; >>>>> >>>>> Datalines; >>>>> >>>>> 1 57 A 01/15/2010 >>>>> >>>>> 1 84 B 02/15/2010 >>>>> >>>>> 2 33 B 03/15/2010 >>>>> >>>>> ; >>>>> >>>>> Proc transpose data=testing out=trans OUTOUT; >>>>> >>>>> By group; >>>>> >>>>> Var N D; >>>>> >>>>> Run; >>>>> >>>>> >>>>> >>>>> Proc print data=trans; >>>>> >>>>> Would produce: >>>>> >>>>> Obs Group N1 D1 N2 D2 >>>>> >>>>> >>>>> >>>>> 1 1 57 01/15/2010 84 02/15/2010 >>>>> >>>>> 2 2 33 03/15/2010 >>>>> >>>>> >>>>> >>>>> With the ID statement: >>>>> >>>>> Proc transpose data=testing out=trans OUTOUT; >>>>> >>>>> By group; >>>>> >>>>> Var N D; >>>>> >>>>> ID C; >>>>> >>>>> Run; >>>>> >>>>> >>>>> >>>>> Would produce: >>>>> >>>>> Obs Group NA DA NB DB >>>>> >>>>> >>>>> >>>>> 1 1 57 01/15/2010 84 02/15/2010 >>>>> >>>>> 2 2 33 03/15/2010 >>>>> >>>>> >>>>> >>>>> Other possible options come up in naming the output variables: (1) ID >>>>> values >>>>> before names (2) a separator between ID values and names, and (3) >>>>> whether >>>>> and how to modify labels. >>>>> >>>> >>>> >>>> >>> >>> >>> >> >> >> >> >> > > > > >
From: Ted Clay on 23 Jan 2010 13:47
Kevin, If there were a "FULLINFO" option in your proposal, my preference would be to have it cover TYPE LENGTH FORMAT INFORMAT NOFORMAT NOINFORMAT on the proc statement, plus the statements IDTYPE IDLENGTH IDFORMAT and IDINFORMAT with default names. But yes one more option is one more option. Ted -----Original Message----- From: Kevin Myers [mailto:KevinMyers(a)austin.rr.com] Sent: Saturday, January 23, 2010 10:04 AM To: Ted Clay; SAS-L(a)LISTSERV.UGA.EDU Subject: Re: SASware ballot #18 Proc Transpose preserving variable attributes P.S. Just in case it wasn't already obvious, with the proposed behavior, specifying: PROC TRANSPOSE TYPE LENGTH FORMAT INFORMAT; would be equivalent to your proposed: PROC TRANSPOSE FULLINFO; Personally, I don't feel that amount of shortening (three words) deserves yet another option, but that's just my opinion. ----- Original Message ----- From: "Ted Clay" <tclay(a)ashlandhome.net> To: "'Kevin Myers'" <KevinMyers(a)austin.rr.com>; <SAS-L(a)LISTSERV.UGA.EDU> Sent: Saturday, January 23, 2010 11:34 Subject: RE: SASware ballot #18 Proc Transpose preserving variable attributes > Yes, this would be necessary. In fact, this is what currently happens if > a > format is associated with a variable but is not available at the time of > execution, with the NOFMTERR option turned on. > > In terms of default behavior, currently the output data set has _LABEL_ if > and only if any VAR has a label. Would _FORMAT_, _INFORMAT_ be created > under the same conditions, that is, if any VAR has one defined? What > about > _TYPE_? Every variable has a type. I guess I'm leading up to saying I > don't think you want to be creating anything new unless the user specifies > one of the added features. How about a single option to "turn on" all > these > features, with "NOFORMAT" and "NOINFORMAT" being the only behavior. > Something like "FULLINFO". No other additional statements or options. If > you say "FULLINFO" the proc would create or use the _FORMAT_, _INFORMAT_ > and > _TYPE_ variables as needed. (_TYPE_ only if not all numeric) > > Ted > > > > > -----Original Message----- > From: Kevin Myers [mailto:KevinMyers(a)austin.rr.com] > Sent: Saturday, January 23, 2010 3:38 AM > To: Ted Clay; SAS-L(a)LISTSERV.UGA.EDU > Subject: Re: SASware ballot #18 Proc Transpose preserving variable > attributes > > Right Ted, these are the types of problems that my proposed enhancement is > intended to solve. There is no way to prevent conversion to character on > the first transpose, because values for multiple columns with differing > types are being forced into a single output column. But upon transposing > the data a second time, the proposed enhancements would generally allow > the > original variables types and other attributes to be restored. > > However, your examples raise a couple of potential problems that I failed > to > > account for previously: 1) What if there is no one-to-one correspondence > between the formatted and informatted values, or 2) what it there is a > format but no corresponding informat or vice-versa? In these cases, my > proposal as it currently stands would not produce the desired result, > because there would be no way to convert the formatted values back into > the > original raw data values. This problem suggests the need for two > additional > > options on the PROC FORMAT statement in order to completely implement the > desired behavior: NOFORMAT and NOINFORMAT. NOFORMAT would be used to > force > default formats when outputting values and NOINFORMAT would be used to > force > > default informats when reading values. Note that these options would only > prevent non-default formats and informats from being actively *used* for > converting values during processing, they would NOT prevent the format and > informat attribute values from being written to or read from variables > identified by using the proposed FORMAT= and INFORMAT= options or the > IDFORMAT and IDINFORMAT statements. > > I will add this discussion to my proposal. > > s/KAM > > > ----- Original Message ----- > From: "Ted Clay" <tclay(a)ashlandhome.net> > To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM>; <SAS-L(a)LISTSERV.UGA.EDU> > Sent: Friday, January 22, 2010 23:57 > Subject: RE: SASware ballot #18 Proc Transpose preserving variable > attributes > > >> Hi Kevin, >> >> The default behavior of Proc Transpose is making it difficult to achieve >> the >> preservation of variable attributes. The problem is with the values >> themselves. >> >> Take a look at this: >> >> proc format; >> value nfmt 1='One%' 2='Two$two' 3='%three'; >> value $cfmt 'A'='AOne%' 'B'='BTwo$two' 'C'='%three#C'; >> run; >> >> Data testing; >> Input Group $ N C $ D mmddyy10.; >> label N='Numeric var' >> C='Text' >> D='Birthday'; >> format n nfmt. c $cfmt. D mmddyy10.; >> Datalines; >> G1 1 A 01/15/2010 >> ; >> proc transpose data=testing out=Tran; >> var N C D; >> run; >> >> proc print data=tran noobs; >> run; >> >> PRODUCES: >> >> _NAME_ _LABEL_ COL1 >> >> N Numeric var One% >> C Text A >> D Birthday 01/15/2010 >> >> The problem is that the numeric variable N has been converted to >> character >> using the format we assigned to it. The original numeric value is lost. >> If >> we prevent this by removing the format, we lose the format attribute. Is >> this Catch-22.0? >> >> Ted >> >> -----Original Message----- >> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of >> Kevin >> Myers >> Sent: Thursday, January 21, 2010 1:30 PM >> To: SAS-L(a)LISTSERV.UGA.EDU >> Subject: Re: SASware ballot #18 Proc Transpose preserving variable >> attributes >> >> Hi Ted, >> >> The proposed SUGI ballot item enhancements to PROC TRANSPOSE could >> potentially help with your process to some degree, even though >> fundamentally >> my proposal only deals with variable attributes, whereas your needs >> involve >> an actual rearrangement of the data into different observations and >> variables. >> >> If you first transposed your data into "long-skinny" form using a BY >> statement, then provided a distinct _NAME_ variable for each individual >> observation in each BY group of the long-skinny dataset using logic of >> your >> own, then transposed that result back into short-wide form once again >> using >> a BY statement, then you could achieve roughly what you are looking for, >> and >> perhaps this is what you have already tried. >> >> The contribution that my proposal could make to this process is that you >> would be able to use the new PROC TRANSPOSE statements and options to >> force >> the resulting variables in the final data set (after transposing twice) >> to >> have all of the same types, lengths, formats, and informats as they did >> in >> the original data set, rather than all being changed to character >> variables >> with none of the other original attributes. If maintaining the original >> variable types and attributes is important, then my proposal could >> substantially simplify your process. >> >> s/KAM >> >> P.S. - Hope you don't mind that I copied this reply back to the list. I >> thought others might also be interested. >> >> >> ----- Original Message ----- >> From: "Ted Clay" <tclay(a)ashlandhome.net> >> To: "'Kevin Myers'" <KevinMyers(a)AUSTIN.RR.COM> >> Sent: Thursday, January 21, 2010 13:50 >> Subject: RE: SASware ballot #18 Proc Transpose preserving variable >> attributes >> >> >>> Hi Kevin, >>> Thanks for filling the gaps. Yes, what I am describing creates a >>> differently >>> shaped output data set, so is not equivalent to your proposal. Let any >>> confusion on that point end here. >>> >>> The ability to create and work with a "self-defining" skinny data set >>> would >>> be great. >>> >>> The situation I run into all the time (say, weekly) is where the data is >>> in >>> groups, and I have a mixture of variable types: numeric, character, >>> numeric >>> date. I want to get all the data in a group onto one observation. I >>> sometimes have an ID variable, or at other times just want numeric >>> suffixes >>> (1,2,3..). >>> Would your proposal make that easier to do than it is now? My hunch is >>> yes, >>> but it would not be a slam-dunk: >>> Proc transpose, with the additional options you propose on the Proc >>> statement. >>> Data step to calculate new variable names, and possibly labels. >>> (Calculate >>> numeric suffixes?) >>> Proc transpose, with the additional ID statements you propose. >>> >>> I had a typo on my example -- I meant to type "ONEOUT" not "OUTOUT". I >>> guess my dog "Spot" was tugging on my pants-leg. >>> Proc transpose data=testing out=trans ONEOUT; >>> By group; >>> Var N D; >>> ID C; >>> Run; >>> >>> Ted >>> >>> >>> >>> >>> -----Original Message----- >>> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of >>> Kevin >>> Myers >>> Sent: Thursday, January 21, 2010 9:27 AM >>> To: SAS-L(a)LISTSERV.UGA.EDU >>> Subject: Re: SASware ballot #18 Proc Transpose preserving variable >>> attributes >>> >>> Ted and all - >>> >>> The current item on the SASware ballot was put on there largely based on >>> my >>> input. I provided a detailed description of the suggested enhancements >>> via >>> SAS Institute's online Tracks system: >>> >>> 7610240654 - Base SAS: Additional options and statements for PROC >>> TRANSPOSE >>> to support transposing additional variable attributes, including format, >>> informat, type, and length. >>> >>> I don't know whether you will be able to view that, but am short on time >>> at >>> the moment, so below is an excerpt from my original submission. Please >>> note >>> that my proposal does NOT require any changes to existing PROC TRANSPOSE >>> behavior, but rather adds support for additional options and statements >>> to >>> support handling additional variable attributes analogous to the >>> existing >>> support for variable lablels. >>> >>> I haven't had time to review your suggestion in detail, but after >>> briefly >>> scanning through it I believe that you are asking for something that is >>> substantially different in concept. If so, then I would suggest that >>> you >>> submit a separate enhancement suggestion, because I wouldn't want your >>> proposal to be confused with or complicate the implementation of mine, >>> or >>> vice-versa. >>> >>> Hope this helps. >>> Kevin M. >>> >>> from: kevinmyers(a)austin.rr.com >>> name: Kevin A. Myers >>> company: Myers Engineering >>> product: BASE >>> os: none >>> suggestion text: PROC TRANSPOSE should be enhanced to support >>> transposing >>> additional variable attributes including format, >>> informat, type, and length. >>> >>> The following new STATEMENTS should be added. These are similar to the >>> existing idlabel statement, but provide input variable >>> names for the other variable attributes: >>> >>> idformat statement - variable name to provide formats >>> idinformat statement - variable name to provide informats >>> idtype statement - variable name to provide variable types >>> idlength statement - variable name to provide variable lengths >>> >>> The PROC TRANSPOSE statement should be enhanced to add the following >>> additional OPTIONS, which all perform similar functions to the >>> existing label option, but for the other variable attributes: >>> >>> format - output variable name for formats (default _FORMAT_) >>> informat - output variable name for informats (default _INFORMAT_) >>> type - output variable name for types (default _TYPE_) >>> length - output variable name for lengths (default _LENGTH_) >>> >>> Providing the new statements and options listed above would allow for >>> simple, complete transformation of a data set from >>> "long-skinny" to "short-wide" representations AND BACK AGAIN, or >>> vice-versa, >>> WITHOUT LOSS OF ANY FORMAT, INFORMAT, TYPE, and LENGTH >>> information. It would also greatly simplify >>> input and output of data from/to "self-defining" external file formats >>> where >>> the format, informat, type, and/or length attributes >>> are included within the external data file. >>> >>> Availability of the proposed new statements and options would allow >>> simple >>> loops based on "long-skinny" data representation >>> to be used for reading/writing data in self-defining formats, >>> followed/preceded by proc transpose steps to convert the data to/from a >>> more >>> typical "short-wide" representation. >>> >>> Code to implement the proposed statements and options should be >>> relatively >>> easy to develop by simply duplicating the existing >>> support for IDLABEL/LABEL= and making relatively minor modifications as >>> needed for each of the other specific attributes (format, >>> informat, type, length). >>> >>> >>> ----- Original Message ----- >>> From: "Ted Clay" <tclay(a)ASHLANDHOME.NET> >>> To: <SAS-L(a)LISTSERV.UGA.EDU> >>> Sent: Thursday, January 21, 2010 09:33 >>> Subject: SASware ballot #18 Proc Transpose preserving variable >>> attributes >>> >>> >>>> SASware ballot #18 refers to Proc Transpose preserving variable >>>> attributes. >>>> >>>> I'm familiar with the problem and frequently use a macro to solve it, >>>> but >>>> would be delighted to see this feature migrate into Base SAS. >>>> >>>> >>>> >>>> I am soliciting input from SAS-L in the hopes that this change will be >>>> implemented in the most useful way. >>>> >>>> Posting your comments may have some influence on a future enhancement >>>> of >>>> Proc Transpose. >>>> >>>> >>>> >>>> First off, I think this requires limiting the output to one observation >>>> per >>>> BY-group. I can't see any way around that. >>>> >>>> >>>> >>>> Assuming that is true, one idea for implementing the change to Proc >>>> Transpose would be to add a "ONEOUT" option on the proc statement. >>>> >>>> >>>> >>>> Example: >>>> >>>> Data testing; >>>> >>>> Input Group N C $ D mmddyy10.; >>>> >>>> format d mmddyy10.; >>>> >>>> label N='Numeric var' >>>> >>>> C='Text' >>>> >>>> D='Birthday'; >>>> >>>> Datalines; >>>> >>>> 1 57 A 01/15/2010 >>>> >>>> 1 84 B 02/15/2010 >>>> >>>> 2 33 B 03/15/2010 >>>> >>>> ; >>>> >>>> Proc transpose data=testing out=trans OUTOUT; >>>> >>>> By group; >>>> >>>> Var N D; >>>> >>>> Run; >>>> >>>> >>>> >>>> Proc print data=trans; >>>> >>>> Would produce: >>>> >>>> Obs Group N1 D1 N2 D2 >>>> >>>> >>>> >>>> 1 1 57 01/15/2010 84 02/15/2010 >>>> >>>> 2 2 33 03/15/2010 >>>> >>>> >>>> >>>> With the ID statement: >>>> >>>> Proc transpose data=testing out=trans OUTOUT; >>>> >>>> By group; >>>> >>>> Var N D; >>>> >>>> ID C; >>>> >>>> Run; >>>> >>>> >>>> >>>> Would produce: >>>> >>>> Obs Group NA DA NB DB >>>> >>>> >>>> >>>> 1 1 57 01/15/2010 84 02/15/2010 >>>> >>>> 2 2 33 03/15/2010 >>>> >>>> >>>> >>>> Other possible options come up in naming the output variables: (1) ID >>>> values >>>> before names (2) a separator between ID values and names, and (3) >>>> whether >>>> and how to modify labels. >>>> >>> >>> >>> >> >> >> > > > > > |