Multiple data sets based on a variable [SAS]

Prev: question asked in an Interview
Next: Proc SQL help needed please.

From: Melissa on 23 Jan 2010 18:32

On Dec 23 2009, 4:09 pm, RHOAD...(a)WESTAT.COM (Mike Rhoads) wrote:
> This is a fairly common problem. As Ron noted, it may not always be necessary/desirable to create separate data sets, but there certainly are times when there is no way around this. For instance, there may be a requirement to supply an appropriate "subset" data set to each of a group of regional managers.
>
> And, as the responses make clear, there is no easy, direct way to do this in SAS. "Traditional" DATA step techniques require the output data set to be specified at compile time. The hash object does provide a way around this, but at the cost of requiring all of the data to be stored in memory, which certainly isn't an inherent requirement of the problem.
>
> Accepting the need to at least specify all possible output data sets up front in the DATA statement, I'd like to see something like:
>
> /* Note this is a proposal, not working code */
> DATA ... /* list of possible data sets, presumably generated */ ;
> SET Original;
> OUTPUT _TEMPLATE_ (DSNAME = 'B_' || B);
> RUN;
>
> This proposes a new keyword, _TEMPLATE_, which would provide a dynamic capability to the OUTPUT statement. Whenever the OUTPUT statement is executed at run time, SAS would evaluate the character expression associated with DSNAME, and write an observation out to the data set that matches. It would be an error if the evaluated character string did not match one of the data set names listed on the DATA statement. This would significantly simplify the code, compared to a SELECT block or IF-THEN-ELSE IF sequence, and might also allow the compiler to generate more efficient code.
>
> I suspect it might require more changes to the underlying architecture to allow new output data sets to be opened on the fly, outside of the DATA step object extensions. If that were possible, the syntax could be even simpler:
>
> /* Note this is a proposal, not working code */
> DATA _TEMPLATE_;
> SET Original;
> OUTPUT _TEMPLATE_ (DSNAME = 'B_' || B);
> RUN;
>
> With this proposal, SAS would create output data sets as needed, based on the values in the original data sets, eliminating the need to specify all of the data set names on the DATA statement.
>
> Who knows -- maybe this will be a welcome stocking-stuffer in SAS 9.x one of these years. ;-)
>
> Mike Rhoads
> Rhoad...(a)Westat.com
>
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SA...(a)LISTSERV.UGA.EDU] On Behalf Of MikeS
> Sent: Tuesday, December 22, 2009 1:52 PM
> To: SA...(a)LISTSERV.UGA.EDU
> Subject: Multiple data sets based on a variable
>
> I need to create separate data sets from a single data set based on a variable (column B) . Like thus:
>
> Original Data Set 1:
> 19404,1987,4200439
> 19404,2329,5160848
> 19404,1987,6127673
> 19404,1987,7077287
> 19404,2329,5066985
> 19404,1987,6127673
> 19404,2329,4781209
> 19404,1987,0868172
> 19404,2329,6030901
> 19404,1987,9897579
>
> Need:
>
> Data Set 2:
> 19404,1987,4200439
> 19404,1987,6127673
> 19404,1987,7077287
> 19404,1987,6127673
> 19404,1987,0868172
> 19404,1987,9897579
>
> Data Set 3:
> 19404,2329,5160848
> 19404,2329,5066985
> 19404,2329,4781209
> 19404,2329,6030901
>
> There could be hundreds of data sets so I need a way to capture the unique values and write those out to separate data sets.
>
> Thanks for any help.
>
> Mike S.

Hi everyone,

Im currently taking a SAS course and am having some difficulty. In
this example I am supposed to create two data sets and Im not sure
how.

here is the problem:

Problem B
The raw data file Mixed_Recs.txt contains two types of records.
Records with 1 in Column 16 contains sales records with Date (in
mmddyy10. format) starting in Column 1 and Amount in Columns 11-15.
Records with a 2 in Column 16 are inventory records and they contain
two values: a part number(character 5 bytes) starting in Column 1 and
a quantity. These two values are separated by a space. Write a SAS
program to read this file and create two SAS data sets: Sales and
Inventory.

here is the raw data:

10/21/2005 1001
11/15/2005 2001
A13688 250 2
B11112 300 2
01/03/2005 50001
A88778 19 2

this is what I have so far, but this does not include two data sets
(one for sales and one for inventory)

DATA PROB_B;
INPUT @16 RECORD 1. @;
IF RECORD = 1 THEN
INPUT @1 DATE MMDDYY10.
@12 AMOUNT 4.;
ELSE IF RECORD = 2 THEN
INPUT @1 PARTNUM $6.
@8 QUANTITY 3.;
DROP RECORD;
FORMAT DATE MMDDYY10.;
DATALINES;
10/21/2005 1001
11/15/2005 2001
A13688 250 2
B11112 300 2
01/03/2005 50001
A88778 19 2
;
RUN;
PROC PRINT DATA=PROB_B;
RUN;

Please help!!

From: Tom Abernathy on 23 Jan 2010 19:22

You have done a good job of solving the issue of reading the two
different types of records.
Now you need solve the issue of generating two datasets.
Look at the syntax for DATA and OUTPUT statements. Also look up data
set options.

Good Luck.
- Tom

On Jan 23, 6:32 pm, Melissa <levimeli...(a)gmail.com> wrote:
>
> Hi everyone,
>
> Im currently taking a SAS course and am having some difficulty. In
> this example I am supposed to create two data sets and Im not sure
> how.
>
> here is the problem:
>
> Problem B
> The raw data file Mixed_Recs.txt contains two types of records.
> Records with 1 in Column 16 contains sales records with Date (in
> mmddyy10. format) starting in Column 1 and Amount in Columns 11-15.
> Records with a 2 in Column 16 are inventory records and they contain
> two values: a part number(character 5 bytes) starting in Column 1 and
> a quantity. These two values are separated by a space. Write a SAS
> program to read this file and create two SAS data sets: Sales and
> Inventory.
>
> here is the raw data:
>
> 10/21/2005 1001
> 11/15/2005 2001
> A13688 250 2
> B11112 300 2
> 01/03/2005 50001
> A88778 19 2
>
> this is what I have so far, but this does not include two data sets
> (one for sales and one for inventory)
>
> DATA PROB_B;
> INPUT @16 RECORD 1. @;
> IF RECORD = 1 THEN
> INPUT @1 DATE MMDDYY10.
> @12 AMOUNT 4.;
> ELSE IF RECORD = 2 THEN
> INPUT @1 PARTNUM $6.
> @8 QUANTITY 3.;
> DROP RECORD;
> FORMAT DATE MMDDYY10.;
> DATALINES;
> 10/21/2005 1001
> 11/15/2005 2001
> A13688 250 2
> B11112 300 2
> 01/03/2005 50001
> A88778 19 2
> ;
> RUN;
> PROC PRINT DATA=PROB_B;
> RUN;
>
> Please help!!- Hide quoted text -
>
> - Show quoted text -

From: Richard A. DeVenezia on 25 Jan 2010 13:09

On Dec 23 2009, 12:31 pm, dynamicpa...(a)YAHOO.COM (oloolo) wrote:
> I think this issue over again and come up the following solution:
> HoH provides us the capability to not-knowing a prior how many output
> datasets we need in the DATA statement, so that it really doesn't matter
> that we have to read through the WHOLE file at once if it is really too
> huge to fit into the memory (say >2GB in a 32-bit machine).
>
> Then we can split the data first by reading say first 25% first, output all
> necessary sub data sets; then the rest 75% in a 25% incremental.
>
> At the final step, we use PROC APPEND to glue these small data sets up. The
> complexity is still linear in number of observations

See "Hash based data splitter for limited resources"
http://groups.google.com/group/comp.soft-sys.sas/search?group=comp.soft-sys.sas&q=Hash+based+data+splitter+for+limited+resources

Should probably put all this HoH stuff in sascommunity and just point
to it when needed.

--
Richard A. DeVenezia
http://www.devenezia.com

|
Pages: 1
Prev: question asked in an Interview
Next: Proc SQL help needed please.