Prev: Normalization in SAS
Next: Unzip a file
From: "Choate, Paul@DDS" on 22 Sep 2006 15:26 Jue - try this - PROC IMPORT OUT= WORK.A DATAFILE= "c:\data.csv" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=2; GUESSINGROWS=3276; RUN; hth Paul Choate DDS Data Extraction (916) 654-2160 -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Jue Wang Sent: Friday, September 22, 2006 11:01 AM To: SAS-L(a)LISTSERV.UGA.EDU Subject: Variable length when using Proc Import I am using the following program to read in data. The data has a variable "name" whose length varies from 3 to 18, unfortunately, since the first observation's name is 3-character, the read-in SAS dataset truncates all other names to have length 3. I understand that the easiest resolution is to put the longest name in the first row of the original data set, but is there any other way to correctly read in data without changing the original data set? Thank You PROC IMPORT OUT= WORK.A DATAFILE= "C:\data.csv" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=2; RUN;
From: Jue Wang on 22 Sep 2006 15:41 -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU]On Behalf Of Choate, Paul(a)DDS Sent: Friday, September 22, 2006 3:26 PM To: SAS-L(a)LISTSERV.UGA.EDU Subject: Re: [SAS-L] Variable length when using Proc Import Jue - try this - PROC IMPORT OUT= WORK.A DATAFILE= "c:\data.csv" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=2; GUESSINGROWS=3276; RUN; hth Paul Choate DDS Data Extraction (916) 654-2160 -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Jue Wang Sent: Friday, September 22, 2006 11:01 AM To: SAS-L(a)LISTSERV.UGA.EDU Subject: Variable length when using Proc Import I am using the following program to read in data. The data has a variable "name" whose length varies from 3 to 18, unfortunately, since the first observation's name is 3-character, the read-in SAS dataset truncates all other names to have length 3. I understand that the easiest resolution is to put the longest name in the first row of the original data set, but is there any other way to correctly read in data without changing the original data set? Thank You PROC IMPORT OUT= WORK.A DATAFILE= "C:\data.csv" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=2; RUN;
From: "Xiao, Steven" on 22 Sep 2006 15:53 That might works for 9.1 ver, but it seems to be that 8.x under Window OS doesn't have that option. Any other way to do it in 8.x ver? Steven -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Choate, Paul(a)DDS Sent: Friday, September 22, 2006 3:26 PM To: SAS-L(a)LISTSERV.UGA.EDU Subject: Re: Variable length when using Proc Import Jue - try this - PROC IMPORT OUT= WORK.A DATAFILE= "c:\data.csv" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=2; GUESSINGROWS=3276; RUN; hth Paul Choate DDS Data Extraction (916) 654-2160 -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Jue Wang Sent: Friday, September 22, 2006 11:01 AM To: SAS-L(a)LISTSERV.UGA.EDU Subject: Variable length when using Proc Import I am using the following program to read in data. The data has a variable "name" whose length varies from 3 to 18, unfortunately, since the first observation's name is 3-character, the read-in SAS dataset truncates all other names to have length 3. I understand that the easiest resolution is to put the longest name in the first row of the original data set, but is there any other way to correctly read in data without changing the original data set? Thank You PROC IMPORT OUT= WORK.A DATAFILE= "C:\data.csv" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=2; RUN;
From: Jue Wang on 22 Sep 2006 15:41 Dear Paul, Thank you for the Help. But somehow I got error message in the log, and no data was read in: 519 PROC IMPORT OUT= WORK.A 520 DATAFILE= "C:\data.csv" 521 DBMS=CSV REPLACE; NOTE: The previous statement has been deleted. 522 GETNAMES=YES; 523 DATAROW=2; NOTE: SCL source line. 524 GUESSINGROWS=3276; ------------ 180 ERROR 180-322: Statement is not valid or it is used out of proper order. 525 RUN; -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU]On Behalf Of Choate, Paul(a)DDS Sent: Friday, September 22, 2006 3:26 PM To: SAS-L(a)LISTSERV.UGA.EDU Subject: Re: [SAS-L] Variable length when using Proc Import Jue - try this - PROC IMPORT OUT= WORK.A DATAFILE= "c:\data.csv" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=2; GUESSINGROWS=3276; RUN; hth Paul Choate DDS Data Extraction (916) 654-2160
From: "Choate, Paul@DDS" on 22 Sep 2006 16:40
Well - it works on v9. I guess you are using SAS 8? There is a v8 tweak that may be useful to you.... SN-V8+-001075 How to scan more than 20 records to determine variable attributes in EFI ------------------------------------------------------------------------ In Versions 7 and 8 of the SAS System, by default the Import Wizard, PROC IMPORT and the External File Interface (EFI) scan 20 records to determine variable attributes when reading delimited text files. Changing the default setting can only be done for EFI in Version 7, Release 8 and Release 8.1. Beginning in Release 8.2 changing the default setting is applicable to the Import Wizard, PROC IMPORT and EFI. Follow the steps below to change the default behavior: 1) Type regedit on the command line (white box with a check mark) 2) When the Registry Editor window opens, double click on the PRODUCTS icon 3) Double click on the BASE icon 4) Double click on the EFI icon 5) In the window on the right the Contents of EFI will be populated with EFI options 6) Double click on GuessingRows 7) When the new window opens with the old value of 20, delete it, enter the new value, and click on OK 8) Close the Registry Editor window 9) Invoke the Import Wizard, PROC IMPORT or EFI to use the new GuessingRows value The new value entered for GuessingRows will remain until you change it. Beginning in SAS 9.1 there is a new statement, GUESSINGROWS=, that can be specified with PROC IMPORT. By specifying the GUESSINGROWS= statement with PROC IMPORT, you do not have to change the GuessingRows value in the SAS Registry. Also beginning in SAS 9.1 you can specify the Number of Rows to Guess when using the Import Wizard in the SAS Import: Delimited File Options Window or when using EFI in the Options for Import Window. ------------------------------------------------------------------------ ---- Product: Base SAS Component: External File Interface(OS) Priority: N/A Note Type: Usage Issue Date: Fri, 15 Apr 2005 Here's post you might be interested in: From: Bruce Gilsen Date: Thurs, Nov 17 2005 8:53 am Groups: comp.soft-sys.sas By default, the Import Wizard, PROC IMPORT, and the External File Interface (EFI) scan the first 20 records to determine variable attributes such as field length when reading delimited text files (CSV, TAB, and DLM files). If character fields have longer values past the first 20 records, PROC IMPORT, the Import Wizard, and the EFI truncate them. Before Version 9.1, you had to update the SAS registry to fix this problem (SAS Note 1075, http://support.sas.com/techsup/unotes/SN/001/001075.html). Starting in Version 9.1, the PROC IMPORT GUESSINGROWS= option tells SAS how many records to scan. The Import Wizard and the EFI also have ways to do this (see SAS Note 1075). One problem is that SAS always checks all variables; there's no way to request that just that some variables are checked, and GUESSINGROWS slows down program execution for large files. Here's a test I ran on Linux 9.1.3 with CSV files. In all cases, all but 1 variable was numeric (so that no checking beyond the default 20 records was needed), and a single character variable, COMPANY, was "ibm" or "aol" in the first 20 observations and "microsoft" in all other observations. Without GUESSINGROWS=21 or higher, PROC IMPORT sets the length of COMPANY to 3 based on the first 20 records and truncates "microsoft" to "mic". All tests had 16,000 observations, and I either omitted GUESSINGROWS or set it to 16,000 to check the whole file. Number of variables GUESSINGROWS=16,000? SECONDS (cpu/real) 3 no .11/.12 3 yes 5.92/6.00 35 no .31/.33 35 yes 50.63/50.76 So, most of the time in the last test is presumably spent checking variables that I know don't need to be checked. My suggestion: an option paired with GUESSINGROWS called GUESSINGCOLUMNS that would specify which columns to check past the default 20 records. If omitted, the current behavior, check all columns, takes place. Here's the code I used. One of the first 2 DATA steps would be commented out, based on which tests I am doing. /* Create data for tests 1 and 2 (3 variables) */ data one; retain i company itimes10; length company $10; do i=1 to 16000; itimes10 = i * 10; if i le 10 then company="ibm"; else if i le 20 then company="aol"; else company="microsoft"; output; end; run; /* Create data for tests 3 and 4 (35 variables) */ data one; retain i company itimes10 i1-i32; array iall (*) i1-i32; drop j; length company $10; do i=1 to 16000; itimes10 = i * 10; /* in 1st obs, i1=1 i2=2 ... i32=32. in 2nd obs, i1=2 i2=3 ... i32=33. etc. */ do j = 1 to dim(iall); iall(j) = i+j-1; end; if i le 10 then company="ibm"; else if i le 20 then company="aol"; else company="microsoft"; output; end; run; /* create a CSV file for testing */ proc export data=one outfile = '/it/support/it/m1bfg00/csv4.csv' dbms=csv; run; /* read the CSV file */ proc import out=csvin datafile = '/it/support/it/m1bfg00/csv4.csv' dbms=csv; getnames=yes; guessingrows=16000; datarow=2; run; ...................................... Paul Choate DDS Data Extraction (916) 654-2160 -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Jue Wang Sent: Friday, September 22, 2006 12:42 PM To: SAS-L(a)LISTSERV.UGA.EDU Subject: Re: Variable length when using Proc Import Dear Paul, Thank you for the Help. But somehow I got error message in the log, and no data was read in: 519 PROC IMPORT OUT= WORK.A 520 DATAFILE= "C:\data.csv" 521 DBMS=CSV REPLACE; NOTE: The previous statement has been deleted. 522 GETNAMES=YES; 523 DATAROW=2; NOTE: SCL source line. 524 GUESSINGROWS=3276; ------------ 180 ERROR 180-322: Statement is not valid or it is used out of proper order. 525 RUN; -----Original Message----- F |