From: John on 24 Sep 2009 10:49 How can a field be properly read that has a newline character within double quotes? Right now I'm using the INFILE statement below and it loads the observation but shifts values of the fields. Raw File ------------------------ Field 1,Field 2,"Field 3 Spans 2 lines",Field4,Field5,,,,,, Field 1,"Field 2 Spans 2 lines",Field 3,Field4,Field5,,,,,, INFILE "/&fileRef" DLM=',' dsd; Resulting Obs Field1 Field 2 Field 3 Field 4 Field 5 etc... ------------------------------------------------------------------------------------------------ Field 1 Field 2 "Field 3 Spans 2 lines" Field4 etc.... Field 1 "Field 2 Spans 2 lines" Field 3 Field4 etc....
From: "Terjeson, Mark" on 24 Sep 2009 11:33 Hi John, A couple clarification questions: 1) Do you want to keep those line terminations embedded in Field 3? or drop them? (there can be two approaches, a) do all the custom handling during the INPUT stage, or b) write a few line utility to strip out line terminations that would be inside a running set of quotes and then just use very basic dsd input operations on the corrected file) 2) Will there always be the same number of delimited fields for each record? (just doublechecking)(necessary if anyone decides to do more of an incoming stream approach...) 3) What platform(o/s) are you on? i.e. what are the line termination character(s) in your file? (e.g. LF, CR, CRLF, other?) Hope this is helpful. Mark Terjeson Investment Business Intelligence Investment Management & Research Russell Investments 253-439-2367 Russell Global Leaders in Multi-Manager Investing -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of John Sent: Thursday, September 24, 2009 7:50 AM To: SAS-L(a)LISTSERV.UGA.EDU Subject: SAS DSD Option on a delimited file How can a field be properly read that has a newline character within double quotes? Right now I'm using the INFILE statement below and it loads the observation but shifts values of the fields. Raw File ------------------------ Field 1,Field 2,"Field 3 Spans 2 lines",Field4,Field5,,,,,, Field 1,"Field 2 Spans 2 lines",Field 3,Field4,Field5,,,,,, INFILE "/&fileRef" DLM=',' dsd; Resulting Obs Field1 Field 2 Field 3 Field 4 Field 5 etc... ------------------------------------------------------------------------ ------------------------ Field 1 Field 2 "Field 3 Spans 2 lines" Field4 etc.... Field 1 "Field 2 Spans 2 lines" Field 3 Field4 etc....
From: Andre Wielki on 24 Sep 2009 11:50 John this is a begin of solution as your contents is varying character or numeric? you have to rework your sas dataset concatenate or other HTH Andre filename a "d:\Field11.txt"; data readcorrect; length field1 8 field2 field3 field3b $15 field4 8; infile a dlm=',' dsd pad truncover; input field1 :10. (field2-field3)(:$15.)/ field3b field4-field10 ; run; for data like 1, 2,"Field 3 Spans 2 lines", 4, 5,,,,,, 1,"Field 2 Spans 2 lines", 3, 4, 5,,,,,, John a �crit : > How can a field be properly read that has a newline character within > double quotes? > Right now I'm using the INFILE statement below and it loads the > observation but shifts values of the fields. > > Raw File > ------------------------ > Field 1,Field 2,"Field 3 > Spans 2 lines",Field4,Field5,,,,,, > Field 1,"Field 2 > Spans 2 lines",Field 3,Field4,Field5,,,,,, > > INFILE "/&fileRef" DLM=',' dsd; > > Resulting Obs > Field1 Field 2 Field 3 Field 4 Field > 5 etc... > ------------------------------------------------------------------------------------------------ > Field 1 Field 2 "Field 3 Spans 2 lines" Field4 > etc.... > Field 1 "Field 2 Spans 2 lines" Field 3 Field4 > etc.... > > -- Andr� WIELKI INED (Institut National d'Etudes D�mographiques) Service Informatique 133 Boulevard Davout 75980 Paris Cedex 20 m�l : wielki(a)ined.fr t�l : 33 (0) 1 56 06 21 54
From: "Data _null_;" on 24 Sep 2009 13:49 This is a common problem when EXCEL files are saved as CSV or other flat file format. Your file contains '0A'x character(s) LF(linefeed) that it are in the "middle" of the record. I have never found an INFILE option that allows this character to be read properly. The INFILE options MAC holds some promise, but as you can see below the '0A'x associated with CR the PC termstring also becomes part of the input. 2479 filename FT56F001 'testlf.csv'; 2480 data _null_; 2481 infile FT56F001 mac; 2482 input; 2483 list; 2484 run; NOTE: The infile FT56F001 is: File Name=C:\Documents and Settings\....\My Documents\My SAS Files\9.1\testlf.csv, RECFM=V,LRECL=256 RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+- 1 CHAR Field 1,Field 2,"Field 3 .Spans 2 lines",Field4,Field5,,,,,, 61 ZONE 4666623246666232246666232057667232666672246666324666632222222 NUMR 695C401C695C402C2695C4030A301E3020C9E532C695C44C695C45CCCCCC0 2 CHAR .Field 1,"Field 2 .Spans 2 lines",Field 3,Field4,Field5,,,,,, 62 ZONE 04666623224666623205766723266667224666623246666324666632222222 NUMR A695C401C2695C4020A301E3020C9E532C695C403C695C44C695C45CCCCCC0 3 CHAR . 1 ZONE 0 NUMR A NOTE: 3 records were read from the infile FT56F001. I prefer to just "bite the bytes" and clean the records and get on with it. This data step can be used for that purpose. This program updates the file in place, using INFILE option SHAREBUFFERS, which may be new to you. The program replaces the LFs with SPACES but you could used some other character if you need to "find them later". filename FT56F001 'testlf0.csv'; data _null_; length filename $256; infile FT56F001 filename=filename recfm=N lrecl=256 sharebuffers eof=eof; file FT56F001; lbyte = '00'x; c = 0; do while(1); input byte $char1.; if byte eq '0a'x and lbyte ne '0d'x then do; c + 1; put '20'x; end; lbyte = byte; end; eof: putlog 'NOTE: There were ' c 'maddening line feed characters cleaned from: ' filename; return; run; If anyone knows the MAGIC option please let us know. On 9/24/09, John <jpovey(a)gmail.com> wrote: > How can a field be properly read that has a newline character within > double quotes? > Right now I'm using the INFILE statement below and it loads the > observation but shifts values of the fields. > > Raw File > ------------------------ > Field 1,Field 2,"Field 3 > Spans 2 lines",Field4,Field5,,,,,, > Field 1,"Field 2 > Spans 2 lines",Field 3,Field4,Field5,,,,,, > > INFILE "/&fileRef" DLM=',' dsd; > > Resulting Obs > Field1 Field 2 Field 3 Field 4 Field > 5 etc... > ------------------------------------------------------------------------------------------------ > Field 1 Field 2 "Field 3 Spans 2 lines" Field4 > etc.... > Field 1 "Field 2 Spans 2 lines" Field 3 Field4 > etc.... >
From: John on 24 Sep 2009 19:49 Thanks for all your suggestion. I was able to read the file and do a little datastep cleansing. While I didn't find an infile option, I left the infile I had and created and array of columns from 1 ro 250 or so that excel maxes out at. I was then able to loop through and search for fields that started with '"' or ended with a '"' and combined them to a single field. A nested do loop did the trick, but it doesn't preserve the newline character, which is fine for my purposes. Thanks again John
|
Pages: 1 Prev: modify MS SQL tabel using ODBC Next: Variable reduction method |