From: Paul M Foster on 18 Mar 2010 12:12 On Thu, Mar 18, 2010 at 08:57:00AM -0700, Tommy Pham wrote: <snip> > > Personally, I find working with fixed widths is best. The text file > might be larger but I don't have worry about escaping any type of > characters ;) I find this impossible, since I never know the largest width of all the fields in a file. And a simple explode() call allows pulling all the fields into an array, based on a common delimiter. Paul -- Paul M. Foster
From: Mattias Thorslund on 18 Mar 2010 12:16 Paul M Foster wrote: > I process a lot of CSV files, and what I typically see is that Excel > will enclose fields which might contain commas in quotes. This gets > messy. So I finally wrote a C utility which parses the file and yields > tab-delimited records without the quotes. > > Paul > And fgetcsv() didn't work for you? http://www.php.net/fgetcsv Cheers, Mattias
From: Ashley Sheridan on 18 Mar 2010 12:15 On Thu, 2010-03-18 at 12:12 -0400, Paul M Foster wrote: > On Thu, Mar 18, 2010 at 08:57:00AM -0700, Tommy Pham wrote: > > <snip> > > > > > Personally, I find working with fixed widths is best. The text file > > might be larger but I don't have worry about escaping any type of > > characters ;) > > I find this impossible, since I never know the largest width of all the > fields in a file. And a simple explode() call allows pulling all the > fields into an array, based on a common delimiter. > > Paul > > -- > Paul M. Foster > Explode won't work in the case of a comma in a field value. Also, newlines can exist within a field value, so a line in the file doesn't equate to a row of data The best way is just to start parsing at the beginning of the file and break it into fields one by one from there. The bit I don't like about characters other than a comma being used in a "comma separated values" file is that you can't automatically tell what character has been used as the delimiter. Hence being asked by spreadsheet programs what the delimiter is if a comma doesn't give up what it recognises as valid fields. Thanks, Ash http://www.ashleysheridan.co.uk
From: Paul M Foster on 18 Mar 2010 12:41 On Thu, Mar 18, 2010 at 09:16:30AM -0700, Mattias Thorslund wrote: > Paul M Foster wrote: >> I process a lot of CSV files, and what I typically see is that Excel >> will enclose fields which might contain commas in quotes. This gets >> messy. So I finally wrote a C utility which parses the file and yields >> tab-delimited records without the quotes. >> >> Paul >> > > And fgetcsv() didn't work for you? > > http://www.php.net/fgetcsv I wrote my utility (and the infrastructure to process these files) long before I was working with PHP. For what I do with the files, I must pipe one operation's results to another process/command to get the final result. This is impossible with web-based PHP. So I shell out from PHP to do it. Like this: // convert original file to tab-delimited cat maillist.csv | cqf | filter.cq3or4 > jones.tab // filter unwanted fields and reorder fields mlt3.py nady jones.tab jones.rdb // build basic DBF file dbfsak -r mailers.rdb jones.dbf // append rdb records to DBF file dbfsak -a jones.rdb jones.dbf Paul -- Paul M. Foster
From: Paul M Foster on 18 Mar 2010 12:57 On Thu, Mar 18, 2010 at 04:15:33PM +0000, Ashley Sheridan wrote: > On Thu, 2010-03-18 at 12:12 -0400, Paul M Foster wrote: > > On Thu, Mar 18, 2010 at 08:57:00AM -0700, Tommy Pham wrote: > > <snip> > > > > > Personally, I find working with fixed widths is best. The text file > > might be larger but I don't have worry about escaping any type of > > characters ;) > > I find this impossible, since I never know the largest width of all the > fields in a file. And a simple explode() call allows pulling all the > fields into an array, based on a common delimiter. > > Paul > > -- > Paul M. Foster > > > > Explode won't work in the case of a comma in a field value. That's why I convert the files to tab-delimited first. explode() does work in that case. > > Also, newlines can exist within a field value, so a line in the file doesn't > equate to a row of data I've never seen this in the files I receive. > > The best way is just to start parsing at the beginning of the file and break it > into fields one by one from there. > > The bit I don't like about characters other than a comma being used in a "comma > separated values" file is that you can't automatically tell what character has > been used as the delimiter. Hence being asked by spreadsheet programs what the > delimiter is if a comma doesn't give up what it recognises as valid fields. I've honestly never seen a "CSV" or "Comma-separated Values" which used tabs for delimiters. At that point, it's really not a *comma* separated value file. My application for all this is accepting mailing lists from customers which I have to convert into DBFs for a commercial mailing list program. Because most of my customers can barely find the on/off switch on their computers, I never know what I'm going to get. So before I string together the filters to process the file, I have to actually look at and analyze the file to find out what it is. Could be a fixed-field length file, a CSV, a tab-delimited file, or anything in between. Once I've selected the filters, the sequence they will be put together in, and the fields from the file I want to capture, I hit the button. After it's all done, I now have to look at the result to ensure that the requested fields ended up where they were supposed to. Paul -- Paul M. Foster
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: Need routine to tell me number of dimensions in array. Next: $_FILE array being truncated |