From: dpb on 1 Feb 2010 20:11 analyst41(a)hotmail.com wrote: > On Feb 1, 7:40 pm, dpb <n...(a)non.net> wrote: .... >> Search and count the 0D 0A pairs (CRLF) -- they're the record markers. >> > Thanks. > > But since these markers are occurring both in the middle of a field > and also at the end of an actual row from the database - I am still > not able to separate out true EORs from the others. Well, they _are_ "true" EORs. _WHY_ you're getting them where you apparently think you shouldn't is a database export problem, apprently. Is there a field/line length variable you could set or somesuch, perhaps? --
From: Gordon Sande on 1 Feb 2010 22:10 On 2010-02-01 21:11:22 -0400, dpb <none(a)non.net> said: > analyst41(a)hotmail.com wrote: >> On Feb 1, 7:40 pm, dpb <n...(a)non.net> wrote: > ... > >>> Search and count the 0D 0A pairs (CRLF) -- they're the record markers. >>> >> Thanks. >> >> But since these markers are occurring both in the middle of a field >> and also at the end of an actual row from the database - I am still >> not able to separate out true EORs from the others. > > Well, they _are_ "true" EORs. _WHY_ you're getting them where you > apparently think you shouldn't is a database export problem, apprently. > > Is there a field/line length variable you could set or somesuch, perhaps? The example you gave looks like HTML or some close technical relative. Such files typically are intended to have their contents be independent of whatever line ends they might contain. If they are stored as stings in a database system the length will be external to the strings and the number of displayed rows (lines?) will be dependent on the internal semantics (the <br> <\br> thingys) independent of the line ends. It sure looks like you are being asked a question which depends on the semantics of the data and they forgot to tell you what they are. It may be just that there are two techniical useages of some otherwise innocent term like line or row.
From: dpb on 2 Feb 2010 09:52 Gordon Sande wrote: > On 2010-02-01 21:11:22 -0400, dpb <none(a)non.net> said: > >> analyst41(a)hotmail.com wrote: >>> On Feb 1, 7:40 pm, dpb <n...(a)non.net> wrote: >> ... >> >>>> Search and count the 0D 0A pairs (CRLF) -- they're the record markers. >>>> >>> Thanks. >>> >>> But since these markers are occurring both in the middle of a field >>> and also at the end of an actual row from the database - I am still >>> not able to separate out true EORs from the others. >> >> Well, they _are_ "true" EORs. _WHY_ you're getting them where you >> apparently think you shouldn't is a database export problem, apprently. >> >> Is there a field/line length variable you could set or somesuch, perhaps? > > The example you gave looks like HTML or some close technical > relative. Such files typically are intended to have their contents > be independent of whatever line ends they might contain. If they are > stored as stings in a database system the length will be external to > the strings and the number of displayed rows (lines?) will be > dependent on the internal semantics (the <br> <\br> thingys) > independent of the line ends. > > It sure looks like you are being asked a question which depends on the > semantics of the data and they forgot to tell you what they are. It > may be just that there are two techniical useages of some otherwise > innocent term like line or row. > Yes, I think that's a fair assumption as well; my response simply addressed the specifics of the question as whether the database export was embedding some other unexpected control character owing to the encoding or somesuch. That doesn't appear to be the problem at all; rather as you say it appears it's simply the line breaks in the original data are likely just being imported into a text-based database record as found in the original data source. --
From: analyst41 on 2 Feb 2010 19:39 On Feb 2, 9:52 am, dpb <n...(a)non.net> wrote: > Gordon Sande wrote: > > On 2010-02-01 21:11:22 -0400, dpb <n...(a)non.net> said: > > >> analys...(a)hotmail.com wrote: > >>> On Feb 1, 7:40 pm, dpb <n...(a)non.net> wrote: > >> ... > > >>>> Search and count the 0D 0A pairs (CRLF) -- they're the record markers. > > >>> Thanks. > > >>> But since these markers are occurring both in the middle of a field > >>> and also at the end of an actual row from the database - I am still > >>> not able to separate out true EORs from the others. > > >> Well, they _are_ "true" EORs. _WHY_ you're getting them where you > >> apparently think you shouldn't is a database export problem, apprently.. > > >> Is there a field/line length variable you could set or somesuch, perhaps? > > > The example you gave looks like HTML or some close technical > > relative. Such files typically are intended to have their contents > > be independent of whatever line ends they might contain. If they are > > stored as stings in a database system the length will be external to > > the strings and the number of displayed rows (lines?) will be > > dependent on the internal semantics (the <br> <\br> thingys) > > independent of the line ends. > > > It sure looks like you are being asked a question which depends on the > > semantics of the data and they forgot to tell you what they are. It > > may be just that there are two techniical useages of some otherwise > > innocent term like line or row. > > Yes, I think that's a fair assumption as well; my response simply > addressed the specifics of the question as whether the database export > was embedding some other unexpected control character owing to the > encoding or somesuch. That doesn't appear to be the problem at all; > rather as you say it appears it's simply the line breaks in the original > data are likely just being imported into a text-based database record as > found in the original data source. > > --- Hide quoted text - > > - Show quoted text - But the database access client is able to tell the end-of-rows from the newline markers embedded in some columns. In other words, you will see two rows when you are in the client. When you say 'save results to csv file' it becomes a file which seems to have 5 rows as seem by excel, notepad, fortran etc. I am surprised nobobdy else seems to have faced this problem.
From: dpb on 2 Feb 2010 22:42
analyst41(a)hotmail.com wrote: > On Feb 2, 9:52 am, dpb <n...(a)non.net> wrote: >> Gordon Sande wrote: >>> On 2010-02-01 21:11:22 -0400, dpb <n...(a)non.net> said: >>>> analys...(a)hotmail.com wrote: >>>>> On Feb 1, 7:40 pm, dpb <n...(a)non.net> wrote: >>>> ... >>>>>> Search and count the 0D 0A pairs (CRLF) -- they're the record markers. >>>>> Thanks. >>>>> But since these markers are occurring both in the middle of a field >>>>> and also at the end of an actual row from the database - I am still >>>>> not able to separate out true EORs from the others. >>>> Well, they _are_ "true" EORs. _WHY_ you're getting them where you >>>> apparently think you shouldn't is a database export problem, apprently. >>>> Is there a field/line length variable you could set or somesuch, perhaps? >>> The example you gave looks like HTML or some close technical >>> relative. Such files typically are intended to have their contents >>> be independent of whatever line ends they might contain. If they are >>> stored as stings in a database system the length will be external to >>> the strings and the number of displayed rows (lines?) will be >>> dependent on the internal semantics (the <br> <\br> thingys) >>> independent of the line ends. >>> It sure looks like you are being asked a question which depends on the >>> semantics of the data and they forgot to tell you what they are. It >>> may be just that there are two techniical useages of some otherwise >>> innocent term like line or row. >> Yes, I think that's a fair assumption as well; my response simply >> addressed the specifics of the question as whether the database export >> was embedding some other unexpected control character owing to the >> encoding or somesuch. That doesn't appear to be the problem at all; >> rather as you say it appears it's simply the line breaks in the original >> data are likely just being imported into a text-based database record as >> found in the original data source. .... > But the database access client is able to tell the end-of-rows from > the newline markers embedded in some columns. Don't think that proves anything... > In other words, you will see two rows when you are in the client. > When you say 'save results to csv file' it becomes a file which seems > to have 5 rows as seem by excel, notepad, fortran etc. As above, all that does is say that the client of the database is displaying two rows. It doesn't say anything useful for this issue about what is actually in the database record. Can you somehow capture what is actually embedded in the database record w/o the user display by some other export method that doesn't format it but dumps it as a stream? Or going back a step, what's in the input to the database--is it a single record or multiple lines from some formatted data stream? Either of those could show where the CRLF is coming from perhaps. Not only does the csv file "seem" to have five rows, it _does_ have five rows (as the dump explicitly showed). > I am surprised nobobdy else seems to have faced this problem. What database/what client? I've not done enough true database engine work in 40 years to have ever run into much of any problem with one; when I did have needs in the area I farmed that portion out religiously. But, either the data are embedded w/ the LF pairs and the export function is simply echoing them or it's broke or configured for short lines that are wrapping or somesuch. W/O more of the rest of the puzzle don't think there's anything else that can be said other than it isn't a Fortran problem; it's the data file itself that's your problem. How to fix that the way you want I've no clue, specifically. -- |