From: GizmoGorilla on 9 Feb 2010 12:18 I'm working on a project using a text widget. I've noticed that after saving the data from the text widget that the data file format is exactly as it was displayed in the text widget. So TABS and CRLFs end up taking a huge amount of space. My data file looks like... Some text tabbed to some column several CRLF's down the page So the data file ends up taking 10 lines of data in this example. Is there any way to "compress" this for saving and "decompress" for loading, retaining the original tabbing & CRLF's without using so much space in the data file?? Thanks! GG
From: Will Duquette on 9 Feb 2010 13:30 On Feb 9, 9:18 am, GizmoGorilla <gizmogori...(a)hotmail.com> wrote: > I'm working on a project using a text widget. I've noticed that > after saving the data from the text widget that the data file > format is exactly as it was displayed in the text widget. So > TABS and CRLFs end up taking a huge amount of space. My data > file looks like... > > Some text tabbed to some column > > several CRLF's down the page > > So the data file ends up taking 10 lines of data in this > example. Is there any way to "compress" this for saving > and "decompress" for loading, retaining the original > tabbing & CRLF's without using so much space in the > data file?? > > Thanks! > > GG I'm missing something. Why is this a problem? Each TAB is one character, and each newline is one character (two characters on the disk, on Windows). How much more can you compress it? You could choose to replace each tab and each CRLF with some other printable string on output, e.g., "\t" and "\n", using [string map], and convert them back on input; but that's not going to decrease the actual number of bytes.
From: GizmoGorilla on 9 Feb 2010 14:26 On 2010-02-09 1:30 PM, Will Duquette wrote: > On Feb 9, 9:18 am, GizmoGorilla<gizmogori...(a)hotmail.com> wrote: >> I'm working on a project using a text widget. I've noticed that >> after saving the data from the text widget that the data file >> format is exactly as it was displayed in the text widget. So >> TABS and CRLFs end up taking a huge amount of space. My data >> file looks like... >> >> Some text tabbed to some column >> >> several CRLF's down the page >> >> So the data file ends up taking 10 lines of data in this >> example. Is there any way to "compress" this for saving >> and "decompress" for loading, retaining the original >> tabbing& CRLF's without using so much space in the >> data file?? >> >> Thanks! >> >> GG > > I'm missing something. Why is this a problem? Each TAB is one > character, and each newline is one character (two characters on the > disk, on Windows). How much more can you compress it? > > You could choose to replace each tab and each CRLF with some other > printable string on output, e.g., "\t" and "\n", using [string map], > and convert them back on input; but that's not going to decrease the > actual number of bytes. The tab is inserting blanks up to the column that was tabbed to. CRLF's will leave a blank line if there's no text on the line. That's fine for the user but not for the data file. If I crlf 5 times with no text on the line, I get 5 blank lines, in my data file. This is a huge waste of space, that's why its a problem. Data is being saved exactly as it is viewed in the text widget, WYSIWYG. So perhaps when I save, for example... (users view) some text tab to here \n \n \n \n ....it could be saved as... (data view) some text [5x\t] tab to here [4x\n] and then expanded on a load. This is what I mean by compressing, removing the redundant data to save space. This would remove the spaces inserted by the tabs, and remove the blank lines, So my data now uses 2 lines, not 5. I was hoping there was an easier way to do this, other than converting during save/load. I hope that clarifies my post... Thanks!
From: Bryan Oakley on 9 Feb 2010 15:55 On Feb 9, 1:26 pm, GizmoGorilla <gizmogori...(a)hotmail.com> wrote: > > The tab is inserting blanks up to the column that was tabbed to. .... so, you don't actually have tabs in the files, you have blocks of consecutive spaces. Make sure when you describe a problem that you describe it accurately. > CRLF's will leave a blank line if there's no text on the line. > That's fine for the user but not for the data file. > If I crlf 5 times with no text on the line, I get 5 blank lines, > in my data file. This is a huge waste of space, five *bytes* is a huge waste of space? *bytes*? Even if you're talking about one thousand blank lines, that's still only 1K. That's a minuscule amount of disk space in this day and age, probably smaller than the disk block size. Are you are a system that is seriously constrained in space? If you're concerned about disk space you should consider using a standard compression scheme. Just run your data through zip/gzip and that way any other wasteful characters will also get compressed. Of course, you're doing this all to the inconvenience of the user who will no longer be able to edit your text files with anything but your tool (or, force them to unzip the file before editing it with some other tool) I don't mean to sound preachy, but it sounds like you're making a beginners mistake of premature optimization. Don't worry about compression until it actually proves to be a problem. Otherwise you'll spend way too much time on something that just doesn't matter.
From: Will Duquette on 9 Feb 2010 16:20
On Feb 9, 11:26 am, GizmoGorilla <gizmogori...(a)hotmail.com> wrote: > On 2010-02-09 1:30 PM, Will Duquette wrote: > > > > > > > On Feb 9, 9:18 am, GizmoGorilla<gizmogori...(a)hotmail.com> wrote: > >> I'm working on a project using a text widget. I've noticed that > >> after saving the data from the text widget that the data file > >> format is exactly as it was displayed in the text widget. So > >> TABS and CRLFs end up taking a huge amount of space. My data > >> file looks like... > > >> Some text tabbed to some column > > >> several CRLF's down the page > > >> So the data file ends up taking 10 lines of data in this > >> example. Is there any way to "compress" this for saving > >> and "decompress" for loading, retaining the original > >> tabbing& CRLF's without using so much space in the > >> data file?? > > >> Thanks! > > >> GG > > > I'm missing something. Why is this a problem? Each TAB is one > > character, and each newline is one character (two characters on the > > disk, on Windows). How much more can you compress it? > > > You could choose to replace each tab and each CRLF with some other > > printable string on output, e.g., "\t" and "\n", using [string map], > > and convert them back on input; but that's not going to decrease the > > actual number of bytes. > > The tab is inserting blanks up to the column that was tabbed to. > CRLF's will leave a blank line if there's no text on the line. > That's fine for the user but not for the data file. > If I crlf 5 times with no text on the line, I get 5 blank lines, > in my data file. This is a huge waste of space, that's why its a > problem. Data is being saved exactly as it is viewed in the text > widget, WYSIWYG. So perhaps when I save, for example... > > (users view) > some text tab to here > \n > \n > \n > \n > > ...it could be saved as... > > (data view) > some text [5x\t] tab to here > [4x\n] > > and then expanded on a load. This is what I mean by compressing, > removing the redundant data to save space. > > This would remove the spaces inserted by the tabs, and remove > the blank lines, So my data now uses 2 lines, not 5. > > I was hoping there was an easier way to do this, other than > converting during save/load. > > I hope that clarifies my post... > > Thanks! No, converting during save/load is the way to do it. But why does the number of lines in the data file matter? You say it's a waste of space; but in what sense is the space being wasted? Are you simply concerned about the amount of screen space the record consumes when you view the file in an editor? Unless you're dealing with truly vast amounts of data, you're not going to saving enough bytes to make the disk space consumption matter one way or another. Will |