From: GizmoGorilla on
I'm working on a project using a text widget. I've noticed that
after saving the data from the text widget that the data file
format is exactly as it was displayed in the text widget. So
TABS and CRLFs end up taking a huge amount of space. My data
file looks like...

Some text tabbed to some column








several CRLF's down the page


So the data file ends up taking 10 lines of data in this
example. Is there any way to "compress" this for saving
and "decompress" for loading, retaining the original
tabbing & CRLF's without using so much space in the
data file??

Thanks!

GG
From: Will Duquette on
On Feb 9, 9:18 am, GizmoGorilla <gizmogori...(a)hotmail.com> wrote:
> I'm working on a project using a text widget. I've noticed that
> after saving the data from the text widget that the data file
> format is exactly as it was displayed in the text widget. So
> TABS and CRLFs end up taking a huge amount of space. My data
> file looks like...
>
> Some text                  tabbed to some column
>
> several CRLF's down the page
>
> So the data file ends up taking 10 lines of data in this
> example. Is there any way to "compress" this for saving
> and "decompress" for loading, retaining the original
> tabbing & CRLF's without using so much space in the
> data file??
>
> Thanks!
>
> GG

I'm missing something. Why is this a problem? Each TAB is one
character, and each newline is one character (two characters on the
disk, on Windows). How much more can you compress it?

You could choose to replace each tab and each CRLF with some other
printable string on output, e.g., "\t" and "\n", using [string map],
and convert them back on input; but that's not going to decrease the
actual number of bytes.
From: GizmoGorilla on
On 2010-02-09 1:30 PM, Will Duquette wrote:
> On Feb 9, 9:18 am, GizmoGorilla<gizmogori...(a)hotmail.com> wrote:
>> I'm working on a project using a text widget. I've noticed that
>> after saving the data from the text widget that the data file
>> format is exactly as it was displayed in the text widget. So
>> TABS and CRLFs end up taking a huge amount of space. My data
>> file looks like...
>>
>> Some text tabbed to some column
>>
>> several CRLF's down the page
>>
>> So the data file ends up taking 10 lines of data in this
>> example. Is there any way to "compress" this for saving
>> and "decompress" for loading, retaining the original
>> tabbing& CRLF's without using so much space in the
>> data file??
>>
>> Thanks!
>>
>> GG
>
> I'm missing something. Why is this a problem? Each TAB is one
> character, and each newline is one character (two characters on the
> disk, on Windows). How much more can you compress it?
>
> You could choose to replace each tab and each CRLF with some other
> printable string on output, e.g., "\t" and "\n", using [string map],
> and convert them back on input; but that's not going to decrease the
> actual number of bytes.

The tab is inserting blanks up to the column that was tabbed to.
CRLF's will leave a blank line if there's no text on the line.
That's fine for the user but not for the data file.
If I crlf 5 times with no text on the line, I get 5 blank lines,
in my data file. This is a huge waste of space, that's why its a
problem. Data is being saved exactly as it is viewed in the text
widget, WYSIWYG. So perhaps when I save, for example...

(users view)
some text tab to here
\n
\n
\n
\n

....it could be saved as...

(data view)
some text [5x\t] tab to here
[4x\n]

and then expanded on a load. This is what I mean by compressing,
removing the redundant data to save space.

This would remove the spaces inserted by the tabs, and remove
the blank lines, So my data now uses 2 lines, not 5.

I was hoping there was an easier way to do this, other than
converting during save/load.

I hope that clarifies my post...

Thanks!
From: Bryan Oakley on
On Feb 9, 1:26 pm, GizmoGorilla <gizmogori...(a)hotmail.com> wrote:
>
> The tab is inserting blanks up to the column that was tabbed to.

.... so, you don't actually have tabs in the files, you have blocks of
consecutive spaces. Make sure when you describe a problem that you
describe it accurately.

> CRLF's will leave a blank line if there's no text on the line.
> That's fine for the user but not for the data file.
> If I crlf 5 times with no text on the line, I get 5 blank lines,
> in my data file. This is a huge waste of space,

five *bytes* is a huge waste of space? *bytes*? Even if you're talking
about one thousand blank lines, that's still only 1K. That's a
minuscule amount of disk space in this day and age, probably smaller
than the disk block size. Are you are a system that is seriously
constrained in space?

If you're concerned about disk space you should consider using a
standard compression scheme. Just run your data through zip/gzip and
that way any other wasteful characters will also get compressed.

Of course, you're doing this all to the inconvenience of the user who
will no longer be able to edit your text files with anything but your
tool (or, force them to unzip the file before editing it with some
other tool)

I don't mean to sound preachy, but it sounds like you're making a
beginners mistake of premature optimization. Don't worry about
compression until it actually proves to be a problem. Otherwise you'll
spend way too much time on something that just doesn't matter.


From: Will Duquette on
On Feb 9, 11:26 am, GizmoGorilla <gizmogori...(a)hotmail.com> wrote:
> On 2010-02-09 1:30 PM, Will Duquette wrote:
>
>
>
>
>
> > On Feb 9, 9:18 am, GizmoGorilla<gizmogori...(a)hotmail.com>  wrote:
> >> I'm working on a project using a text widget. I've noticed that
> >> after saving the data from the text widget that the data file
> >> format is exactly as it was displayed in the text widget. So
> >> TABS and CRLFs end up taking a huge amount of space. My data
> >> file looks like...
>
> >> Some text                  tabbed to some column
>
> >> several CRLF's down the page
>
> >> So the data file ends up taking 10 lines of data in this
> >> example. Is there any way to "compress" this for saving
> >> and "decompress" for loading, retaining the original
> >> tabbing&  CRLF's without using so much space in the
> >> data file??
>
> >> Thanks!
>
> >> GG
>
> > I'm missing something.  Why is this a problem?  Each TAB is one
> > character, and each newline is one character (two characters on the
> > disk, on Windows).  How much more can you compress it?
>
> > You could choose to replace each tab and each CRLF with some other
> > printable string on output, e.g., "\t" and "\n", using [string map],
> > and convert them back on input; but that's not going to decrease the
> > actual number of bytes.
>
> The tab is inserting blanks up to the column that was tabbed to.
> CRLF's will leave a blank line if there's no text on the line.
> That's fine for the user but not for the data file.
> If I crlf 5 times with no text on the line, I get 5 blank lines,
> in my data file. This is a huge waste of space, that's why its a
> problem. Data is being saved exactly as it is viewed in the text
> widget, WYSIWYG. So perhaps when I save, for example...
>
> (users view)
> some text                            tab to here
> \n
> \n
> \n
> \n
>
> ...it could be saved as...
>
> (data view)
> some text [5x\t] tab to here
> [4x\n]
>
> and then expanded on a load. This is what I mean by compressing,
> removing the redundant data to save space.
>
> This would remove the spaces inserted by the tabs, and remove
> the blank lines, So my data now uses 2 lines, not 5.
>
> I was hoping there was an easier way to do this, other than
> converting during save/load.
>
> I hope that clarifies my post...
>
> Thanks!

No, converting during save/load is the way to do it.

But why does the number of lines in the data file matter? You say
it's a waste of space; but in what sense is the space being wasted?
Are you simply concerned about the amount of screen space the record
consumes when you view the file in an editor? Unless you're dealing
with truly vast amounts of data, you're not going to saving enough
bytes to make the disk space consumption matter one way or another.

Will