high-water-mark memory management [TCL]

Prev: Ann: Gnocl gets own URL/
Next: Compiling

From: MartinLemburg on 1 Mar 2010 15:29

Hi tomK,

you are right, but ... we had files generated from a system,
describing the movement of points in 3D using a definition file of a
kind of network of points describing a geometry.

The definition file was already around 20MB, but the movement file was
nearly 1GB big - both files in plain text.

And ... it was not really acceptable to load only step by step a
perhaps configurable amount of movement sets to be shown inside a 3D
simulation environment driven by a tcl kernel in a C++/OpenGL
environment.
Loading data for 2 Seconds of simulation with a normal display update
rate was already too much for keeping the system of "normal" users
with limited RAM sizes really usable.

But ... we never had crashing software, because of reading the same
data twice - not on Unix OS' nor on MS Windows - from tcl 8.0 on.

Best regards,

Martin Lemburg

On 1 Mrz., 17:48, tomk <krehbiel....(a)gmail.com> wrote:
> As you have found out, it is sometime more expedient to recast the
> solution to a problem then to solve it. Databases were invented
> because computers have limited working memory. If you're processing
> the data serially (multi-pass is still serial) then do your own block
> management. If the data is random then put it in a database prior to
> processing. If you use a database like SqlLite then it uses mapped
> file access (I think) which is what you really want and will save you
> the work of writing the code.
>
> If you're interested in trying a different solution post a description
> of your problem and ask for suggestions.
>
> tomk

From: blacksqr on 2 Mar 2010 00:39

On Mar 1, 10:30 am, Glenn Jackman <gle...(a)ncf.ca> wrote:
>
> I don't want to derail your issue. Have you considered not reading the
> whole file into memory at once? Is it a text file you're processing?
> Can you read the file line-by-line, or by chunks?
>

The core issue for me is not to get a specific piece of code to work,
but to understand a certain behavior and clarify if it classes as a
bug. I've made just such optimizations in my original program. But
the core issue is that under certain circumstances Tcl seems never to
free memory used in data processing, even after all variables used
have been unset. Thus when processing large data sets over a long
time period, Tcl eventually starves the computer of usable RAM.
Therefore no matter how small the chunks I try to read a file in,
eventually those chunks will be bigger than the available memory, and
the interpreter will crash.

The question is, is this a bug? Is it a performance optimization used
too enthusiastically without due thought given to endgame? Is it
behavior forced on the interpreter process by the operating system?
Whatever it is, the upshot for me is that it is preventing me from
using Tcl for long-running programs that process large amounts of
data, tasks that Tcl otherwise is perfect for.

From: Donal K. Fellows on 2 Mar 2010 11:01

On 2 Mar, 05:39, blacksqr <stephen.hunt...(a)alum.mit.edu> wrote:
> The question is, is this a bug? Is it a performance optimization used
> too enthusiastically without due thought given to endgame? Is it
> behavior forced on the interpreter process by the operating system?

It's a bug. Whose bug, well, we don't know that yet. :-)

It's most certainly possible to write scripts to completely consume
all memory. However, I've had scripts that dealt with very large
datasets for a long time, so it's also possible to have things be OK.
The details really matter. (For example, while Tcl is reckoned to be
leak-free, Tk is *known* to have significant issues though mitigation
steps can be taken if you know what to look out for.)

Donal.

From: Donal K. Fellows on 2 Mar 2010 11:04

On 1 Mar, 20:29, "MartinLemburg(a)Siemens-PLM"
<martin.lemburg.siemens-...(a)gmx.net> wrote:
> you are right, but ... we had files generated from a system,
> describing the movement of points in 3D using a definition file of a
> kind of network of points describing a geometry.
>
> The definition file was already around 20MB, but the movement file was
> nearly 1GB big - both files in plain text.

In some cases, the right thing can be to move to working with the data
in packed form with C code. It's good to try to avoid such a step
(it's very inflexible) but sometimes it's the only alternative. Critcl
makes it a much simpler step than it would otherwise be. :-)

Donal.

From: igor.g on 4 Mar 2010 11:17

On Mar 1, 12:36 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
wrote:
> The high-water mark method means that a repeated,
> identical allocation should reuse the same resources without a single
> byte of overhead.

proc mem_test {} {
set str [string repeat 0 [expr {1000*1000}]] ;# 1 MB
}
mem_test

If I understand the issue, the only way to release memory is to use
rename mem_test {}
Am I right ?

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Ann: Gnocl gets own URL/
Next: Compiling