From: Georgios Petasis on
O/H Donald Arseneau έγραψε:
> On Dec 1, 9:05 am, Alexandre Ferrieux <alexandre.ferri...(a)gmail.com>
> wrote:
>> On Dec 1, 5:33 pm, Donald Arseneau <a...(a)triumf.ca> wrote:
>>
>>
>>
>>> On Nov 30, 1:59 pm, Georgios Petasis <peta...(a)iit.demokritos.gr>
>>> wrote:
>>>> However, I don't know how to serialise and restore such a large
>>>> structure. Just using "array get" needs much more memory, and tcl needs
>>>> more than the 2GB a 32-bit application can use. So, I wrote some code
>>>> that serialises all elements without requiring conversion to strings.
>>> ... array nextelement ...
>> Ahem, the question is about serialization, not iteration, and reuse
>> (sharing) of values. What does the array iterator have to do with
>> that ?
>
> It lets you save the contents of a 1.3GB Tcl array to a file without
> overflowing process memory as [array get] would. I was presuming that
> "serialise and restore" meant "serialize for writing, and restore
> from
> a file".
>
> I agree that a Tcl array is not ideal for such a big hash table, and
> something more like a database is more appropriate.
>
> Donald Arseneau
>

Yes, I used array search for storing the hash table, and iteration over
the dicts to store them. [array get" needed much more memory that the
application could allocate (since it was using already 1.3 GB).

George
From: Georgios Petasis on
O/H Helmut Giese έγραψε:
> Hi George,
> it could be that MetaKit is your friend. It is not a database (just
> "persistent storage"), but it seems to me that you don't really need
> true DB capabilities (which for me ist the possibility to formulate
> complex queries).
> Its performance can be quite astonishing and it probably has less of a
> memory overhead than a database solution.
>
> You already have it installed - it's part of ActiveState's Tcl. I
> haven't used it for a couple of years so I cannot off hand produce an
> example, but if you want to check if it fits your needs, there are
> probably enough knowledgable people around here to help you get going.
>
> Good luck
> Helmut Giese
>
>
>> Hi all,
>>
>> I have a large hash table, whose keys are words, and the values are
>> dicts, that contain integer pairs.
>> I am creating this structure in memory, taking care to reuse objects as
>> much as possible, with the result occupying ~ 1.3GB of memory.
>>
>> However, I don't know how to serialise and restore such a large
>> structure. Just using "array get" needs much more memory, and tcl needs
>> more than the 2GB a 32-bit application can use. So, I wrote some code
>> that serialises all elements without requiring conversion to strings.
>> The format I chose was as tcl code, to be asy to load it back:
>>
>> set dict [dict create]
>> dict set dict 48422 1
>> set word {tenjin}
>> set word_matrix($word) $dict
>> set dict [dict create]
>> dict set dict 4779 1
>> dict set dict 29113 2
>> dict set dict 44221 1
>> set word {lightyear}
>> set word_matrix($word) $dict
>> set dict [dict create]
>> dict set dict 25399 1
>> set word {salary?}
>> set word_matrix($word) $dict
>> set dict [dict create]
>> dict set dict 366 1
>> dict set dict 819 1
>> dict set dict 1154 2
>> dict set dict 2580 1
>> dict set dict 3164 1
>> dict set dict 3244 2
>> dict set dict 3420 2
>> dict set dict 3833 1
>> ... 313 MB of similar data.
>>
>> However, I cannot load back the data from this file. The problem is that
>> a new object is created for every number in the file, which is memory
>> expensive since there is some repetition.
>>
>> I tried to enclose the data in a proc (hoping that tcl will compile the
>> proc into bytecode internally, and end up reusing the same objects for
>> the same integers), but it didn't work (wish terminated around 1.3 GB
>> with a message of not being able to re-alloc a large memory piece).
>>
>> Any ideas?
>>
>> George
>

Dear Helmut,

Again a proposal I didn't think of :-)
I am not sure I have the courage to test it though, as this would be the
6th implementation from scratch of the same task...

I am currently running something with sqlite. I will check timings and
decide...

Regards,

George
From: Donal K. Fellows on
On 1 Dec, 19:19, drscr...(a)gmail.com wrote:
> This is one of the things about sqlite.  It processes the whole query
> and returns it in one chunk.

I think it does it progressively if you give it a script to execute
each time round.

Donal.
From: Georgios Petasis on
O/H Georgios Petasis έγραψε:
> O/H Will Duquette έγραψε:
>> On Dec 1, 11:01 am, Georgios Petasis <peta...(a)iit.demokritos.gr>
>> wrote:
>>> $database onecolumn "SELECT id FROM words WHERE word='$word'"
>>>
>>
>> Others have already mentioned adding an index, which you'll definitely
>> want to do. I just wanted to point out that the usual way to write
>> this query is
>>
>> $database onecolumn {SELECT id FROM words WHERE word=$word}
>>
>> SQLite will do the variable interpolation for you, according to SQL
>> rules rather than Tcl rules, which generally speaking is what you
>> want. Among other things, it prevents SQL injection attacks/errors.
>> For example, in your version if $word is
>>
>> some'word
>>
>> you'll get an SQL syntax error.
>
> This is brilliant!! I couldn't imagine about this, so I converted
> manually single quotes to '' before the sql statements!
>
> Many thanks,
>
> George

I estimate that this will give a speed up ~50%. I suppose mainly because
sqlite can cache the queries, and there is no need to reparse them (as
was the case when I performed variable substitutions at the tcl level,
and with every query sqlite saw q new string...)

George
From: Georgios Petasis on
O/H Donal K. Fellows έγραψε:
> On 1 Dec, 19:19, drscr...(a)gmail.com wrote:
>> This is one of the things about sqlite. It processes the whole query
>> and returns it in one chunk.
>
> I think it does it progressively if you give it a script to execute
> each time round.
>
> Donal.

Yes, it does. And can store a whole row either in a Tcl array, or in
variables named after the column names.

Regards,

George
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7
Prev: freewrap is awesome!
Next: Tcl and .NET