WIP: preloading of ispell dictionary [PgSql]

Prev: pgsql: Prevent the injection ofinvalidly encoded strings by PL/Python
Next: Standalone backends run StartupXLOG in anincorrect environment

From: Takahiro Itagaki on 22 Mar 2010 20:57

Pavel Stehule <pavel.stehule(a)gmail.com> wrote:

> I wrote some small patch, that allow preloading of selected ispell
> dictionary. It solve the problem with slow tsearch initialisation with
> some language configuration.
>
> I afraid so this module doesn't help on MS Windows.

I think it should work on all platforms if we include it into the core.
We should continue to research shared memory or mmap approaches.

The fundamental issue seems to be in the slow initialization of
dictionaries. If so, how about adding a pre-complile tool to convert
a dictionary into a binary file, and each backend simply mmap it?

BTW, SimpleAllocContextCreate() is not used at all in the patch.
Do you still need it?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 23 Mar 2010 02:07

Takahiro Itagaki wrote:
> Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
>
>> I wrote some small patch, that allow preloading of selected ispell
>> dictionary. It solve the problem with slow tsearch initialisation with
>> some language configuration.
>>
>> I afraid so this module doesn't help on MS Windows.
>
> I think it should work on all platforms if we include it into the core.

It will work, as in it will compile and run. It just won't be any
faster. I think that's enough, otherwise you could argue that we
shouldn't have preload_shared_libraries option at all because it won't
help on Windows.

> The fundamental issue seems to be in the slow initialization of
> dictionaries. If so, how about adding a pre-complile tool to convert
> a dictionary into a binary file, and each backend simply mmap it?

Yeah, that would be better.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on 23 Mar 2010 03:42

2010/3/23 Takahiro Itagaki <itagaki.takahiro(a)oss.ntt.co.jp>:
>
> Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
>
>> I wrote some small patch, that allow preloading of Â selected ispell
>> dictionary. It solve the problem with slow tsearch initialisation with
>> some language configuration.
>>
>> I afraid so this module doesn't help on MS Windows.
>
> I think it should work on all platforms if we include it into the core.
> We should continue to research shared memory or mmap approaches.
>
> The fundamental issue seems to be in the slow initialization of
> dictionaries. If so, how about adding a pre-complile tool to convert
> a dictionary into a binary file, and each backend simply mmap it?

It means loading about 25MB from disc. for every first tsearch query -
sorry, I don't believe can be good.

>
> BTW, SimpleAllocContextCreate() is not used at all in the patch.
> Do you still need it?
>

yes - I needed it. Without Simple Allocator cz configuration takes
48MB. There are a few parts has to be supported by Simple Allocator -
other hasn't significant impact - so I don't ugly more code. In my
first path I verify so dictionary data are read only so I was
motivated to use Simple Allocator everywhere. It is not necessary for
preload method.

Pavel

> Regards,
> ---
> Takahiro Itagaki
> NTT Open Source Software Center
>
>
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Nicolas Barbier on 23 Mar 2010 03:52

2010/3/23 Pavel Stehule <pavel.stehule(a)gmail.com>:

> 2010/3/23 Takahiro Itagaki <itagaki.takahiro(a)oss.ntt.co.jp>:
>
>> The fundamental issue seems to be in the slow initialization of
>> dictionaries. If so, how about adding a pre-complile tool to convert
>> a dictionary into a binary file, and each backend simply mmap it?
>
> It means loading about 25MB from disc. for every first tsearch query -
> sorry, I don't believe can be good.

The operating system's VM subsystem should make that a non-problem.
"Loading" is also not the word I would use to indicate what mmap does.

Nicolas

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on 23 Mar 2010 04:04

2010/3/23 Nicolas Barbier <nicolas.barbier(a)gmail.com>:
> 2010/3/23 Pavel Stehule <pavel.stehule(a)gmail.com>:
>
>> 2010/3/23 Takahiro Itagaki <itagaki.takahiro(a)oss.ntt.co.jp>:
>>
>>> The fundamental issue seems to be in the slow initialization of
>>> dictionaries. If so, how about adding a pre-complile tool to convert
>>> a dictionary into a binary file, and each backend simply mmap it?
>>
>> It means loading about 25MB from disc. for every first tsearch query -
>> sorry, I don't believe can be good.
>
> The operating system's VM subsystem should make that a non-problem.
> "Loading" is also not the word I would use to indicate what mmap does.

Maybe we can do some manipulation inside memory - I have not any
knowledges about mmap. With Simple Allocator we can have a dictionary
data as one block. Problems are a pointers, but I believe so can be
replaced by offsets.

Personally I dislike idea some dictionary precompiler - it is next
application for maintaining and maybe not necessary. And still you
need a next application for loading.

p.s. I able to serialise czech dictionary, because it use only simply regexp.

Pavel

>
> Nicolas
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2
Prev: pgsql: Prevent the injection ofinvalidly encoded strings by PL/Python
Next: Standalone backends run StartupXLOG in anincorrect environment