From: Alvaro Herrera on 12 May 2010 15:04 Excerpts from Alexander Korotkov's message of lun may 10 11:35:02 -0400 2010: > Hackers, > > The current version of levenshtein function in fuzzystrmatch contrib modulte > doesn't work properly with multibyte charater sets. > My patch make this function works properly with multibyte charater sets. Great. Please add it to the next commitfest: http://commitfest.postgresql.org On a quick look, I didn't like the way you separated the "pg_database_encoding_max_length() > 1" cases. There seem to be too much common code. Can that be refactored a bit better? -- -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Alvaro Herrera on 12 May 2010 22:03 Alexander Korotkov escribi�: > On Wed, May 12, 2010 at 11:04 PM, Alvaro Herrera <alvherre(a)alvh.no-ip.org>wrote: > > > On a quick look, I didn't like the way you separated the > > "pg_database_encoding_max_length() > 1" cases. There seem to be too > > much common code. Can that be refactored a bit better? > > > I did a little refactoring in order to avoid some similar code. > I'm not quite sure about my CHAR_CMP macro. Is it a good idea? Well, since it's only used in one place, why are you defining a macro at all? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Alexander Korotkov on 13 May 2010 02:49 On Thu, May 13, 2010 at 6:03 AM, Alvaro Herrera <alvherre(a)commandprompt.com > wrote: > Well, since it's only used in one place, why are you defining a macro at > all? > In order to structure code better. My question was about another. Is memcmp function good choice to compare very short sequences of bytes (from 1 to 4 bytes)?
From: Alexander Korotkov on 12 May 2010 16:13 On Wed, May 12, 2010 at 11:04 PM, Alvaro Herrera <alvherre(a)alvh.no-ip.org>wrote: > On a quick look, I didn't like the way you separated the > "pg_database_encoding_max_length() > 1" cases. There seem to be too > much common code. Can that be refactored a bit better? > I did a little refactoring in order to avoid some similar code. I'm not quite sure about my CHAR_CMP macro. Is it a good idea?
From: Alexander Korotkov on 6 Jun 2010 16:00 Hello Hackers! I have extended my patch by introducing levenshtein_less_equal function. This function have additional argument max_d and stops calculating when distance exceeds max_d. With low values of max_d function works much faster than original one. The example of original levenshtein function usage: test=# select word, levenshtein(word, 'consistent') as dist from words where levenshtein(word, 'consistent') <= 2 order by dist; word | dist -------------+------ consistent | 0 insistent | 2 consistency | 2 coexistent | 2 consistence | 2 (5 rows) test=# explain analyze select word, levenshtein(word, 'consistent') as dist from words where levenshtein(word, 'consistent') <= 2 order by dist; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Sort (cost=2779.13..2830.38 rows=20502 width=8) (actual time=203.652..203.658 rows=5 loops=1) Sort Key: (levenshtein(word, 'consistent'::text)) Sort Method: quicksort Memory: 25kB -> Seq Scan on words (cost=0.00..1310.83 rows=20502 width=8) (actual time=19.019..203.601 rows=5 loops=1) Filter: (levenshtein(word, 'consistent'::text) <= 2) Total runtime: 203.723 ms (6 rows) Example of levenshtein_less_equal usage in this case: test=# select word, levenshtein_less_equal(word, 'consistent', 2) as dist from words where levenshtein_less_equal(word, 'consistent', 2) <= 2 order by dist; word | dist -------------+------ consistent | 0 insistent | 2 consistency | 2 coexistent | 2 consistence | 2 test=# explain analyze select word, levenshtein_less_equal(word, 'consistent', 2) as dist from words where levenshtein_less_equal(word, 'consistent', 2) <= 2 order by dist; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Sort (cost=2779.13..2830.38 rows=20502 width=8) (actual time=42.198..42.203 rows=5 loops=1) Sort Key: (levenshtein_less_equal(word, 'consistent'::text, 2)) Sort Method: quicksort Memory: 25kB -> Seq Scan on words (cost=0.00..1310.83 rows=20502 width=8) (actual time=5.391..42.143 rows=5 loops=1) Filter: (levenshtein_less_equal(word, 'consistent'::text, 2) <= 2) Total runtime: 42.292 ms (6 rows) In the example above levenshtein_less_equal works about 5 times faster. With best regards, Alexander Korotkov.
|
Pages: 1 Prev: Tags missing from GIT mirror? Next: pg_upgrade versus MSVC build scripts |