From: shiplu on 17 Jul 2010 18:29 There is an algorithm called longest common sub sequence. If you can find the longest common sub sequence of the strings of database for the given string and sort it, you'll get the most matched word. But I think this algo is developed already and available in your context. It's name can be different. As a developer I am not sure actually what function in php or mysql serves the purpose. Shiplu Mokadd.im My talks, http://talk.cmyweb.net Follow me, http://twitter.com/shiplu SUST Programmers, http://groups.google.com/group/p2psust Innovation distinguishes bet ... ... (ask Steve Jobs the rest)
From: Andrew Ballard on 19 Jul 2010 18:06 On Mon, Jul 19, 2010 at 2:46 PM, tedd <tedd.sperling(a)gmail.com> wrote: > At 12:39 PM +0100 7/19/10, Richard Quadling wrote: >> >> I'm using MS SQL, not mySQL. >> >> Found a extended stored procedure with a UDF. >> >> Testing it looks excellent. >> >> Searching for a match on 30,000 vehicles next to no additional time - >> a few seconds in total, compared to the over 3 minutes to search using >> SQL code. > > That seems a bit slow. > > For example, currently I'm searching over 4,000 records (which contains > 4,000 paragraphs taken from the text of the King James version of the Bible) > for matching words, such as %created% and the times are typically around > 0.009 seconds. > > As such, searching ten times that amount should be in the range of tenths of > a second and not seconds -- so taking a few seconds to search 30,000 records > seems excessive to me. > > Cheers, > > tedd I would be surprised if a Levenshtein or similar_text comparison in a database were NOT slower than even a wildcard search because of the calculations that have to be performed on each row in the column being compared. That, and the fact that user-defined functions in SQL Server often have a performance penalty of their own. Just for kicks, you could try loading the values in that column into an array in PHP and then time iterating the array to calculate the Levenshtein distances for each value to see how it compares. Andrew
From: Richard Quadling on 20 Jul 2010 05:09 On 19 July 2010 19:46, tedd <tedd.sperling(a)gmail.com> wrote: > At 12:39 PM +0100 7/19/10, Richard Quadling wrote: >> >> I'm using MS SQL, not mySQL. >> >> Found a extended stored procedure with a UDF. >> >> Testing it looks excellent. >> >> Searching for a match on 30,000 vehicles next to no additional time - >> a few seconds in total, compared to the over 3 minutes to search using >> SQL code. > > That seems a bit slow. > > For example, currently I'm searching over 4,000 records (which contains > 4,000 paragraphs taken from the text of the King James version of the Bible) > for matching words, such as %created% and the times are typically around > 0.009 seconds. > > As such, searching ten times that amount should be in the range of tenths of > a second and not seconds -- so taking a few seconds to search 30,000 records > seems excessive to me. Tedd, I'm not looking for a "word". I'm looking for similar "wrds". Word is closer to the misspelled wrds that it is to wars. select dbo.DamerauLevenshteinDistance('words', 'wars'), dbo.DamerauLevenshteinDistance('words', 'wrds') (No column name) (No column name) 2 1 Lower is better. Also, I have to compare every row in the set and then sort it to find the lowest values for the Damerau-Levenshtein or the highest for the JaroâWinkler distance. As the value entered is always going to be the unknown, I can't pre-calculate the distances. I do an exact match test first.
First
|
Prev
|
Pages: 1 2 Prev: user login and access + headers already sent Next: integrating lib (C++) into php |