From: Ashley Sheridan on 3 Sep 2009 15:17 On Thu, 2009-09-03 at 12:12 -0700, sono-io(a)fannullone.us wrote: > Thanks to everyone who has responded. After reading everyone's > response, I think I have a very simple way to solve my "problem". > > Using my original example, if someone wants to find item # > 4D-2448-7PS, no matter what they type in, I'll take the input, strip > out all non-alphanumeric characters to make it 4D24487PS, add the > wildcard character between each of the remaining characters like so, > 4*D*2*4*4*8*7*P*S, and then do the search. > > Still being new at this, it seems to be the simplest approach, or is > my thinking flawed? This also keeps me from having to add another > field in the db to search on. > > BTW, this solution needs to work with any db, even ASCII files, so it > has to happen in PHP. > > Thanks again, > Frank > For speed you might want to consider an extra field in the DB in the future. If the database gets larger, or your query needs to join several tables together, then things will take a noticeable speed hit. I had a similar issue myself where I had to search for names based on mis-spellings of them. In the end I searched with metaphone tags on an extra field in the DB set up for that purpose, but it was the only way to do it that didn't affect the speed of the site. Thanks, Ash http://www.ashleysheridan.co.uk
From: Eddie Drapkin on 3 Sep 2009 15:24 On Thu, Sep 3, 2009 at 3:17 PM, Ashley Sheridan<ash(a)ashleysheridan.co.uk> wrote: > On Thu, 2009-09-03 at 12:12 -0700, sono-io(a)fannullone.us wrote: >> Â Â Â Thanks to everyone who has responded. Â After reading everyone's >> response, I think I have a very simple way to solve my "problem". >> >> Â Â Â Using my original example, if someone wants to find item # >> 4D-2448-7PS, no matter what they type in, I'll take the input, strip >> out all non-alphanumeric characters to make it 4D24487PS, add the >> wildcard character between each of the remaining characters like so, >> 4*D*2*4*4*8*7*P*S, and then do the search. >> >> Â Â Â Still being new at this, it seems to be the simplest approach, or is >> my thinking flawed? Â This also keeps me from having to add another >> field in the db to search on. >> >> Â Â Â BTW, this solution needs to work with any db, even ASCII files, so it >> has to happen in PHP. >> >> Thanks again, >> Frank >> > For speed you might want to consider an extra field in the DB in the > future. If the database gets larger, or your query needs to join several > tables together, then things will take a noticeable speed hit. I had a > similar issue myself where I had to search for names based on > mis-spellings of them. In the end I searched with metaphone tags on an > extra field in the DB set up for that purpose, but it was the only way > to do it that didn't affect the speed of the site. > > Thanks, > Ash > http://www.ashleysheridan.co.uk > > > > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > Has anyone considered deploying an actual search engine (Solr, Sphinx, etc.), as they will take care of the stripping, stemming, spelling corrections, etc?
From: Tommy Pham on 3 Sep 2009 16:02 ----- Original Message ---- > From: "sono-io(a)fannullone.us" <sono-io(a)fannullone.us> > To: PHP General List <php-general(a)lists.php.net> > Sent: Thursday, September 3, 2009 12:12:40 PM > Subject: Re: [PHP] Searching on AlphaNumeric Content Only > > Thanks to everyone who has responded. After reading everyone's response, I > think I have a very simple way to solve my "problem". > > Using my original example, if someone wants to find item # 4D-2448-7PS, no > matter what they type in, I'll take the input, strip out all non-alphanumeric > characters to make it 4D24487PS, add the wildcard character between each of the > remaining characters like so, 4*D*2*4*4*8*7*P*S, and then do the search. The correct wildcard syntax to work in any DB (Oracle, MySQL, MSSQL, etc), is % and not * if I remember correctly. Searching like this is ok but won't be efficient when you have a lot of rows. As for external file processing txt, csv, etc... I recommend you create a separate mechanism for it since each storage medium is meant for different purposes. txt (both delimited and fix formatted) and csv are usually meant for importing/exporting between various RDBMS types and different companies. They're not mean for fast searching of data. I suggest you think about the amount of the data you have to deal with 1st and how often will the search be done on that data. It's probably easier and faster just to import the ascii into db and do you search on db if you have to work with any ascii. As for adding another field to the db, perhaps your project just started? If so, wouldn't it be better to do it with the future in mind so later you won't have to go back and redesign the db and modify the codes because now you have over 100k rows to search and the search occurs just about every other hits? That time you now have could be used for code optimizing for better performance, add more features/functionalities to the site, etc... :) Trust me, searching the db table with over 200k rows and return the results with multi-table joins based 1 criteria isn't fun. Keep in mind that you shouldn't keep the users waiting more than 5 seconds. Only exception to that rule is data mining where you'll have millions of rows to work with ;) Then it's no longer your problem. It's the DBA :D Regards, Tommy > > Still being new at this, it seems to be the simplest approach, or is my > thinking flawed? This also keeps me from having to add another field in the db > to search on. > > BTW, this solution needs to work with any db, even ASCII files, so it has to > happen in PHP. > > Thanks again, > Frank > > --PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php
From: Paul M Foster on 3 Sep 2009 16:25 On Thu, Sep 03, 2009 at 12:12:40PM -0700, sono-io(a)fannullone.us wrote: > Thanks to everyone who has responded. After reading everyone's > response, I think I have a very simple way to solve my "problem". > > Using my original example, if someone wants to find item # > 4D-2448-7PS, no matter what they type in, I'll take the input, strip > out all non-alphanumeric characters to make it 4D24487PS, add the > wildcard character between each of the remaining characters like so, > 4*D*2*4*4*8*7*P*S, and then do the search. Your expression, if used to directly search in your SQL table, won't work. The '*' character isn't a valid wildcard for SQL. In PostgreSQL, the wildcard for any number of characters is '%', and for a single character is '_'. I don't know that MySQL understands this same convention. And who knows about Oracle. As others have mentioned, it would be ideal (though not very "normalized") to create a new table column which contains the alphanumerics without the punctuation characters ('-'). In nearly any SQL dialect, you could do a simple SELECT using LIKE to find your item, if you're searching on this extra field. If you want do the searching in PHP, then it becomes more complicated. You'll have to strip out the dashes from the user input, and then query all the keys from your table, and test them using a regular expression. As mentioned before, this is time-consuming for a large table. Here's something else to consider: Could there ever be two items which only differ by the placement of their dashes? Like 4D-2448-7PS versus 4D2-44-87PS? If not, then you should store the item number without punctuation, and use that as the primary key on your table. Have an "extra" field which shows the item number with dashes. You can use this extra field in printing inventory labels or whatever (I don't recall the context of your original post). Paul -- Paul M. Foster
From: Andrea Giammarchi on 3 Sep 2009 16:33
stripping, stemming, spelling corrections ? ... uhm, that's probably why they invented regular expressions, isn't it? As I said, at the end of the day, this will be a manual slow, potentially wrong implementation of what we already have and use on daily basis. But obviously, everybody is free to create his own problems, no doubts about that. Regards > Has anyone considered deploying an actual search engine (Solr, Sphinx, > etc.), as they will take care of the stripping, stemming, spelling > corrections, etc? _________________________________________________________________ With Windows Live, you can organize, edit, and share your photos. http://www.microsoft.com/middleeast/windows/windowslive/products/photo-gallery-edit.aspx |