Prev: FAQ 4.65 How can I get the unique keys from two hashes?
Next: Are there any MySQL queries or software packages for "finding similar items"
From: bugbear on 6 Jul 2010 11:29 Jerry Stuckle wrote: > But what you're looking for is to get a computer to be a natural > language processor, which is still beyond our current programming > capabilities. IBM has recently come up with a test system ("Watson") > which does a fair job, but still has a long ways to go. Once we get > there, we'll have a Star Trek capability :) > > With that said, it doesn't mean all is hopeless. Levenstein can help, > as can trigram matching and other things mentioned (except SoundEx). But > it will also require a lot of work on your part to "train" the system as > to whether two questions are similar or not. Surely something like concept extraction/matching (like the old Excite ICE model) would be helpful. BugBear
From: Jerry Stuckle on 6 Jul 2010 17:08 bugbear wrote: > Jerry Stuckle wrote: >> But what you're looking for is to get a computer to be a natural >> language processor, which is still beyond our current programming >> capabilities. IBM has recently come up with a test system ("Watson") >> which does a fair job, but still has a long ways to go. Once we get >> there, we'll have a Star Trek capability :) >> >> With that said, it doesn't mean all is hopeless. Levenstein can help, >> as can trigram matching and other things mentioned (except SoundEx). >> But it will also require a lot of work on your part to "train" the >> system as to whether two questions are similar or not. > > Surely something like concept extraction/matching > (like the old Excite ICE model) > would be helpful. > > BugBear It's possible, but I'm not sure it's public domain, is it? And trying to generate your own concept extraction/matching module would be a huge undertaking. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex(a)attglobal.net ==================
From: bugbear on 7 Jul 2010 04:08 Jerry Stuckle wrote: > bugbear wrote: >> Jerry Stuckle wrote: >>> But what you're looking for is to get a computer to be a natural >>> language processor, which is still beyond our current programming >>> capabilities. IBM has recently come up with a test system ("Watson") >>> which does a fair job, but still has a long ways to go. Once we get >>> there, we'll have a Star Trek capability :) >>> >>> With that said, it doesn't mean all is hopeless. Levenstein can >>> help, as can trigram matching and other things mentioned (except >>> SoundEx). But it will also require a lot of work on your part to >>> "train" the system as to whether two questions are similar or not. >> >> Surely something like concept extraction/matching >> (like the old Excite ICE model) >> would be helpful. >> >> BugBear > > It's possible, but I'm not sure it's public domain, is it? And trying > to generate your own concept extraction/matching module would be a huge > undertaking. There have been manu academic version (indeed, they came first): google for "latent semantic analysis" and/or "singular value decomposition" I think the excite engine's novelty was an efficient and fairly accurate "incremental mode", where the entire SVD didn't have to be fully redone when a document was added to the corpus. BugBear
From: Jerry Stuckle on 7 Jul 2010 07:24
bugbear wrote: > Jerry Stuckle wrote: >> bugbear wrote: >>> Jerry Stuckle wrote: >>>> But what you're looking for is to get a computer to be a natural >>>> language processor, which is still beyond our current programming >>>> capabilities. IBM has recently come up with a test system >>>> ("Watson") which does a fair job, but still has a long ways to go. >>>> Once we get there, we'll have a Star Trek capability :) >>>> >>>> With that said, it doesn't mean all is hopeless. Levenstein can >>>> help, as can trigram matching and other things mentioned (except >>>> SoundEx). But it will also require a lot of work on your part to >>>> "train" the system as to whether two questions are similar or not. >>> >>> Surely something like concept extraction/matching >>> (like the old Excite ICE model) >>> would be helpful. >>> >>> BugBear >> >> It's possible, but I'm not sure it's public domain, is it? And trying >> to generate your own concept extraction/matching module would be a >> huge undertaking. > > There have been manu academic version (indeed, they came first): > > google for "latent semantic analysis" > and/or > "singular value decomposition" > > I think the excite engine's novelty was an efficient > and fairly accurate "incremental mode", where the entire > SVD didn't have to be fully redone when a document was added to the corpus. > > BugBear Have you ever used these? Academic versions are not the same as commercial, and generally have restrictions on their use. Also, early versions are comparatively limited in their functionality. And significant training is still required. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex(a)attglobal.net ================== |