From: Rene Veerman on 17 Mar 2010 08:33 I've browsed wikipedia, sf.net and google for code & papers on what is commonly known as NLP. I haven't found thesaurus software for native php/mysql, wordnet which is apparently the leader, provides os-native apps, and "db files" without db structure and not in any sql format (looks like cvs without the commas but i'm not sure yet). When i asked princeton staff about sql releases they simply replied "we dont do sql here". Which i find a bit strange.. Easiest thing for me to do is write a conversion script that puts their "db files" in mysql, and work from there. My search on sf.net turned up empty too, all of the projects with relevant descriptions have just the name regged, no code releases. From reading http://www.go4expert.com/forums/showthread.php?t=35, "Introduction to Natural Language Processing(NLP)", i gather that NLP as it is results in much ambiguity on several levels of it's operation. It's an interesting problem though, and probably a profitable one, so i'm going to spend some time trying to come up with something better from scratch. On Sun, Mar 14, 2010 at 12:04 AM, Rene Veerman <rene7705(a)gmail.com> wrote: > Hi.. > > I'm building a newsscraper -> portal. > Fetching, parsing and storing many links to news items per hour was > not much of a problem. > Translations between languages can be done via google, so that wont be > much of a problem either i suspect. > > I dont want to reveal too much of my business idea, but i do need to > do text-analysis, to group related items, and make "suggestions" > lists. > I've had a dabble with creating my own ontology structure (kinda like > a dictionary + thesaurus datamodel) by scraping existing ontology > websites, but needless to say natural text analysis is a huge field. > One that i'm a total noob in. > > So in the first place, I'm looking for any free/paid useful existing > data-mining / text-analysis code that can be run easily from php. > TBH i dont even know my feature-requirements really, i'm interested to > know what's available. > > In the second place, i'm looking for free and published-for-a-cost > data-mining / text-analysis papers/books that explain how to produce > useful results. > > Thanks for your input. >
From: Rene Veerman on 17 Mar 2010 13:59 Thanks for the links.. But i think i'll keep at it on my own. I may be interested to set up a competitor to the companies of which you gave links. I've built a nice datamodel today, which i think will return even better results than zemanta. But what do you mean by "linked data", Nathan? On Wed, Mar 17, 2010 at 4:10 PM, Nathan Rixham <nrixham(a)gmail.com> wrote: > wouldn't be diving right in to full on nlp for this ;) it's pretty easy > to do term/semantic extraction nowadays. > > have you seen opencalais, alchemy, zemanta, yahoo term extraction or the > like? > > honestly I've been doing this for years and would recommend hooking up > to the opencalais and zemanta api's - should you muddle your way towards > linked data in any way from there give me a shout and I'll give you some > pointers. There are already clients for PHP, as well as the normal cms > things like drupal, wordpress etc :) > > regards! > > ps: if you really want to get in to this kind of thing then > http://gate.ac.uk/ is a good starting (and ending) point >
|
Pages: 1 Prev: different php.ini for virtual host on apache2 withmod_php5 Next: open source bookshop |