continuous deployment - Named entity recognition with a small data set (corpus) -

May 15, 2014

i want develop named entity recognition system in persian language have small ner tagged corpus training ans test. maybe in future we'll have better , bigger corpus. way need solution incrementally better performance whenever new data added without merge new data old data , training scratch. there solution ?

yes. help: work in progress. js , "no training ..."

please see https://github.com/redaktor/nlp_compromise/ !

it fork worked on ner during last days , optimized usage different languages !!!

it combination of dictionary words, dictionary rules + build tool. awesome work on persian support (i working on german) ... planned support ner of

'cardinal' -> [ready]
'date' -> calendar based [gregorian calendar ready]
'duration' -> see above [date ranges ready]
'measure' -> systems based [metric system , si units ready, 80+ categories]
'money' -> currencies based [ready in few days]
'person' -> word/rules based [english/european names ready]
'organization'
'location'

i think starting point ? did not find time document new features - feel free open issues on github.

Search This Blog

ANgular

continuous deployment - Named entity recognition with a small data set (corpus) -

Comments

Post a Comment

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -