continuous deployment - Named entity recognition with a small data set (corpus) -
i want develop named entity recognition system in persian language have small ner tagged corpus training ans test. maybe in future we'll have better , bigger corpus. way need solution incrementally better performance whenever new data added without merge new data old data , training scratch. there solution ?
yes. help: work in progress. js , "no training ..."
please see https://github.com/redaktor/nlp_compromise/ !
it fork worked on ner during last days , optimized usage different languages !!!
it combination of dictionary words, dictionary rules + build tool. awesome work on persian support (i working on german) ... planned support ner of
- 'cardinal' -> [ready]
- 'date' -> calendar based [gregorian calendar ready]
- 'duration' -> see above [date ranges ready]
- 'measure' -> systems based [metric system , si units ready, 80+ categories]
- 'money' -> currencies based [ready in few days]
- 'person' -> word/rules based [english/european names ready]
- 'organization'
- 'location'
i think starting point ? did not find time document new features - feel free open issues on github.
Comments
Post a Comment