continuous deployment - Named entity recognition with a small data set (corpus) -


i want develop named entity recognition system in persian language have small ner tagged corpus training ans test. maybe in future we'll have better , bigger corpus. way need solution incrementally better performance whenever new data added without merge new data old data , training scratch. there solution ?

yes. help: work in progress. js , "no training ..."

please see https://github.com/redaktor/nlp_compromise/ !

it fork worked on ner during last days , optimized usage different languages !!!

it combination of dictionary words, dictionary rules + build tool. awesome work on persian support (i working on german) ... planned support ner of

  • 'cardinal' -> [ready]
  • 'date' -> calendar based [gregorian calendar ready]
  • 'duration' -> see above [date ranges ready]
  • 'measure' -> systems based [metric system , si units ready, 80+ categories]
  • 'money' -> currencies based [ready in few days]
  • 'person' -> word/rules based [english/european names ready]
  • 'organization'
  • 'location'

i think starting point ? did not find time document new features - feel free open issues on github.


Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -