machine learning - NaiveBayes Classifier: Do I have to concatenate all files of one class? -

July 15, 2015

i implementing simple naive bayes classifier did not understand how calculate class conditional probability (p(d|c)). completeness shortly explain used terminology. naive bayes probabilities computed by:

enter image description here

c denotes arbitrary class while d document. let x = {x1,x2,...,xn} list of n features e.g. 50 frequent bigrams).

in training set there classes (represented folder called c_i) , each of them has k documents (represented normal text files).

the a-priori probability p(c) can calculated easily:

enter image description here

now want calculate p(d|c). should done by

enter image description here

now don't understand how compute p(x_i|c). take feature x_i (let's bigram "th") , check how appears in class c. how do it? each class represented k documents. have concatenate files? later certaintly have divide "total count of features". frequency of bigram "th" in (concatenated) documents?

the bayes approach makes assumption document set of words independently drawn probability distribution. based on independence assumption, can indeed concatenate documents in class , use word frequencies of class documents union estimate of class probability distribution.

Search This Blog

ANgular

machine learning - NaiveBayes Classifier: Do I have to concatenate all files of one class? -

Comments

Post a Comment

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -