machine learning - NaiveBayes Classifier: Do I have to concatenate all files of one class? -
i implementing simple naive bayes classifier did not understand how calculate class conditional probability (p(d|c)). completeness shortly explain used terminology. naive bayes probabilities computed by:
c denotes arbitrary class while d document. let x = {x1,x2,...,xn} list of n features e.g. 50 frequent bigrams).
in training set there classes (represented folder called c_i) , each of them has k documents (represented normal text files).
the a-priori probability p(c) can calculated easily:
now want calculate p(d|c). should done by
now don't understand how compute p(x_i|c). take feature x_i (let's bigram "th") , check how appears in class c. how do it? each class represented k documents. have concatenate files? later certaintly have divide "total count of features". frequency of bigram "th" in (concatenated) documents?
the bayes approach makes assumption document set of words independently drawn probability distribution. based on independence assumption, can indeed concatenate documents in class , use word frequencies of class documents union estimate of class probability distribution.
Comments
Post a Comment