r - How to explain a higher percentage of point variability using kmeans clustering? -
i'm doing kmeans clustering:
regardless of how many clusters choose use, percentage of point variability not change:
here's how plotting data:
# prepare data mydata <- read.csv("~/student-mat.csv", sep=";") # let's grab numeric columns mydata <- mydata[,c("age","medu","fedu","traveltime","studytime","failures","fam mydata <- na.omit(mydata) # listwise deletion of missing mydata <- scale(mydata) # standardize variables ibrary(ggplot2) # k-means clustering 5 clusters fit <- kmeans(mydata, 5) #to change number of clusters, change "5" # cluster plot against 1st 2 principal components # vary parameters readable graph library(cluster) clusplot(mydata, fit$cluster, color=true, shade=true, labels=0, lines=0)
how affect percentage of point variability?
the amount of variance explained related 2 principal components calculated visualize data. has nothing type of clustering algorithm or accuracy of algorithm you're using (kmeans in case).
to understand how accurate clustering algorithm @ least can use table()
construct cross-classification table observed data , typically data you've held out of clustering process. using cross-tabulation/confusion matrix can calculate metrics user's/producer's accuracy, etc. there far more sophisticated approaches of course, can started thinking best way assess classification accuracy.
Comments
Post a Comment