r - How to explain a higher percentage of point variability using kmeans clustering? -


i'm doing kmeans clustering:

enter image description here

regardless of how many clusters choose use, percentage of point variability not change:

enter image description here

here's how plotting data:

# prepare data mydata <- read.csv("~/student-mat.csv", sep=";")  # let's grab numeric columns mydata <- mydata[,c("age","medu","fedu","traveltime","studytime","failures","fam  mydata <- na.omit(mydata) # listwise deletion of missing mydata <- scale(mydata) # standardize variables ibrary(ggplot2)  # k-means clustering 5 clusters fit <- kmeans(mydata, 5) #to change number of clusters, change "5"  # cluster plot against 1st 2 principal components  # vary parameters readable graph library(cluster) clusplot(mydata, fit$cluster, color=true, shade=true,    labels=0, lines=0) 

how affect percentage of point variability?

the amount of variance explained related 2 principal components calculated visualize data. has nothing type of clustering algorithm or accuracy of algorithm you're using (kmeans in case).

to understand how accurate clustering algorithm @ least can use table() construct cross-classification table observed data , typically data you've held out of clustering process. using cross-tabulation/confusion matrix can calculate metrics user's/producer's accuracy, etc. there far more sophisticated approaches of course, can started thinking best way assess classification accuracy.


Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -