machine learning - How do I choose a linkage method for Hierarchical Agglomerative Clustering? -
i understand hac has several options in terms of linkage functions. have:
- single linkage produces "straggly" clusters
- complete linkage produces tight, spherical clusters
- average linkage sort of compromise between two
- ward's method, based more off variance actual distance
what i'm trying figure out is, how know 1 of these want use? there datasets "straggly" clusters preferable spherical ones? or more function of intend clustering data?
it depends on data.
single-linkage works reasonably on clean data.
if have dirty data, other linkages may better.
ward similar k-means. may choice if want talk centroids , data partitioned disjoint subsets.
the other problem slink (for single-linkabe) fast. others work in o(n^3) not usable on large data sets. compare e.g. dbscan runs in o(n log n) if done well, or kmeans in o(n)...
Comments
Post a Comment