K-Means Clustering was my first exposure to the concept of Clustering as a statistical process. I wish my undergraduate coursework (in which I learned R initially) had covered this topic, but I suppose the focus and applications of Statistics and Data Science have shifted a lot in the last 10 years as computing technology and culture have evolved in tandem.
Clustering seems to me like one of those advancements in Data Science that education and school data analysis could really benefit from. My mind is drawn to a recent data exercise for a job selection process in which there was no clear regression, but clear regions in which there was more or less of a relationship. It interests me enough that I plan to delve back into that data out of personal interest even though I submitted the hiring exercise weeks ago.
The course on Machine Learning that I’ve been following on Udemy also covered Hierarchical Clustering, which is not appropriate for large datasets but integrates the clustering process in the algorithm before an optimal number of clusters is chosen.