K-Means-Clustering

K-Means Clustering Algorithm in Machine Learning

K-Means clustering is an unsupervised   machine learning algorithm which partitions n instances into k clusters by similarity. As K-Means clustering is an unsupervised learning algorithm, therefore instances will not have labels.

As  K-Means clustering  is an unsupervised learning algorithm, training instances will not have labels. Furthermore, to make you understand K-Means clustering algorithm, I will take a very simple dataset having three features without labels.

Instancesx1x2x3
1124030
2163529
3143725
4183621
5124031

For the sake of simplicity, I will divide instances only into two clusters. To partition the dataset into two clusters you  will require two instances to be centroid. Let us take instance 1 as centroid of cluster-0 and instance-2 as center of cluster-1.

cluster-0={1}

It means that instance 1 is in cluster-0.

cluster-1={2}

It means that instance 2 is in cluster-1.

Now let us see, which cluster instance 3 will belong to.

To decide which cluster instance 3 will belong to, you need to calculate distance from instance 1(centroid) to instance 3 and instance 2 (centroid) to instance 3.

Euclidian distance between instance 1 and 3 is.

d1-3=sqrt((12-14)2+(40-37)2+(30-25)2)=sqrt(38)

d2-3=sqrt((16-14)2+(35-37)2+(29-25)2)=sqrt(24)

It can be observed that instance  no. 3 is closer to  2 compare to instance no. 1. Now it will be in cluster-1 then  clusters would be.

cluster-0={1}

cluster-1={2,3}

Furthermore, a new centroid will be calculated for cluster-1, which would be.

((16+14)/2, (35+37)/2, (29+25)/2)=(15, 36, 27)

To decide which cluster instance 4 will belong to, you need to calculate distance from instance 1(centroid) to instance 4, and instance 23 (centroid) to instance 4.

d1-4=sqrt((12-18)2+(40-36)2+(30-21)2)=sqrt(133)

d23-4=sqrt((15-18)2+(36-36)2+(27-21)2)=sqrt(45)

It can be observed that instance  no. 4 is closer to  23 compare to instance no. 1. Now it will be in cluster-1 then  clusters would be.

cluster-0={1}

cluster-1={2,3,4}

Now, new centroid will be calculated for cluster-1 as given below.

((16+14+18)/3, (35+37+36)/3, (29+25+21)/3)=(16, 36, 25)

To decide which cluster instance 5 will belong to, you need to calculate distance from instance 1(centroid) to instance 5, and instance 234 (centroid) to instance 5.

d1-5=sqrt((12-12)2+(40-40)2+(30-31)2)=sqrt(1)

d234-5=sqrt((16-12)2+(36-40)2+(25-31)2)=sqrt(68)

It can be observed that instance  no. 5 is closer to  1 compare to instance no. centroid 234. Now it will be in cluster-0 then  clusters would be.

cluster-0={1, 5}

cluster-1={2,3,4}

Now centroid for cluster-0 will be calculated as.

((12+12)/2, (40+40)/2, (30+31))=(12, 40, 30.5)


Finally, we will have two clusters

cluster-0={1, 5}

cluster-1={2,3,4}

And their centers are (16, 36, 25) and  (12, 40, 30.5) respectively.

Python code

from sklearn.cluster import KMeans
kmc=KMeans(n_clusters=3)
X=[[12,40,30],[15,35,29],[14,37,25],[18,36,21],[12,40,31]]
kmc.fit(X)
print(kmc.labels_)
print(kmc.cluster_centers_)

After running the above code you will get clusters and their centers as output.

Leave a Comment

Your email address will not be published. Required fields are marked *

©Postnetwork-All rights reserved.