K-Nearest Neighbors

K-Nearest Neighbors Algorithm in Python

K-nearest neighbors algorithm is a machine learning algorithm which is used for classification and regression, and it can be applied where other machine learning algorithms are applied. See the typical problem setting ,

Suppose (x1, y1), (x2, y2), (x3, y3), ……………….(xn, yn) pairs in which x’s are attributes and y’s are labels, suppose another instance comes without having any label i.e (xn+1, Unknown )  you can see that there is not label. The machine learning classification task is to predict the unknown label of instance (xn+1, Unknown).

 Data Set Description-

I have taken portion of Irish dataset   in which there are four features sepal length, sepal width, petal length and petal width of a flower based on that features a flower is labeled as Irish-setosa, Irish-versicolor and  Irish-virginica. There are 10 instances in the table in which 9 instances having labels but the last one has not any label that is Unknown. The task is to predict the unknown  label based on 9 instances.

K-Nearest Neighbors

Working of K-Nearest Neighbors Algorithm-

It works  on the voting system in which   majority wins,  suppose there are  9 instances( records) having labels but instance no. 10 has not any label.

In addition, suppose If you decide that 5 neighbors ( Classes)  will vote, here voting means calculating distance from all instances and taking 5 instances having least distance. Furthermore, in 5 instances you will see that which class having majority, then unknown class will be labeled by that class.

For the calculation of distance I have used Euclidean distance formula  for four dimensions.

K-Nearest Neighbors

 

From the data set  you can see that instances 1, 2 and 3 belong to Irish-setosa, 4, 5 and 6 Irish-versicolor and 7, 8 and 9 Irish-virginica.

If you take 3- nearest neighbors   then you will see that 1, 2, and 3 having least distance in which (1, 2, 3) are Irish-setosa.

So, you can say that Unknown   label is Irish-satosa.

If you take 5- nearest neighbors   then you will see that 1, 2, 3, 4 and 5 are having least distance in which (1, 2, 3) are Irish-setosa 4 and 5 are Irish-vericolor.

So, you can say that Unknown   label is Irish-satosa.

For 7-nearest neighbors you can not decide because 3 are Irish-satosa, 3 are Irish-versicolor and one is Irish-virginica.

So, you should know that size of data set number of neighbors many factors  play very important role for deciding a label.

Python Program of K-Nearest Neighbors-

from sklearn.neighbors import KNeighborsClassifier
i1=[4.6,3.4,1.4,0.3]
i2=[5,3.4,1.5,0.2]
i3=[4.4,2.9,1.4,0.2]
i4=[7,3.2,4.7,1.4]
i5=[6.4,3.2,4.5,1.5]
i6=[6.9,3.1,4.9,1.5]
i7=[6.3,3.3,6,2.5]
i8=[5.8,2.7,5.1,1.9]
i9=[7.1,3,5.9,2.1]
X_train=[i1,i2,i3, i4,i5,i6, i7,i8,i9]
irst, irver,irvirg =”Iris-setosa”, “Iris-versicolor”, “Iris-virginica”
y_train=[irst,irst ,irst ,irver , irver,irver,irvirg,irvirg,irvirg]
modal= KNeighborsClassifier(n_neighbors =3, p=2)
modal.fit(X_train, y_train)
X_pred=[[5,3.3,1.4,0.2]]
Unknown= modal.predict(X_pred)
print(“Unknown=”,Unknown)

Program Description-

For the program instances are taken as i1, i2, i3, i4, i5, i6, i7, i8 and in i9 variables and labels are in irst, irver,irvirg variables. Therefore, two dimensions  list  X_train behaves like a table, and label is also provided to the program using y_train.

In KNeighborsClassifier() method n_neighbors is for numbers of neighbors and p=2 Eclidean distance fit() method trains the model and predict() method predict the label.

Output of the program when k=3

Label predicted as Irish-setosa

 3-Nearest-Neighbors-Python-Program-ScreenShot.png

Output of the program when k=5

Label predicted as Irish-setosa

 5-Nearest-Neighbors-Python-Program-ScreenShot

 

 

 

 

 

Leave a Comment

Your email address will not be published. Required fields are marked *

©Postnetwork-All rights reserved.