K – NEAREST NEIGHBOUR
Ram Teja Reddy Gennepally – C0728017
Tejal More – C0728030
Alan Salo – C0727079
Arpita Roy – C0728691
Dhruv Chadha – C0727901
S.NO. NAME PAGE NO.
1 INTRODUCTION 4
2 CODE 14 – 16
3 ANALYSIS 16 – 25
4 ADVANTAGES AND DISADVANTAGES 26
5 APPLICATIONS 27
6 REFERENCES 30
SCREENSHOT NAME PAGE NO.
Installing R package 9
Installing R Studio 10
Installing MongoDB 11
Twitter keys and tokens 11
Installing libraries 12
Installing mongolite package 12
Installing rtweet package 13
Setting up of working directory and libraries 17
Authentication to access tweets 18
Amongst the numerous algorithms used in machine learning, k-Nearest Neighbors (k-NN) is often used in pattern recognition due to its easy implementation and non-parametric nature.
A k-NN classifier aims to predict the class of an observation based on the prevailing class among its k-nearest neighbors; “nearest” is determined by a distance metric between class attributes (or features), and “k” is the number of nearest neighbors to consider.
In regression, K-nearest neighbor algorithm can be applied for continuous dependent variables. Average of the values of its k nearest neighbors is the predicted value.
It is important that variables are normalized before calculating distance, while independent variables in training data are measured in various units. In order to make them comparable we need to standardize them which can be done by any of the following methods:
A uniqueness of the k-NN algorithm i.e., sensitive to the local data structure. K-NN is not k-means, one of the popular machine learning algorithms.
k-NEAREST NEIGHBOR ALGORITHM STEPS
1. Find ‘k’ i.e., no. of nearest neighbors.
2. Calculate distances among data variables.
3. Sort the distances and find kth.
4. Using kth minimum distance find nearest neighbors.
5. Group the nearest neighbors based on the category ?.
6. Prediction value of the query instance can be found by applying simple majority of the category of nearest neighbors.
data1 = read.csv (file = “US_Presidential_Data.csv”)
#dependent to factor
data1$Win.Loss = as. factor (data1$Win.Loss)
set. seed (101)
index = create Data Partition (data1$Win.Loss, p = 0.7, list = F)
train = data1index,
validation = data1-index,
# Training and validation data level setting
levels (train $ Win.Loss)