AReportonK – NEAREST NEIGHBOURSubmitted ByRam Teja Reddy Gennepally – C0728017Tejal More – C0728030Alan Salo – C0727079Arpita Roy – C0728691Dhruv Chadha – C0727901CONTENTSS.
NO. NAME PAGE NO.1 INTRODUCTION 42 CODE 14 – 163 ANALYSIS 16 – 254 ADVANTAGES AND DISADVANTAGES 265 APPLICATIONS 276 REFERENCES 30SCREENSHOTSSCREENSHOT NAME PAGE NO.Installing R package 9Installing R Studio 10Installing MongoDB 11Twitter keys and tokens 11 Installing libraries 12Installing mongolite package 12Installing rtweet package 13Setting up of working directory and libraries 17Authentication to access tweets 18INTRODUCTIONAmongst the numerous algorithms used in machine learning, k-Nearest Neighbors (k-NN) is often used in pattern recognition due to its easy implementation and non-parametric nature. A k-NN classifier aims to predict the class of an observation based on the prevailing class among its k-nearest neighbors; “nearest” is determined by a distance metric between class attributes (or features), and “k” is the number of nearest neighbors to consider.
In regression, K-nearest neighbor algorithm can be applied for continuous dependent variables. Average of the values of its k nearest neighbors is the predicted value.It is important that variables are normalized before calculating distance, while independent variables in training data are measured in various units. In order to make them comparable we need to standardize them which can be done by any of the following methods: NormalizationA uniqueness of the k-NN algorithm i.e.
, sensitive to the local data structure. K-NN is not k-means, one of the popular machine learning algorithms.k-NEAREST NEIGHBOR ALGORITHM STEPS1. Find ‘k’ i.e.
, no. of nearest neighbors.2. Calculate distances among data variables.3. Sort the distances and find kth.4.
Using kth minimum distance find nearest neighbors.5. Group the nearest neighbors based on the category ?.6. Prediction value of the query instance can be found by applying simple majority of the category of nearest neighbors. CODEsetwd(“C:/Users/ramte/Desktop”)data1 = read.
csv (file = “US_Presidential_Data.csv”)View(data1)dim(data1)#librarieslibrary(caret)library(e1071)#dependent to factordata1$Win.Loss = as. factor (data1$Win.
Loss)set. seed (101)index = create Data Partition (data1$Win.Loss, p = 0.7, list = F)train = data1index,validation = data1-index,dim(train)dim(validation)names(train)head(train)head(validation)# Training and validation data level settinglevels (train $ Win.Loss)