A

Report

on

K – NEAREST NEIGHBOUR

Submitted By

Ram Teja Reddy Gennepally – C0728017

Tejal More – C0728030

Alan Salo – C0727079

Arpita Roy – C0728691

Dhruv Chadha – C0727901

CONTENTS

S.NO. NAME PAGE NO.

1 INTRODUCTION 4

2 CODE 14 – 16

3 ANALYSIS 16 – 25

4 ADVANTAGES AND DISADVANTAGES 26

5 APPLICATIONS 27

6 REFERENCES 30

SCREENSHOTS

SCREENSHOT NAME PAGE NO.

Installing R package 9

Installing R Studio 10

Installing MongoDB 11

Twitter keys and tokens 11

Installing libraries 12

Installing mongolite package 12

Installing rtweet package 13

Setting up of working directory and libraries 17

Authentication to access tweets 18

INTRODUCTION

Amongst the numerous algorithms used in machine learning, k-Nearest Neighbors (k-NN) is often used in pattern recognition due to its easy implementation and non-parametric nature.

A k-NN classifier aims to predict the class of an observation based on the prevailing class among its k-nearest neighbors; “nearest” is determined by a distance metric between class attributes (or features), and “k” is the number of nearest neighbors to consider.

In regression, K-nearest neighbor algorithm can be applied for continuous dependent variables. Average of the values of its k nearest neighbors is the predicted value.

It is important that variables are normalized before calculating distance, while independent variables in training data are measured in various units. In order to make them comparable we need to standardize them which can be done by any of the following methods:

Normalization

A uniqueness of the k-NN algorithm i.e., sensitive to the local data structure. K-NN is not k-means, one of the popular machine learning algorithms.

k-NEAREST NEIGHBOR ALGORITHM STEPS

1. Find ‘k’ i.e., no. of nearest neighbors.

2. Calculate distances among data variables.

3. Sort the distances and find kth.

4. Using kth minimum distance find nearest neighbors.

5. Group the nearest neighbors based on the category ?.

6. Prediction value of the query instance can be found by applying simple majority of the category of nearest neighbors.

CODE

setwd(“C:/Users/ramte/Desktop”)

data1 = read.csv (file = “US_Presidential_Data.csv”)

View(data1)

dim(data1)

#libraries

library(caret)

library(e1071)

#dependent to factor

data1$Win.Loss = as. factor (data1$Win.Loss)

set. seed (101)

index = create Data Partition (data1$Win.Loss, p = 0.7, list = F)

train = data1index,

validation = data1-index,

dim(train)

dim(validation)

names(train)

head(train)

head(validation)

# Training and validation data level setting

levels (train $ Win.Loss)