CSE 569 – HW4 – Fall 2018Aditya Vikram Sharma1215126588November 26, 20181 K-MeansK-Means algorithm is used to compute cluster the points given in the dataset into multiple clustersbased on the value of ‘K’. In this assignment, 2 datasets have been given which have 1600 and 1500points respectively. The Simulations have been performed for both the datasets. Initially, to ndthe optimal value of ‘K’, Elbow-rule is used where a graph between ‘K’ values and Sum of SquaredErrors/ Total number of points for each cluster for a xed random initialization are plotted and theoptimal ‘K’ value is the point in the graph where the curve bends like an elbow.

Next, xing this’K’, simulation for various values of ‘r’ which is the random initialization is performed and a graphof ‘r’ vs (Sum of Squared errors)/ (Total number of points) are plotted to identify the best randominitialization value. The cluster of the random initialization is shown is depicted.The algorithm performed for K-means in this assignment is shown here : Algorithm 1:K-Means1k GetInput2 for i 0to kdo 3C entroid i RandomInitialization4 end5 newC entroid Centroid6 for j 0to MaxIter do 7forx 0to TotPoints do 8fory 0to kdo 9dist EucDist(Centroidy, Pointsx)10 end11 ind Index(min(dist))12 C lusterind Pointsx)13 end14 newC entroid avg(Clusuterind)15 if(newCentroid != Centroid) then16newC entroid Centroid17 continue18 else 19break20 end21 end For both the dataset, simulation is performed and the corresponding plots are shown in the nextsubheadings.11.1 Dataset 1This dataset contains 1600 points in total. A depiction of the dataset is shown in the gure below. Figure 1: Dataset 1The optimal ‘K’ value for this dataset was calculated by plotting various values of ‘K’ vs ‘Sum ofSquared errors/ Total Points’ graph for a xed random initialization using the elbow rule. The optimalvalue of ‘K’ turned out to be 2 which can be seen in the below gure.

Figure 2: Elbow RuleNow, that we xed K=2, the algorithm is run ‘r’ times with different random initializations where’r’ is set to 5 to nd the random initialization with the lowest Sum of Square error/ Total Points valueand a graph for that is shown in the gure 3.2Figure 3: Random Initialization vs SSE/Total pointsFrom the above plot, the SSE/ Total Points value for all the random Initializations are :1.762324974077466, 1.762324974077466, 1.7622818440138992, 1.762324974077466, 1.

7622793729899464and it is clear that Random Initialization 5 has the lowest SSE/ Total Points value.Using this random initialization, the clustering for the points is done with K = 2 and the plot forthat is shown below. Figure 4: Clustering of Dataset 1 for K = 2With this random initialization and K value, the algorithm converged at Iteration 4.31.2 Dataset 2This dataset contains 1500 points in total. A depiction of the dataset is shown in the gure below. Figure 5: Dataset 2The optimal ‘K’ value for this dataset was calculated by plotting various values of ‘K’ vs ‘Sum ofSquared errors/ Total Points’ graph for a xed random initialization using the elbow rule. The optimalvalue of ‘K’ turned out to be 3 which can be seen in the below gure.

Figure 6: Elbow RuleNow, that we xed K=3, the algorithm is run ‘r’ times with different random initializations where’r’ is set to 5 to nd the random initialization with the lowest Sum of Square error/ Total Points valueand a graph for that is shown in the gure 3.4Figure 7: Random Initialization vs SSE/Total pointsFrom the above plot, the SSE/ Total Points value for all the random Initializations are :1.0261745127196509, 1.026653218697321, 1.1787796700494195, 1.178782241593976, 1.0261849230725586and it is clear that Random Initialization 1 has the lowest SSE/ Total Points value.

Using this random initialization, the clustering for the points is done with K = 3 and the plot forthat is shown below. Figure 8: Clustering of Dataset 2 for K = 3With this random initialization and K Value, the algorithm converged at Iteration 8.The python code for the K-means algorithm implemented is shown in the next page.5# -*- coding: utf-8 -*-“””Created on Wed Nov 21 11:09:15 [email protected]: 1796a”””importmatplotlib.

pyplotaspltimportnumpyasnpimportmatplotlib.pyplotasplttfromcollectionsimport defaultdictfromcopyimport deepcopyfile=open(C: Users 1796a Documents Sem 1 FSL AssignmentCodes Assignment 4 Dataset_2.txt), !X=np.loadtxt(file)def euc_dist(X, Y): #Function to Calculate Eucledian Distancereturn (np.linalg.norm(X-Y))def sse(X,Y): #Function to Calculate Sum of Squared Errosreturn (X-Y)**2def kmeansClust(k=3, i=1):”””C is a dictionary which stores the centeroids.M is a list which randomly permutates the input, to takeinitial centeroids, !d is a list of eucledian distances between centroid and each ptin dataset, !Cluster is a dictionary which appends each pt to thecorressponding index, !PrevCent stores the previous value of centroid, to check if ithas changed, !after each iteration”””C=defaultdict(list)size=len(X)np.random.

seed(i)M=np.take(X,np.random.permutation(X.shape0),axis=0)plt.scatter(X:,0, X:,1,s=25, cmap=viridis)plt.

savefig(“C: Users 1796a Documents Sem1 FSL Assignment Codes Assignment4 initialPoints2.png”),!, !for iin range(0,k):Ci=Mimax_iter=0PrevCent=deepcopy(C)list(C)for _in range(50): # Run loop max of 50 timesmax_iter=max_iter+1Cluster=defaultdict(list)for iin range(0,len(X)):d=list()for jin range(0,k):d.append(euc_dist(Cj,Xi))ind=d.index(min(d))Clusterind.append(Xi)for clin range(0,k):6Ccl0=0Ccl1=0for lin range(0,len(Clustercl)):Ccl0=Ccl0+ (Clustercll0)/len(Clustercl), !Ccl1=Ccl1+ (Clustercll1)/len(Clustercl), !for p,cinzip(PrevCent, C):if (list(PrevCentp)!=list(Cc) ):PrevCent=deepcopy(C)flag=0breakelse :print (“Converged at Iteration “, max_iter)flag=1breakif flag==1:breakColors=b,y,g,c,m,k,w,rd=0for valinrange(0,k):for pin range(0,len(Clusterval)):pltt.scatter(Clustervalp0,Clustervalp1,s=25, c=Colorsval), !#Calculate SSE hered=d+sse(Clustervalp0, Cval0)+ sse(Clustervalp1,Cval1), !pltt.scatter(Cval0, Cval1, marker= *, c=black,s=200, alpha=0.

5), !pltt.savefig(“C: Users 1796a Documents Sem1 FSL Assignment Codes Assignment4 Cluster_3_r1_Graph.png”),!, !pltt.show()return (d/size)def main():Val={}k=int(input(Enter number of Clusters: ))r=int(input(Enter max r value: ))for iin range(1,r+1):print (“R = “, i)Vali=kmeansClust(k=k,i=i)print (Val)lists=sorted(Val.items())x, y=zip( *lists)plt.plot(x, y)plt.xlabel(“Random Initialization”)plt.

ylabel(“Sum of Squared Errors”)plt.gcf().subplots_adjust(left=0.15)plt.savefig(“C: Users 1796a Documents Sem1 FSL Assignment Codes Assignment 4 SSE_K_Graph2.png”), !plt.

show()if __name__==”__main__”:main()72 Gaussian Mixture ModelGaussian Mixture model is generally the representation of sum of gaussian densities. In this task,there are two datasets on which the EM Algorithm of the GMM is performed which contains 1600and 1500 points respectively. The EM algorithm has two stages – the E-step and the M-step where atthe E-step, which is the expectation step, the initial parameters are evaluated and calculated and at themaximization step the log likelihood function is maximized and the parameters are estimated again.This process is repeated till the algorithm converges which is when the estimated parameters equalthe previous estimate of the iteration. The algorithm implemented in shown below. Algorithm 2:Gaussian Mixture Model – EM Algorithm1 Mean2 P Covariance3 if (initialization =kmeans) then4 Centroid(Kmeans)5 iniLog LogLikelihood6 while (Algo != Converge) do7fori 0to len(df) do 8Ci PMF (Data)9 Ci Ci S um(Ci)10 V al:append (Ci)11 end12 forj 0to len(Val) do 13Cj M aximize (C )14 N ewV al:append (C 1h; C 2h )15 j N ewV al V al+N ewV al16 n (Pj)17 end18 dif f eucledian (; n )19 n20 end The task involves two stages – one to initialize the mean vectors and covariance randomly and thesecond is to use the K-means centroid values as the mean vectors and covariance.

This process isdone for both the datasets and the plots are shown in the respective subheading.2.1 Dataset 1This dataset contains 1600 points in total. A depiction of the dataset is shown in the gure below. Figure 9: Dataset 182.1.

1 Random InitializationHere, the mean vectors and covariance is randomly initialized for the K value set 2 and is used to thethe E-M algorithm. The plot for the clustering and the log likelihood can be seen below. Figure 10: GMM Clustering with Random InitializationFigure 11: Log Likelihood Graph for Random InitializationAs, we can see from the gures that the algorithm converges very well and the log likelihood graphis also converging at a very good rate with respect to the Iterations which can be seen in gure 11.

The Plot and Log likelihood graph for K-means initialization is shown in the next page.92.1.2 K-means InitializationIn this phase, the mean vectors and covariance is initialized using K-Means clustering with K=2which is done in the previous task.

The Centroid of the K-clusters are used during initilization. Figure 12: GMM Clustering with K-Means InitializationFigure 13: Log Likelihood Graph for K-Means InitializationAs it is visible, both the initialization work well and can be used in their own way. Both workwell for the EM Algorithm, however, it is theoratically better to use the K-means initialization as italready has the centroids and the EM algorithm can converge quickly.102.2 Dataset 2This dataset contains 1500 points in total. A depiction of the dataset is shown in the gure below.

Figure 14: Dataset 22.2.1 Random InitializationHere, the mean vectors and covariance is randomly initialized for the K value set 2 and is used to thethe E-M algorithm. The plot for the clustering and the log likelihood can be seen below. Figure 15: GMM Clustering with Random InitializationAs, we can see from the gures that the algorithm converges very well and the log likelihood graph isalso converging at a very good rate with respect to the Iterations which can be seen in gure 16. ThePlot and Log likelihood graph for K-means initialization is shown in the next page.11Figure 16: Log Likelihood Graph for Random Initialization2.2.

2 K-means Initialization In this phase, the mean vectors and covariance is initialized using K-Means clustering with K=3which is done in the previous task. The Centroid of the K-clusters are used during initilization. Figure 17: GMM Clustering with K-Means Initialization12Figure 18: Log Likelihood Graph for K-Means InitializationAs it is visible, both the initialization work well and can be used in their own way. Both workwell for the EM Algorithm, however, it is theoratically better to use the K-means initialization as italready has the centroids and the EM algorithm can converge quickly. The code is implemented inPython 3.6 which is shown in the next page.13# -*- coding: utf-8 -*-“””Created on Mon Nov 26 23:40:23 [email protected]: 1796a”””importnumpyasnpimportpandasaspdimportrandomasrandimportmatplotlib.pyplotaspltfromscipy.

statsimport normfromsysimport maxintnp.random.seed(42)file=open(C: Users 1796a Documents Sem 1 FSL AssignmentCodes Assignment 4 Dataset_1.txt), !X=np.

loadtxt(file)labels=(1 *800)+(2*800)#1600 samplesdata={x: xs,y: ys,label: labels}df=pd.dF(data=X)fig=plt.figure()plt.

scatter(datax, datay,24, c=datalabel)def prob(val, myu, s, lam):p=lamfor iin range(len(val)):p *=norm.pdf(vali, myui, sii)return pdef expectation(dF, param):for iin range(dF.shape0):x=dFxiy=dFyip_cluster1=prob(x, y,list(parammyu1), list(params1), paraml0 ), !p_cluster2=prob(x, y,list(parammyu2), list(params2), paraml1 ), !if p_cluster1>p_cluster2:dFlabeli=1else :dFlabeli=2return dFdef maximization(dF, param):pts_cluster1=dFdFlabel==1pts_cluster2=dFdFlabel==214per_cluster1=len(pts_cluster1)/float(len(dF))per_cluster2=1-per_cluster1paraml=per_cluster1, per_cluster2 parammyu1=pts_cluster1x.mean(), pts_cluster1y.mean(), !parammyu2=pts_cluster2x.

mean(), pts_cluster2y.mean(), !params1= pts_cluster1x.std(),0, 0,pts_cluster1y.std() , !params2= pts_cluster2x.std(),0, 0, pts_cluster2y.

std() , !return paramdef dist(old_params, new_params):dist=0for param inmyu1,myu2:for iin range(len(old_params)):dist+=(old_paramsparami-new_paramsparami) **2return dist**0.5value=maxintepsilon=0.01iters=0df_copy=df.copy()df_copylabel=map(l x: x+1, np.random.choice(2,len(df)))params=pd.

dF(guess)while (value>epsilon):iters+=1newVal=expectation(df_copy.copy(), params)newParams=maximization(newVal, params.copy())value=dist(params, newParams)print (“iteration {}, value {}”.format(iters, value))df_copy=newValparams=newParamsfig=plt.figure()plt.scatter(df_copya, df_copyb,24, c=df_copylabel)153 Binomial Mixture Model Extra CreditBinomial Mixture Model to predict the outcome or values of two unknowns can be approached usingExpectation Maximization algorithm. Here, in this dataset, we have 10000 coin tosses by 2 coins’C1′ and ‘C2’.

There are 1000 sets in the dataset having 100 tosses in each set and each set cor-ressponds to a single coin, but we do not have the label for the coin, hence we can use ExpectationMaximization to estimate the missing parameters/ probabilities of occurance of both the coins C1 andC2.The notations going to be used – 1 = Probability Coin C1 landing a head 1 = Probability of Coin C2 landing a headFor simulation, we set 1 = 0.6 and1 = 0.5 initially just like how its mentioned in 1.

Insteadof calculating which coin would have generated the tosses, it is better to calculate the probability ofeach possible completion of the missing values(which is the label in this case).Let us consider the rst set(rst row) in the dataset. There are 61 Heads (1’s) and 39 Tails (0’s).P(61 HjC 1) = 100C61 611 (11)39= 0 :0798 (1)P (61 HjC 2) = 100C61 612 (12)39= 0 :00711 (2)Taking the weighted average here, we getW1= (1) (1) + (2)= 0:9181 (3)W 2= (2) (1) + (2)= 0:0819 (4)Since we know that, each set of data (row) is performed by a single coin, we have (X1; X2; X3; ::::::; X1000)as the value for each set of data where each X is a vector of 100 coin tosses. We can denote the hiddenvalue of label of coins as ( (Z1; Z2; Z3; ::::::; Z1000).

We know that, L( j X ) = P(X j) = Xz P(X; Z j) (5)If we denote ‘h’ by the number of heads in the i’th set,P(ZijXi; ) = P(Xi; Zij) P(Xij) (6)where, P(Xij) = 100Ch h1 (11)(1h)+ 100Ch h2 (12)(1h)(7)and, Zi2C1; C 2 (8)Therefore, since we know that E(H) = n*P (n=100 here) which isE(N o of H in C1j Xi; 1) = 100P (Zi=C1j Xi; 1)(9)Therefore, at the Expectation step (E-step) of the E-M algorithm we calculate the Binomial ProbabilityMass Function for both the coins C1 and C2 and we append the row probability of both the coins everyset of 100 values (every row).16After the expectation step, we need to perform Maximization. So, at the Maximization step, wemaximize the log likelihood function.The value at each iteration are calculated by normalizing the number of heads in total of all 1000iterations of coin C1 with respect to the sum of number of heads and tails for coin C1.

Similarly the value of the coin C2 is calculated. This process continues and the values keep getting updatedat each iteration and the process will halt when the difference between the current value and theprevious value is less than a threshold value. The threshold value set in this algorithm is 1e-6.The algorithm implemented for this problem is given below. Algorithm 3:Binary Mixture Model1df Load Dataset2 1 0.

63 2 0.54 ( 1 ; 2)5 dif f eucledian (; n ) n6 lim 1e-67 maxI ter 100008 while (iterlim) do 9fori 0to len(df) do 10C1h BinomialPMF (C1)11 C2h BinomialPMF (C2)12 C1h C1h C1h +C 2h13 C2h C2h C1h +C 2h14 V al:append (C 1h; C2h)15 end16 forj 0to len(Val) do 17C1h C1 N o H18 C1t C1 N o T19 C2h C2 N o H20 C2t C2 N o T21 N ewV al:append (C 1h; C 2h )22 end23 n1 S um(C 1h ) S um(C 1h )+ S um (C 1t)24 n2 S um(C 2h ) S um(C 2h )+ S um (C 2t)25 n (n 1; n 2)26 dif f eucledian (; n )27 n28 end The Simulation for the algorithm was performed using Python 3.6. After performing the simula-tions, the results obtained were 1 = 0.5488 2 = 0.4527and it has been observed that, even upon changing the initial parameters, that is, initial values, the -nal values converge at the above specied values. The screenshots of the simulation have been shownin the gure next page.

17The updation of parameters over various iterations is shown in the Figure 19.Figure 19: Parameter Updation over various iterationsThe Final values for both the coins, that is probability of heads landing on coins C1 and C2 isshown in the gure 20. Figure 20: Parameter Updation over various iterationsA simulation was performed over the dataset using the EM Algorithm in Python 3.6 with initialvalues set as 0.6 and 0.5 for the two coins respectively and nal values obtained as 0.5488 and 0.4527respectively.References1https://www.nature.com/articles/nbt1406———————————————————————————————————————- 18