Evaluation of Differential Evolution Algorithm with Various Mutation Strategies for Clustering Problems

ABSTRACT


INTRODUCTION
Evolutionary algorithms (EAs), motivated by Darwin's theory of natural selection, have become a powerful way to solve several different optimization problems in various domains [1]. Among them, the Differential Evolution (DE) algorithm is a simple, competent, and robust stochastic search strategy based on population, and it has been successfully used to catch global optimum on high-dimensional continuous problems [2]. Like a standard EA, DE applies evolution processes such as mutation, crossover and selection to transfer from one generation to the next. It differs significantly from other EAs in the fact that the distance between pairs of randomly selected individuals is used to modify the solution, and the selected solution's position guides the track of the search process. To employ the mutation operator in DE, there exist a few different mutation strategies that determine the solution to be modified and the number of different vectors to be used to find the distance for modification [3,4]. The effectiveness of DE heavily depends on the chosen mutation strategy due to different mutation strategies that can guide different tracks toward exploration and exploitation.
In recent years, DE has been widely utilized to solve clustering problems due to its ability to enhance the solution quality. It has been widely used to perform clustering independently [5][6][7] or incorporate it into the existing clustering approaches [8][9][10]. Paterlini and Krink described an innovative approach for DE based clustering [5,6]. They studied the performance superiority between genetic algorithm (GA), particle swarm optimization (PSO), and DE, and concluded that DE is more suitable for cluster analysis. Some paper proposed the combination of DE with local search approaches to achieve considerably better efficiency [8][9][10]. Nevertheless, it is needed that the comparisons on clustering performance of different mutation strategies for cluster analysis. In this paper, an empirical analysis is presented to compare and examine the performance of DE with different variants of mutation strategy for clustering problems. It is expected that the acquired information insight from the experiments may be useful and helpful to employ optimal mutation strategy for future DE researches in the clustering domain.
In the next section of the paper, a brief explanation of a traditional DE algorithm and different mutation strategies used in DE are presented. The DE based clustering method is explained in Section 3. In the fourth part, the outcomes of experimental tests are shown, and in Section 5, the paper is completed with a conclusion.

BACKGROUND
In this section, the basic structure of DE algorithm is firstly described, and then different variants of the mutation strategy used in DE are briefly explained.

Differential Evolution Algorithm
DE is an innovative heuristic population-based search approach that had been proposed by Storn and Price in 1995. It has become one of the most successful and widely used EAs to solve several real-world continuous global optimization problems in various domains [2,3]. Like a standard EA, DE maintains a population of individuals that are a sample of candidate solutions to an optimization problem. Hence, an initial population is created through random sampling with uniform distribution at the beginning of the algorithm. And then, DE iteratively performs three consecutive steps (namely mutation, crossover, and selection) until a stopping situation is reached.
Let , = { , 1 , , 2 , … , , } is the i th solution (individual) of the population, = { 1 , 2 , . . , } at the g th iteration where d is the data dimensionality and NP is the size of population.

Mutation
A trial vector V i,g is generated for each parent solution X i,g by perturbing a target solution, 1, with a scaled difference as follows: (1) Where i is an integer within [1,NP], i 1 , i 2 , and i 3 are random integers within [1,NP] such that i ≠ i1 ≠ i2 ≠ i3, and then f is a scaled factor within (0,).

Crossover
At the crossover phase, an offspring vector, U i, g is usually generated by applying binomial crossover operator as follows: Where i an integers within [1,NP], j is an integer within [1, d], rand(j)  U(0,1) and the crossover rate, CR  (0,1).

Selection
In the selection phase, the parent solution in the current population and its offspring vector are compared to determine which will remain in the next generation (iteration). The fitter solution is selected and added to the new population. For the maximizing problem, the solution vector for the next iteration is chosen according to the following; Where ( , ) indicates the fitness value of offspring and ( , ) denotes the fitness value of i th parent in the current population.

Different Mutation Strategies in Differential Evolution
Recent research works proposed numerous variants to the basic DE. In the literature, a notation of DE/x/y/z is commonly used to categorize these different variants [1]. In this notation, x indicates the way of choosing a target solution, y specifies the number of pairs of difference vectors applied, and the last symbol, z, identifies the adopted crossover operator. This paper intends to describe various mutation strategies. Thus, the notation DE/x/y is applied, and the character z is omitted. The random mutation strategy, DE/rand/1 is typically used in a standard DE algorithm. The mutation strategies [2,3] that are frequently used are as follows.

Random Mutation Strategy
DE/rand/1 and DE/rand/2 are the random mutation strategies that use one difference vector and two difference vectors, respectively. As mentioned in above, DE/rand/1 creates the trial vector with three randomly chosen solution vectors while DE/rand/2 uses five randomly selected solution vectors to generate the trial vector according to the following equation; , = 1, + 1 ( 2, − 3, ) + 2 ( 4, − 5, ) (4) Where 1 2 are two control parameters to scale differences of vectors, i1, i2, i3, i4, and i5 are disjoint randomly generated integers within [1, NP].

Best Mutation Strategy
The best mutation strategy applies the fittest solution vector in the population as the target vector. DE/best/1 and DE/best/2 represent two types of the best mutation strategy that use one difference vector and two difference vectors, respectively. These strategies generate the trial vectors as follows; Where f, f 1 and f 2 are the scaling factors within (0,), and i1, i2, i3, and i4 are disjoint randomly generated integers within [1, NP].

Current to Random Mutation Strategy
The notation DE/current-to-rand/1 indicates the current to random mutation strategy. This strategy uses a parent solution as a target vector and employed two difference vectors to produce a trial vector. The first difference is the difference between one random solution and the parent solution, whereas the rest is computed from two randomly selected vectors. DE/current-to-rand/1 produces the trial vector according to the following equation; Where f 1 and f 2 (0,) are the scaling factors, and i1, i2 and i3 are different randomly generated indexes within [1, NP].

Current to Best Mutation Strategy
This strategy is also known as the target to best mutation strategy and represented by the notation DE/current-to-best/1. It uses the parent solution as a target vector and applies two difference vectors to mutate the target vector. The first difference vector is calculated from the best and parent solutions, whereas the rest is computed from two randomly selected vectors. The trial vector is produced as followings; Where f 1 and f 2 (0,) are the scale number for controlling difference vector, and i1 and i2  [1, NP] that are different randomly generated indexes.

DIFFERENTIAL EVOLUTION BASED CLUSTERING ALGORITHM
DE maintains a number of possible solutions to the problem as a population. Each possible solution is encoded as a chromosome (individual). For applying DE to solve clustering problems, a cluster solution for the given data set is encoded as an individual. And then, cluster validity measures are used as objective functions to evaluate the fitness of the solution [11].
In this paper, centroid-based representation is used where a chromosome is encoded by real numbers, which represents the coordinates of centroids of a cluster solution. If a chromosome encodes k clusters of a ddimensional dataset, k*d is the length of this chromosome. Each chromosome of the initial population is constructed as follows; = { , , . . , , + , + , . . , , . . , ( − ) + , ( − ) + , . . , } where the very first d-dimensional vector stands for the first cluster centroid, the second d-dimensional vector denotes the coordinate of the second cluster center, and the last d-dimensional vector represent the k th cluster centroid for the given data set. The total intra-cluster distance [7] is used as an objective function to compute the fitness of each chromosome.
= ∑ ∑ ( , ) ∈ =1 (9) Where k is the number of cluster, C j is the jth cluster, d is a data point in C j , c j is the center of C j , and Dist is the Euclidean distance [12] between data point d and the center c j of the cluster C j .
In DE based clustering algorithm, each chromosome is initialized with k randomly selected cluster centers from a given dataset to construct an initial population. To compute the fitness of each chromosome, Euclidean distance between each data point and all cluster center of the chromosome is firstly calculated, and then the data points are assigned to the closet cluster, and finally, the sum of intra-cluster distance of each cluster is calculated. The population for the next generation is produced by mutation, crossover and selection. The best solution of the final population is the optimal cluster solution for the given dataset. The process of DE based clustering algorithm is given in Algorithm 1. For each data point p 6: Compute the Euclidean distance between data point p and all of the cluster centers 7: Assign the data point to the closet cluster 8: Compute the fitness of the chromosome according to eq. (9) 9: End. 10: End. 11: While the number of iteration is not equal to Itr do 12: Create a trial vector by applying the mutation operation 13: Create an offspring by applying the binomial crossover operator 14: Compute the fitness of the offspring 15: Update the population by evaluating the fitness of the parent and offspring vectors based on the selection operation 16: End.

EXPERIMENTAL STUDY
The main aim of this work is to provide some valuable information for developing a simple, efficient and robust DE based clustering algorithm. The most well-known, simple and efficient mutation strategies were taken into account in this work. The clustering performance of DE algorithms with six different mutation strategies is tested on some real datasets from the UCI machine learning repositroy [13]. Seven UCI standard datasets that are frequently used for metahuristics-based clustering [14] are utilized. The summary of these datasets is shown in Table 1.

Experimental Setup
For all DE based clustering algorithms with different mutation strategies, the crossover rate and the size of the population were respectively set to 0.9 and 100 [7,15], and the scaling factor was set as follows: f=0.5, f 1 =0.3 and f 2 =0.3. The number of maximum iteration was set to 100. The initial population was constructed in a similar fashion such that each chromosome was composed of k distinct data points that were randomly selected from the dataset. The algorithms were implemented in java programming language on Intel Core i7 processor, 8GB memory, and 64-bit operating system. Each algorithm was executed 30 times independently for each dataset. The quality of obtained cluster solutions and convergence speed of different DE variants were compared. The quality of clustering solutions was compared according to the following criteria: • The objective function values (total intra-cluster distance defined in eq. (9) ) • Sum of squared error (SSE) [12]: It calculate the sum of the squared distances from each data point in a cluster to the center of this cluster as follows: (10) Where k represents the number of clusters, C j stands for the jth cluster, d is a data point in C j , c j is the center of C j , and Dist is the Euclidean distance between data point d and the center c j of the cluster C j . Minimum SSE indicates better cluster solutions. • Quantization error [16]: It calculates the average distance between data points and the cluster center as follows: Where k indicate the number of clusters, C j stands for the jth cluster, d is a data point in C j , c j is the center of C j , | | is the number of data points in Cj and Dist is the Euclidean distance between data point d and the center c j of the cluster C j . Lower quantization means the better cluster results.

Experimental Results
The experimental results obtained by DE based clustering algorithms with different mutation strategies are summarized in Table 2-4. The qualities of solutions obtained by each algorithm are tabulated in terms of the worst, best, mean and standard deviation (Std.).
The vales of objective function obtained for all datasets are presented in Table 2. According to the mean values of the given results in Table 2 Glass, and Ecoli datasets. For Wine, Breast Cancer, and Pima datasets, the values of standard deviation achieved by DE/current-to-best/1 are smaller than the values acquired by others. It can be said that DE/best/2 and DE/current-to-best/1 is more robust than others where DE/current-to-best/1acheived more stable results for high-dimensional datasets (in terms of number of feature and number of data instances) than DE/best/2.  Table 3 and Table 4 summarized the quality of cluster solutions acquired from different mutation strategies in terms of the sum of squared error (SSE) and quantization error, respectively. According to the mean values given in both Table 3 and Table 4, the solutions acquired by random mutation strategies (DE/rand/1 and DE/rand/2) are better than other strategies for almost all of the test datasets. However, the mutation strategies that involve the best vector (DE/best/1, DE/best/2 and DE/current-to-best/1) obtained more stable results than others according to the standard deviation values given in both tables.
The convergence manners of different mutation strategies for all of the test datasets are shown in Figure 1. Based on the same 30 separate runs as mentioned above, the figure is illustrated with the averages of this runs. As observed in Figure 1, the convergence speed of DE/best/2 is the fastest on all datasets, whereas DE/rand/2 is the slowest and worst mutation strategy for all test datasets. Although DE/best/1 is faster than all variants except DE/best/2, it is not able to search for better solutions in the later stages, and it easily catches to local optima. DE/rand/1 finds better solutions for some datasets than other strategies in the late iterations, even though its convergence rate is slow in the early stages. DE/current-to-rand/1 can be regarded as a second-worst mutation strategy because it is slower and does not reach a better solution for all datasets except Iris. The exploration ability of DE/current-to-best/1 is not sufficient, and it does not catch a better solution for some datasets, although its convergence speed is a little fast.   According to the overall experimental results, it is noticed that as follows: the mutation strategies based on the best solution are more robust and faster than others because these use the guidance information of the best solution to increase the exploitation ability and convergence speed of DE. Among them, DE/best/2 is more effective and robust for datasets with high number of clusters (Glass and Ecoli) due to the guidance information from the best solution and the application of two differentials. DE/rand/1 is able to find better solutions not only for high dimensional datasets (Breast Cancer and Pima) but also for moderate size of datasets because it can keep good diversity.

CONCLUSION
This paper presents an experimental investigation to analyze different mutation strategies of the DE algorithm for clustering problems. The performance of six mutation strategies has been tested on some UCI standard datasets mostly used in EAs based clustering. The quality of solutions and the convergence speed of different DE variants were compared to investigate the outcomes of the experiments. The experimental analysis pointed out that DE/rand/1 accomplishes to find better solutions for the moderate size of datasets. Besides, it shows good exploitation behavior in the later stages, while DE/best/2 shows good exploration behavior in the early stages. The test also showed that the mutation strategies that used the best solution achieve to find more stable results. Future work is to propose an effective mutation strategy for addressing large-scale clustering problems by applying the information from this experimental study.