A modified genetic algorithm with a new crossover mating scheme

Received Feb 20, 2019 Revised Apr 11, 2019 Accepted Apr 24, 2019 This study introduced the Inversed Bi-segmented Average Crossover (IBAX), a novel crossover operator that enhanced the offspring generation of the genetic algorithm (GA) for variable minimization and numerical optimization problems. An attempt to come up with a new mating scheme in generating new offspring under the crossover function through the novel IBAX operator has paved the way to a more efficient and optimized solution for variable minimization particularly on premature convergence problem using GA. A total of 597 records of student-respondents in the evaluation of the faculty instructional performance, represented by 30 variables, from the four State Universities and Colleges (SUC) in Caraga Region, Philippines was used as the dataset. The simulation results showed that the proposed modification on the Average Crossover (AX) of the genetic algorithm outperformed the genetic algorithm with the original AX operator. The GA with IBAX operator combined with rank-based selection function had removed 20 or 66.66% of the variables while 13 or 43.33% of the variables were removed when GA with AX operator and roulette wheel selection function was used.


INTRODUCTION
Data preprocessing [1][2][3] which is an imperative stride and considered to be one of the prime methods that is useful in data mining (DM), have led to the enhancement on the quality of data that positively contributes improvement to the precision and accuracy level as well as the mining efficiency of a prediction model [4,5].
Data reduction, as an important data preprocessing technique in DM, is achieved through the selection and removal of unnecessary attributes and or variables in the dataset [6].It is well known that in some cases, reducing original training set or variables by selecting the most representative information is advisable, yet obtaining nearly the same result or data-driven output [7][8][9].Minimizing the size of the dataset aids in increasing the ability of generalization properties of the model.It also helped in lessening the space and computational time as well as minimizing the size of formulas used by the algorithm on the execution process [10].Maximized accuracy through the reduced number of attributes [11,6] and better understandability and interpretability of results are among the many benefits perceived in data reduction [12].
One of the competent data reduction, feature selection and global optimization algorithm that is widely used in related studies is the Genetic Algorithm [13][14][15].Genetic Algorithms (GA), which was introduced by J.H. Holland in the 1970s, represents wide-ranging search method based on evolution and population genetics where its major executory mechanism relies on the crossover operator [16].The unique integration of selection, crossover, and mutation operators serves as the driving force behind the successful implementation of GA.According to [17], the most widely-known problem in GAs is premature convergence.It occurs when genetic operators converge in an early stage after a few generations have been made and get stagnated there (local optima).Premature convergence occurs when the genetic operators cannot produce offspring that are a better representative of their parents whilst it is associated due to the loss of diversity in the population.According to [18], one of the technique to prevent premature convergence is to design an efficient crossover operator; thus, this study.
The activity that lies behind crossover is the creation of offspring that is achieved by combining information of the two parent chromosomes [19,20].For real encoding problems using the arithmetic function, the average crossover (AX) [21] is modified in this study.The simplicity of the average crossover has opened an avenue for improvements for better genetic algorithm performance.The modification will solve the weakness of the GA since a new method of pairing genes from the chromosomes will be observed, and those other researchers may use it for their experimental parameters setting.
There is an appeal in the literature that encourages the enhancement of crossover operators for more effective optimization schemes of evolutionary algorithms.The influence of crossover operators is vital to the whole genetic algorithm process in the quest for optimal search space [22,23].
Therefore, this study proposed a novel crossover operator as an enhancement to the average crossover of the genetic algorithm.The novel crossover is called Inversed Bi-segmented Average Crossover (IBAX) that alters the offspring generation of parents that are instrumental for the next generation.The rest of the paper is arranged as follows.Section 2 discusses the literature review of Genetic Algorithm.Section 3 includes the design and methodology used in the study.Section 4 discusses the results and discussions while Section 5 highlights the conclusion and recommendation.

LITERATURE REVIEW 2.1. Genetic Algorithm
Genetic algorithms as defined by [24], is one of the many evolutionary algorithms based on the rules of biological evolution for global optimization solution.
GA is known as one of the most competent and widely held techniques that are used to search the best or ideal solution for problems with a huge search space especially in combinatorial problems where the search space is of factorial order.GA produce and controls some individuals through the integration of various suitable generic operators to look for optimal solutions.The bottleneck for an optimal genetic algorithm implementation relies on its three fundamental operations after creating the initial population viz., selection, crossover, and mutation functions.Figure 1 shows the flowchart of the genetic algorithm.A modified genetic algorithm with a new crossover mating scheme (Allemar Jhone P. Delima) 167

Initialization / Evaluation of Fitness Function
Fitness function serves as the backbone of the evaluation process of fitted values; hence, a vital step in GA execution.This serves as a performance determinant for relevant judgment [25].

Selection
This stage of the genetic algorithm is where the members in the population are selected to enter into the mating pool for the next function which is the crossover stage.The selection of an optimal operator for this stage is vital to ensure that members of the population who have higher fitness values can have a bigger chance of being selected for mating.Although, members with lower fitness function do still have a slim chance of being selected for reproduction.It is important to select the best members of the population to ensure that the search process is global and does not simply meet the nearest local optimum [26].Selection is one of the important aspects of the GA process, and there are several ways for the selection as to wit: Binary Tournament Selection, Stochastic Universal Sampling (SUS), Roulette Wheel Selection (RWS), Elitism Selection, and Rank-based Selection.For the detailed explanation of the abovementioned selection schemes, the study of [27] is recommended.Below are the following selection functions used in this study:

-Roulette Wheel Selection (RWS) Function
According to [28], Roulette selection is one of the simplest traditional GA selection technique.To execute, all the chromosomes in the population are placed on the roulette wheel according to their fitness value.A segment is assigned as representation to each individual commensurate to their fitness value; hence, the bigger the fitness value is, the larger the segment.Then, the virtual roulette wheel is spun.The individual corresponding to the segment on which roulette wheel stops are then selected.The process is repeated until the desired number of individuals is selected.Individuals with higher fitness have more probability of selection.

-Rank-based Selection Function
The rank-based selection function according to [29] can be assigned depending on the distribution of chromosomes according to their fitness values.This can be executed through positioning the chromosomes in decreasing order according to their fitness values.Next is to allocate a rank value on every chromosome That corresponds to its arrangement in the set and then calculate the new fitness value for every chromosome using (1): where 1<max<=2 & min = 2-max

Crossover
The Crossover is identified to be the most important operator in genetic algorithms.It is responsible for generating new offspring that will be used for the next generation by combining features of two parent chromosomes [21].
A recent study was conducted by [22] that enhanced the Average Crossover (AX) operator of the genetic algorithm.The proposed operator is called Cross Average Crossover (CAX).The use of the modified genetic algorithm with CAX operator and rank-based selection function yielded to more decreased variables than the traditional genetic algorithm, but a degradation phenomenon [30] was depicted.The CAX operator with rank-based selection function eliminated those individuals with higher fitness values due to the structure of its mating scheme.
According to [19], there are two categories of crossover development.They are called parent-centric and mean centric operators.The parent-centric approach generates offspring within the vicinity of each of the parent chromosomes while the mean centric generates offspring solutions by identifying the central tendencies of the parents involved.The Average Crossover which is a well-known crossover operator for real encoding problems found in the study of [21], that is modified this study, is outlined below along with the CAX operator:

-Average Crossover (AX)
Part or all of the genes are averages of the same alleles in both the parents.Select two parallel parents and compute its average to create offspring.

-Cross Average Crossover (CAX)
A modified version of Average Crossover (AX) where the first gene in the first chromosome and the last gene of the second chromosome are averaged and get its value.The resulting average values are considered as offspring.Repeat the steps until genes from chromosomes have crossed in creating offspring.

Mutation
Studies have been carried out on the varieties of mutation techniques to improve GAs performance over the years.The purpose of mutation operation is to change the genes of the offspring and to increase the diversity of the population.This process enables GAs to jump out of local or suboptimal solutions to avoid premature convergence [19].

METHODOLOGY
In this study, the average crossover which is one of the crossover operators in the genetic algorithm is modified.The use of the roulette wheel and rank-based selection function were observed.The variables who obtained the lowest fitness function in each generation for ten generations were removed.Instead of pairing the parallel genes from chromosomes x and y and compute its average to produce offspring z as shown in Figure 2, it is suggested to segment the chromosomes (x and y) into two and inversely compute the average of genes within each segment created as depicted in Figure 3.The modified crossover will be called Inversed Bisegmented Average Crossover (IBAX).

Existing Traditional Average Crossover
The average crossover is simple and can be implemented through the following steps: Step 1: Take two parents from the selection pool.
Step 2: Create offspring Z from two parallel parents X and Y Step 3: Use the formula Step 4: For i = 1 to n do formula (2) Step 5: End do Figure 2. Average crossover with roulette wheel selection function

Modified Average Crossover
For the IBAX operator to be realized, the following steps must be executed: Step 1: Take the parents from the selection pool.
Step 2: Count the number of genes found in the chromosomes.Identify if the dataset is in odd or even numbers.
Step 3: Segment the chromosomes (x and y) by dividing the total number of genes in the chromosomes into two and make sure that both first and second segments must contain an equal number of genes in an even count.
Step 4: On the first segment, create offspring Z for each gene by inversely pairing the first gene from chromosome X to the last gene on chromosome Y.Repeat until the last gene of the chromosome X and the first gene of the chromosome Y have inversely mated and have produced an offspring using formula (2).
Step 5: Execute the same process on the second segment until genes from all segments have produced offspring.
In the case of odd datasets, the last genes of the chromosomes will not be combined in the second segment and will automatically be mated with each other to produce offspring.

Datasets
In this study, a total of 597 records of student-respondents in the evaluation of the faculty instructional performance from the four State Universities and Colleges (SUC) in Caraga Region, Philippines were used as the datasets.There were thirty (30) variables that represent the faculty instructional performance having divided into six (6) parts viz., methodology, classroom management, student discipline, assessment of learning, student-teacher relationship, and peer relationship.Each category has five items as shown in Table 1.

Simulation result for GA with AX operator and RWS Function
The simulation on the genetic algorithm was done for ten generations utilizing the existing traditional average crossover and roulette wheel selection function.The 597 records of random student-respondents in the evaluation of the faculty instructional performance (IP) from the four State Universities and Colleges (SUC) in Caraga Region, Philippines were instrumental in this study.
First Generation: Variable C2 is removed from the chromosome since it obtained the lowest fitness value of 171396 as evident in Table 2.
Second Generation: Variables M5 and A2 were removed from the chromosome since both obtained the lowest fitness value of 263169 as evident in Table 3. Third Generation: Variable C3 is removed from the chromosome since it obtained the lowest fitness value of 265225 as evident in Table 4.
Fourth Generation: Variables C4 and A5 were removed from the chromosome since it obtained the lowest fitness value of 266256 as evident in Table 5. Fifth Generation: Variable C1 is removed from the chromosome since it obtained the lowest fitness value of 268324 as evident in Table 6.
Sixth Generation: Variable P3 is removed from the chromosome since it obtained the lowest fitness value of 272484 as evident in Table 7.

Simulation result for GA with IBAX operator and rank-based selection function
The simulation on the genetic algorithm was done utilizing the novel Inversed Bi-segmented Average Crossover (IBAX) operator and rank-based selection function on the same datasets and number of generations.
First Generation: Variable C2 was removed from the list of variables after applying the rank-based selection.The variable C2 obtained the lowest fitness value in the rank-based selection.Hence, it does not have any chance to be selected.Moreover, after applying the inversed bi-segmented average crossover (IBAX) operator and obtained the fitness value of the offspring, variable C3 was removed from the chromosomes since it obtained the lowest fitness value of 224676 that will not warrant for the next generation.Thus, in the first generation, there were two variables removed from the list as shown in Second Generation: After applying the inversed bi-segmented average crossover (IBAX) operator, variables ST2 and C1 were removed from the chromosomes since they obtained the lowest fitness value of 262144.In the second generation, there were two variables removed from the list as shown in Table 13.Third Generation: After applying the inversed bi-segmented average crossover (IBAX) operator and obtained the fitness value of the offspring, variables ST4 and A2 were removed from the chromosomes since both obtained the lowest fitness value of 270920.3.In the third generation, there were two variables removed from the list as shown in Table 14.Fourth Generation: After applying the inversed bi-segmented average crossover (IBAX) operator, variables P2 and A1 were removed from the chromosomes since both obtained the lowest fitness value of 273529.In the fourth generation, there were two variables removed from the list as shown in Table 15.Fifth Generation: After applying the inversed bi-segmented average crossover (IBAX) operator, variables ST3 and A4 were removed from the chromosomes since both obtained the lowest fitness value of 275100.3.In the fifth generation, there were two variables removed from the list as shown in Table 16.Sixth Generation: After applying the inversed bi-segmented average crossover (IBAX) operator, variables C5 and ST5 were removed from the chromosomes since both obtained the lowest fitness value of 278256.3.In the sixth generation, there were two variables removed from the list as shown in Table 17.Seventh Generation: After applying the inversed bi-segmented average crossover (IBAX) operator, variables SD5 and SD2 were removed from the chromosomes since both obtained the lowest fitness value of 279841.In the seventh generation, there were two variables removed from the list as shown in Eighth Generation: After applying the inversed bi-segmented average crossover (IBAX) operator, variables M1 and A5 were removed from the chromosomes since both obtained the lowest fitness value of 281961.In the eight generations, there were two variables removed from the list as shown in Table 19.
Ninth Generation: After applying the inversed bi-segmented average crossover (IBAX) operator, variables A3 and P3 were removed from the chromosomes since both obtained the lowest fitness value of 287296.In the ninth generation, there were two variables removed from the list as shown in Table 20.Tenth Generation: After applying the inversed bi-segmented average crossover (IBAX) operator, variables M5 and P4 were removed from the chromosomes since both obtained the lowest fitness value of 290521.In the tenth generation, there were two variables removed from the list as shown in Table 21.

Evaluation of the efficacy and reduction rate using GA with AX and IBAX operators
The variable minimization process using the genetic algorithm with average crossover operator and roulette wheel selection function has depicted a decrease after the ten generations.From the 30 variables, the numbers were minimized to 17 variables.43% of variables were removed as depicted in Table 22.
Meanwhile, the variable minimization process using the genetic algorithm with the proposed novel mating scheme called inversed bi-segmented average crossover operator, and rank-based selection function has depicted a noticeable decrease after the ten generations.From the 30 variables, the numbers were minimized to 10 variables after the generations.A total of 66.66% of variables were removed as depicted in Table 23.Since the amount of reduction varies according to the genetic algorithms used, removing 66.66% of the variables in the dataset is good enough as the notion of dropping one or more variables should help reduce dimensionality is certain.The ratio of feature reduction of more than 60% is acceptable just like in the work of [31].To have a further evaluation on the efficacy and reduction rate of the proposed novel crossover, the GA with the IBAX operator was compared along with the other real encoding-based crossover mechanism of the GA such as geometrical crossover [32] and cross average crossover (CAX) [22] aside from the AX operator.The simulation result showed that the genetic algorithm with a new crossover mating scheme outperformed the other existing real encoding-based crossover operators of genetic algorithm in reducing variables as depicted in Table 24

CONCLUSION AND RECOMMENDATION
Through the study, a novel approach for the optimization process using another crossover operator of genetic algorithm was introduced and added to the body of knowledge.The proposed modification on the genetic algorithm with inversed bi-segmented average crossover (IBAX) has paved the way to the enhancement of GA's average crossover mating scheme that affects GA's optimization performance in general.It is evident that the IBAX operator performed the minimization process way better than the average crossover since there were 10 and 17 variables left, respectively, after the tenth generations.
For future works, it is suggested to use the novel IBAX operator in the different type of datasets and incorporate the modified genetic algorithm in various data mining techniques and approaches that need variable minimization or feature reduction process such as in prediction.

Figure 3 .
Figure 3. Inversed Bi-segmented Average Crossover with the rank-based selection function

Table 1 .
Variables used in the study

Table 2 .
G1 using an average crossover with RWS function

Table 3 .
G2 using an average crossover with RWS function A modified genetic algorithm with a new crossover mating scheme (Allemar Jhone P. Delima)

Table 4 .
G3 using an average crossover with RWS function

Table 5 .
G4 using an average crossover with RWS function

Table 6 .
G5 using an average crossover with RWS function

Table 7 .
G6 using an average crossover with RWS functionSeventh Generation: Variable A1 is removed from the chromosome since it obtained the lowest fitness value of 274052.3 as evident in Table8.
A modified genetic algorithm with a new crossover mating scheme (Allemar Jhone P. Delima)

Table 8 .
G7 using an average crossover with RWS function

Table 9 .
G8 using an average crossover with RWS functionNinth Generation: Variable SD2 is removed from the chromosome since it obtained the lowest fitness value of 277729 as evident in Table10.

Table 10 .
G9 using an average crossover with RWS functionTenth Generation: Variables C5 and A3 were removed from the chromosome since it obtained the lowest fitness value of 280370.3 as evident in Table11.

Table 11 .
G10 using an average crossover with RWS function

Table 12 .
G1 using IBAX with the rank-based selection function

Table 13 .
G2 using IBAX with the rank-based selection function

Table 14 .
G3 using IBAX with the rank-based selection function

Table 15 .
G4 using IBAX with the rank-based selection function A modified genetic algorithm with a new crossover mating scheme (Allemar Jhone P. Delima) 177

Table 16 .
G5 using IBAX with the rank-based selection function

Table 17 .
G6 using IBAX with the rank-based selection function

Table 18 .
G7 using IBAX with the rank-based selection function

Table 19 .
G8 using IBAX with rank-based selection function

Table 20 .
G9 using IBAX with the rank-based selection function A modified genetic algorithm with a new crossover mating scheme (Allemar Jhone P. Delima) 179

Table 21 .
G10 using IBAX with the rank-based selection function

Table 22 .
Simulation result for GA with AX operator and RWS function

Table 24 .
. Comparative result for variable minimization using genetic algorithms