Adaptive Parameter Control Strategy for Ant-Miner Classification Algorithm

Received Jun 27, 2019 Revised Feb 14, 2020 Accepted Mar 11, 2020 Pruning is the popular framework for preventing the dilemma of overfitting noisy data. This paper presents a new hybrid Ant-Miner classification algorithm and ant colony system (ACS), called ACS-AntMiner. A key aspect of this algorithm is the selection of an appropriate number of terms to be included in the classification rule. ACS-AntMiner introduces a new parameter called importance rate (IR) which is a pre-pruning criterion based on the probability (heuristic and pheromone) amount. This criterion is responsible for adding only the important terms to each rule, thus discarding noisy data. The ACS algorithm is designed to optimize the IR parameter during the learning process of the Ant-Miner algorithm. The performance of the proposed classifier is compared with related ant-mining classifiers, namely, Ant-Miner, CAnt-Miner, TACO-Miner, and Ant-Miner with a hybrid pruner across several datasets. Experimental results show that the proposed classifier significantly outperforms the other ant-mining classifiers.


INTRODUCTION
Machine learning (ML) is a data analysis technique and is intensively used for automating analytical model construction. ML performs a particular task by utilizing discovered patterns and inference without using explicit instructions. This technique is considered a branch of artificial intelligence. ML uses algorithms and statistical models to create a mathematical model on the basis of a given dataset. This model is known as a training model and will be used to make decisions or predictions for real-world problems [1]. In the ML term, a learning process is classified into two main instances, namely, unsupervised and supervised learning [2]. Unsupervised learning, also known as a clustering task, aims to analyze the given data into groups according to the behavior and similarities of data [3]- [5]. In contrast, supervised learning, also known as classification, is a powerful learning algorithm, where the target is the class of a specific data [6]. Classification is one of the studies in the field of ML with a wide variety of industrial and commercial applications. These applications include medical diagnosis, detection of spam emails, determination of bankruptcy, detection of intrusion, determination of handwritten digits, and face detection [7]- [10].
The algorithm that performs ML classification is known as a classifier algorithm. Numerous classifiers, such as decision trees, rule-based models, linear models, lazy evaluation methods, probabilistic, and nonlinear models, have been developed in the literature [11]. In the concepts of understandability, certain classifier algorithms produce classification models with high accuracy; however, the structures of the models are highly complex and difficult to understand. Other classification algorithms, including rule-based classification, produce an understandable model for users. A number of methods and techniques have been proposed; among which, the most successful is ant colony optimization (ACO), also known as the ant-mining

RELATED WORKS
ACO is a swarm-intelligence metaheuristic algorithm used to solve combinatorial optimization problems. This algorithm was first applied to the traveling salesman problem and has since been widely applied to a variety of NP-hard optimization problems, such as quadratic assignment problems, vehicle routing with time windows, grid computing, and data mining [21], [22]. The first ACO algorithm is the ant system, proposed by Dorigo et al. in the 1990s [23]. Real ants use the stigmergy method to find good paths when searching for a food source and deposit pheromones to indicate these paths. The amount of laid pheromones depends on the food quality and distance. As each ant randomly moves, other ants can detect previously visited paths and decide whether to visit them. The pheromones are then deposited along these paths. The process is characterized by positive feedback loop and stochastic behavior. Therefore, indirect communication among ants searching for food provides them with the ability to find the optimal path between the food source and the ants nest [24].
Artificial ants also behave as real ants and simulate their natural counterparts to seek for an optimal solution to the problem at hand. Each ant employs a stochastic constructive behavior and adopts two factors, namely, the pheromone amount and heuristic information, to formulate its solution. Artificial ants also exhibit other features that cannot be found in real ants. Each ant has an internal memory to trace their previous action, and the local search procedure may be used to improve the quality of the solution produced by each ant [25]. The algorithmic outline of the ACO metaheuristic algorithm consists of three main procedures. The first procedure is to construct the solution, where each ant builds a full solution to the problem at hand by using the probabilistic state transition rule. The second procedure is the daemon action (found in many ACO variants), which is essential to obtain competitive results. An example of the daemon action is the local search procedure. The local search starts with an individual solution and iteratively applies changes to improve it. The third procedure updates the pheromone amount in two phases. The first phase deals with the pheromone amount that evaporates from every path, whereas the second increases the pheromone amount on the path that leads to highquality solutions [26]. The concept of ACO classification has been used to express useful information from data. The information is presented as a rule-based model in the form of IF <conditions> THEN < class>. The rule antecedent (IF) consists of a set of conditions, which is connected by a logical conjunction operator (AND), that is, IF term1 AND term2 AND… Each term is a triple <attribute, operator, value> (i.e., <Gender = Male>). The first developed classifier algorithm, called the Ant-Miner, was introduced by Parpinelli, Lopes, and Freitas (2002) to discover classification rules by using the concept and principles of the ACO algorithm, divide and conquer strategy and ML [27]- [29]. Ant-Miner uses certain ACO procedures to generate a number of rules and determine the best option for a given set of training instances. The process consists of creating one rule for each ant and selecting the best option to include in the discovered rule list. The instances covered by the best rule are separated from the set of training instances. The process is repeated until all training instances are covered by the discovered rule list. The algorithmic outline of the Ant-Miner algorithm contains three importance components, known as construction rule, pruning procedure, and pheromone update procedure.
Rule construction refers to the routines that ants use to incrementally construct a classification rule. After the starting term is selected, one term is added to the ant path. Term selection depends on the available pheromone amount and heuristic information. The ants' solutions are chosen based on the values of pheromone and heuristics information. The higher the values of those factors, the more chance the solution to be selected to be included in the classification rules. The construction graph in this algorithm is fully connected where each term is presented as a node, and the possible paths connect the nodes. Therefore, each ant creates its own path, which represents a classification rule. Figure 1 shows an example of two solutions constructed by ants, which represents the following classification rules: 1. IF attribute 1 = v1, 1 AND attribute 2 = v2, 1 AND attributeN = vn, 1 THEN class = Class1; 2. IF attribute 1 = v1, 2 AND attribute2 = v2, 2 AND attributeN = vn, n THEN class = Class3. Rule pruning, a popular procedure in the rule-discovery process, aims to reduce the complexity of the discovered rules by avoiding irrelevant terms, thereby boosting the algorithm's performance. The pruning method removes irrelevant terms after the rules are developed to their maximum sizes. An example of a rule from the ML repository of the University of California Irvine (UCI) dataset called United States Congressional Voting is shown in Figure 2. The rule is produced using the Ant-Miner classifier without the use of rule pruning. The dataset includes 16 attributes, all of which occur in the discovered rule. The pruning procedure deletes one term at a time from the rule to improve its quality. This procedure will be iterated as long as more than one term is left in the rule, or additional improvement occurs. An example of a pruned rule is shown in Figure 3, which presents the same rule in Figure 2. The rule is simple and exhibits few terms but covers additional instances and is accurate.  Pheromone updating is the indirect mechanism used to achieve communication among ants in the colony. This procedure is responsible for awarding the pheromone amount to good-quality rules in two steps. First, the pheromone values are decreased to avoid an unlimited pheromone amount from accumulating which allow the ant colony to forget previously constructed low-quality rules. The pheromone is then deposited to good rules constructed by the ant. During the learning process, the ants converge to high-quality rules.
A number of modifications have been proposed in constructing ACO rule-based classification. Liu et al. (2002) introduced a less accurate heuristic function to reduce the algorithm's computational time [30]. An extension was proposed to increase the exploration of the search space by adapting the state transition strategy [31]. Jiang et al. (2005) proposed three major modifications on the pheromone update strategy, heuristic function, and transition rule [32]. Another study proposed two strategies, namely, punishing and mutation operators [33].  proposed a modification in the rule-pruning procedure for use with big data [20]. Another modification was performed by Chan and Freitas (2006b) for multilabel data classification [34]. Smaldon and Frietas (2006) proposed a strategy to determine the class consequent in advance and then generate classification rules for each particular class [35]. Another work introduced the contribution of multiple ant colonies rather than a single one using the original Ant-Miner [36]. Antminer+ has been developed with different aspects in which the pheromone is initialized with the upper trail limit, and upper and lower boundaries are introduced to the amount of deposited pheromones. The pheromones are updated using a strong elitist strategy, which can be an iteration of the best ant or the global-best ant. Antminer+ also exhibits a different heuristic function, new pheromone updating strategy, and new self-adaptive mechanism to weigh the α and β parameters. Lastly, Antminer+ defines the environment as a directed acyclic graph, where all other algorithms use a fully connected graph [37], [38]. The threshold criterion was proposed to determine the importance of each term for including or rejecting them from inclusion [18], [19]. Another study was proposed to handle the continuous attributes on the run of the algorithm rather than discretization in the pre-processing step [39], [40]. A study was proposed to modify the rule construction process by considering the correlation between terms [41]. Another research was conducted in medical diagnosis in which the Ant-Miner was combined with the fuzzy system [42]. Agravat et al. (2010) proposed Ant-Miner for intrusion detection, with three modifications, namely, pheromone update procedure, transition rule, and fitness function [43]. Salama and Abdelbar (2010) produced a multi-pheromone concept [44]. Liang et al. (2011) introduced a multistage rule-choosing method to select the rule set [45]. Another research study proposed a hybrid between the Ant-Miner and simulated annealing algorithm by incorporating the latter in the rule construction process [46]. A study proposed the implementation of ant population rather than a single ant to discover the classification rule in each iteration [47]. Baig and Shahzad (2012) introduced a different heuristic function according to the correlation between the terms. The study proposed a random selection method to determine the values of the α and β parameters [48]. Other studies have determined the class consequent in advance and then constructed the rule for this class, allocating different types of pheromone for each class [49]. Different quality functions, and each ant in the colony must select the quality functions before the rule construction process, as proposed by Rajpiplawala & Singh (2014) [50]. The Laplace correction heuristic function was proposed to replace the original heuristic function in the Ant-Miner algorithm [51]. Another study proposed a heuristic function that considers the correlation between attributes and instance coverage [52]. A related study also proposed different pheromone update procedures that used an evaporation factor which dynamically updates during the search process [17].
This study aims to provide an adaptive strategy to control the important terms online to be included in the construction rule on the basis of the existing literature in the area of ACO algorithms for rule-based classification. This study also aims to improve the stability of the algorithm and classification performance and simplify the discovered rules through the proposed strategy.

THE PROPOSED ACS-ANTMINER CLASSIFICATION ALGORITHM
Our proposed technique introduces a new parameter called the importance rate (IR), responsible for adding only the important terms to each rule. The IR is a pre-pruning criterion based on the probability (heuristic and pheromone) amount. If the probability amount of the selected term is below the IR criterion, then the term is rejected from inclusion; otherwise, it is added. This technique handles irrelevant terms during learning instead of finding a complete rule and then pruning it. Each ant is assigned its own IR value during the model learning process. The IR value is modified over time to ensure diversity and consider the different search stages. The other terms that can be added to the rules are then explored, and the learning process is prevented from being stuck in local optima. The ant colony system (ACS) algorithm is designed to optimize the IR parameter. An ant solution represents the IR parameter of the Ant-Miner classifier. The classification performance of the Ant-Miner classifier on the training dataset is used to design the feedback collection strategy of the ACS. Figure 4 shows the main stages in the implementation of the proposed ACS-based parameter selection for the Ant-Miner classifier.
1. Pheromone is initialized on a feedback collection table of the ACS-based parameter selection algorithm.  The adaptive parameter selection strategy describes the application of ACO concept which collects the feedback from the classification process to adjust the threshold of pre-pruning in the Ant-Miner classifier. This dynamic adjusted threshold is called importance rate (ζ). This parameter controls the irrelative terms during the rules construction phase of the classifier according to its probability (i.e., the heuristic and the pheromone information). To understand the strategy of ζ selection, three vectors have been developed, a vector of threshold values is denoted as ParametersValue(vn) = { υ1, υ2, υ3,..., υn} which represents various ζ values, and another vector of probabilities for each parameter value υn, denoted as ParametersProbabilities(vn) = {p(υ1), p(υ2), p(υ3) ,..., p(υn)}, which determines the selection probability of each parameter value, and a vector to collect the local and global quality of the threshold values, denoted as feedbackCollection (vn) = {q(υ1), q(υ2), ..., q(υn )}. When each threshold value is used υn, then its current quality q(υn) is updated. Parameter selection: The new ACS selection transition rule is introduced as shown in (2). This is in accordance with the pheromone value that is considered for parameter value selection in our proposed ACSbased parameter selection algorithm.
where 0 is a parameter of the controlling state transition rule which will gradually increase between [0.1,...,0.9]. The maximum (max) and minimum (min) values of the range identified experimentally in this work by using the following equations.
where a is the number of ants and t is the index of the current ant in the ACS. This equation is set as q0 with min value to increase the exploration of the ACS algorithm at the beginning of the construction process. During the construction process, each ant is assigned its own q0 value. Over time, the value of q0 will gradually increase and be stopped at the value of max. q which is a random value uniformly distributed in between [0,...,1]. R is a randomly chosen threshold value which is a pure exploration instead of the biased exploration calculated by the original formula of the state transition rule [53]. Figure 5, below, shows the process of controlling the state transition rule by the 0 parameter. is the probability of selecting a specific value from available values and is simplified in Figure 6, which is calculated by using Equation 4.
where [ ( ) ] is the amount of the quality associated to each threshold value to probabilistically select one of them.  Table 1. The first column represents the available threshold values while the second column represents the quality of the threshold values based on the feedback collected locally and globally. The last column represents the probability of obtaining a specific threshold value. Thus, the use of the probabilities was able to help in deriving conclusions about the causes of the successful or unsuccessful performance in different threshold values. For example, the success rate of the threshold value is υ7, which is higher than the other. This means that threshold value υ7 has a higher probability of being selected. Ant-Miner classifier: The Ant-Miner starts to construct one classification rule from the training dataset and adds it to a list of discovered rules. The instances covered by the construction rule are then removed from the training set. This process is iterated until the number of uncovered instances is larger than the maximum uncovered cases. The discovered classification rule process includes three main stages, called, rule construction, pruning procedure, and the pheromone update procedure.
Rule construction: Each ant will create one rule by adding one term at a time in the rule. The term selection uses (5) which is based on the heuristic information and pheromone amount associated with each term. Addition of terms will stop when all the attributes have been visited, or any additional term inserted into the construction rule would cover less than the allowed number of minimum cases per rule. A term is selected using the Ant-Miner probability [50].
where [ ( ) ] is the pheromone amount associated with termij at iteration (t); [ƞ ] is the heuristic information for each term; a is the attribute number in the dataset; bi is the number of different values for each attribute; and xi is set to 1 if attribute ai is not yet visited by ant, or 0 otherwise.
The heuristic function and the pheromone amount are used to select terms in the rule construction process. The heuristic function in the Ant-Miner classifier is inspired from information theory. This function computes the amount of information contained in each term (entropy). The heuristic function is defined by (6) where a and bi have the same meaning as illustrated above; k is the number of classes; and W is the class attribute, that is, the attribute whose domain consists of the classes to be predicted. Rule pruning: The rule constructed by each ant is pruned to delete irrelevant terms. This procedure utilizes one term elimination method to delete irrelevant terms from the current rule. The predictive class of the pruning rule can be changed if the number of instances covered by the pruning rule is different from the instances covered by the original rule. Thus, the classifier determines the predictive class by assigning the majority class among the instances covered by the rule. The quality of the new rule is then calculated. This process will be repeated while improving the rule quality, or only one term is left in the rule.
The ACS-based parameter selection collects feedback using the Ant-Miner construction rule by applying our proposed local feedback collection as in (7).
where the  value indicates the pheromone evaporation rate for avoiding unlimited accumulation on specific parameter value, and Q(t) is the quality of the discovered rule. Ant-Miner pheromone updating: The pheromone amount for each term is updated, thereby depositing pheromone in the term, followed by each ant according to rule quality. The effect of pheromone evaporation for unused terms is carried out by dividing the value of each τij by the summation of all values of τij. The pheromone update is defined as followinf formulas: (8) where a and bi have the same meaning as illustrated above; Q is the quality of the discovered rule; TP is the case that is covered and correctly predicted by the rule. In contrast, FP is the case that is covered and predicted incorrectly by the rule, FN is the case that is uncovered but predicted correctly by the rule and TN is the case that is uncovered and predicted incorrectly by the rule.
Once the first ant finishes this process, the second ant starts with a different IR (ζ) value. This process will be iterated until all ants in the colony have discovered all rules, or the current ant builds a rule that is exactly similar to the rule constructed by the previous ant, thereby increasing the number and exceeding the pre-defined number of converged rules.
The best rule among all discovered rules of all ants in the current iteration will be added into the list of discovered rules. The ACS-based parameter selection collects global feedback according to the best rule discovered in the current iteration using the proposed (11).
where  is the pheromone decay parameter, and ( ) is the quality of the best discovered rule. A new iteration will start with the same activities.

RESEARCH METHOD
A ten-fold cross-validation method is used in our experiments. In this method, the dataset is split into ten (10) subsets. Each subset is equally sized, where nine (9) are used for the training process. The remaining subset is used in the testing stage. This process is repeated ten times with a different subset for training and testing to ensure that all subsets will be used in training and testing. Subsequently, the performance of all folds is averaged, and the standard deviations are computed. The ten-fold cross-validation method has been also adopted and adapted and used in other ant-mining classifier studies [14], [54].
The evaluation in this study is performed on the basis of three criteria. First, the classification accuracy in discovering the rule list is called the correct classification rate. This criterion is based on the correctly classified instances in the test data. Each time, the training subsets consist of n number of instances, the classifier constructs the training and test subsets that will be used to test the performance. The correct classification instances will determine the performance of the proposed classifier. Second, the size in discovering the rule list, which is measured by the amount of terms per rule. average model size rank of all classifiers is used in our experiments. A lower average rank implies better algorithm performance. We perform several experiments using 16 UCI benchmark datasets to test the performance of our hybrid classification algorithm. The benchmark datasets are selected in accordance with Ant-mining literature. These benchmarks are secondary datasets that have been chosen from UCI [55]. These datasets demonstrate different attribute numbers, which lie between 4 and 60. The attributes exhibit continuous and categorical types. The datasets differ in size of instance number within the range of 148-8124 and class label numbers. The main descriptions of the experimental datasets are listed in Table 2. Furthermore, we use different ant-mining classifiers to determine the effect of the proposed strategy on the classification accuracy and the classification model size against related and state-of-the-art classifiers in literature. The applied classifiers are Ant-Miner and other related ant-mining algorithms, namely, CAnt-Miner, TACO-Miner, and Ant-Miner with a hybrid pruner. The CAnt-Miner classification algorithm is an extended version of the Ant-Miner classifier that integrates an entropy discretization method in handling continuous features during construction [39]. The TACO-Miner provides a threshold value on the basis of the information gain of each term. If the information associated with the selected term is below the threshold value, then the term will be rejected. The threshold value is considered a pre-pruning criterion that determines the acceptance or rejection of terms [18], [19]. The Ant-Miner with a hybrid pruner presents a new hybrid rule pruner that hybrids the original Ant-Miner's rule pruner with a rule pruner based on two aspects, namely, information gain and number of terms allowed in the rule called r value. The new method is applied to each rule that exceeds the number of terms allowed in a rule. The number of terms in the selected rule is then reduced until it is equal to the r value. This method is implemented on the basis of information gain and roulette wheel for selection. Thereafter, the procedure of the Ant-Miner's original rule pruner is applied [20].  [56], [57]. The parameters used for the Ant-Miner classifiers are listed in Table  3.

RESULTS AND DISCUSSION
The results of classification accuracy are depicted in Figure 7 which shows that the proposed ACS-AntMiner obtained the best result in 12 out of 16 datasets. The proposed algorithm achieves second-best in two datasets, namely, lymphography and segment. Our classifier is competitive in two datasets, namely, Credit-g and Mushroom where the second best results have been obtained. CAnt-Miner classifier was able to obtain the best result in three (3) datasets. Figure 8 depicts the summary of the best classification accuracies for all algorithms.
With regard to the size of the classification model, rule-based classification complexity refers to the number of terms included in a given classification model. As shown in Figure 9 & Figure 10, the TACO-Miner, which discovers the classification model, obtains the lowest size in ten (10) out of 16 datasets. Our proposed algorithm achieves the lowest size in six (6) over 16 datasets. In addition, our proposed algorithm obtains the second-best lowest size in five (5) datasets, namely, Balance Scale, Heart (Cleveland), Heart (Statlog), Sonar, and Tic-tac-toe. However, the no-free-lunch theorem states that no algorithm performs best on all datasets [12], [58].
In summary, the results achieved by the ACS-AntMiner are good because the classification accuracy achieves the best and outperforms the other classifiers in the majority of the datasets. The classification model size indicates that the ACS-AntMiner only performs slightly lower than the TACO-Miner and significantly better than the remaining compared classifier.  Table 4 shows the results averaged over all the datasets, with the best performances shown in bold. This table includes the average classification accuracy and rule term numbers. The proposed classifier exhibits the best classification accuracy than the other classifiers. The TACO-Miner obtains the best result for a number of terms. The ACS-AntMiner achieves the second-best in the average number of terms in the overall datasets. We assign the average accuracy ranking against the average model size rank over the different datasets Figure 11. A lower average rank indicates a better performance of the algorithms. Figure 11 proves that the results obtained by the ACS-AntMiner dominate other classifiers when considering the classification accuracy and model size ranks. The ACS-AntMiner only performs slightly worse than TACO in terms of model size, but it is significantly better than TACO and the other classifiers in terms of classification accuracy. Therefore, the ACS-AntMiner is the best classifier that balances between the classification accuracy and model size. This balance aims to overcome the problems of overfitting and under-fitting, considering that a less accurate rule in the training process that covers numerous training instances is better than an accurate rule that covers only one instance. This case helps to avoid the overfitting problem and increases the generalisation of the classification model. Our classifier aims to find the appropriate pruning criteria for each dataset by considering the feedback (local and global) in the learning process on the basis of the classification accuracy and avoiding the learning process from constructing a classification model that is too simple to describe a given set of data. Understanding these two phenomena allows the development of a classification algorithm that balances two extremes. Therefore, the ACS-AntMiner outperforms other classifiers when the classification accuracy and model are balanced. Figure 11. Average rank test

CONCLUSION
In this study, a novel ACO-based rule classification algorithm, called ACS-AntMiner, is presented which balances between high classification accuracy and low model size. A new parameter, called IR (ζ), is introduced to control the relative importance to the terms included in the construction rule. The ACS optimization algorithm is then designed to control this parameter online using three main features. The first feature is the parameter selection probability, which considers the balance between exploration and exploitation. This method can effectively guide our classifier in selecting a better parameter value to select the terms included in the construction rule. The second feature is the local feedback collection procedure, which monitors the effectiveness of the selected parameter value. This feature evidently reinforces the performance of the classification algorithm. The last one is the global feedback collection, which keeps track of the best parameter value used in the current iteration. These new features are implemented in our algorithm, and 16 datasets are used to evaluate the performance of ACS-AntMiner classifier. The evaluation result with four most related ant-mining classifiers, namely, Ant-Miner, CAnt-Miner, TACO-Miner, and Ant-Miner with hybrid pruner, shows that our proposed classifier performs best in the classification accuracy and achieves preferable simplicity. This due to the usage of the ACS algorithm to optimise the IR (ζ) value. Certain potential directions in future research in our classifier are discovered. Such a notion could draw inspiration from other swarmintelligent optimization algorithms, namely, PSO, ABC, BA, and FA, to optimize the IR (ζ) parameter online, which may guide ants to find enhanced solutions. Another research direction is to adapt techniques that automatically adjust the parameters of the search methods on-the-fly in the field of the stochastic local search algorithm.