A Cost Sensitive SVM and Neural Network Ensemble Model for Breast Cancer Classification

ABSTRACT


INTRODUCTION
Cancer is considered the deadliest among diseases occurring in humankind. Cancers which are growths, can occur in any portion of the body. Certain cancers like breast cancer or cervical cancer are prevalent specifically in women, even though the former can occur with low likelihood, also in men. Various studies show that they are the leading diseases in most countries like India, with breast cancer showing an upward trend in incidence [1] Breast Cancer manifests chiefly as lumps in the breast which occur mainly due to the uncontrolled growth of cells in the lining, lobules or ducts of the breast tissue. As per the statistics of the World Health Organization it is considered as the most incident and prevalent cancer in women universally [2]. The incidence and mortality rates vary from country, region and ethnicity and the risk of contracting the disease increases as age increases. The most vulnerable and risk prone amongst women are the older population of women. In 2020, 2.3 million women were diagnosed with breast cancer and 685 000 deaths were reported globally [3]]. The Survival rate of breast cancer for at least 5 years after diagnosis also varies from highly developed to lowly developed countries. It ranges from more than 90% in high-income countries, to 66% in India and 40% in South Africa. Breast cancer has surpassed all other cancers prevailing in women in India [4].  Mathew) 367 The incidence of the disease in India is reported as 25.8 per 100,000 women and mortality at 12.7 per 100,000 women [4] with highest incidence recorded in the state of Kerala. India is also home to the most aggressive form of breast cancer, Triple Negative Breast Cancer [5]. The Indian Council of Medical Research also reports that the death rate due to various cancers is relatively higher in Indian women when compared to their male counterparts [6]. Global studies show that one third of the global breast cancer burden is contributed by US, India and China, collectively [7] The morbidity and mortality is more when metastasis of the disease sets in and hence rapid and early diagnosis is the key strategy for better prognosis as well as survival [4]. With the advent of state of art medical facilities mortality rate has been much reduced, yet more techniques need to be utilized to aid medical professionals in the rapid identification and treatment of the disease. Besides the medical modalities additional techniques can be used to assist medical practitioners in diagnosis and classification of the disease [8] Machine learning techniques has been in the fray and has been applied for the past two decades in various classifications tasks in severals domains [9], specifically, the medical domain and for several disease classifications like breast cancer [10], [11]. Many methods like decision trees [8] , k-NN [12], logistic regression [13], and many more have been applied for classification problems. A broad classification of machine learning techniques based on the mode of learning is supervised and unsupervised learning techniques. Classification is an umbrella term that comes under the aegis of supervised techniques and it comprises of a plethora of techniques. Two other categories of supervised classifiers usable in the classification process are Support Vector Machines and Artificial Neural Networks. Literature shows that the application of these techniques to disease classification provides promising results [14]. Neural Networks need large amounts of data for training while SVMs do not. In the case of accuracy, studies show that training using SVMs have smaller standard errors compared to that of Neural Networks. Hence, both can be considered as complementary methods and this concept is made in use for the breast cancer classification problem in this study. The task of the study is to employ an ensemble model comprising of SVM and NN for breast cancer classification. The two classifierss, SVM and ANNs are combined into a voting ensemble model for breast cancer classification and the resultant model is seen to produce better performance with lesser misclassification when compared with the individual models. Besides to improve performance of individual classifiers ensembling or using hybtid models can be a solution. Even though studies in literature show good performance in disease classification by SVMs and ANNs better methods are needed to boost the issues seen. ANNs need a lot of training time. Methods that help to reduce training time as well as boost performance is necessary.
This section provides an overview of some of the related works from the latest scientific articles. SVM is seen to be a most popular as well as robust classifier [15] and artificial neural network is a most frequently used classifier besides being also a very robust classifier. SVMs were used by [10] to identify regions of interest in mammograms and it was performed with 80% accuracy. They concluded that SVMs were very accurate in classifying breast cancer. In their work [16] used various classifiers like k-nn, random forest, ANN and SVM for breast cancer detection from histopathological images and SVM was seen to provide an accuracy of 90%. Besides using simple SVMs, ensemble of SVMS can be used. In their work, [17] used SVM and SVM ensembles for a comparative study and they concluded that linear kernel based SVM ensembles with bagging method and RBF kernel based SVM ensembles with the boosting method are better choices for small scale datasets, with feature selection in the data pre-processing stage, whereas, for large-scale datasets, RBF kernel based SVM ensembles based on boosting were seen to perform better. [18] proposed a cost sensitive SVM ensemble with feature selection and it provided promising results with an accuracy of 98%. [19] used SVM with RBF kernel and Random forests to evaluate breast cancer classification performance with the Boruta feature selection technique and svms were seen to outperform random forests with an accuracy of 95%. In the work proposed by [20] they compared NEAT and backpropagation ANN and obtained an accuracy of 95.8% for breast cancer classification. In their work [21] used scaled conjugate backpropagation ANN and obtained an accuracy of 97.47% for breast cancer classification. Similarly, [22] used conjugate gradient back propagation for breast cancer classification and obtained an accuracy of 97.6%. In all these studies data imbalances was not considered. SVM and NNs do not handle class imbalance well. In the proposed work data imbalance is also considered besides parameter optimization and feature selection.
The paper is organized as follows, Section 1 presents an introduction to the topic, alongwith literature available, while Section 2 deals with the methods applied and materials used. The results and discussions are given in section 3, and Section 4 gives the conclusion followed by references

RESEARCH METHOD
The aim of this study is to produce a model with better performance and lesser misclassification of instances. The classifiers used for producing the proposed ensemble model is Support Vector machnies with a polynomial kernel and Neural Networks using Gradient Descent with Momentum Back Propagation method.

Dataset Used
The Wisconsin breast cancer data set is used in this study. The dataset has 699 instances and 11 attributes, of which the first attribute, the id number, holds no relevance in the work and hence is discarded. The last attribute is the class or target variable which categorizes the instances into twobenign and malignant tumours. The remaining 9 attributes are taken for the study. 16 instances were seen to have missing values hence they were also discarded.

SVM
Support Vector machines are supervised machine learning methods proposed by Vapnik which now plays a major role with applications primarily in classification and regression. An advantage of SVMs is that they are memory efficient as they use a subset of the training data denoted as support vectors. Literature suggests them as good classifiers in binary classification problems. To function well with non -linear problems they make use of kernel functions. There are many categories of kernel functions, like RBF, linear, sigmoid, and polynomial. In this study, the polynomial kernel used. with SVM was seen to outperform the performance depicted by various other kernels used for breast cancer classification. [23].

ANN
An artificial neural network functions like human brain. Applications implementing ANNs have increased and it has become the most active research area with extensions into deep learning [24]. ANNs have achieved good performance for classification and diagnosis of breast cancer at early-stage A basic structure of an ANN model consists of 3 layers the input, hidden and output layer. Each layer is interconnected with neurons and each contains an activation function that helps improving the ability to implementand solve nonlinear problems. The working of the model commences from the input layer passing through the hidden layers, finally to the output layer. The final classification result is depicted at the output layer. The number of iterations involved while working varies with the structure and nature of the problem involved.
Back Propagation algorithm is a set of methods that efficiently train artificial neural networks by following a gradient descent approach which are being used in various domains. These are considered as fundamental building blocks of ANNs. There a various categories of backpropagation algorithms, BPNNs make use of a large training time slowing the working of ANNs. Researchers have developed different methods which produce better outcomes. In this study Gradient Descent with momentum method is chosen. This helps the algorithm and the ANN to function faster. This method uses a momentum factor with the Gradient Descent method. The Momentum factor allows the network to aptly respond to the local gradient and latest condition prevailing on the error surface. The advantage of this method is that it provides faster convergence as well as helps the network to ignore and disregard small features in the error surface and henceforth, also prevents the network from getting trapped in a shallow local minimum. The method has a few parameters and identifying their optimal values are of great significance to the performance of the artificial neural network, [25].

Genetic Search
Genetic Search algorithm is a metaheuristic search algorithm that belongs to the larger class of evolutionary algorithms. This optimization technique is based on the Natural evolution theory of Darwin. It mimics three major biological processes natural selection, gene crossover and mutation. It maintains a set of chromosomes which are considered equivalent to the potential solutions of the investigated problem. The concept of GA lies in obtaining and arriving at the optimal solution, undergoing few steps in evolution, after a few generations. Genetic algorithms are seen to be good optimizers for ANNs, [26], [27], in his work suggested that implementing feature selection with ANNs help to enhance its performance.
The Genetic search process begins with initialization, and an initial population consisting of randomly generated bit strings of candidate solutions are used. The fitness function or the objective function is a significant element that is to be defined. The objective function used here is the difference between the predicted and actual values. At each step, a pool of parents is chosen from the parent population based on the calculated fitness value of each individual. This is done by the selection mechanism. In this work the roulette wheel selection method is applied. The selected parents or individuals have a greater probability to pass on genetic material to the subsequent generations by the way of crossover and mutations. From the selected parents a child population is created and this constitutes the next generation or population.

Data Imbalance
A major problem and challenge involved in machine learning datasets in several domains is the imbalance of data. Data imbalance implies that the classes available in the dataset do not have a uniform distribution of instances. The number of instances in one class mostly the positive class or the class which represents the disease or the malignant class in this case will be very less compared to the other class which is the negative class or class without breast cancer in this study. The data imbalance can be problematic [28] and this has to be resolved. Many techniques can be applied to solve it. Two broad classifications that can be applied are data-based methods or algorithmic based methods. In this study cost sensitive learning approach, which comes under the category of data sampling, is implemented. It uses a cost matrix. Each instance is given a misclassification cost and for each incorrect classification it is penalized 'n' times the misclassification cost. The objective lies in minimizing the misclassification cost. SVMs are seen to work fine with various datasets but performance deteriorates when data is imbalanced [29]. Hence data balancing is necessary for SVMs. Here a cost sensitive SVM using a cost matrix that penalizes twice for a misclassification is proposed to deal with data imbalance.

Working of Proposed Model
The proposed model works in three parts.Initially,data preprocessing techniques are applied and then the dataset is partitioned into training and testing datasets. Once this is done parameter optimization, feature selection and data sampling is done in the second part. Resampling techniques are applied for balancing the training data. A combination of oversampling of the minority class and undersampling of the majority class.is applied on the dataset. The balanced data obtained is used with the NN. Feature selection and extraction of the most relevant features are also done. This is done by applying the genetic search algorithm and it selects the best and relevant set of features. The reduced feature set is provided to the artificial neural network using the gradient descent with momentum backpropagation algorithm. For SVMs the cost senstivie learning is applied and a cost matrix as in Table 1 is used. While training with svm, for every misclassifcation of the minority class or in this case, the positive class, that occurs, the classifiers is penalized twice. This is implememted using the cost matrix. The working of the model is depicted in Figure 1 The final part is the voting ensemble of classifiers.The voting ensemble produces the majority label of the predictions as output of the model. When all the classifiers of a voting classifier are independent, and use different methods for training better results are obtained [30] The parameters for the svm polynomial kernel are selected using the grid search technique. Literature suggets that parameter optimization helped classification performance. Hence parameter optimization is incorporated.The best SVM parameters C and gamma obtained by using grid search method and for each parameter a value 1 is seen appropriate. The grid values were vareid from 0.001 to 100. Metaheuristic algorithms are seen to help in better performance during the classification process [31], hence, GA is used for feature selection.. To avoid overestimation of results a training and testing partitionof 80-20 is used for training and testing purposes. The parameters and their values used in the proposed model is shown in table 1.

Figure 1. Working of Model
The studies done is [34][35][36] also suggest using optimized SVM parameters as well as hybrid models for better SVM performance. Using the best optimized paramters is seen to improve performance. Voting is of two types Soft Voting and Hard Voting. Here hard voting is selected. The model with highest votes for performance gets selected when ensembling is done.SVM helps in reducing the time of ANNs which use more time for training models.

RESULTS AND DISCUSSION
The prime objective of the study is to produce a model with better classification accuracy and lesser misclassification of instances. The classification qualityof the experimentation done is measured using various performance measures like ROC-AUC, P-R AUC, Kappa Statistic, F Measure, Recall, Precision, MCC, and FPR. The confusion matrix is computed from True Negatives, True Positives, False Positives and False Negatives. The time taken to build the model was seen to be comparatively lesser than that of the individual neural network. The combined model of SVM and NN helps to counterbalance the issues that each of the individual model have. The performance measures of the proposed model are given in Table 2     The ROC curves can be seen to be closer to the left top edge of the axes and from this it can be perceived that the classifier is illustrating a superior performance. Area under the curve of the ROC helps to measure the overall performance of binary classifier [32]      Hence, the significant findings of the study are that the proposed model has illustrated high accuracy. The misclassification of the classes has reduced significantly. The AUC of ROC obtained was 1 which shows the performance. High recall precision rates were illustrated by the model. The F measure reveals the sensitivity of the methods employed. The proposed model is compared with SVM and NN individual models [33]  Here the accuracy of the models varied within the range of 88%-92%. [34] in their work highlighted the fact that SVM ensembles performed better than SVM individual models. The proposed model supports this fact as its performance was better than that of individual SVM models.

CONCLUSION
The study proposed an ensemble model for breast cancer classification using a voting ensemble of SVM and ANN. along with optimization, feature selection and sampling techniques The model was seen to give superior performance when compared to a few other models and produced an accuracy of 99.12%. Even though the model produced a high accuracy the time taken by the ensemble model is seen to be high. This has to be taken care of. Better methods are to be used to speed up training of ANNs. Besides this, a future work will be to test the performance of the model and consistency of the model across various datasets.

ACKNOWLEDGMENTS
The author thanks Dr William H Wohlberg of the University of Wisconsin Hospitals for the dataset used in the study.