Important Features of CICIDS-2017 Dataset For Anomaly Detection in High Dimension and Imbalanced Class Dataset

Kurniabudi Kurniabudi, Deris Stiawan, Darmawijoyo Darmawijoyo, Mohd Yazid Bin Idris, Bedine Kerim, Rahmat Budiarto

Abstract


The growth in internet traffic volume presents a new issue in anomaly detection, one of which is the high data dimension. The feature selection technique has been proven to be able to solve the problem of high data dimension by producing relevant features. On the other hand, high-class imbalance is a problem in feature selection. In this study, two feature selection approaches are proposed that are able to produce the most ideal features in the high-class imbalanced dataset. CICIDS-2017 is a reliable dataset that has a problem in high-class imbalance, therefore it is used in this study. Furthermore, this study performs experiments in Information Gain feature selection technique on the imbalance class datasaet. For validation, the Random Forest classification algorithm is used, because of its ability to handle multi-class data. The experimental results show that the proposed approaches have a very surprising performance, and surpass the state-of-the-art methods.

Keywords


feature selection, information gain, random forest, high class imbalance, CICIDS-2017

Full Text: PDF

Refbacks

  • There are currently no refbacks.


 

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)
ISSN 2089-3272

Creative Commons Licence

This work is licensed under a Creative Commons Attribution 4.0 International License.

web analytics
View IJEEI Stats

503 Service Unavailable

Service Unavailable

The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

Additionally, a 503 Service Unavailable error was encountered while trying to use an ErrorDocument to handle the request.