Important Features of CICIDS-2017 Dataset For Anomaly Detection in High Dimension and Imbalanced Class Dataset
Abstract
The growth in internet traffic volume presents a new issue in anomaly detection, one of which is the high data dimension. The feature selection technique has been proven to be able to solve the problem of high data dimension by producing relevant features. On the other hand, high-class imbalance is a problem in feature selection. In this study, two feature selection approaches are proposed that are able to produce the most ideal features in the high-class imbalanced dataset. CICIDS-2017 is a reliable dataset that has a problem in high-class imbalance, therefore it is used in this study. Furthermore, this study performs experiments in Information Gain feature selection technique on the imbalance class datasaet. For validation, the Random Forest classification algorithm is used, because of its ability to handle multi-class data. The experimental results show that the proposed approaches have a very surprising performance, and surpass the state-of-the-art methods.
Keywords
feature selection, information gain, random forest, high class imbalance, CICIDS-2017
Full Text:
PDF
Refbacks
- There are currently no refbacks.
Indonesian Journal of Electrical Engineering and Informatics (IJEEI)
ISSN 2089-3272
This work is licensed under a Creative Commons Attribution 4.0 International License.