Enhanced Deep Learning Intrusion Detection in IoT Heterogeneous Network with Feature Extraction

Heterogeneous network is one of the challenges that must be overcome in Internet of Thing Intrusion Detection System (IoT IDS). The difficulty of the IDS significantly is caused by various devices, protocols, and services, that make the network becomes complex and difficult to monitor. Deep learning is one algorithm for classifying data with high accuracy. This research work incorporated Deep Learning into IDS for IoT heterogeneous networks. There are two concerns o n IDS with deep learning in heterogeneous IoT networks, i.e.: limited resources and excessive training time. Thus, this paper uses Principle Component Analysis (PCA) as features extraction method to deal with data dimensions so that resource usage and training time will be significantly reduced. The results of the evaluation show that PCA was successful reducing resource usage with less training time of the proposed IDS with deep learning in heterogeneous networks environment. Experiment results show the proposed IDS achieve overall accuracy above 99%.


INTRODUCTION
The growth in the number of complex and diverse (heterogeneous) traffic as well as spreading of device distribution makes Internet of Things (IoT) security even more complex and challenging. In addition, the attacks detection in an IoT environment is different from detection systems on conventional networks such as resource limitations, low latency, distribution, scalability, and mobility [1]. Therefore it is necessary to design an IoT IDS that can more precisely detect attacks on heterogeneous networks. Deep learning (DL) technique is a potential candidate solution as it has features to identify small changes in a complex system. Diro and Chilamkurti [2] state that traditional machine learning cannot detect complex intrusions, due to training process of traditional machine learning fails to identify small changes in attack scenario, more specifically, because traditional machine learning cannot extract invisible features of a dataset. In fact, attacks evolve 99% and only 1% left with similar concept. The success of deep learning technique in identifying small changes of data such as changes on pixels in image recognition shows its reliability.
There are two concerns to note on IoT Intrusion Detection System (IDS) using deep learning, i.e.: the use of resources and excessive training time. Since IoT has a limited resource, it needs to design an appropriate and optimized IDS method for IoT system with less resources consumption without scarifying the accuracy of the detection. Furthermore, as deep learning takes a relatively long time for training process, thus, a mechanism is needed to reduce the processing time to train the IoT IDS on heterogeneous networks.  [5]. Yan and Han [3] propose deep learning as a solution to intrusion detection challenges because of its outstanding performance in handling large-scale complex data. The work uses Autoencoder stack model to perform unattended dimension reduction of intrusion detection samples. As a result, feature extraction can reduce high-dimensional dataset to its low-dimensional that in turn increases deep learning performance.
Sharipuddin et al [4] and Zyad et al [5] have discussed a hybrid Principal Component Analysis (PCA) with deep learning to improve accuracy and time detection of IDS. The proposed methods reduce the dimensions of the training data. Therefore, the training process of the deep learning model becomes faster without high resources requirement.
This research aims to propose PCA-based feature extraction method for IoT-IDS in heterogeneous network then the proposed method is combined with a deep learning technique to improve the performance of -IoT IDS performance in heterogeneous networks. PCA is used to reduce the dimensions of heterogeneous data without losing the characteristics of the original data. Thus, in turn, the feature extraction reduces the use of resource and training process time of the deep learning.
This paper is organized in 5 sections. Section 2 provides background and related work on heterogeneous network, IDS on IoT, PCA and deep learning. Section 3 presents the proposed method. Section 4 discusses experiment and results. Finally, Section 5 concludes some findings and suggests for future works.

RELATED WORKS
IDS with Deep Learning. DL has been implemented in many fields and one of them is network security, i.e.: IDS. The research work in [5] implements four key DL models used in IDS literature and evaluates them on four datasets: CICIDS 2017 and CICIDS 2018, KDD'99, NSL-KDD. The DL models have been chosen from the top three types of the taxonomy. They represent different methods to build DL models. First is the LSTM network that classifies sequences of flows. Second is the feed-forward neural network that classifies flow instances. Third is deep belief networks and autoencoder that are trained in a semi-supervised manner with both unlabeled as well as labeled data. This comparison in this research aims to address the difficulty of comparing models by results reported in research works due to differences in datasets and evaluation metrics.
Research work in [6] proposes Deep Neural Network (DNN) for classifying the attacks in IoT networks. The method of IDS can only be developing if there is availability of an effective dataset for the training process. The performance of DNN to classify attacks has been evaluated using several datasets such as KDD-Cup'99, NSL-KDD, and UNSW-NB15. The results show that the accuracy of the proposed method using DNN is 90%. Alrawashdeh and Purdy [7] have proposed deep learning with DNN to improve the IoT IDS by comparing it with other algorithms.
Principle Component Analysis. The selection of features is important for processes in IDS. The accuracy of an IDS changes when IDS gave different input features. IoT networks have a large amount of traffic with high dimensional features which will affect results of classification [8]. In IoT networks, IDS requires FE to reduce computation in IDS-IoT [9]. The feature extraction (FE) have been proposed to extract features of datasets from existing features and change features into small dimension to reduce training and improving accuracy [10]. The following are some of the previous studies related to the use of PCA that have been conducted [11] Liu, et al. [11] have built a detection system for monitoring online computing for misuse detection and anomaly detection. In this work, the PCA method is applied to reduce the dimension of the dataset by combining the Artificial Neural Network (ANN) method so that it was known as PCANN. The results of experiments on DARPA dataset show accuracy of up to 98.58%.
Bharti & Singh [12] have applied a hybrid method to reduce high data dimensions with involving two stages. The first stage is to select features from the dataset and select several important ones. The result will be obtained in the form of a list of sub-dataset then the PCA method is applied to reduce the overall dimensions of the original dataset without losing a lot of information.
In research works by Hamid et al. [13], Taguchi [14], Taguchi [15], Taguchi & Murakami [16], Thaseen & Kumar [17], FE method using Principal Component Analysis (PCA) is also being applied to reduce the dataset dimension. It is not only applied to the detection system but also to other aspects. In addition, Kuang et al. [18] propose a Support Vector Machine (SVM) approach model by combining PCA with genetic algorithm (GA) for IDS. In the proposed method, hybrid-SVM is used to classify an activity as an attack or normal. The model of KPCA is used to SVM preprocessor with the aim to reduce features dimensions and training time. The function is to reduce noise caused by different features, improve SVM performance and kernel function (N-RBF). The experimental results show that the proposed method has performed higher accuracy and faster in the detection.  Figure 1 shows architecture of the proposed method to reduce the use of resources and training time of IDS with deep learning in heterogeneous IoT network. It consists of two phases namely training phase and testing phase. Prior to the phases, firstly, a preprocessing is performed, i.e.: dividing the dataset into two portions; for training and for testing. Next, is the process of reducing the dimension of the dataset, without losing its characteristics and then followed by designing IDS with deep learning model for the IoT environment. Lastly, is evaluating the IDS-Deep Learning model.

Dataset and Preprocessing
Two initial preparation works are carried out, i.e.: dataset creation and preprocessing on the created dataset. This work creates its own heterogeneous IoT dataset that consists of several devices, sensors, transport (wire, wireless), services, and protocols. Thus, the dataset represents an IoT heterogeneous network in a real environment. The hardware used include: sensors (soil moisture, MQ2, Fundulno, DHT22, etc.), devices as nodes (PC, Raspy, and Arduino). The middleware used include: XBee, w1d D1, and WIFI to connect among middleware and to server in Figure 2. The type of attack is Denial of Service.
Next, is the preprocessing on the dataset as depicted in Figure 3. This stage is required to collect the attributes (then become features) to identify patterns of the traffic packets. The dataset in Pcap files is difficult for humans to identify and to find important information (features), as they have different structures and hidden layers depend on protocols. The results of the WIFI dataset preprocessing are converted to 96 features while the XBee dataset is converted to 64 features. The details of the dataset features are shown in pseudocode in functions def extract_xbee() for Xbee and def extract_wifi() for WIFI.

Feature extraction
This stage is a process of reducing number of data dimensions of the dataset. In this work, PCA is used as a FE method. Figure 4 is the pseudocode for the PCA designed for IoT IDS-Deep Learning. The performance of the PCA as extraction method will be evaluated using two experimental scenarios. First, the WIFI dataset consisting of 96 features is converted into 5 and 8 features. Second, the Xbee dataset, which consists of 64 features, is also converted into 5 and 8 features.

Proposed Method
The proposed IDS-Deep Learning model uses Input Layer that consists of 96 entries for WIFI dataset and 64 entries for Xbee dataset and 12 nodes. The Hidden Layer consists of 8 layers and 8 nodes. The Output Layer as classifier to classify two-classes, i.e.: attack or normal. Computer used in the experiment is a notebook with hardware specification: Intel Core i7, 12GB RAM, running Ubuntu 20.04 LTS operating system. The platform used to develop IDS-Deep Learning are: TensorFlow (Keras) and Scikit-learn for the feature extraction stage. Table 1 lists the deep learning setup variable, while Figure 5 shows the pseudocode of the proposed IoT IDS with Deep learning technique.

Performance Metric
This work only considers accuracy as a metric for the performance evaluation of the proposed IDS-Deep Learning model, as shown in (1).

= ( + ) ( + + + )
(1)  Table 2 shows the number of packets of each protocol as results of running experiments that produces 3,133,071 traffic packets. Three types of protocols, i.e.: UDP, TCP, and ARP are captured in the WIFI dataset. In other hand, Xbee dataset has 19,426 traffic packets. The preprocessing stage has obtained 95 attributes for the Wi-Fi protocol and 64 for the Xbee. Table 3 shows the results of PCA in extracting the features in the dataset from the initial attributes into 5 and 8 attributes. The results of the PCA will be converted into numeric in order to be able to be inputted into the IDS-Deep Learning.    Table 4 displays the results of confusion matrix IDS-DL experiment using the portion of 60% for training and 40% for testing. In WIFI dataset, 569,457 traffic packets are recognized as attacks and 36,707 packets are recognized as normal with error detection reaches 1%. Meanwhile, in XBee dataset, 678,439 packets are recognized as attacks and 4,593 packets are recognized as normal. The experiment is repeated 5 times, and the 60% for training data portion is distributed 50% for training phase and 10% for validation. Table 5 shows the results for accuracy on the testing phase. The proposed IDS with deep learning in heterogeneous IoT network is able to detect attacks with accuracy level above 99%. These results show that the PCA increases the accuracy of IDS-Deep Learning. Figure 6 shows a comparison of experimental results on accuracy of IDS-Deep Learning with PCA and without PCA.    Figure 9 shows there is the significant reduction in time from the training process. The IDS running on dataset with PCA is faster than the IDS running on dataset with all features (without PCA).  Table 6 depicts comparisons of the accuracy of FE methods and without using FE in IDS-Deep Learning. The comparison shows accuracy of IDS-Deep Learning with PCA feature extraction produces the highest accuracy more than Factor Analysis, Non-Negative Matrix Factorization, and without feature extraction. Previous research in [19] only used the WIFI dataset for experimentation. Whereas, this research work extends the work in [19], i.e.: through introducing more complex dataset, namely WIFI and Xbee datasets. The results of this work show that the performance of IDS-DL with WIFI and Xbee datasets is not different from the previous research work. Therefore, it can be concluded that IDS-DL with feature extraction is able to enhance the performance of IDS IoT with heterogeneous networks.

CONCLUSION
Incorporating deep learning into IDS for IoT heterogeneous network can increase the performance of accuracy detection. The issues that need to be solved in IoT IDS with Deep Learning are limited resources and excessive training time. One of the solutions is implementation of feature extraction method in IDS IoT. This work has proposed the Principle Component Analysis (PCA) as the extraction method. The initial Learning increases significantly and reach accuracy level above 99%. In the near future, the authors plan to proceed with other methods on feature extraction such as Autoencoder method for automatic feature extraction.