Implementation of Deep Learning Based Method for Optimizing Spatial Diversity MIMO Communication

ABSTRACT


INTRODUCTION
The implementation of Multiple Input Multiple Output (MIMO) communication has become more popular nowadays due to its ability to maintain a reliable communication in a wireless channel with some impairment predominantly by fading. It can be achieved as multiple antennas technology provides benefits in a communication system which are interference reduction and avoidance, array gain, and spatial diversity or spatial multiplexing gain [1].
The main idea behind MIMO is that signals which are sampled in the spatial domain at both transmitter and receiver are combined in a certain method that they either add diversity to improve the quality in term of Bit Error Rate (BER) of the communication and/or create effective multiple parallel spatial data pipes that will result in increasing the data rate [2].
Basically the MIMO communication process is that first, input bits are first encoded through several process that eventually result in spatial data streams. These data streams are then transmitted by several antennas to the receiver and propagate through certain channel impairments, for example Rayleigh fading. The received signal will be decoded in the receiver side until the estimated bits are obtained.
However, the process of encoding and decoding mentioned in previous paragraph are very challenging. For years, researchers have been developing algorithms in multiple antennas technology in order to improve its performance either in detection task or channel estimation task or other tasks but the issue of a trade-off between performance improvement and computational complexity always become a main restriction  [3][4][5][6][7][8]. Those advanced models result in very high performance and can handle so many works in several domains, especially in computer vision [9][10][11].
Recently, there are some publications implementing method from deep learning field in MIMO communication system in order to improve its performance. As a result, they perform very well and even result in better performance compare to the baseline methods.
Some of the most interesting results of machine learning implementation in a communication system are paper titled "An Introduction to Deep Learning for the Physical Layer" [12] and "Deep Learning-Based Communication Over the Air" [13] which introduce deep learning as an end-to-end system in SISO communication. This end-to-end model means that transmitter, channel impairments, and receiver are represented by one or several neural network layer (dense) then interpret the whole system as an autoencoder, a powerful method for performing unsupervised learning [14]. Since they show good results, researches related to autoencoder implementation in MIMO communication has been developing rapidly, for instance its application in channel and polar code decoding [15,16] and Orthogonal Frequency Division Multiplexing (OFDM) [17][18][19]. However, the need of improvement in this topic is still required especially in end-to-end learning based model in order to make it feasible to be implemented in the real world condition.
In this paper, we propose a method for optimizing spatial diversity in MIMO communication both in data detection and channel estimation cases. Our main contributions are first the proposed model is the first end-to-end based model that is fairly compared to the baseline method. The second, this work is also the first work addressing issue in spatial diversity channel estimation case based on end-to-end learning.

Overview of the Proposed Method
Overall, the proposed models consist of several dense and lambda or custom layers representing MIMO communication. All of the proposed models actually follow the autoencoder scheme where model try to replicate its input to its output. In this research, each transmitter was designed to transmit 2 bits, making each antenna has 4 different bit pairs. Therefore, the total of bit pair combinations of each antenna are 16. Instead of expressing them in a one-hot encoding method, in this research, each of bit pair is expressed in an integer number that later be fed into embedding layer. The embedding layer will turn the data indices into vectors in order to save the memory usage. Reshape layer in the transmitter model block has a function to create parallel transmit stream denoted by three dimensional matrix ℝ 2 2 . The first dimension represents the number of transmit antenna, the second dimension represents the complex number consisting of two real numbers, and the last dimension represents n time samples. Layer with linear function will determines the final transmitted symbol that its power will be constrained by BatchNormalization layer. Then, the last layer which has a softmax activation function will decode the message or data transmitted of each antenna. All of the models were trained using millions of synthetically generated data with various number of batch size with some hyperparameter tunings that will be explained deeply in the section 3.
Previous research from paper titled "Deep Learning Based MIMO Communication" [20] also proposes a model for detection task in spatial diversity MIMO communication. However, there are some problems in the previous model. First, channel response H and noise existence Z are expressed by several custom layers. For the noise, maybe it is not a big problem as Keras [21] already provides Gaussian noise layer as a regularizer but, for the channel response (Rayleigh fading), it brings up a doubt whether the channel response generated by a custom layer is suitable to standard Rayleigh fading or not. The second is no Channel State Information (CSI) in the receiver side which is an uncommon situation in the communication system. Moreover, as this model is compared to the baseline method which perfectly knows CSIR, the provided performance result can be considered as an unfair comparison. Figure 1 shows the model of deep learning based spatial diversity MIMO communication. There are several differences between the previous model [20] and the proposed research beside the depth of network. In this research, there are three input that will be fed to the model, those are data which want to be transmitted (S), channel response H (Rayleigh fading) and Additive White Gaussian Noise Z (AWGN). Channel response and noise were generated using random normal function "randn()" from Numpy library. This model also uses perfect CSI in the receiver side, making it fairly compared with the baseline model. The used non-linear activation function is PReLU [22] instead of ReLU. One of the advantages of using PReLU is the negative value input will still have output rather than zero. As the data flowing in the model has a range of -∞ to ∞, the PReLU properties is very beneficial for improving the model accuracy. Moreover, we have tried to use ReLU, activation proposed in the previous work, in this model. Unfortunately, the training and validation loss become very high due to zero gradient issue. After parallel transmitted symbol is formed, the BatchNormalization layer in the end of transmitter model block will performs as a power constraint so that the power of transmitted signal does not exceed the standard power transmission. To obtain the standard power transmission, the hyperparamter of gamma was constrained by setting the maximum value of the maximum-norm constraint to be 0.78. This constraint only takes place on network parameters during optimization. Maximum norm constraint is a form of regularization that used for enforcing the absolute upper bound on the magnitude of the weight vector for every neuron that eventually use projected gradient descent to enforce the constraint. Matrix multiplier and noise addiction layer were made using several lambda or custom layers  Actually, previous research also implements the aforementioned symbol transmission scheme, but if we observe from the constellation diagram shown from paper [20], the symbols transmitted of each antenna every two time slots are not always identical. This research also extends the idea of the previous research to the channel estimation in spatial diversity MIMO communication. The method consists of two model, first is channel estimator model which results in pilot, and the second is data transmission model. Figure 3 shows the channel estimator model. ′ is a fixed 16 data stream that used for generating parallel transmit stream pilot ′ . Channel response H and noise Z were also generated using "randn()" function from Numpy library. The channel responses are identical in every 16 data transmission while the noise are varied in every data transmission. The most important hyperparameter tuning among other parameters is that in BatchNormalization layer that we must set the maximum-norm constraint of the beta constraint and gamma constraint to be 0.05 and 0.9 respectively. If we desire to use more than one pilot, then we just need to add more encoder estimator model block to the system However, the gamma constraint must be set differently in different scheme and will be deeply explained in the next section. After a good estimator model is obtained which is indicated by low training and validation loss, we then put the encoder and channel estimator model block to the data transmission model as a non-trainable layers as depicted by Figure 4. Figure 5 shows the constellation diagram of the pilot and data symbol from data transmission model. Based on the Figure 5

RESULTS AND ANALYSIS
This section discusses about the result of the proposed method in term of Bit Error Rate (BER) over a range of Signal to Noise Ratio (SNR) and several hyperparameters tuning to obtain the mentioned result. The proposed methods which implement deep learning method were fairly compared with the baseline or conventional methods. All of the models were trained using Adam optimizer [23] with learning rate of 0.01 and sparse categorical cross-entropy and logcosh loss function for data detection and channel estimation task (only for pilot model) respectively. The loss function equation for sparse categorical cross entropy and logcosh are defined by equation 2 and 3 respectively.
All of the obtained result in deep learning field were obtained from simulation using Keras [21] with tensorflow backend [24], while the baseline results were obtained through simulation using Matlab. In spatial diversity MIMO communication, the end-to-end learning based model was compared with the standard Alamouti system [25] over 1000000 bits. The proposed model was trained with millions of data (4000000 bits) and batch size of 500 data over 50 epochs. The NN based model was also trained in a fixed / 0 = 21dB. We set the hyperparameter in BatchNormalization layer, gamma constraint, to be max_norm(max_value=0.78). Figure 6 shows the performance of the NN based model compared to standard Alamouti scheme. From Figure 6, over the range of SNR, the NN based model shows promising result by outperform the standard Alamouti performance. Moreover, as SNR becomes higher, the gap performance between the proposed model and the baseline model also becomes bigger.

Data Detection with Perfect CSIR
We also tried to compare the proposed model with the previous model with the assumption that the reshape layer position, batch size and epoch are just identical with the proposed model, the model use perfect CSIR. Moreover, the number of neurons was assumed to be identical to the proposed model except with the last dense layer in the decoder block model as we only used one dense layer in the decoder. Figure 7 shows the performance of the aforementioned case. The result shows that the performance of the system become worst that the deep learning based model performance cannot outperform the baseline model. It indicates that the depth of layer and PReLU activation function has significant impact of the model accuracy.

Channel Estimation
In the case of channel estimation, we have generated five transmission scheme that are system using 1 pilot, 2 pilots, 3 pilots, 4 pilots and perfect CSIR. All models were trained by 4000000 bits and tested by 1000000 bits. Training process in each data transmission model is different to each other because of the difference in transmission scheme, but the value of / 0 are identical (21 dB). Table 1 shows the difference of hyperparameter value among each models. The difference value of gamma constraint will affect the power transmission while the batch size will affect the model accuracy during the training phase.  Figure 8 shows the result of the NN based channel estimation model in term of BER over a range of SNR. The result shows that the increase of pilot number will improve the system performance. Moreover, the proposed models which use imperfect CSI show an outstanding performance as this model is able to outperform the baseline model which perfectly knows the CSI in the receiver side after transmitting at least 3 pilots.

CONCLUSION
Trade-off phenomenon between system performance and computational complexity always become the biggest consideration in developing performance of MIMO communication. Based on that problem, this research proposes deep learning based methods for optimizing the performance in spatial diversity MIMO communication. This research proposes solutions from deep learning field because it has been proven to research very well in several domain especially image. Moreover, computational complexity is only suffered in training stage. Once we obtain the well trained weights, we just need to load them and pass the data for testing stage.
There are two different models in this research which each of them handle data detection and channel estimation task. Those models are fairly compared to the baseline methods. Every hyperparameter of each model was differently tuned in order to obtain the best result, especially in BatchNormalization layer and batch size for training the models. The obtained results show that NN based methods show promising performance by outperforming the baseline performance in a predetermined range of SNR (-4 dB until 22.5 dB). In perfect CSIR (Channel State Information in Receiver side) case, the proposed models achieve BER nearly 10 −5 at SNR 22.5 dB. While in channel estimation case, the proposed models can exceed the baseline performance even by only transmitting 2 or 3 pilots.
These promising results were obtained due to appropriate hyperparameters tuning that eventually result in promising model accuracy. We believe that the obtained result can be improved by doing several hyperparameter tunings and/or even by building a new model with different algorithm.