A Low-complexity Complex-valued Activation Function for Fast and Accurate Spectral Domain Convolutional Neural Network

Received Sep 20, 2020 Revised Jan 15, 2021 Accepted Jan 17, 2021 Conventional Convolutional Neural Networks (CNNs), which are realized in spatial domain, exhibit high computational complexity. This results in high resource utilization and memory usage and makes them unsuitable for implementation in resource and energy-constrained embedded systems. A promising approach for low-complexity and high-speed solution is to apply CNN modeled in the spectral domain. One of the main challenges in this approach is the design of activation functions. Some of the proposed solutions perform activation functions in spatial domain, necessitating multiple and computationally expensive spatial-spectral domain switching. On the other hand, recent work on spectral activation functions resulted in very computationally intensive solutions. This paper proposes a complex-valued activation function for spectral domain CNNs that only transmits input values that have positive-valued real or imaginary component. This activation function is computationally inexpensive in both forward and backward propagation and provides sufficient nonlinearity that ensures high classification accuracy. We apply this complex-valued activation function in a LeNet-5 architecture and achieve an accuracy gain of up to 7% for MNIST and 6% for Fashion MNIST dataset, while providing up to 79% and 85% faster inference times, respectively, over state-of-the-art activation functions for spectral domain.


INTRODUCTION
An artificial neural network (ANN) is based on a collection of connected nodes called neurons. ANNs must have an input layer, one or more hidden layers and an output layer. The output of each neuron is computed (activated) by some non-linear function of the sum of its inputs. It is highly desirable that the activation functions are non-linear and differentiable. The hidden layers should employ nonlinear activation functions so as to enable the network to learn complex relationships from the input data [1]. In fact, through nonlinear activation functions an ANN can learn any nonlinear behavior, provided the network has enough neurons and layers [2].
A convolutional neural network (CNN) is a specific variant of ANNs, built as a series of convolutional layers that are interleaved with pooling (or subsampling) layers, a fully-connected multilayer perceptron, and ends with a softmax layer [3]. CNN is a fundamental example of deep learning that is commonly applied to analyzing visual imagery. CNNs have shown tremendous success in computer vision applications such as IJEEI ISSN: 2089-3272  sigmoid/tanh type activation function for the spectral domain CNN. However, the authors did not explain how nonlinearity is achieved with a linear function nor provide any details on the implementation of the function.
Recently, Uijens in [13] proposed that a spectral activation function can be developed in the spectral domain by computing Fourier equivalent of a point-wise spatial activation function, approximated with the Taylor series. The square function is proposed for the spatial function, which is equivalent to auto-convolution in Fourier space (Fourier-space input convolved with itself) and hence very computationally intensive. Ayat et al. in [14] proposed a spectral version of ReLU, the most commonly used activation function in CNNs. This activation function, called SReLU, approximates ReLU with a quadratic function using the Taylor series and computes its Fourier equivalent. This activation function also has to compute auto-convolution. This work achieved 99% classification accuracy on MNIST dataset with the LeNet-5 CNN. However, just like [13], its main drawback is that it is computationally complex.
There is a class of activation functions, called complex-valued activation functions, which are applied in complex-valued neural networks (CVNNs) including complex-valued CNNs (CVCNNs) [16]. CVCNNs are applied to classify complex-valued data, that is, inputs that are already in complex form such as complexvalued images generated by Synthetic Aperture Radar (SAR) and functional Magnetic Resonance Imaging (fMRI). This is in contrast to spectral domain CNNs which classify real-valued data that are transformed into complex-valued Fourier space. In a CVCNN, instead of realizing convolution as point-wise products (as is done in spectral domain CNNs), complex-valued convolutions are performed [16].
Some complex-valued activation functions are split-type activation functions, where a spatial activation function such as tanh (or sigmoid) is applied to real and imaginary parts of the complex inputs separately. Recently, a split-type activation function, called ℂReLU, has been proposed for CVCNN, where ReLU is applied separately to the real and imaginary parts [17]. This work has attained excellent accuracy for CIFAR-10 and MusicNet datasets using ResNet and VGG CNNs respectively. A split-type tanh function is used in a spectral domain CNN as shown in the work of [18]. However, tanh functions possess singular points, which requires scaling of inputs and initial weights [16].
Another approach in complex-valued activation function involves compressing the complex input space for activation by passing input values in certain quadrants, while blocking input values in other regions. One such example is zReLU, which is proposed in [19]. However, this activation function has been tested only on specialized datasets but not on standardized datasets [19]. A similar activation function is modReLU that preserves pre-activation phase but modifies magnitude by the amount of a trainable parameter [20].
One of the earliest spectral domain CNN models was proposed by Mathieu et al. in [9]. In this work, only the convolution operation (realized as point-wise product) was performed in the spectral domain, while pooling and activation functions were performed in spatial domain. Spectral pooling was first proposed by Rippel et al. in [11], In this work, convolution and pooling operations were realized in spectral domain. Recently, Guan et al. in [18] proposed a spectral domain CNN model with computations of convolution and activation function (which is complex-valued version of tanh) performed in the spectral domain. In these models either pooling or activation function or both were performed in spatial domain and hence, required multiple spatial-spectral domain switching (using FFTs and IFFTs).
The first spectral domain CNN model without multiple domain switching was developed by Ko et al. in [12], who implemented convolution, pooling and activation function (realized as linear approximation of sigmoid/tanh) in spectral domain. Pratt et al. in [21] has also developed another spectral domain CNN model without multiple domain transformations, but do not specify the type of activation function they use or whether they have used any activation function at all. More recently, Ayat et al. in [14] proposed another spectral model for CNN without multiple domain transformations. They apply their SReLU spectral activation function and implement a fused version of convolution in spectral domain where pooling operation is merged with convolution.
This work applies a complex-valued activation function for spectral domain CNNs, where preactivation values that have positive-valued real or imaginary part are transmitted through the activation function. Here, all feature extraction layers-convolution, pooling and activation-are realized in spectral domain. This activation function is beneficial in two ways. First, proposed activation function does not require computationally expensive domain switchings, as the activation function works in complex domain. Second, it is computationally much lighter than state-of-the-art spectral activation functions such as SReLU [14].
The main contributions of this research work are the following: (a) we propose a new complex-valued activation function for spectral domain CNN. The function has low computational complexity in both forward and backward propagation and exhibits sufficient nonlinearity to ensure high classification accuracy can be achieved, and (b) we show that a spectral domain CNN model applying this activation function offers more accurate and faster inference over state-of-the-art spectral and complex-valued activation functions. In addition to having a computationally light activation function, this CNN model removes the need for multiple spatialspectral domain switching, which are computationally expensive. This makes such spectral domain CNN The remainder of this paper is organized as follows: Section 2 presents proposed activation function and CNN model for this work. Experimental work, analysis and results are described in Section 3. Finally, Section 4 presents conclusions.

RESEARCH METHOD 2.1. Proposed activation function
The general nature of a complex-valued activation function is that they pass input values in certain segment of input space and blocks, from propagating through, input values in other areas of the input space. The area of input that a complex-valued activation function propagates to its output is called activation area [16,17,22]. This activation area is a complex valued space.
We propose to apply the following complex-valued activation function in our spectral domain CNN model. This activation function passes pre-activation values that have either positive-valued real part or positive-valued imaginary part. This idea is partly inspired by the operation of Rectified Linear Unit (ReLU) that passes only positive valued inputs (and rectifies the negative ones) and are widely employed for realvalued CNNs. The proposed activation function g(x), where x is the input to the activation function, is defined as where, ℜ(x) and ℑ(x) represent real and imaginary part of the complex variable x. Figure 2 (shaded region) illustrates the activation area of the proposed activation function.
When compared to similar activation functions such as zReLU in [19] and ℂReLU in [17], the proposed activation function preserves more of the input magnitude and phase but still provides nonlinearity by rectifying inputs that have negative-valued real or imaginary segments. This is supported by the results of training and test accuracies of our spectral domain CNN model that utilizes this activation function (which is presented in Section 3 of this paper).
Differentiability is considered an important property of activation functions [23]. The derivatives of different layers of neural networks including activation functions are needed to compute error gradients, which allows the network to learn and improve accuracy. In backpropagation, error gradient of a layer is computed from partial derivative of the layer's output with respect to input as well as error gradient of the following layer with respect to current layer's output [1]. So, if input and output of layer i are xi and zi and that of the following layer (layer i+1) are xi+1 and zi+1, the error gradient of layer i with respect to its input (∂E/∂xi) is where ∂zi/∂xi is the derivative of the current layer's output, while ∂E/∂zi is the error derivative of the next layer with respect to that layer input (which happens to be current layer's output). For this work, we have kept the backpropagation architecture largely in spatial domain. So, the error gradient with respect to input of the activation function ∂E/∂xi needs to be in spatial domain. Here, the error gradient coming from the following layer (∂E/∂zi) is in spatial domain. However, the output derivative of the activation function block (∂zi/∂xi) cannot be computed in spatial domain directly as there is no spatial equivalent of the proposed complex-valued activation function. Therefore, the output derivative of the activation function block (∂zi/∂xi) must be computed in Fourier space first. So, if x and z are input and output of a layer in spatial domain and their equivalent in Fourier space are X and Z, one needs to compute ∂Zi/∂Xi, the output derivative in Fourier space, where Z = ℱ(z) and X = ℱ(x). Computing output derivatives of the activation function is very simple since it either passes input values intact if input values lie within the activation area (ℜ(Z)>0 \/ ℑ(Z)>0) and derivative becomes 1, or passes 0 when the input values lie beyond this area, where derivative would be 0. This is expressed by the following Equations 3, 4 and 5. Finally, after the error gradient is computed in Fourier space, it is transformed to spatial domain to be consistent with the spatial domain nature of backpropagation architecture.
As shown above, the proposed activation function and its derivative are computationally inexpensive. This is an important requirement for activation functions in CNNs. Part of the reason why ReLU has become the dominant activation function for spatial domain CNNs is that ReLU and its derivative are inexpensive to compute. The proposed activation function is also differentiable everywhere in the activation area except at the boundary of the activation area. ReLU is similarly not differentiable at origin, which serves as the boundary between values that are passed and values that are zeroed out. One should note that even though previously activation functions were expected to be non-constant, continuous and differentiable everywhere, many modern activation functions including ReLU do not fulfill all these criteria [4,24].

A spectral domain CNN model for evaluating different activation functions
As described in Section 1, there have been a couple of models for realizing CNNs in spectral domain. Some realize only certain feature-extraction layers in spectral domain such as convolution [9], convolution and pooling [11] and convolution and activation function [18]. These models require multiple spatial-spectral domain switching as some feature-extraction layers are realized in spatial domain. On the other hand, some works realize all feature-extraction layers in spectral domain such as convolution, pooling and activation function [12] and convolution and activation function [14] (the latter work does not use a separate pooling layer as pooling operation is merged with convolution). We have developed a spectral CNN model for evaluating different activation functions, where all feature-extraction layers including convolution, pooling and activation function are realized in spectral domain.
In this spectral domain CNN model, we realize convolution with point-wise products in spectral domain. To perform convolution equivalent in spectral domain, first, real-valued input data are transformed to complex-valued Fourier space. The well-known convolution theorem establishes the duality between convolution in spatial domain and point-wise product in spectral (Fourier) domain. It states that convolution of two spatial inputs, say x and w, is equivalent to point-wise product of Fourier-transformed inputs, X and W, as defined in Equation 6, where X = ℱ(x) and W = ℱ(w). To compute an output pixel, a point-wise product requires just one complex multiplication between an input pixel and a kernel element in Fourier space. If convolution were performed on the equivalent spatial domain inputs, computing one output pixel would have required k 2 real-valued multiplications and k-1 real-valued additions. However, as equation 7 shows, each complex multiplication requires four real-valued multiplications and two real-valued additions. Here, A and B are real and imaginary components of X, while C and D are real and imaginary components of W. For typicalsized inputs and kernels in image processing, the number of multiplications and additions needed for pointwise products is much less than spatial convolution [8,9,10,11].
We apply spectral pooling to perform dimensionality reduction in our model. Spectral pooling does this by eliminating higher frequencies in the spectral domain feature maps, while retaining lower frequencies.
Spectral pooling causes minimal loss of information to the input feature maps as information in natural images is largely concentrated on lower frequencies. With spectral pooling one can crop any complex-valued input feature map to any arbitrary size determined by the designer [11]. This is illustrated in Equation 8, where X and Y represent input and output of the spectral pooling layer with dimensions of PxQ and RxS, respectively. 178 Y = Crop (X, R, S), X ∈ ℂ PxQ , Y ∈ ℂ RxS , R < P, S < Q (8) We have moved pooling layer before activation layer as was done in [14]. This allows the activation function to operate on a smaller feature map, which can enhance the speed of the activation layer. Following pooling operation, we apply our proposed complex-valued activation function to provide nonlinearity. This topology serves as the initial model of a feature-learning block of our spectral domain CNN model, as illustrated in Figure 3. Transformation between spatial to spectral domain is only needed in the first and last feature-extraction blocks (before the first and after the last convolution layers). The proposed activation function removes the need for multiple domain switching between spatial and spectral domains in the feature-learning segment of the network. Since input data and feature maps in spectral domain are complex-valued, a complex-valued activation function is compatible with spectral domain CNN networks. One should note that this CNN model performs feature-extraction in Fourier space, but unlike CVCNNs, it is meant to classify real-valued data.
We have developed a spectral domain CNN model for LeNet-5, where spatial convolution and pooling are replaced with point-wise product and spectral pooling and proposed complex-valued activation function is employed in place of sigmoid or ReLU activation functions (for first two feature extraction blocks) in the original model. The classification layers are kept spatial domain, as full connection or softmax layers are not that computationally intensive, when compared to convolution. In the last feature extraction block, an IFFT layer is utilized to transform spectral data to real-valued data for full-connection block. A spatial ReLU activation function follows this IFFT layer. A dropout layer has been added at the end of this last feature learning block to enhance generalization capability of the model. Figure 4 illustrates the general architecture of this spectral domain CNN model. For our model, we have kept the number of output feature maps same as the number used in MatConvNet's default LeNet-5 implementation (information about our CNN implementation environment including MatConvNet is provided in Section 3.1). Table 1 shows output feature map dimensions and their numbers as well as kernel dimensions for different layers in this spectral domain LeNet-5 model for digit and object recognition with MNIST and Fashion MNIST datasets.  As described above, the feedforward model is mostly in spectral domain, with only last activation layer and classification layers computed in spatial domain. For ease of computing error gradients, the backpropagation model is kept largely in spatial domain. The convolution in backpropagation architecture is computed in spatial domain as cyclic convolution, as in [14], to make it compatible with point-wise products in feedforward architecture. Error gradients for pooling and activation functions are initially computed in spectral domain and then converted to spatial domain.

Computational complexity analysis of proposed activation function
Computational complexity refers to the amount of resources (i.e. time, memory) required to execute a computational problem or algorithm. The type of computational complexity that would be useful to analyze running time of an algorithm such as inference time of a CNN model is time complexity. Time complexity describes the amount of time required to run an algorithm. Time complexity of an algorithm can be computed based on the number of elementary operations that need to be executed by the algorithm, assuming each elementary operation takes constant time to be executed by a given hardware platform. Sometimes time complexity can be difficult to compute exactly in this manner. In such cases, asymptotic behavior of the complexity is measured, that is running time of the algorithm for large inputs. Here, coefficients and lowerorder terms of the complexity expression are ignored [25]. From here on when we refer to computational complexity, it would mean time complexity.
Computational complexity of some activation functions can be estimated based on the number of elementary operations. For example, ReLU is realized by a simple max operation, as shown in Equation 9, where X and Y represent input and output of the ReLU layer with both feature maps having a dimension of PxQ. For a ReLU layer, each pixel of the output feature map requires one max operation to be performed on the input, which checks whether input is larger than 0. For a 1-dimensional input of size N, each element (or pixel) of the output of this layer would require just 1 max operation on an input element and therefore, N output elements (or pixel) would require N number of operations. Thus the computational complexity of a ReLU activation layer with such 1-dimensional input (of size N) is N. In computer vision, inputs are generally 2dimensional and with equal height and width. So, when a 2-dimensional image data with height P = N and width Q = N is provided to a ReLU activation layer, total number of elementary operations (in this case, max operations) that it needs to execute is N 2 . So, naturally the computational complexity of ReLU activation function can be estimated to be O(N 2 ). Y = max(0, X), X ∈ ℝ PxQ , Y ∈ ℝ PxQ (9) Complex-valued activation functions such as ℂReLU [17], zReLU [19] and proposed activation function perform two max operations on an input element and hence perform 2N 2 operations on a NxN input. These activation functions perform an additional N 2 operations, as compared to ReLU since they operate on both real and imaginary components of the input. However, their complexity is still O(N 2 ) since constants are ignored in complexity estimation. Complexity of the spectral activation function proposed by [12] cannot be reliably estimated as they do not provide adequate implementation details for it. Another spectral activation function is tanh as proposed by [18]. Tanh implementations can vary in different implementation platforms. One of the typical real-valued tanh implementations involve multiple exponentiations with a complexity of log 3 N 2 and some simpler operations (multiplication, division, addition and subtraction) of N 2 complexity [16,26]. When tanh is applied on a spectral network, it is applied separately on both real and imaginary components of the input. Since computational complexity describes asymptotic behavior and ignores constants [25], the computational complexity would be O(log 3 N 2 ).
Spectral activation functions developed by [13] and [14] have to employ highly computationally intensive spectral convolution or auto-convolution. Spectral convolution for each output pixel would entail a complexity of N 2 k 2 , where N is the input feature map dimension and k is the kernel dimension. One should note that unlike spatial convolution, where k would be smaller in size than N, in case of auto-convolution, k is larger than N as it is a padded version of the input feature map [14]. Here, every output pixel is computed with k 2 complex multiplications, that entails 4k 2 real-valued multiplications and 2k 2 real-valued additions, that is 6k 2 real-valued operations in total. This can be clearly seen in Equation 7. For an N 2 dimensional feature map, the number of real-valued operations become 6N 2 k 2 and hence complexity here is O(N 2 k 2 ). Since [14] utilizes a quadratic equation, it requires an extra 2N 2 real-valued additions, but complexity remains O(N 2 k 2 ). Among all the related spectral or complex-valued activation functions, the proposed activation function as well as zReLU [19] and ℂReLU [17] require least number of operations (2N 2 ), possess the smallest complexity (O(N 2 )) and do not require any computationally intensive auto-convolution. Table 2 lists computational complexity of these spectral and complex-valued activation functions that were developed for CNNs.  Table 2. Computational complexity of spectral and complex-valued activation functions Activation function (AF) Computational complexity AF in [13] O(N 2 k 2 ) tanh [18] O(log 3 N 2 ) SReLU [14] O(N 2 k 2 ) zReLU [19] O(N 2 ) ℂReLU [17] O

Experimental Environment
To compare our work with previous research in this area we have tested our CNN model on the classic recognition dataset MNIST and its modern variant Fashion MNIST. MNIST is a widely used dataset in deep learning community with well-established benchmark accuracy. Introduced in 1998, MNIST is a dataset for digit classification. It consists of 28x28 greyscale images of handwritten digits of 10 classes (0 to 9). There are 60,000 training images and 10,000 test images [27]. Despite being developed 3 decades ago, MNIST continues to serve as the go to dataset for deep learning researchers for establishing proof-of-concept when exploring new algorithms. Fashion MNIST has the same structure (number of training and testing images) and images of same size as MNIST but consists of images of fashion articles such as shirts, bags and sandals. It is considered a more challenging dataset than MNIST [28].
For the experimental work for this research, we have described our CNN architectures in Matlab (v. 2018b) simulation environment utilizing MatConvNet (v.2015) toolbox that is tailor made for training CNNs. MATLAB is a development platform for scientists and engineers that has usage in diverse areas ranging from machine learning and computer vision to communications and computational finance. MatConvNet provides lot of built-in functions for CNNs, supports computation with GPUs and training large models (i.e. AlexNet, VGG) and complex datasets (i.e. ImageNet) [29]. The CNN model for this work was trained and tested on a desktop PC powered by 4-core Intel Core i7 4790K CPU with 8 GB DDR3 RAM operating at 4.00GHz and taking advantage of a single NVIDIA GeForce GTX 1070 GPU with 4GB RAM.

Experimental Work, Results and Discussion
We have analyzed test accuracy of our LeNet-5 CNN model employing proposed activation function for recognizing handwritten digits in the MNIST dataset and fashion articles in the Fashion MNIST dataset. Previous works have applied their activation functions in different CNN models and some were tested on different datasets. For instance, the complex-valued activation function ℂReLU [17] and auto-convolutionbased spectral activation function [13] have been tested on CIFAR-10 dataset. Another complex-valued activation function zReLU [19] was tested on a specialized dataset. A feature-extraction block in [14] consists of a convolution layer (fused type) followed by an activation function, while the same block in the SpecNet model by [18] has convolution layer followed by activation function followed by a spatial pooling layer. We have applied all these activation functions in our CNN model and tested them on the same MNIST and Fashion MNIST datasets so that a fair comparison can be made between this work and related previous works.
The original work that introduced the world to LeNet-5 and modern CNN networks, LeCun et al. in [27], attained 99.05% accuracy on MNIST dataset. This was a spatial domain CNN network. Highlander et al. in [8] developed a spectral domain CNN architecture for LeNet-5 that uses spatial activation function and attains 92.46% test accuracy. So, this network has multiple transformations between spatial and spectral domains to perform activation function in spatial domain. The work by Ko et al. in [12] presents a spectral domain CNN solution without domain transformations that employs a linear spectral activation function but provides only training accuracy. This work does not describe basic properties of their activation function and how it was implemented. Recently Pratt et al. in [21] developed another spectral domain CNN approach without domain transformations, attains 97% test accuracy but does not mention the type of activation functions, or whether any such has been used at all. Due to lack of implementation details for these two activation functions, we have not applied them in our model to compare our work with theirs. Very recently, Ayat et al. in [14] has achieved an accuracy similar to spatial domain CNN solutions, attaining 99.20% accuracy. Spectral domain CNN model with split type tanh proposed by Guan et al. in [18] attains 95% accuracy. This is the average accuracy of six models they propose, where the models differ based on compression rates for input feature maps. These spectral domain CNN solutions as well as the spatial domain CNN of LeCun et al. in [27] evaluated their models on MNIST dataset using LeNet-5 CNN. For this same MNIST dataset, our spectral domain LeNet-5 CNN model with proposed activation function attains a test accuracy of 97.65%, an accuracy very close to all these previous works-in some cases outperforming them-as can be seen in Table 3 as number of output feature maps)), all these activation functions have been evaluated in the same model discussed in Section 2.2. Accuracy and inference times attained by this model with many of these activation functions (as well as proposed activation function) are listed in Tables 4 and 5 for MNIST and Fashion MNIST datasets, respectively.  [27] 1998 Spatial Sigmoid 99.05% Highlander et al [8] 2016 Spectral -*1 92.46% Ko et al. [12] 2017 Linear approximation of tanh/sigmoid 96% *2 Pratt et al. [21] 2017 -*1 97% O. Ayat et al. [14] 2019 SReLU 99.20% B. Guan et al. [18] 2019 tanh 95% This work 2020 Proposed AF 97.65% *1 Information regarding activation function applied in these models are not provided in these works. *2 Only training accuracy is provided by Ko et al. in [12] Our spectral domain LeNet-5 model was trained using Adagrad optimizer. For MNIST and Fashion MNIST datasets, the model was trained for 60 epochs with a logarithmically spaced learning rate varying from 0.005 to 0.0001. The network was trained using all the 60,000 training samples of MNIST and Fashion MNIST datasets. At the end of the aforementioned epochs, the model attained a training accuracy of 98.52% (error of 1.48%) and 88.95% (error of 11.05%) for MNIST and Fashion MNIST datasets, respectively. Figure 5 shows the training and test errors for these datasets. For inference, all test samples-10,000 in total-were tested, for MNIST and Fashion MNIST datasets. During each epoch of training, MatConvNet applies inference on test data and records inference times. We took average of inference times (for all 10,000 test samples) recorded for first 30 epochs and in multiple runs to compare inference times of our model with different activation functions. We recorded peak accuracy attained on test data after training was completed. Tables 4 and 5 documents the peak test accuracy and inference time attained with different activation functions for MNIST and Fashion MNIST datasets, respectively. The difference in accuracy, denoted as DiffAcc in tables 4 and 5, is the change in accuracy for an activation function (say A) as compared to the accuracy attained with proposed activation function (say B). So, a negative DiffAcc (when B-A is negative) for an activation function would mean its accuracy is lower than accuracy attained with proposed activation function, while a positive DiffAcc (when B-A is positive) would mean the opposite (proposed activation function attaining a lower test accuracy). The percentage difference in inference time is denoted as DiffInf in tables 4 and 5. This reflects the percentage change in inference time when compared to the inference time attained with the proposed activation function. So, if the inference time of our model with proposed activation function is denoted as Q and inference time of the same model with another activation function is denoted as P, DiffInf is calculated as 100*(P-Q)/Q. Here also a negative DiffInf would mean inference time with an activation function is longer than inference time with proposed activation function, while a positive DiffInf would mean inference time attained with proposed activation function is longer.
For MNIST dataset, our spectral model with proposed activation function attained 97.65% peak accuracy (error of 2.35%) in inference, outperforming four state-of-the-art spectral or complex-valued  As table 4 illustrates, our activation function outperforms tanh [18] by 7% and SReLU [14] and zReLU [19] by about 1%. When inference time is compared, this work offers about 79% and 8.5% shorter inference times than SReLU and zReLU, respectively. The inference times with tanh and proposed activation function are virtually same differing by a mere 0.15%. Our test accuracy is almost same as ℂReLU [17] (difference of 0.24%) but we offer about 9.6% shorter inference time with our activation function. Even though zReLU, ℂReLU and the proposed activation function have the same computational complexity, the run time of their elementary functions may not be the same. For example, proposed activation function has to modify only a small number of input values (values in the third quadrant), when compared to ℂReLU. That is why our model provides faster inference with proposed activation function than with ℂReLU despite both activation functions having the same computational complexity. Overall, our approach outperforms three state-of-the-art spectral or complex-valued spatial activation functions in terms of both test accuracy and inference time, in varying degrees. In case of tanh, proposed activation function outperforms it significantly in terms of accuracy while offering almost same inference time. The work by [14] proposing SReLU is the only work that applies a spectral CNN model for Fashion MNIST dataset, achieving a test accuracy of 80.54%. We evaluate our model for Fashion MNIST using the same activation functions that were employed to test our model for MNIST dataset, including SReLU. As table 5 shows, our model employing proposed activation function achieves 87.95% test accuracy (error of 12.05%) for Fashion MNIST dataset, outperforming state-of-the-art spectral or complex-valued activation functions. Our activation function outperforms SReLU [14] and tanh [18] in test accuracy by about 6% and 2.8%, respectively. It also offers about 85% and 7.6% faster inference time over the aforementioned activation functions. Our activation function achieves almost same test accuracy as ℂReLU [17] (a difference of about 0.5%) but offers about 14% faster inference time. In case of comparison with zReLU [19], proposed activation function achieves about 4% higher test accuracy but inference time with our activation function is slightly longer, by about 5%. In the end, we offer better accuracy and inference time over three state-of-the-art spectral/complex-valued activation functions. In case of one activation function (zReLU), we offer better accuracy.

CONCLUSION
In this work, it was demonstrated that a spectral domain CNN model employing proposed complexvalued activation function can provide a low-complexity CNN model that offers very accurate and fast inference and can do so without multiple and expensive spectral-spatial domain switchings. This work has demonstrated the effectiveness of this model by applying it for two types of recognition tasks-recognotion of hand-written digits and objects. Its performance was evaluated in terms of accuracy and inference time on standard recognition benchmark datasets (MNIST and Fashion MNIST). The significant acceleration of inference speed achieved in our spectral domain CNN model while providing high classification accuracy can potentially lead to effective CNN solutions for implementation in resource and energy-constrained embedded systems. As future work, one can explore effectiveness of our activation function in terms of classification accuracy, reduction in computational complexity and speedup of inference time when applied in larger