Blur Classification Using Segmentation Based Fractal Texture Analysis

Received Mei 8, 2018 Revised Sep 7, 2018 Accepted Dec 2, 2018 The objective of vision based gesture recognition is to design a system, which can understand the human actions and convey the acquired information with the help of captured images. An image restoration approach is extremely required whenever image gets blur during acquisition process since blurred images can severely degrade the performance of such systems. Image restoration recovers a true image from a degraded version. It is referred as blind restoration if blur information is unidentified. Blur identification is essential before application of any blind restoration algorithm. This paper presents a blur identification approach which categories a degradation available in hand gesture image into one of the sharp, motion, defocus and combined blurred categories. Segmentation based fractal texture analysis extraction algorithm is utilized for featuring the neural network based classification system. The simulation results demonstrate the preciseness of proposed method than other methods. Keyword:


INTRODUCTION
Gesture recognition is an area where meaningful physical movement of the fingers, hands, arms, face or body are used to convey information for human computer interaction [1]. These massive gestures may be identified either as the static gestures or as dynamic gestures. Static gestures require less computational complexity rather than dynamic gestures, which are complex but suitable for real time environments. Dynamic gesture recognition requires a decent interpretation of the body or its part movement to have effectively expressive directives.
Hand Gestures compromise a common and natural communication modality for Human Computer Interaction (HCI) [2]. Proficient human computer interfaces (HCIs) need to be established to permit systems to visually recognize hand gestures in real time. However, vision-based hand gesture recognition is challenging problem due to complexness of hand gestures and other image degradation factors during image acquisition.

Use of Mobile Camera
These days, a wide range of general-purpose hand-held devices, such as mobile phones, come with an optical imaging system [3]. Enabling these devices with the capability to recognize gestures is a costeffective alternative to conventional static camera with the system. A gesture image captured by camera phone can be transmitted to the system for decoding purpose. With such a mechanism, end-user will not only benefitted from communication provided by the phone, but the camera phone also provides the full mobility. The hardware is readily available to billions of people. Combined with new services, this can revolutionize HCI environment.

Challenges with Mobile Camera
The availability of imaging phones provides people a mobile platform for decoding of gesture rather than the use of the conventional camera, which has lack of mobility. Unfortunately, sometimes the deprived quality of the images taken by digital cameras makes it difficult to correctly decode gestures. Identifying gesture from images taken by general-purpose handheld devices is particularly challenging due to limitations of the integrated imaging system and the processing capabilities of the device. These devices often have lower quality lens systems and lower resolution imaging circuitry compared to dedicated digital cameras [4], [5].
Image blurring is often a major factor influencing the performance of a gesture recognition system. It is very difficult to recognize right gesture from a blurred image as shown in Figure 1 Image blurring is usually inevitable in a camera-based imaging system, especially in the case of the camera that does not have auto focus or macro mode [6], [7]. Although some high-end camera phones, which integrate a high resolution and auto focus/macro mode are available, the low-end camera phone user segment is huge. Generally, motion and defocus blur degrades the performance of mobile camera based HCI systems. Image restoration is highly desirable since blurring diminishes the high frequency components of image, which are highly desirable for gesture recognition. Image restoration methods can be categorized into blind and nonblind restoration. For blind restoration of images, it is necessary to guess blur parameters for deblurring. However, before use of any blur parameters estimation method blur identification is very required (i.e. to know if image is blurred than what type of blur is present).

Blur Detection and Classification
While most previous work focuses on blur detection i.e. categorization of image as sharp or blurred image and blur classification separately, not as much research has been done on combined framework for blur identification and blur classification, which is more applied.
Tong et al. [8] proposed a scheme to decide whether an image is blurred or not and if blurred then up to what extent it is blurred. They used Harr wavelet transform to determine different types of edges and presence of blur is decided by this assessment. Yang et al. [9] addressed the motion blur detection method using support vector machine and natural image statistics. In photographic camera, the optical lenses may be set in a way to clearly distinct two areas in the image: the blurry one and the non blurry one. Such a method in which an automatic segmentation coupled to specific descriptors first allow to describe any region of the image and then uses a supervised learning process that decides for each unknown region as "blurry" or "sharp" as suggested by Runga et al. [10].
Crete et al. [11] used intentional blurring pixel difference (IBD) algorithm. This method does not require the use of edge detection. It is motivated with the observation that the intentional blurring of a sharp image provides significant gray scale variations. Instead, intentional blurring of an existing blurred image gives small gray scale variations. Chi [12] proposed an unsupervised method to isolate focused main subject regions from defocused background. This procedure first calculates the blurring level using the bivariate kurtosis of all Discrete Cosine Transform (DCT) blocks of a photographic image with low field depth. Then these blocks are clustered to blurry regions and sharp regions.
Chong and Tanaka [13] proposed a scheme that simultaneously detects and identifies blur. This method is based on the analysis of extrema values in an image. Liu et al. [14] offered a framework for partial blur detection and classification i.e. whether some portion of the image is blurred as well as blur types. They considered maximum saturation of color, gradient histogram span and spectrum details as blur features. Aizenberg et al. [15] presented a work, which identifies blur type, estimates blur parameters and perform image 375 restoration using neural network. They considered four kinds of blur namely rectangular, motion, defocus, and Gaussian blurs as a pattern classification problem. Bolan et al. [16] proposed an image blurred region detection and classification method, which can automatically detect blurred image regions and blur type using singular value feature. This scheme has also analyzed the alpha channel information and classifies the blur type into defocus blur and motion blur categories. Yan and Shao [17] made an attempt to find a general feature extractor for common blur kernels with various parameters, which is closer to realistic application scenarios and applied deep belief neural networks for discriminative learning. In this method Fourier spectrum of blurred images is passed to the neural network as input. However, utilizing spectrum as an input is not a good idea due to large size of feature set. Blur patterns in frequency spectrum are utilized by Tiwari et al. [18], [19] for blur classification into one of the three blur categories namely motion, defocus and combined blur for barcode images. They have used statistical and wavelet features for classification.

IMAGE DEGRADATION MODELS
Blur is an artefact that occurs in images due to irregularities in image acquisition. The blurring process can be described as some kind of weighted averaging of pixel values in a certain neighborhood. Blurs are treated as low pass filters, which smoothens out the abrupt changes in the gray level of an image. There are different analytical models used to represent the shift invariant degradation model. The description of blurs models considered in this work is given below.

Motion Blur
When the scene to be recorded translates relative to the camera at a constant velocity (vrelative) under an angle of  degree with the horizontal axis during the exposure interval [0, texposure], the distortion is one dimensional. Defining the length of motion as L = v relative × t exposure , the point spread function (PSF) for uniform linear motion blur is described as [13] ℎ ( , The frequency response of PSF is called Optical Transfer Function (OTF). The frequency response of ℎ , is a SINC function given by When the blur angle is zero degree, it is called as horizontal motion blur.

Defocus Blur
Images that are captured due to a deprived convergence of light from an object on the sensor plane results in what is known as out-of-focus or defocus blur. The consequence of a defocus blur on an image is the scattering of the pixel intensities around its neighbors in a circle. The centre of such a circle is called the Centre of Confusion (COC). The out of focus blur caused by a system with circular aperture is modeled as a uniform disk of radius as [14] The frequency response of Equation (2.6) is given by where 1 is the Bessel function of first kind and R is radius of uniform disk.

Combined Blur
In general, image degradation is very complex in nature, but in many situations, it can be modeled as a linear shift-invariant process. The combined blur (also referred as joint or simultaneous blur) considered in this thesis is assumed to be motion blur and defocus blur mixed together, and the two blur effects are supposed to be independent linear shift-invariant processes. Therefore, the combined blur is treated as if it has two steps: first the original image is degraded by motion blur, and then it is further degraded by defocus blur [14]. In the case where both out-of-focus blur and motion blur are present in the same image, the blur model is Since convolution is associative, so PSF of combined blur can be obtained as convolution of two blur functions as where ℎ ( , ), ℎ ( , ) are point spread functions for motion and defocus blur respectively and * is the convolution operator. The PSF estimation for blur is corresponding to estimation of the three parameters angle ( ), length ( ) and radius ( ).

Noise Model
The image degradation model in (1) includes an additive noise term that represents the effects of various types of noise introduced at the time of image capturing. This additive term sufficiently models the noise in Charged Coupled Devices (CCD) cameras. There are several sources of noise inherent when using CCD cameras. Only one of these noise sources known as Gaussian noise is modelled for the research in this thesis which can be occurred at the time of image capturing or transmission. This noise is modeled randomly by the Gaussian distribution as [14].
where is the standard deviation of the Gaussian distribution. The variance of noise added to the blurred images is defined using the Blurred Signal-to-Noise Ratio (BSNR) metric in this work. This metric for a blurred image with variance 2 is given by where 2 is the variance of noise.

SEGMENTATION BASED FRACTAL TEXTURE ANALYSIS (SFTA)
Fractal dimension is extensively applied texture measure. Fractal mentions to the self-similarity at multiple scales of some object pattern. For example, a bounded set is supposed to be self-similar when is the union of distinct copies of itself, where each copy is scaled down by a proportion of . The fractal dimension ( ) defined by

= log log
From the fractal dimension, one can estimate the irregularity or roughness of the surface of an object. The high value of fractal dimension denotes the more coarse texture.
The Segmentation Based Fractal Texture Analysis (SFTA) algorithm is a proficient texture feature extraction method; proposed by Costa et al. [20]. This algorithm has two main steps. In the first step, a gray level image is decomposed into a set of binary images using the Two-Threshold Binary Decomposition (TTBD) algorithm. This algorithm decomposes a gray scale image into 2 * binary images, where is the desired number of thresholds. Figure 2 shows the binary decomposed images for input image with = 4. in the second step, the area, mean gray level and fractal dimension are calculated using these decomposed binary images. SFTA successfully used by researchers for multiple applications such as Content-Based Image Retrieval (CBIR) and image classification in the past. The performance of SFTA based features extraction method is showed better than other widely employed feature extraction methods such as Haralick and Gabor filter banks. SFTA achieved higher accuracy for CBIR and image classification. Moreover, SFTA is quicker than Gabor feature extractor and Haralick feature extraction method.

ARTIFICIAL NEURAL NETWORK
The feed forward artificial neural network with back propagation learning algorithm is preferred as pattern classification tool over other classifiers in this work. Conventional statistical classifiers such as discriminant analysis are designed on the Bayesian decision theory. To make a classification decision, these classifiers requires a prior probability model in order to calculate the posterior probability. One main restraint of the statistical models is that they perform well only when the prior assumptions are satisfied. The efficiency of these models depends largely on the different assumptions or conditions under which the models are developed. Before application of such models, users must aware with both data properties and model capabilities. To overcome these limitations, ANN has utilized as widely acceptable tool for classification. It is a preferred tool due following properties [21]: a. ANN is data determined self-adaptive method in that it can adjust itself to the data without any explicit condition of functional or distributional form for the underlying model. b. ANN is universal functional approximator in that it can approximate any function. c. It is based nonlinear models, which makes it flexible in modeling of real world complex relationships.

PROPOSED METHOD
Blurring reduces significant features of image such as boundaries, shape, regions, objects etc., which creates problem for image analysis. It is difficult to identify blur type in spatial domain since all the blurs have same effect in spatial domain. However, motion and defocus blurs have different appearance in frequency space. So, the blur categorization can be easily done using these specific appearance in frequency domain. If the blurred image is transformed in frequency domain, it can be seen from frequency response of motion blurred image that the dominant parallel lines appear which are orthogonal to the motion blur orientation with near zero values [22]. In defocused blur one can see appearance of some circular zero crossing patterns [23] and in case of coexistence of both blurs combined effect of both blurs becomes visible. The power spectrum of sharp image has not such specific patterns. Figure 3 shows the patterns of different blurs on the power spectrum of image. These patterns in frequency domain are considered as textured image itself for blur classification The steps of the algorithm to classify blur are detailed in Figure 4 These are five major steps preprocessing of images, find logarithmic power spectrum, two-threshold binary decomposition, feature extraction, training of neural network classifier system and result analysis.
for 0 ≤ < 0 ≤ < . After this step, the windowed image can be transferred into the frequency domain by performing a Fast Fourier Transform (FFT). The power spectrum is calculated to facilitate the identification of particular features of the Fourier spectrum. However, as the coefficients of the Fourier spectrum decrease rapidly from its centre to the borders, it can be hard to identify local differences. Taking the logarithm of the power spectrum helps to balance this fast drop off. In order to obtain a centered version of the spectrum, its quadrants have to be swapped diagonally. Figure 5 shows the effects of Hann windowing.

Simulation
The proposed method is evaluated using Triesch gesture database [25]. This database consists of 720 images taken by 24 different persons in three different backgrounds comprise of 10 hand gestures. All the images are synthetically blurred to create the database. This complete work is implemented using neural network and image processing toolboxes of Matlab 6.5.
SFTA algorithm needs to set the number of thresholds parameter that is used in the input image decomposition. An analysis of classification performance is carried out for different number of thresholds. Considering best accuracy results, the is set to 4 and 5 respectively for blurred and blurred & noisy hand gesture database. So, total 6 * 4 = 24 and 6 * 5 = 30 features are extracted for each experiment.
The classification is achieved using a feed forward neural network containing back propagation as network training algorithm, where the training dataset is designed by the extracted features of the blur patterns in frequency spectrum. The whole training and testing features set is normalized into the range of [0, 1], whereas the output class is assigned to zero for the lowest probability and one for the highest probability. In this work, the transfer functions of hidden layers are sigmoid functions. The numbers of neurons used in input layer are equal to the extracted features from image dataset which are 24 and 30 respectively for each experiment. To select the optimum number of hidden layers, the neural network is trained and tested with different number of hidden layers.
The final architecture is selected with single hidden layer of ten neurons, which gives best performance. The out layer consists of 4 neurons corresponding to four class's viz. no-blur, motion blur, defocus blur and combined blur.
The term epoch in back-propagation learning algorithm is used to specify one weight update or training iteration. For each epoch, the learning algorithm makes a dissimilar model with a different set of weights. If an ANN is trained with 100 epochs, the learning algorithm investigates through 100 dissimilar models. So, ANN learning is kind a searching of best model among a large number of models. However, running too many epochs may over train the network and result in overfitting. Early stopping has been used to avoid overfitting of the neural network. In early stopping method, it is possible to test the neural network after successive epoch of training on a validation data set to ensure that over-fitting is avoided.
In order to apply early stopping, the available data is divided into three subsets namely training set, validation set and the testing set in a ratio of 50:20:30 respectively. The validation error is observed after each epoch. When the network starts to over fit, the error on the validation set usually starts to increase. When the validation error grows for a specified number of epochs, the training is terminated, and the trained model at the epoch with minimum validation error is used.
Although the ideal initial values for weights (i.e., those that will maximize the efficiency and momentum with which a neural network trains) cannot be determined theoretically [26], it is common practice to initialize these values randomly. Even if, such a random initialization may lead to a different learning time or result but such a random distribution can minimize the possibilities of the network becoming stuck in local minima [27]. In this work weight are repeatedly randomly initialized and model is preferred with best generalization ability.
To assess the blur classification model, confusion matrix and four statistical metrics as explained in Section 6.1 are utilized. Finally, overall classification results are used to evaluate the blur classification model. A receiver operating characteristic (ROC) curve is also plotted to validate the results. It plots the true positive rate (fraction of true positives over all positives) versus the false positive rate (fraction of false positives over all negatives). A large area under this curve confirms that a classifier has high true positive rate and low false positive rate, which is desired.

Performance Evaluation
To assess the blur classification model, confusion matrix is formed and five statistical metrics precision, recall, F-score, error rate and accuracy are used as specified below [19].

Experiment 1: Performance Analysis with Blurred Hand Gesture Images
In the first experiment, all the 720 images of the hand gesture database are synthetically blurred with the three different classes of blur (motion, defocus and combined blur) using varying blur parameters to make the database of 2160 blurred images (i.e., 720 images with each class of blur) for each type of barcode images. So, total 2880 images (i.e. 2180 blurred and 720 original sharp) are used to carry out the experiment. Out of 2880 images, 1440, 576 and 864 images are used as training, validation and testing datasets respectively to design the neural network model. The training process is continued till the best validation performance achieved. The training stopped when the validation error increased for 6 iterations. The best validation performance is 0.0065, at epoch 75 as shown in Figure 6. The trained network model is used subsequently to categorize the blur type from degraded images. The overall blur classification results are presented as confusion matrix in Table 1. Using this confusion matrix statistical quantities precision, recall, F-score, error rate and accuracy are calculated. The high values of these metrics in Table 2 show effectiveness of this classification model. The overall classification accuracy of the model is 99.2 %. The ROC curves plotted in Figure 7 also confirms the high performance.  Figure 7. ROC curves for each class of blur identification where classes 1, 2, 3 and 4 representing no-blur, motion, defocus and combined blur respectively

Experiment 2: Performance Analysis with Blurred and Noisy Hand Gesture Images
To test the efficiency of the proposed scheme in presence of noise, another neural model is created using all the hand gesture images. All the images are degraded by different kinds of blur similar to Experiment 6.1. Subsequently, zero mean Gaussian noise of 40 Blurred Signal-to-noise ratio (BSNR) is added to the blurred and sharp images to create a database of blurred and noisy samples. The training process is continued till the best validation performance achieved. The training stopped when the validation error increased for 6 iterations. The best validation performance is 0.024, at epoch 33 as shown in Figure 8. The trained network model is used afterwards to classify the blur type from degraded images. Table  3 shows the confusion matrix of the proposed blur classification system. Table 4 shows the performance metrics of the proposed model. The high values of these metrics show efficiency of this model. The overall classification accuracy is 95.6 %. The ROC curves plotted in Figure 9 also approves the high performance.   Figure 9. ROC curves for each class of blur identification in presence of noise where classes 1, 2, 3 and 4 representing no-blur, motion, defocus and combined blur respectively

Comparative Analysis
The results of the proposed methods are compared with two earlier offered methods based on statistical features [18] and Curvelet transform based features [19]. Table 5 and Table 6 present the comparative results. From the results, it evident that proposed method is superior than other methods.

Conclusion
A blur identification approach is presented in this paper, which categorize a degraded hand gesture image into one of the four categories viz. sharp, motion, defocus and coexistence of both blurs. Such a framework is very required before blind restoration of degraded hand gesture images. This blur identification framework gives impressive results with the accuracies of 99.2 % and 95.6 % with nonexistence and existence of noise respectively. The results are also compared with other methods which shows the superiority of the proposed method. The limitation of these methods is with the level of noise. The blurred images with a high degree of noise will not be suitable since disappearance of blur patterns in frequency spectrum.