Fusion Iris and Periocular Recognitions in Non-Cooperative Environment (AFM Raffei et al)

Anis Farihan Mat Raffei, Tole Sutikno, Hishammuddin Asmuni, Rohayanti Hassan, Razib M. Othman, Shahreen Kasim, Munawar A Riyadi 1Faculty of Computer Systems & Software Engineering, Universiti Malaysia Pahang, Malaysia 2Department of Electrical and Computer Engineering, Universitas Ahmad Dahlan, Yogyakarta, Indonesia 3,4,5Faculty of Computing, Universiti Teknologi Malaysia, Malaysia 6Faculty of Computer Science & Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Johor, Malaysia 7Department of Electrical Engineering, Universitas Diponegoro, Semarang, Indonesia


INTRODUCTION
Increasing applications of security systems such as visual surveillance has motivated the current iris recognition system to identify a person in non-cooperative environment (at different distances, in motion, under lighting variation and using visible wavelength illumination) [1,2]. Other than that, in the image acquisition stage, the eye images are expected to be taken using a sensor with high resolution [3,4]. The iris recognition system performance is relying upon between the distance of sensor and the subject. This condition has caused difficulty in verifying a person due to low quality of iris data [5] caused by several noise factors such as reflections [6][7][8][9][10], motion-blurred [11], off-angle [12], low lighting [13,14] and occlusion of eyelid [15][16][17]. Besides, the performance of iris segmentation in this environment is negatively impacted when the images resolution is low [3,4]. The resolution of the sensor and distance of the subject from the sensor are the two factors that determine the resolution of the captured eye images [18,19]. Furthermore, the resolution of the sensor is depending on the zoom factor, resolution and view angle [18]. Although the first factor can be resolved using a high resolution sensor, it is still challenging to manage the distance of the subject from the sensor. An increase distance of the subject from the sensor causes a decrease in the size or pixel resolution of the captured eye images when using a fixed zoom-factor. This has reduced the quality of iris texture and increase the erroneous in the iris recognition system [3,4].
To improve the performance of the iris recognition for the low quality of eye images, a combination between two recognitions should be carried out. Several researchers have implemented the combination at the sensor level to resolve the problem of low resolution eye images [20][21][22]. At this level, it is usually uses superresolution approaches which generates an image with a higher resolution. It can be done either in segmentation or normalization stages. The shortcoming of using this approach is that it requires a magnification factor where the value of this factor depends on the number of non-redundant low resolution images that are available. Besides, the selection of the periocular region as features for recognition provides better accuracy in the recognition system and the fusion in feature levels provides a better performance than the other fusion levels [23,24]. Rather than extracting only single features, the global and local features of low resolution eye images are used. Bharadwaj et al. [22] proposed a combination of global matcher (GIST) and circular local binary pattern (CLBP) with 1,536 elements of GIST descriptor and 64 sub-regions. A fusion score from both methods which weighted min-max normalization is used and a combination of fusion methods has resulted in higher accuracy than that of the single methods. Some researchers, such as Miller et al. [24] uses a uniform local binary pattern (ULBP) to select the local features. A size of 100 × 160 pixels was used for the periocular region and an oval-shaped neutral mask provided for the iris and sclera regions. The ULBP was determined using 8pixel neighbourhoods that produced 59 different possible results and a method of city-block distance was used to match the scores. Park et al. [25] implemented a fusion method of gradient orientation histogram and local binary pattern (LBP) to select the global features of periocular while a method of scale invariant feature transformation was applied to select the local features of periocular. A Euclidean distance (EUD) and CSD were used to measure the matching of global and local features, respectively.
Woodard et al. [26] determined the usability of periocular recognition over near infrared and visible data by selecting the texture and colour features. The LBP was used to select the texture features and to select the colour features, a method of color histogram (CH) was implemented for the red and green channels. Methods of Manhattan distance and Bhattacharya distance were used to determine the matching score for the methods of LBP and CH, respectively. To fuse the scores between the features of texture and colour, a method of min-max-normalization (MMN) was used. For the iris recognition, an integrodifferential operator to localize the iris and pupil boundaries, a rubber sheet model to normalize the segmented iris, a two dimensional of Gabor filter to select the iris texture and a Hamming distance to measure the matching template are used. The results showed that periocular recognition provides higher accuracy than iris recognition while the fusion of both recognitions outperforms the single trait recognitions. Tan and Kumar [27] proposed Leung-Malik filters to select the periocular features. The filters are constructed from Gaussian derivative at different orientations and scales in which the filter responses from the training images are clustered using 100-means clustering. Following this, a CSD was employed as the metric by which to compute the matching scores. For the iris recognition, Tan and Kumar [27] uses the following procedures: a Canny edge detection method to localize the iris and pupil, a rubber sheet model to normalize the segmented iris, a log-Gabor filter to select the iris features and a Hamming distance to compute the matching scores. The MMN was used to combine the matching scores between the periocular and iris features. The results showed that periocular recognition provides better accuracy than iris recognition while the fusion of both recognitions outperforms the single trait recognitions. With regard to content-based image retrieval, features that could be extracted for image similarity comprise texture, colour, shape, intensities and spatial information [28][29][30][31]. The colour feature is commonly used in image retrieval and has several advantages compared to the other features. These include, ease of extraction, robust background complication and independence of image size and orientation [29]. In periocular recognition, most of the existing studies only use the texture method for periocular features, with the exception in [26] who used a fusion of texture and colour features. According to Woodard et al. [26], the fusion features of texture and colour provide higher accuracy than single features, while texture features give higher accuracy than colour features.
The challenges of texture and colour feature extraction in periocular are easily affected by a background complication, dependent on image size and orientation, limited to spatial information and the quantization effects. These have contributed to unable in selecting discriminative information of periocular. Hence, alternative methods of texture and colour features were proposed as those methods provide robust discriminative structure features in extracting spots, the line ends, edges and corners of texture, and capable of improving discriminating power of colour indexing as well as it provides better colour distribution for colour similarity. Figure 1 presents an complete process of the proposed method, RIULBP. This paper contains of several sections: (i) Section 2 presents the datasets used in this study, UBIRIS.v2 and UBIPr; (ii) Section 3 describes the proposed methods of texture and color feature extraction use in the periocular recognition; (iii) Section 4 explains the iris recognition system; (iv) Section 5 disccusses the experimental results and lastly, Section 6 concludes the study and recommends the future work. Figure 1. A complete process of the proposed method, RIULBPCM of periocular recognition.

DATASET
Two databases are available to analyze the objectives of this research, namely version two of University of Beira Interior (UBIRIS.v2) [1] and University of Beira Interior Periocular Recognition dataset (UBIPr) [32]. Each of the dataset was described as follows:

Eye images of UBIRIS.v2
This dataset was created by the University of Beira Interior and was functioned to either investigate the performance of iris or periocular recognition in non-cooperative environment settings [33,34,46,47]. The eye images were acquired at ranged from four to eight meters with a resolution of 400 × 300 pixels and in a color representation of standard RGB (see Figure 2a), in motion, under lighting variation and using visible wavelength illumination. About 500 eye images with each distance consisting of 100 images and with less than 30° angles were selected to avoid incorrect iris segmentation due to a less extracted iris area from the eye images with more than 30° angle.

Eye images of UBIPr
This dataset was also created by University of Beira Interior with purpose to overcome the limitations of several existing datasets such as UBIRIS.v2, Multiple Biometrics Grand Challenge (MBGC [35]) and Face Recognition Grand Challenge (FRGC [35][36][37]) in investigating the performance of periocular recognition in non-cooperative environment settings. According to Padole and Proenca [38], the MBGC and FRGC datasets do not provide enough data variability in which both datasets have only low degrees of variations in pose, minor illumination changes and absence of scale changes while some of the eye images in UBIRIS.v2 dataset does not include eyebrows and skin regions, especially at a short distance. The UBIPr images were acquired at different distances (ranged from four to eight meters) with vary resolutions: 1001 × 801 pixels for four meters, 801 × 651 pixels for five meters, 651 × 501 pixels for six meters, 561 × 541 pixels for seven meters and 501 × 401 pixels for eight meters (see Figure 2b). About 500 eye images with each distance consisting of 100 images and with less than 30° angles were also selected for this dataset.

METHODOLOGY
This section describes the fusion iris and periocular recognition system.

Iris Preprocessing
This section comprises of three subsections: Section 3.1.1 describes remove the reflections in the eye images, Section 3.1.2 explains the contrast enhancement for the low-level eye images and Section 3.1.3 give details for the frame detection for the bespectacled eye images.

Remove Reflections in The Eye Images
To remove the reflections, a method of line intensity profile (LIP), support vector machine (SVM) and intensity adjacent interpolation, which also known as LIPSVM [10] is implemented to identify, classify as well as filling in the reflection areas in the eye images. The LIP determines the reflections using Eq. (1) as follows: where the intensities of green and blue of a pixel must be less than the red intensity after subtracted from the maximum intensity value. Then, an SVM is used which a set of reflection areas and a set of non-reflection areas is formed in a one-dimensional space which each set is labelled as either 1 for reflections or -1 for nonreflections. A radial basis function is selected as a non-linear kernel function. To remove the reflections, the dilation and closure of morphological processing proposed by Sankowski et al. [8] is implemented. Lastly, adjacent intensity interpolation is used to fill the reflection areas.

Iris Contrast Enhancement
An iris contrast enhancement method suggested by Raffei et al. [14] is performed to improve the low contrast level in the eye images. To initiate the process of contrast enhancement, a root mean square error (RMS) is determined where the obtained value of RMS must be below than 0.4 [14]. Next, the eye image is partitioning into several sub-areas of 8×8. An entropy-based method [39] is performed to obtain thae clip limit parameter of amplification for each sub-area and to determine the iso-luminance from uniform distribution, a cumulative distribution function is computed.

Frame Detection
To distinguish the edges of the frame and the edge of the iris and eyelids during the process of iris segmentation, a method proposed by Raffei et al. [40] is implemented. A conversion into a different color space such as HSV is necessarily because the image intensities in the V channel of HSV space is more stable compared to in grayscale space. Next, fuzzy sets were formed to define each variable's intensities with "Sobel" and "high pass" for input and "edge" and "non-edge" for output. A Gaussian and triangular membership functions were implemented for the input and output fuzzy sets.

Iris Segmentation
A circular Hough transform method [15] is performed to extract the edges of the limbic and pupillary iris which rather than extract the edge from entire eye region, the limbic edges is extracted first because the pupillary edge is always within the iris area. Then, a linear Hough transform and thresholding are applied to remove the occlusion by eyelids and eyelashes.

Iris Normalization
A conversion of circular form of the segmented iris into rectangular form is required in order to provide a similar form of template for comparison which a homogenous rubber sheet model [14] is implemented. Each point in the iris is remapped to a pair of polar coordinates (r, θ) by the model where r is between the interval [0,1] and θ is defined as angle [0, 2π].

Iris Feature extraction
A one-dimensional log Gabor filter [15] is applied to extract a unique feature of each image to be used for the next process, iris template matching.

Iris Template matching
To match between the two extracted irises, a Hamming distance is utilized which for the value of distance is more than 0.5, the two extracted irises are considered to be from a different iris, and vice versa.

Periocular Features Extraction
Direct implementation of feature extraction in RGB color space is a bad practice due to this color space is easily loss of features information and does not consistent to the corespoding perception of color similarity [41,42]. Hence, a different color space such as HSV (see Figure 3) is required to increase the discriminative properties in LBP in which this color space is a nonlinear transformation of the RGB. To extract the texture features of eye images, the hue channel of HSV is utilized and to extract the color features of eye images, all channels of HSV are used. Several attributes in LBP method has makes it prominent to practice which it provides a discriminative texture structure, useful to extract spots, line ends, edges and corners of the periocular texture while adapting to monotonic lighting changes [24,36,44,45]. To remove the rotation effect and reduce dimensionality, an extension version of the rotation-invariant and uniform in LBP are developed, known as RIULBP as follows: Step 1: The hue image is utilized to extract the texture features which the hue eye image is partitioned into 25 × 25 sub-regions and each pixel in the sub-regions is compared with its eight neighbours with radiuses of 1. Figure 4 presents example outputs for this step. Step 2: The RIULBP is performed according to the formula below: . where the values of hue level in central pixel are defined as and and surrounding pixels in the circle neighbourhood is characterized as p with a radius, r and function ( ) is defined as output values are produced by the operator , , corresponding to 2 p different binary pattern formed by surrounding pixels.
Step 3: A binary number is obtained by concatenating all these binary values in a clockwise direction for each pixel, which starts from the one of its top-left neighbour. The decimal value of the generated binary number is then used for labelling the given pixel. The derived binary numbers are referred to be the RIULBP codes.
To overcome the limitation of the quantization effect of the colour histogram method, the method of colour moment is chosen and this method is very effective for the colour image-based analysis. The method of colour moment is performed as follows: Step 1: To increase the discriminating power of colour indexing techniques, the HSV eye image is divided into three equal non-overlapping horizontal regionswith the size of each region is 100 × 400. Figure 5 presents example outputs for this step. Step 2: For each non-overlapping horizontal region, the colour moments feature vectors from each colour channel is extracted.
Step 3: The colour moments feature vectors store the 27 floating point numbers in the index of the image.
Different methods of periocular templates matching are required in order to compute the matching score for the different periocular features. To measure the texture template scores, a method of chi-square distance, χ 2 [33] is used as follows: where B and D are the two texture features to be matched while m and n correspond to the m th bin of histogram belonging to n th local region. To measure the similarity of color features, a EUD is applied as follows: where k represent the floating point numbers, and ′ and represent the feature vector of the query and database images. The matching scores of texture and color features are then normalized using a method of MMN [43].
The MMN is carried out as follows:

RESULTS AND ANALYSIS
To evaluate the proposed and existing methods for the fusion iris and periocular recognitions, three measurements are completed: accuracy, decidability index and cumulative match characteristics (CMC). Table 1 presents the results of accuracy for the fusion iris and periocular recognitions for the UBIRIS.v2 and UBIPr datasets based on different methods and distances. For the UBIRIS.v2 dataset, at all distance, the Tan and Kumar [27] achieved the lowest accuracy compared to the other methods with 71.2% accuracy. The Woodard et al. [26] method has obtained a slightly higher accuracy than the Tan and Kumar [27] method at all distance. This is because, the method utilizes more features in order to increase the performance of periocular recognition compared to the Tan and Kumar [27] method. Although these methods have made several improvements for the iris segmentation and periocular feature extraction, the existence of the eyeglass frame in the eye images makes it difficult for them to obtain the highest accuracy. On the other hand, the proposed method outperformed the other methods with 94.2% accuracy. This is because the method has improved the performance of the iris segmentation using the process of frame detection, contrast enhancement and reflection removal in order to reduce the level of noise in the eye images. Besides, the extraction of texture and colour features in the periocular has also improved the performance of the periocular recognition.

Analysis of accuracy for the fusion iris and periocular recognitions
For the UBIPr dataset, although the Woodard et al. [26] and Tan and Kumar [27] methods have made several improvements for the iris segmentation and periocular feature extraction, the large skin area and eyeglass frame in the eye images makes it difficult for them to obtain the highest accuracy. These methods only obtained 68.8% and 66.5% accuracies at all distance, respectively. The proposed method was once again outperformed by the other methods with 95.9% accuracy. This is because the method has improved the performance of the iris segmentation using the process of frame detection, contrast enhancement and reflection removal in order to reduce the level of noise in the eye images. Besides, the extraction of texture and colour features in the periocular has also improved the performance of the periocular recognition.  Table 2 presents the results of decidability index for the fusion iris and periocular recognitions for the UBIRIS.v2 and UBIPr datasets based on different methods and distances. For the UBIRIS.v2 dataset, the Tan and Kumar [26] method obtained the lowest decidability index compared to the other methods with an index of 1.795 at all distance. Hence, this method has provided the smallest separation distance between the intraclass and interclass distributions. Also, the ability of this method to identify an identical person is very low. On the other hand, the proposed method has achieved the highest decidability with an index of 2.431 at all distance. Therefore, this method has provided the largest separation distance between the intra-class and interclass distributions. Also, the ability of this method to identify an identical person is very high.

Analysis of decidability index for the fusion iris and periocular recognitions
For the UBIPr dataset, the proposed method was once again outperformed by the other methods. At all distance, the proposed method has obtained an index of 2.253 while the Woodard et al. [26] method has achieved an index of 1.665. In addition, the Tan and Kumar [27] method has once again obtained the lowest decidability index compared to the other methods with an index of 1.654 at all distance. Hence, this method has provided the smallest separation distance between the intra-class and interclass distributions. Also, the ability of this method to identify an identical person is very low.  Figure 6 until Figure 11 present the results of the CMC accuracy for the fusion iris and periocular recognitions using UBIRIS.v2 and UBIPr datasets in accordance to different distances and methods. This measurement was used in order to evaluate the one-to-many identifications of fusion recognition performance. For the UBIRIS.v2 dataset, the proposed method achieved the most accurate result with an average rank-one which is 80.0% for the distance of four metres, 75.5% for the distance of five and six metres, 83.2% for the distance of seven metres and 77.0% for the distance of eight metres. The method used by Tan and Kumar [27] performed poorly in terms of the one-to-many identification capabilities with 43.7% for the distance of four metres, 39.1% for the distance of five metres, 33.2% for the distance of six metres, 51.3% for the distance of seven metres and 34.0% for the distance of eight metres. The scores achieved by Woodard et al. [26] method ranged from 47.0% to 60.0%. At all distance, the proposed method achieved more than 70.0% for an average rank-one cumulative match score. This can be concluded that the proposed method performed well at one-to many identification levels compared to the other methods.

Analysis of CMC for the fusion iris and periocular recognitions
For the UBIPr dataset, the method used by Tan and Kumar [27] once again performed poorly in terms of the one-to-many identification capabilities with 47.0% for the distance of four metres, 35.5% for the distance of five metres, 31.5% for the distance of six metres, 50.0% for the distance of seven metres and 32.0% for the distance of eight metres. The Woodard et al. [26] method obtained more accuracy than the Tan and Kumar [27] method with 60.0% for the distance of four metres, 47.6% for the distance of five metres, 43.9% for the distance of six metres, 60.0% for the distance of seven metres and 44.0% for the distance of eight metres. The proposed method was once again achieved the highest accuracy with 87.0% for the distance of four metres, 79.3% for the distance of five metres, 73.3% for the distance of six metres, 89.4% for the distance of seven metres and 61.0% for the distance of eight metres. At all distance, the proposed method achieved more than 77.0% for the average rank-one cumulative match score. It can be concluded that the proposed method performed well at one-to many identification levels compared to the other methods.

CONCLUSION
A combination of texture and colour periocular feature extraction methods has been proposed to resolve the limitations of the current periocular feature extraction methods that were easily affected by a background complication, dependent to image size and orientation, and limited to spatial information and quantization effects. A rotation invariant uniform local binary pattern method was proposed to extract the texture features and a colour moment method was proposed to extract the colour features. The rotation invariant uniform local binary pattern method was able to reduce the effects of the rotation and dimensionality while being very robust to extract texture features such as spots, line ends, edges and corners. Furthermore, the use of a hue-saturation-value channel has provided a better stability of colour distribution for the colour similarity. In addition, the partitioning process of eye images into equal non-overlapping horizontal during the process of colour feature extraction has stored sufficient spatial information which has increased the discriminating power. Besides, the characteristics of the colour moment method that extracted local information such as mean, standard deviation and skewness in the eye images has provided a better colour distribution for the colour similarity between the query and data images. The proposed method has increased the accuracy of periocular recognition. Finally, the combination of iris and periocular recognition systems has increased the accuracy of recognition compared to the single recognition in identifying a person. In the future, other research works such as [46][47][48][49][50] will be embedded to enhance our method.