Malayalam Handwritten Character Recognition Using AlexNet Based Architecture

ABSTRACT


INTRODUCTION
Optical Character Recognition (OCR) is the process of recognising handwritten text and printed text.The recognised texts are converted into an encoded format.Four essential steps in character recognition are: a) data pre-processing, b) segmentation, c) feature extraction, and d) classification [1].The Malayalam script consists of 15 vowels, 36 consonants, and 5 pure consonants as shown in following Figure 1.There are 12 dependent vowels and 144 compound characters as shown in the Figure 2 and Figure 3 respectively.The characters are compounded both vertically and horizontally in Malayalam script.
Convolutional Neural Network is used for character recognition as well as for other purposes like gender prediction [2], finding fidelity content of document digitalisation [3], image style recognition [4], plant classification [5], rail surface defect detection [6],and several other applications.CNN has its advantage in image classification and feature extraction due to its special characteristics such as local connectivity strategy and weight sharing strategy [7] which drastically reduces usage of many parameters.
Handwritten character recognition is a difficult task, because the writing styles of people differs on a large scale.Apart from the writing styles, other factors like noise and occurrence of skew also increases the difficulty.Unavailability of a standard dataset for Malayalam characters increases the complexity.Several models have been used by researchers to improve the accuracy rate of characters by designing new features, integrating different known features, and using multiple classifiers for classification.Various handcrafted A survey of different feature extraction and classification methods is conducted and presented here.G. Pirlo et al. [8] presented a fuzzy-zoning based classification method for handwritten characters.The Recognition and reliability rate obtained for this fuzzy based zoning method is 93% and 95% respectively.Dileep Kumar Patel et al. [9] used Discrete Wavelet Transform (DWT) and Euclidean distance transform to find similar feature patterns.They noticed the minimum distance between the characters during classification.The accuracy obtained is 90%.A neural network based off-line English handwritten character and digit recognition was proposed by Ashok Kumar et al. [10].ShilpyBansal et al. [11] presented a technique called Neighborhood Foreground Pixels Density for handwritten Gurumukhi character feature extraction.Support Vector Machine (SVM) was used for classification.The accuracy achieved was 91.95%.A novel offline Malayalam handwritten text segmentation method was proposed by Shanjana C et al. [12].SVM is used for character classification and achieved accuracy of 82%.
DurjoySenMaitra et al. [13] 5-layer CNN model like LeNet-5 reported an accuracy of 99.10% for English and multi script characters.A CNN based Bangla digit recognition was proposed by M.A.H Akhand et al. [14], 98.80% accuracy was obtained in the proposed model.A study on different applications of deep learning was conducted by S.M. Sofiqul Islam et al. [15].The review evaluates two models of CNN, AlexNet, and Visual Geometric Group (VGG-S) in nine different benchmark datasets.Gurumukhi character recognition using particle swarm optimization and a neural network has been presented by JaspreetKaur et al. [16].Their results showed PSONN outperforms ANN in recognizing Gurmukhi characters with 100% accuracy for basic characters.A novel approach to integrate two different classifiers for the recognition of off-line Arabic handwritten characters was made by Mohamed Elleuch et al. [17], 94.9% accuracy was reportedin this method.Shenzhen Gu et al. [18] used AlexNet architecture for the detection of a tennis ball in pictures.Ahmed El-Sawy et al. [19] proposed an off-line Arabic handwritten character recognition system using CNN model.A nine-layer CNN was used, and 5.1% misclassification error was reported for the testing data.Darmatasia et al. [20] proposed a CNN based feature extraction model.The overall accuracy, when tested on ten form documents, was obtained as 83.37%.Pranav P et al. [21] worked on Malayalam handwritten character recognition using CNN.The model was compared with LeNet-5 architecture and obtained a higher accuracy result.95% accuracy was obtained for this method.SVM based character recognition has been proposed by GauriKatiyar et al. [22].The method emphasizes the advantage of using SVM classifier for character classification.The results show an average accuracy of 95.74% for uppercase characters and 92.19% for lower case characters.AarthnaMaheshwari et al. [23] worked on handwritten English alphabets using PSO algorithm.ANN was used for recognition and obtained an accuracy of 83.8462%.
From the above surveys of object recognition methods, it is observed that the character recognition has scope for further research and development.An emerging trend which has shown advancement in recognition rate and reduction in time consumption is CNN.Here, an attempt is made to improve the recognition rate and reduce time consumption of Malayalam handwritten simple and compound characters using AlexNet based CNN model.The primary objective of this research work is to develop a suitable model that efficiently extract features of Malayalam characters and classifies the character.The proposed new model is having higher recognition rate with higher accuracy and minimal training time.Further, a new dataset for Malayalam characters is developed for testing proposed new model and shared in the Internet.

RESEARCH METHOD
From the research surveys conducted, the drawbacks of handcrafted feature learning from the works of G. Pirlo [8] and Dileep Kumar Patel [9] are identified.In these methods, different features of the characters need to be extracted manually at each iteration, thereby increasing the cost of computation.To avoid the computational cost and to reduce the manual feature computation process, a convolutional feature extraction technique using neural network model to extract and train the network using different features of the same data is introduced.This improves the efficiency of character recognition and reduces the cost of computation which occurred while using manual feature extraction method.
One of the features of CNN is that its recognition rate improves as the training set increases.This requires collecting a large dataset from people of different ages and different sections.Since, Malayalam does not have a large dataset, collecting handwritten character data is the first requirement for this research work.The collected data is then pre-processed in a small amount and augmented to enlarge the size of the dataset.This data isfurther divided into training and test set in the ratio 80:20, respectively.The primary aim of this research work is to automate the feature extraction by using CNN architecture.The CNN model proposed here consists of a 24-layer architecture, like AlexNet, which is used for extracting features of characters and SVM is used for classifying the output characters.

Collecting Character data set
In this proposed method, 36 compound characters and 44 basic characters of Malayalam language are considered.Since Malayalam characters do not have a standardised dataset, the 44 primary characters are collected from a new Malayalam character dataset called P-ARTS KAYYEZHUTU [24] dataset.The 36 compound characters are collected manually from people of different age groups.Grids of size 14 x 8 are printed in A4 papers, and character set is written in these columns of the grid.Following Figure 4 shows the architecture of the proposed system.

Augmenting and Pre-processing data
The data collected are cropped and characters are separated into 36 different labels.Each character image is of size 86 x 86 pixels.This dataset is augmented to develop a larger dataset.The augmentation is performed by applying affine transformation.The affine transformation includes translation, rotation, and scaling.The affine transformations preserve edge points, shapes, curves etc.So the basic shapes of characters remain the same.Rotation is used for augmenting the dataset, 8 distinct degrees of rotations are used (-2, 2, 4, -4, 8, -8).Following Figure 5 Step 1: The input data is represented in the form of pixels of size 227x227x3.
Step 2: A filter of size k x k is selected from the image and is convolved over the entire image pixels.
Step 3: The first layer calculates the match of feature to a patch of the image, which multiplies each pixel in a feature by the corresponding pixel value in the image.
Step 4: Add up the answers and divide by total pixel values.
Step 5: In the first layer, if both pixels are black, (-1) x (-1) =1 or if both pixels are white then 1 x 1=1, every matching pixel result in a 1.
The third dimension 3 represents the RGB colour channels of the image.In our experiment input size is 227x227x3, kernel k = 11, with a stride of 4 and zero-padding.The first convolution operation Conv1 yields a feature map with 96 features.The output size of a convolution layer is calculated as follows: where, C is the output layer size, W is the input height / length, K is the kernel size, and P represent padding and S represent stride value.
A general convolution operation is represented as: IJEEI ISSN: 2089-3272  Let h and g are two functions, whose convolution is written as h*g.This is the integral of the product of two functions after one is reversed and shifted.The convolution is a type of integral transform as shown below: here, t need not represent time domain and this operation is defined as a weighted average of the functionℎ() at moment t where (−) is weighting shifted by amount t.Following Figure 6 shows image of a compound character before and after convolution operation.
The next necessary layer is called activation function layer.This layer mainly helps to introduce some non-linearity to the network.Also, it helps to convert the outputs of neurons in previous layer as the input to the neurons in the next layer that aids in preventing linear mapping.Max-pooling finds the maximum value in a proposed region.A max-pool layer MP1, with a kernel size of 3 x 3 with stride 2 is used here.This reduces the input size to 112 x 112 x 48.Following Figure 6 shows the sample image before and after the convolution operation.The second convolution layer Conv2, with kernel 5 x 5 x 48, stride 1 and padding 2 produces a feature map of size 256.

Classification and Model training
Support Vector Machine (SVM) is used for classification instead of Soft-max classifier of CNN for two main reasons.First, SVM reduces the over fitting problem that occurs in CNN.Drop out layer [25] is used for handling the misclassifications.By appropriately tuning the margin control parameter λ over fitting problem is reduced.The λ is selected using cross-validation methods.Secondly, Softmax classifier is made for 1000 class problem, whereas to deal with lesser number of classes, in this research work 80 classes, thus, SVM is more suitable.The goal of SVM is to find the optimum separating hyper plane that maximizes the margin of training data.A multi-class SVM is used here for training the data.It works as follows: In general, a two-class classifier is built over a feature vector (x, y) which consists of input features and class of the datum.At test time classifier chooses the class.
Finally, the classifier classifies these data using 80 classes.As mentioned, FC7 layer consist of 4096 connections of neurons and about 122,880 different features of each character are used as training features.That is about 4096 x 122,880 features are fed as input to the SVM classifier.Gradient features of each character are considered to calculate the feature vector values.SVM uses feature vector values for classification.

RESULTS AND ANALYSIS
The compound character dataset consists of 36-character classes.This dataset is tested against two other architecture other than AlexNet-24 architecture, namely LeNet-5 with seven-layer architecture and architecture with 152 layers called ResNet.
Following Table 1 shows the comparison between three CNN models with two different datasets, raw data, and pre-processed data.The LeNet-5 model produces an accuracy of 85.30% to the raw dataset and 87.54% to the pre-processed dataset.The LeNet-5 architecture consists of seven layers, and it takes a long time for executing the data of about 180,000-character images.The ResNet network consists of 152-layer architecture and there occur the problem of vanishing gradient and degradation.It produces better accuracy rates than LeNet-5 but lesser than the proposed AlexNet-24.
The first parameter is initial learning rate.The learning rate indicates the time required for the neural networks to learn different features.If it is low the time in learning increases, whereas if it is high the learning time is reduced but the prediction accuracy decreases.The learning rate is set as 0.001.Other parameters are mini-batch size and epoch.The above Table 2 shows the different mini-batch size used with corresponding accuracy obtained.Mini-batch size is the amount of data that needs to be used for a single iteration.The value after comparison is considered as 128, which produces the highest accuracy in short time.Epoch determines the number of forward and backward pass required for iterations.Here, different epoch values such as 5, 10, 20, and 30 are tried and this value is set to be 20.
The two different datasets collected are tested separately using the three models, i.e LeNet-5, ResNet, andAlexNet-24.The input images of 36 compound characters are tested and number of classifications is recorded.The compound characters are classified with the basic character data.The following Table 3 shows that the accuracy obtained for both the basic characters collected from P-ARTS Kayyezhuthu dataset and tested with those of compound characters, AlexNet-24 shows a higher average accuracy level.
The Mean Square Error (MSE) rate is 1.58%.Following Table 4 shows the characters, their corresponding misclassified classes, and their percentage of misclassification.The confusion matrix is used for evaluation that produces an accurate level of correct recognition and misclassified character percentage.The average accuracy level obtained is above 98.4%.The model is tested several times with different iterations, and a different number of test images are used.A consistent accuracy level of 98.42% is obtained in most of the cases.The experiment is carried out in MATLAB 2017a.


ISSN: 2089-3272 IJEEI, Vol. 6, No. 4, December 2018: 393 -400 394 feature extraction methods have been employed to extract features of Malayalam characters, but, all failed to achieve 100% accuracy.Moreover, these conventional methods are time consuming and most of them failed to achieve higher recognition rate.

Figure 4 .
Figure 4. Architecture of the proposed system

Figure 5 .
Figure 5. (a) Eight different rotated angles of character, NNA (b) Sample negative of characters, NKA and GMA

Figure 6 .Figure 7 .
Figure 6.Image before and after convolution operation
Recognising Malayalam handwritten characters is one of the challenging research areas.In general, the handwritten characters vary in writing style, from person to person, thus, an automated feature extraction process makes the Malayalam character recognition easier.Compound characters of Malayalam are most prone to misclassification with their basic characters.A dataset consisting of 90,000 basic characters are collected from P-ARTS Kayyezhuthu dataset, and 100,000 compound characters are newly developed in this research work manually using different augmentation methods.In this research experiment an overall training accuracy of 99.9% and a testing accuracy of 98.42% are achieved.Several combinations are tried out withvarious functions and methods in this research work.These methods are selected by accuracy and processing time for executing the data.A comparison of the proposed new model AlexNet-24 architecture is made with two other existing recent models LeNet-5 and ResNet.Also, a complete Malayalam character dataset is newly developed by these basic and compound characters made available through P-ARTS Kayyezhuthu drive.The experimental results show that the proposed model AlexNet-24 outperforms other recent existing models LeNet-5 and ResNet.

Table 1 .
Comparison of raw data set with Pre-processed data set Malayalam Handwritten Character Recognition Using AlexNet based Architecture (Ajay James) 399

Table 2 .
Accuracy Vs training time with Different batch size

Table 3 .
Comparison of three methods using both two datasets CNN Models Accuracy of data set from P-ARTS Kayyezhuthu