Recognition of Degraded and Non Degraded Roman Characters Using Different Classiﬁers

,


Introduction
Optical character recognition (OCR) is an automated process for reading printed or handwritten text. OCR is an electronic conversion of books text, administrative records, office files, marriage records, security records and many other important printed text into machine encrypted text. This machine encoded text takes less memory space as compared to image and helps in formatting, editing and displaying text properly [1,2]. The feature extraction is one of the most important step for matching or classification. Feature extraction techniques are divided into two groups: linear and nonlinear. The linear extraction techniques are principal component analysis (PCA) [3], independent component analysis (ICA) [4], linear discrim-inant analysis (LDA) [5]. The techniques belong to nonlinear extraction methods are kernal PCA and contour based features. The extracted features of the character along with the class labels are fed to the classifiers for classification of the character image. Three main approaches for classification are simplest, probabilistic and decision boundary. The simplest approach is based on similarity, the probabilistic approach is based on Bayes rule and decision boundary approach is based on optimizing the error in character images.
The important problems of OCR are related to alphabetical character recognition, degraded document, character recognition in license plate, logo & seal and word spotting. Alphabetical character recognition is the most basic and most important problem of OCR. This paper presents a novel approach for character recognition of offline english alphabetical characters. The main strategy is to extract statistical features using a combination of complementary similarity measure (CSM) and grey level co-variance matrix (GLCM). The CSM method is generally used as a classification method in the area of character recognition. In this work, CSM method is used as a feature extractor and combined with GLCM to boost the accuracy of character recognition.
The work presented in this paper is focused on similarity measures and statistical features of dataset. The features are extracted from the offline alphabetical characters using statistical complementary similarity measure (CSM) and grey level co-variance matrix (GLCM). The recognition has been done using four different classifiers i.e. artificial neural network (ANN), Naive Bayes classifier, random forest (RF) and support vector machine (SVM). Standard dataset has been used for experimental work. The experimental results show that the overall average recognition accuracy achieved is in the range from 63.9% to 84.9% in noiseless environment and 75.82% is recorded in noisy environment. This section reviews the literature related to alphabetic character recognition in various fields. Uttal et al. [6] used conflicted dot patterns to recognize characters by dynamic visual noise (DVN) technique. Increasing the number of dots in the form of geometric structures leads to increase in recognizability.
Kahan et al. [7] presented an algorithm in which they recognized roman alphabetic characters of different font sizes. They have used shape clustering approach and line adjacency graph (LAG) for binary images. 'Blobs' were used as connected components of LAG and thinning was chosen as basic factor for feature extraction. The features extracted using LAG based thinning had number of holes in structure, crossings of strokes, concavity of skeleton structure, vertical direction end points, exact position of holes and rectangular boxes that contained characters. The time taken by LAG traversal based thinning was less as compared to pixel based thinning approach. Khan et al [7] found some confusion groups in following alphabets for example a, e, s, g, h, b, t, f, d, o, i, j. They have used Bayesian classifier, which classified six font sizes. But due to some confusion features, recognition rates were reduced and accuracy was compromised i.e. for unlike fonts, it was 97 % and for likely font, it was 99 %.
Comelli et al. [8] found some degraded pictures of vehicles, which were captured by TV camera. The captured pictures were having some imperfection like geometric distortion, presence of noise and blurring. So, it was very difficult to recognize the license plate number properly. The authors have considered template matching and cross correlation technique to overcome this problem. This work was verified by using RITA software as a license plate recognizer. The recognition rate of alphabetical characters was 97.1 percent.
Fukushima et al. [9] presented their work on handwritten recognition that was based on neocognitron technique. This technique was a deformation invariant technique.
Hirwani et al. [10] presented a character recognition of handwritten alphabets in which they used local binary pattern (LBP) as a feature and used a nearest neighbor classifier for recognition.
Bag et al. [2] presented a recognition of bangla compound characters in which they used skeleton segmentation for feature extraction and used strokes segments to extract shape components. This improves the recognition and efficacy of features.
Heutte et al. [11] presented a combined structural and statistical features based vectors (SSFBV) for character recognition. They divided these SSFBV features into two categories, local features and global features. Global features are combined with invariant moments, projections and profiles. Local features consists of four features, namely intersections with straight lines, holes and concave arcs, extrema, end points and junctions. A statistical classifier based on linear discriminant function was used for classification. The recognition rate re-ported for proprietary datasets was 97.4 percent.
Li et al. [12] used partition combination method for character recognition. Partition parts was called bases, roots, etc. Further, basic features like mean, standard deviation were extracted and recognition was carried out. Finally, they integrated all results of different patterns and accuracy of 98 % was achieved. It was difficult to find out new techniques which integrate the whole results.
Conell et al. [13] presented an unconstrained online devanagari script recognition. They have used two different classifiers i.e. hidden Markov model (HMM) and neural network (NN) in five different ways. The extracted features which were classified by classifier 1 (HMM) provides directional change in the x axis and y axis. The features are center of character and inclined angle θ in sine and cosine forms. Classifier 2 (HMM) classified orientation of critical point, classifier 3 (NN) classified each 5 × 5 zone stroke directional histogram. Classifier 4 (NN) classified each stroke beginning to end and classifier 5 (NN) used global features for each zone like centroid. In this way, they classified different features by different classifiers and reported the recognition accuracy of 86.5 %.
Likforman et al. [14] presented an enhancement scheme with a total variation and non local means filtering methods. This help in removing background noise from images and improves the character recognition rate.
Bawane et al. [15] presented a recognition of handwritten document and object by applying spiking neural network (SNN), Leaky Integrate Fire (LIF) model and two level Network. They have used SVM for comparison purpose.
Vasudeva et al. [16] presented a technique in which they recognized characters by artificial neural network (ANN) and directions of pixels are taken as features in their work. In this work they have used back propagation neural network for recognition.
Zhang et al. [17] presented a survey for classification. They highlighted some issues of posterior probability estimation, the connection between neural and conventional classifiers. They presented some feature selection algorithms and also examined misclassification.
Albregtsen et al. [18] presented a method behind the GLCM matrices. They have discussed about the texture features which were extracted from GLCM matrices [19].
Chawaki et al. [20] presented a method in which they have discussed texture features which were extracted from grey level run length (GLRL) and grey level co-occurrence matrices. They have reported that the GLRL matrices contain more discriminatory information.
Patel et al. [21] presented a handwritten recognition using neural network.
Dojvcinovic et al. [22] presented a neural network based classification of characters in which they have considered whole image as a feature. Extraction and segmentation of characters were recognized in this work. They applied MSER (Maximally Stable Extremal Regions) as a feature detector and recognized it by a neural network.
Chherawala et al. [23] presented an offline character recognition using bidirectional long short term memory neural network.

404
Recognition of Degraded and Non Degraded Roman Characters Using Different Classifiers

Applications of Roman Character Recognition in Degraded Document
In literature, several methods are reported to address the issue of degraded documents. Sawaki and Hagita [24] presented a strong technique based on complementary similarity measure (CSM) for recognition of character with graphical design and degraded documents. They have used news paper headlines with graphical designs for recognition in their experimental work and reported the recognition rate of 97.7%.
Hobby and Ho [25] presented a technique to enhance degraded document images to display good quality and recognition accuracy. They have used fax images for experimental work and found that the obtained outline descriptions of the printed symbols were rendered with an arbitrary resolution. Some missing parts of the characters, broken characters or stains of the characters were not effective by enhancement method.
Tonazzini et al. [26] presented a technique to recognize text characters in highly degraded printed documents. They have used wavelet based decomposition and filtering for preprocessing the ancient printed text. They applied Markov random field (MRF) segmentation with blind restoration to small portions called blob and trained with back propagation algorithm used in multilayer nueral network (MLP). They have used very highly effective recognition for strong degraded texts with precise segmentation. In their work they used degraded touched characters for segmentation and lastly got a resultant text which was tolerable. For degraded documents several methods are reported in the literature to address this issue.
The method introduced by Likforman et al. [27] has the advantage that they used an extension of one-dimensional hidden Markov models (HMMs) called dynamic Bayesian networks (DBN) for the recognition of degraded old printed characters from historical printed books. They used the pairing of two Hidden Markov Models for capturing the 2D behavior of character images, the columns of image examined to be vertical HMM and rows of image noticed to be horizontal HMM in combination. These two streams were model interacted by two coupled DBN architectures. Here coupled architectures survived better with highly broken characters than both basic HMMs and discriminative methods like SVMs. The coupled architectures predicted missing parts stains and also provided at least one unstained stream within shift. Both the image column and rows are strongly updated which yield unsubstantial data for the classification decision.
Namane et al. [28] presented a technique for degraded characters in typewritten documents produced by typsetting machine . They used complementary similarity measure(CSM)as feature extractor and multi layered perceptron (MLP) for classification. They used CSM features to lift the MLP which was very useful for rejection. They reported the accuracy of 97.95 percent of recognition rate, 0.09 percent of rejection rate and 1.96 percent of error rate on poor quality typewritten characters on typewritten A4 page documents. In the light of reported accuracy it is conceivable that in combination architecture (CSM-MLP) the decision is only made by the MLP, whereas the rejection is either made by the CSM or the MLP and it decreases the system fulfillness.
Ramesh Babu et al. [29] presented a novel technique for recognizing degraded printed characters based on gradient patterns of a particular character. Experiments were organized on character images which were either digitally written or on degraded old historical documents. This work was reported to be tolerable. This method was found to be sensitive to added stains which modify the gradient pattern automatically and decrease the performance of the work.
Namane et al. [30] presented a method for degraded characters recognition in which it contained only model similarities in order to accept or reject an incoming character by a first classifier similarity measure neural network (SMNN). The second classifier Wienner Takes All (WTA) used all the similarities produced by the first classifier (cases of rejection). Similarity measure neural network (SMNN) was designed here, relative distance like hamming distance was used as a quality measurement and proposed a design of a cascaded combination of SMNN and WTA, or CSM net. Experiments were conducted on isolated printed characters collected from post checks. All the recognition methods that are based on the use of feature extraction should use appropriate classification methods.
Virmani et al. [31] presented a technique in which they used singular value decomposition of GLCM matrix and SVMbased characterization of liver cirrhosis.

Applications of Roman Character Recognition in License Plate
Kim et al. [32] presented license plate recognizer approach. In his work, they recognized the segmented characters by support vector machine (SVM). The character recognition accuracy reported was 97.2 percent.
Wang et al. [33] presented an automatic license plate recognition. They have used invariant rotated characters and orientation had been taken about major axis. In this way after using rotation free character recognition on 102 different car images they reported the accuracy of 98.5 percent.
Pan et al. [34] presented a homogenized technique of structural and statistical methods for recognition of characters. Bayes method and BP neural network method were used to combined these results and achieved the accuracy of 98 percent for hybrid and 97 percent for BP respectively.
Gatos et al. [35] used binarization and enhancement based technique for degraded documents. They used adaptive threshold scheme for this process. Otsu thresholding technique was not tolerable for degraded documents due to display local variance. Very large amount of noise was found in non text areas in Niblack's approach.
Some algorithms contained background noise, thinning and broken of characters. In Kim et al. [32] approach great amount of noise and broken character were presented.
Roy et al. [36] presented a work in which they separated a text and symbols from color graphics by using connected component features and geometrical features [37]. They also worked on Multioriented touching characters in the graphical documents. They used Circular and convex hull ring along with angular information of the contour pixels of the character to make the feature rotation invariant. They reported the recognition accuracy of 97.6 percent by SVM classifier.
Chacko et al. [38] used a wavelet energy feature and extreme learning transform (elm) classifier for handwritten character recognition.
Soora et al. [39] presented a robust technique for license plate character recognition. Angular Width Feature Vector (AWFV) and Geometrical Shape Feature Vector (GSFV) were used and recognized by edit distance metric. They reported the accuracy of 98.6 percent.

Classifiers and Their Algorithms
This section describes the algorithms of the various classifiers such as support vector machine (SVM), naive Bayes, random forest (RF), and artificial neural network (ANN) for character recognition. In general, any classifier consists of two phases: training phase and testing phase. During training phase, a model is constructed and during testing phase, results are predicted by using the model.

Support Vector Machine
Support vector machine (SVM) is a supervised learning algorithm. It constructs a model, that depends on the attributes and labels of the training dataset. The constructed model predicts the labels of the testing dataset. Boser et al. [40] presented a training algorithm that maximizes the margin of decision boundaries. Cortes et al. [41] proposed a method, that plots input vectors to the high dimensional space.

Mathematical Formulation
The two-class classification problem has the following form: where φ (x) is feature-space transformation of the feature vector x of independent variable of data point, w T is the weight vector, and b is the bias. Unknown class for feature vector x of independent variable of a data point is classified as sign of y(x).
The main aim in the model creation is to compute the w T and b. Features of independent variable and target value in the set of training dataset are used in model creation. Let   predicted class(x) = sign(y(x)), where, ∈ training dataset} is the set of support vectors, a m are Lagrange multiplies corresponding to inequality of support vectors, k(x, x m ) is the radial basis function (RBF) kernel defined as where, C > 0 and γ are the kernel parameters.
Multiclass SVM "One-against-one" approach is used for multiclass classification [42]. If k is the number of classes, then k(k − 1)/2 different 2-class SVM classifiers are constructed on all possible pairs of classes. Test points are classified according to the highest number of votes. In our problem, k = 26, as 26 lower case characters are recognized. The block diagram of the algorithm is presented in the figure 2.

Recognition of Degraded and Non Degraded Roman Characters Using Different Classifiers
Algorithm:1 Prediction of unknown labels by using the SVM classifier Procedure: SVM(X, L tr ,Y, 'kernel , γ) Input: X: features set of training dataset, L tr = (L 1 , L 2 , L 3 , .....L m ) : labels of training dataset, Y : features of dataset of character images, whose labels are to be predicted, kernel: kernel function. We have used RBF as defined in (6), γ : kernel parameter, C = 1.

Predict the labels of Y by using the svmpredict [43] as follows
3. Return the predicted labels L pr .

Artificial Neural Network
Rumelhart et al. [44] proposed the widely used backpropagation algorithm. This algorithm is the core part to develop the multilayer perceptron neural network. The fundamental computing unit of the network is called node or neuron or perceptron. Multilayer feed forward neural network (MFFNN) consists of input layer (first layer), hidden layers (second layer, third layer and so on) and output layer (last layer). Each layer is made of structurally identical neurons. The inputs for neurons of input layer is weighted features of feature vector x of independent variable of data point. For other layers, output of every neuron (which is a scalar quantity) in one layer is input for every neuron in the next layer. Neurons are connected by weights (which are scalars). In the output layer, each neuron represents a unique class. Therefore, number of nodes in output layer is equal to the number of classes to be identified. An unknown data-point belongs to that class, which is corresponding to the node with highest output value.
The output O i,1 of ith neuron of input layer (first layer) is given as where, h i,1 (.) is an activation function, w (i,1),(k,0) is the connection weight between ith neuron of first layer and kth element x k of the feature vector x, and |x| is the dimension of x.
The output O i, j of ith neuron of jth layer (other than first layer) is given as where, h i, j (.) is an activation function, w (i, j),(k, j−1) is the connection weight between ith neuron of jth layer and kth neuron of ( j − 1)th layer, N j−1 is number of nodes in the ( j − 1)th layer.
Calculating the connection weights is the step in model creation. Weights are calculated based on the error function E L defined as follows: where, r k,L is the desired response at kth neuron of the output layer. Equation (9) provides the error for one data-point. The total error for training data-set is defined as follows: The gradient descent algorithm is used to optimize the error E L with respect to connection weights [45]. Connection weights are updated as follows: where, δ s for output layer (δ i,L s) are computed as follows: and δ s for other layers are computed as follows: Class of unknown data x is defined as: Algorithm:2 Prediction of unknown labels by using the ANN classifier   4. Get the response matrix RL pr for Y by using the trained model ANNmodel as follows Note that values of RL pr belongs to [0, 1].
5. Get the predicted labels from RL pr by using the function vec2ind of MATLAB2016 as follows: 6. Return the predicted labels L pr .

Naive Bayes Classifiers
Bayesian classifier is a supervised learning algorithm, initially proposed by [46]. Later, this classifier is improved by Langley et al. [47] and Zhang et al. [48].
Naive Bayes classifier is based on minimization of total average loss. Let we have a data-set of W classes. The loss function r j (x), when feature vector x is assigned to class ω j is defined as The class of unknown feature vector x is given as  For 0 − 1 loss condition [45] (L k j 0 if k = j, 1 otherwise (k = j)), the class of unknown feature vector x is given as where, Computation of d j s is the step of model creation. The most challenging task in the computation of d j is subcomputation of the p(.). If we assume that p(.) has Gaussian distribution, then d j s are computed as follows: where, Algorithm:3 Prediction of unknown labels by using the Naive Bayes classifier Procedure: NBC(X, L tr ,Y ) Input: X: features set of training dataset, L tr = (L 1 , L 2 , L 3 , · · · L m ) : labels of training dataset, Y : features of dataset of character images, whose labels are to be predicted, Output L pr = (L 1 , L 2 , L 3 , · · · L n ): predicted labels corresponding to Y .
1. Create a model NBCmodel by using the the function f itcnb of MATLAB2016 as follows NBCmodel = f itcnb(X, L tr ).
Note that we have considered 'normal' distribution.
2. Predict the labels of Y by using the NBCmodel as follows L pr = NBCmodel.predict(Y ).

Random Forest (RF)
In Breiman's [49] random forest classifier creates a set of decision trees from randomly selected subset of training set. It then aggregates the votes from different decision trees to decide the final class of the test object. Basic parameters to Random Forest Classifier can be total number of trees to be generated and decision tree related parameters like minimum split, split criteria etc. In decision trees, the best split among all variables is chosen to split a node where as in RF, the best subset among random predictors is chosen to split a node. The unknown samples are classified using a weighted or unweighted voting of a set of classifiers in the forest. Here, Bagging technique has been used to create the training dataset by randomly choosing samples. A test sample has been classified by assigning the maximum voted class label from all the classifiers and with the predicted class label, the confidence score corresponding to that label has also been predicted. If a training set X = x 1 , x 2 , · · · x n and output are L = l 1 , l 2 , l 3 , · · · l n bagging repeatedly B times selects a random sample with replacement of the training set and fits trees to these samples: For b = 1, · · · B Sample, with replacement, n training examples from X, L; say X b , L b . Train a classification or regression tree f b ; on; X b , L b . After training, predictions for unseen samples x can be made by averaging the predictions from all the individual regression trees on x Algorithm:4 Prediction of unknown labels by using the random forest classifier.
Procedure: RFC(X, L tr ,Y, method, nlearn) Input: X: features set of training dataset, L tr = (L 1 , L 2 , L 3 , · · · L m ) : labels of training dataset, Y : features of dataset of character images, whose labels are to be predicted, method : ensemble-aggregation method. nlearn : number of ensemble learning cycles Output L pr = (L 1 , L 2 , L 3 , · · · L n ): predicted labels corresponding to Y .
2. Predict the labels of Y by using the function predict as follows: 3. Return the predicted labels L pr .

CSM Features
The CSM is used as discriminating functions for the recognition phases applied to binary character images [30]. It is based on primarily the similarity measures between two binary images; a model y stored in the training set, and a sample image x of the size n. The attribution of a sample image character to a single class among other ones (where each class is represented by one or more model images) is performed by using the highest score of similarity between x and y. These two image characters are expressed as n-dimensional binary features vectors as follows: The complementry similarity measure of model(x) and sample(y) [28] is calculated by using following formulla

GLCM Features
A statistical method of examining texture that considers the spatial relationship of pixels is the gray-level co-occurrence matrix (GLCM), After creating the GLCMs, using graycomatrix [18,19], We can derive several statistics from them using graycoprops. Given an input image M × N neighborhood of an input image containing G gray levels from 0 to G−1, gray level co-occurrence matrix P is defined as [P = i, j|d = 1, θ = 0] µ is mean value of P and µ x , µ y , σ x , σ y are means of P x and P y .
Where P x (i) is the i th entry in the marginal-probability matrix obtained by summing the rows of P(i,j) [18] (29) 1. Contrast-Measures the local variations in the gray-level co-occurrence matrix.
2. Correlation-Measures the joint probability occurrence of the specified pixel pairs.
3. Energy-Provides the sum of squared elements in the GLCM. Also known as uniformity or the angular second moment.
4. Homogeneity-Measures the closeness of the distribution of elements in the GLCM to the GLCM diagonal.

Experiments, Results and Analysis
This section discusses the details of the dataset, procedure and five experiments.

Procedure
A general procedure of the experiments is depicted in figure  7. The detail of each block is as follows: Data set: It is the data set of off line optical characters [50]. The

Accuracy
The general formula for calculating the accuracy of classifier is as follows:

Experiment 1: Analysis of results of support vector machine classifier
We studied the impact of (C, γ) on the recognition accuracy of the testing dataset. C is fixed as 1, the ratio r is fixed as 0.9 and γ is varied in the range (0, 1). We observed that change in the testing accuracy with respect to γ is very less for low range values of the γ. However, the best testing accuracy of 78.9% is achieved at γ = 0.032. The testing accuracy significantly falls down for high range values of the γ. Graph of accuracy versus γ is shown in figure 9. We recorded the least testing accuracy of ≈ 35% at γ = 0.998.
The effect of ratio (r) on the testing accuracy is provided in the table 3. The accuracy values are corresponding to (C, γ) = (1, 0.019). Accuracy value of each character for different r values is provided in the table. The maximum average accuracy is 78.2% for r = 0.9 (90 : 10) and the least average accuracy is 38.4% for r = 0.1 (10 : 90). The best accuracy of thirteen characters is 100% for r = 0.9 and the least accuracy of two characters is 0% for r = 0.1.  We studied the impact of number of hidden nodes on the recognition accuracy of the testing dataset. The number of hidden layers is fixed as one, and the ratio r is fixed as 0.8. We observed that change in the testing accuracy with respect to number of hidden nodes (N 2 ) in the first hidden layer is zigzag. However, the best testing accuracy of ≈ 84% is achieved Recognition Accuracy Using Naive Training Testing Figure 13. Recognition accuracy of training and testing using Naive Bayes at N 2 = 70. Graph of accuracy versus N 2 is shown in figure 11. We recorded the least testing accuracy of ≈ 50% at N 2 = 10. The effect of ratio (r) on the testing accuracy is provided in the table 4. The accuracy values are corresponding to one hidden layer and N 2 = 31. Accuracy value of each character for different r values is provided in the table. The maximum average accuracy is 78.57% for r = 0.8 (80 : 10) and the least average accuracy is 37.6% for r = 0.2 (20 : 80). The best accuracy of individual character is 100% for r = 0.9, 0.8, 0.7, 0.6, 0.5 and 0.3. The least accuracy of individual character is 0% for r = 0.1.
The effect of ratio (r) on the training and testing accuracies is provided in the figure 12. The figure is corresponding to one hidden layer and N 2 = 31. Training accuracy is almost constant with respect to r. Testing accuracy improves with respect to r upto r = 0.8. The difference between testing accuracy and training accuracy is minimum at r = 0.8. The testing is less than training accuracy. The best training accuracy of 93.42% is achieved at r = 0.9, and the least training accuracy of 83.33% is achieved at r = 0.1.

Experiment 3: Analysis of results of Naive Baye's classifier
The effect of ratio (r) on the testing accuracy is provided in the table 5. The maximum average accuracy is 65.36% for r = 0.7 and the least average accuracy is 26.4% for r = 0.1. The best accuracy of seven characters is 100% for r = 0.9 and the least accuracy of individual character is 0% for r = 0.1, 0.8, 0.9.
The effect of ratio (r) on the training and testing accuracies is provided in the figure 13. Training accuracy decreases with r, while testing accuracy improves with respect to r. Testing accuracy is less than training accuracy. Testing accuracy converges to training accuracy with r. The best training accuracy of ≈ 100% is achieved at r = 0.1 and the least training accuracy of ≈ 74% is obtained at r = 0.9.

Experiment 4: Analysis of results of random forest classifier
The effect of ratio (r) on the testing accuracy is provided in the table 6. The maximum average accuracy is 82.05% at r = 0.9 and the least average accuracy is 52.1% at r =

412
Recognition of Degraded and Non Degraded Roman Characters Using Different Classifiers The effect of ratio (r) on the training and testing accuracies is provided in the figure 14. Training accuracy is almost constant with respect to r. Testing accuracy improves with respect to r. Testing accuracy is less than training accuracy. Testing accuracy converges to training accuracy with r.

Experiment 5: Analysis of results of noisy dataset
In this experiment, we have studied the effect of impulsive noise on the testing accuracy. We have used the parameters corresponding to the best observed testing accuracy. The used impulsive noise level for testing dataset is 15%, 25%, 50% and 75%. The SVM is studied at (C, γ) = (1, 0.019), and r = .9. The ANN is studied at N 2 = 31 and r = 0.8. The naive Bayes is studied at r = .7 and random forest is studied at r = .9. The observed accuracies with noise level are presented in the figure 15 and table 2. The accuracy of each classifier falls down with noise level as expected. Further, few important observations are as follows: • The random forest and the artificial neural network have almost same accuracy. In the low range levels of impulsive noise, random forest and the artificial neural network outperforms.
• The SVM outperforms in the high range levels of impulsive noise.
• The accuracy of the Naive Bayes is the least except in the mid range levels of the impulsive noise.

Conclusions
In this paper, four different classifiers viz SVM, Naive Bayes, random forest and neural network are studied for classification of offline alphabetic characters. Experiments have been done on clean dataset and impulsive noise dataset. The CSM and GLCM features have been extracted from the dataset. The experimental results show that, on clean dataset, random forest classifier provides the highest accuracy of ≈ 84% and Naive Bayes classifier provides the least accuracy of ≈ 65.38%. SVM and artificial neural network classifiers have comparable accuracies of 78% on the clean dataset. Random forest outperforms on the clean dataset. For noisy dataset, the random forest and the artificial neural network have almost same accuracy. In the low range levels of impulsive noise, random forest and the artificial neural network outperforms. However, the SVM outperforms in the high range levels of impulsive noise.