Identiﬁcation of Motor Imagery Movements from EEG Signals Using Automatically Selected Features in the Dual Tree Complex Wavelet Transform Domain

The decoding of human brain electrical functions by electroencephalogram (EEG) signal is the most important step in brain computer interface (BCI) based systems. So, in this paper, an automatic feature selection method has been proposed to classify imagery left and right hand movements from the EEG signals in the Dual Tree Complex Wavelet Transform domain. First, the EEG signals are decomposed into several bands of real and imaginary coefﬁcients and then, some statistical features like Shannon entropy and variance have been calculated. These features are combined into a single feature space and after that optimal features have been selected automatically imposing some feature selection criteria from this combined feature space. The selected features have been shown to be promising to distinguish different kinds of EEG signals by statistical hypothesis testing (e.g., one way ANOVA) as well as graphical analysis (e.g., scatter plots, box plots). Finally, k-nearest neighbor based classiﬁers are developed using these selected features to identify left and right hand imagery movements. A mean accuracy of 90.00% is achieved in publicly available BCI competition II Graz motor imagery data set which is shown to be better than some existing techniques.


Introduction
Brain computer interfacing (BCI) allows to control and operate computer aided systems by intent alone. The major objective of BCI is to assist disable people for their rehabilitation. BCI involves detection, analysis and classification of different types of motor imagery movements to implement real time control and communication. Electroencephalogram (EEG) signals are often used for BCI purpose since it can be implemented as a non-invasive system [1]. Analysis of EEG signals to decode human brain activity is essential for the implementation of BCI systems.
There are several categories of EEG-based BCI such as limb motor imagery classification [2], continuous arm movements direction detection [3], individual finger movement decoding [4], P300 evoked potential based character recognition [5] etc. One major category of BCI is the detection of motor imagery movements such as left and right hand movements. Various methods have been developed in the literature for classifying different types of arm movements. A waveletbased common spatial pattern algorithm using low frequency features and Fisher linear discriminant classifier is developed to classify fast and slow hand movements in [6]. In [7], filter bank common spatial pattern is implemented with mutual information based feature selection and for the identification task, Naive Bayesian Parzen Window classifier is used. Wrist movement classification has been done by extracting gamma band features from wavelet packet transform and employing radial basis function classifier in [8]. Separability of EEG signals using adaptive auto regressive parameters is proposed in [9]. Time-frequency optimization is performed to classify left and right hand movements with reduced electrodes in [10]. For utilizing cross-channel dependency in BCI, multivariate empirical mode decomposition based classification method is presented in [11].
Since EEG data for research purpose can be acquired with varying experimental setup and conditions, BCI competition was held providing standard data sets to evaluate and compare different algorithms. The standard data sets were proved to be representative in motor imagery and were suitable for BCI research. Different approaches have been studied to classify motor imagery movements using the BCI competition II Graz motor imagery data set. Band passed EEG signals and power spectral density based linear discriminant analysis (LDA) is described in [12]. In [13], raw EEG signal incorporated with Hidden Markov Model (HMM) was presented by the same author. Adaptive Auto Regressive (AAR) model based features with Bayesian Graphical Network (BGN) and Multi Layer Perceptron are reported in [14]. Morlet wavelet is used to extract features from mu rhythms with Bayes quadratic classifiers in [15]. Wavelet coefficient based statistical features and fuzzy support vector machine (FSVM) classifier is described in [16]. Most recently, discriminative area selection method is implemented with fuzzy Universal Journal of Biomedical Engineering 3(4): 30-37, 2015 31 Hopfield neural network (FHNN) classifier in [17].
The aim of this study is to assist developing BCI systems through identifying imagery hand movements by automatically extracting suitable features from EEG signals in the dual tree complex wavelet transform (DTCWT) domain. DTCWT has been widely used for image and video processing ( [18], [19]), and recently, it has been used in the area of bio-medical signals ( [20], [21]). But to the best of our knowledge, the DTCWT is being employed for the first time in the classification of imagery hand movements in conjunction with automatic feature selection. Since motor imagery movements occur in the low frequency EEG bands [22], to perform detail analysis on specific bands, DTCWT has been used as it is a rich level analysis tool than traditional discrete wavelet transform (DWT). The product of Shannon entropy and signal variance obtained from the various DTCWT bands are used as features. The ability of this value in discriminating imagery hand movements is demonstrated using various types of scatter plots, box plots and one way ANOVA analysis. An automatic selection method based on J 3 criteria is proposed for selecting the optimum feature vector. Left and right hand imagery movements are classified using these features employing a number of classifiers. The performance of these classifiers in detecting hand movements is obtained and compared with those of several existing techniques.

Description of the EEG database
The BCI competition II data set (motor imagery III) provided by Technical University of Graz has been used in this paper. This data was acquired from a normal subject while the subject sat in a chair with armrests and was trying to control a feedback bar by making imagery movements of left or right hands. Left and right cues were in random order [23].  The experiment consists of 7 runs with 40 trials each. During each trial at t=2s an acoustic stimulus indicates the beginning of the trial and a cross '+' was displayed for 1s. After this an arrow (left or right) was displayed at t=3s as the cue. At the same time the subject was asked to move a bar into the direction of the cue which was controlled by adaptive auto-regressive parameters of channel C3 and C4. The EEG signal was filtered between 0.5 and 30 Hz while the sampling rate was 128 Hz. A detail description of the experimental set up can be found in [24]. Fig. 1 shows the timing scheme of the experimental setup while Fig. 2 presents the electrode/channel positions.

Dual Tree Complex Wavelet Transform
Dual tree complex wavelet transform is a recent enhancement to the discrete wavelet transform which has additional properties including nearly shift invariant and directionally selective in two and higher dimensions [25]. DTCWT is 2 d times redundant for any d dimensional signal as compared to DWT and offers directional information in six directions. Thus is it more efficient in time frequency localization of EEG signal.
Similar to positive/negative post-filtering of real subband signals, the idea behind dual tree approach is quite simple. DTCWT employs two real DWTs where the first DWT gives the real part of the transform while the second DWT gives the imaginary part. The analysis filter bank structure used to implement DTCWT is given in Fig. 3. Two real wavelet transforms use two different sets of filters which satisfy perfect reconstruction conditions. If square matrices H h and H g denote the two real DWTs, then the DTCWT can be represented as follows: The inverse transform of H is given as, If the vector x represents a real signal, then w h = H h x represents the real part and w g = H g x represents the imaginary part of the DTCWT. When the dual-tree CWT is applied to a real signal, the output of the upper and lower filter banks gives the real and imaginary parts of the complex coefficients respectively.

32
Identification of Motor Imagery Movements from EEG Signals Using Automatically Selected Features in the Dual Tree Complex Wavelet Transform Domain Among several available wavelets, Farras wavelets [26] are used to perform DTCWT. Fig. 4 represents the EEG signals from C3 channel for left and right hand imagery movements and the corresponding second level DTCWT real and imaginary coefficients.

Analysis in Dual Tree Complex Wavelet Transform Domain
The proposed method consists of first decomposing the EEG signal of each trial with dual tree complex wavelet transform into three levels and then extracting suitable features from different bands. After extracting suitable features, optimal features are selected automatically using J 3 criteria and finally kNN classifier is deployed for classifying hand movements.

Feature Extraction in DTCWT Domain
Since BCI competition II Graz data set was recorded with low pass filtering of 30 Hz, no preprocessing was needed to discard the unnecessary high frequency components. As discussed in [22], motor imagery activity occurs in low frequency EEG band signal. So to have in-depth view, wavelet transform is applied to the acquired EEG signal. Since forward transform of DTCWT gives two branches containing real and imaginary coefficients, it gives a rich way of analyzing EEG signals than DWT. The experiment was carried out by taking feedback from C3 and C4 channel while making Cz the reference. As a result, by applying DTCWT, EEG signal acquired from both C3 and C4 channel for a single trial is decomposed into three levels. If the original low passed signal is denoted by X which has 0.5 to 30 Hz frequency components, after first level decomposition, it provides Y 1 (16-30 Hz) and Z 1 (0.5-15 Hz). After second level decomposition, Z 1 leads to Y 2 (7.5-15 Hz) and Z 2 (0.5-7.5 Hz). So after three levels of DTCWT, the four frequency parts are Y 1 (16-30 Hz), Y 2 (7.5-15 Hz), Y 3 (3.75-7.5 Hz) and Z 3 (0.5-3.75 Hz). Reconstructions of these components using the inverse DTCWT approximately correspond to the physiological EEG sub-bands delta, theta, alpha and beta respectively [27]. Since each frequency band gives both real and imaginary coefficients, we have total 4 real and 4 imaginary coefficient bands for each channel. From now on, the real and imaginary bands and associated levels will be denoted as RB x and IB x where 'x' is the level index.
The underlying dynamics of EEG signals is spread over various sub-bands in the frequency domain. To classify motor imagery movements, we need to extract more information in lower frequency bands of EEG signals and mu rhythms [15]. For this purpose, variance and Shannon entropy in different bands have been extracted as features.

Variance
Variance of a distribution is a measure of how widely values are dispersed from the average or mean value. It is the average squared distance between the mean and each item in the distribution. Variance is denoted by σ 2 where σ is defined as: where N is the number of data points in a distribution, x i is the ith sample of the distribution,x is the mean of N samples.

Shannon Entropy
Entropy is a measure of randomness which is a common concept in signal processing. Wavelet entropy can provide useful information about the underlying dynamical process associated with the signal [28]. The entropy E must be an additive cost function such that E(0) = 0 and E(S) = ∑ i E(S i ), where S is the signal. In our study most common Shannon entropy has been used.
The Shannon entropy is defined as [29] where S i are the coefficients of signal S in an orthonormal basis. However, since variance and Shannon entropy both do not have good p-values individually in one way analysis of variance (ANOVA), they do not posses the ideal feature quality separately to discriminate left and right hand imagery movements. Also by using two features from each band, a total of 4 × 2 × 2 × 2 = 32 features are needed to be used for four levels from two channels which has very large search space for optimal feature selection. To overcome these problems, we have defined a combined feature space as the product of Shannon entropy and signal variance. This new feature space has two advantages. One is the separability of this combined feature space between two classes is better than the individuals (very small p-values of one way ANOVA analysis ensure this). The second advantage is that the length of feature vector is now reduced to half compared to separately using entropy and variance and thus keeping the search space smaller. So, the proposed method extracts features from Shannon entropy×variance plane for all four levels (i.e., eight bands) from two channels.

Automatic Feature Selection
Since the four level DTCWT of the EEG signal leads to eight coefficient bands for each channel, a total of 16 features are available from two channels. But how can we select the best feature combination? To combat this problem, we have used J 3 criteria [30] for selecting optimum feature combination. J 3 is a cost function for class separability measurement defined as Here S w is the within class scatter matrix and S m is the mixture scatter matrix defined as where, P i is the priori probability of i th class, M denotes total class number and S i is the corresponding co-variance matrix given by where µ µ µ i is the mean of each class while µ µ µ 0 is the global mean. These two means are related by Sometimes between class scatter matrix S b is used instead of S m in (5). For M class problem, between class scatter matrix can be calculated as J 3 criteria has the advantage that it is invariant under linear transform [31]. For a feature vector with a particular number of features, the feature elements yielding maximum value of J 3 are selected as the elements of the optimal feature vector. Here from available 16 features, 4 optimal features have been chosen automatically. The four features found using J 3 criteria are RB 3 (C3), RB 4 (C3), RB 2 (C4) and IB 3 (C4)  Table 1 shows the best possible features for a 4 element feature vector selected via the J 3 criterion and the corresponding p-values of one way ANOVA analysis. The hypothesis about the p-values is that the value, p<0.05 indicates that at least one sample mean is significantly different than the other sample means statistically [32]. From Table  1, it is clear that these four features have very small p-values which indicates they can be used as good features.
Apart from one way ANOVA analysis, scatter plots and box plots are provided to further illustrate the classification quality of the features. Fig. 5 represents the scatter plots of  RB 3 and RB 4 for C3 channel, respectively whereas Fig. 6 presents the scatter plots of RB 2 and IB 3 for C4 channel, respectively. In Fig. 5 and Fig. 6, green hexagon and red square markers indicate feature values during left and right hand imagery movements respectively. The green and red markers have significantly different values indicating their variation during different imagery hand movements which states that they can be used as good features. The scatter plots, box plots and the p-values of one way ANOVA indicates that RB 3 (C3), RB 4 (C3), RB 2 (C4) and IB 3 (C4) extracted using J 3 feature selection criteria from combined feature space of variance×Shannon entropy have distinguishable values for left and right hand motor imagery movements. In other words, the features have good betweenclass distance and small within-class variance in the feature vector space [31] and as a result they can be used as very good features to identify left and right hand motor imagery movements from EEG signals.

Classification Using kNN Classifier
k-Nearest Neighbors algorithm (kNN) is a non-parametric learning algorithm method used for classification. Among the various methods of supervised statistical pattern recognition, the Nearest Neighbor rule achieves consistently high performance, without a priori assumptions about the distributions from which the training examples are drawn [33].
In order to classify a sample trial vector X which has unknown class, kNN classifier ranks the sample trial's neighbors among the training trial vectors and uses the class labels of the k most similar neighbors to predict the class of the new test trial [34]. The classes of these neighbors are then weighted according to the similarity of each neighbor where the similarity index is the cosine value between two document vectors of Euclidean distance. The cosine similarity index is defined as where X is the test or unknown trial; D j is the j-th training trial; t i is shared by both X and D j ; d ij is the weight for t i in training sample D j whereas x i is the weight for t i in X. The l 2 norm of X is defined as The number "K"decides how many neighbors influence the classification. If k = 1, then the algorithm is simply called the nearest neighbor algorithm.

Experimental Results
The EEG data set has 140 trials each for left and right hand (total 280 trials) of 9 seconds length. Since the cue was given at t=3 sec, data segment after 3 seconds from C3 and C4 channels are used for classification. Only four features have been used to form the final feature vector. The train and test feature matrix has dimension of 140×4 which is fed to kNN classifier. The experiment was carried out using MATLAB 2013b [32] on Windows-7 32 bit platform having 1 GB RAM and 2.93 GHz Intel Core 2 Duo processor. Both training and testing data sets have been used in leave one out cross validation method. The accuracy has been calculated using the following equation:

Accuracy =
Correctly classif ied EEG epochs T otal EEG epochs × 100% First, the mean accuracy of the classification employing kNN classifier with different values of "K" is shown in Fig.  11. Here, the elements in the feature vectors are obtained by using the J 3 criterion. Note that, the best accuracies are obtained for K= 13, 9, 16 and 16 for feature vectors with 4, 5, Universal Journal of Biomedical Engineering 3(4): 30-37, 2015 35 6 and 7 elements, respectively (If the same accuracy occurs, only the lowest "K" value is selected). The distance parameter of the kNN classifier is selected as Euclidean distance [32].   Table 2 presents the feature elements and the corresponding best mean accuracy values of different feature combinations found by J 3 feature selection approach. If only two features are selected using J 3 criteria then the accuracy is very satisfying 87.86% which indicates that the combined feature space has significant distinguishable property for left and right hand imagery movements. When total number of selected features are 3, 4 and 5, then the mean accuracy is highest 90%. We have used four features for final comparisons (3 or 5 features can also be used similarly).    Table 3 provides the performance evaluation of different classifiers with varying parameters which indicates that kNN classifier with Euclidean distance gives better accuracy than others. Finally Table 4 compares the mean accuracy of the proposed method with several other methods. It can easily be concluded that using simple kNN classifier, our method provides good mean accuracy to classify left and right hand motor imagery movements than all the methods listed in table 4. One may ask about the gain in performance achieved due to the use of J 3 criterion. Table 5 provides the mean accuracy values obtained by using 4 element feature vectors with various combinations of the elements. It also shows the mean accuracy obtained by using the feature vector whose elements are automatically selected by the J 3 criterion. From Table 5, it is clear that the mean accuracy obtained by using J 3 selection criterion is significantly higher than other combinations listed in the table and thus justifying the use of J 3 criterion.

Conclusions
In this paper a comprehensive method has been proposed to distinguish the left and right hand motor imagery movements which offers a promising support for an important application in BCI. EEG signals have been successfully classified by extracting features from a combined space formed by variance and Shannon entropy of EEG signals in the dual tree complex wavelet transform domain. Optimal combination of features have been selected automatically using J 3 criteria and justification of using these features has been provided with a number of scatter plots, box plots and one-way ANOVA analysis. Among various types of classifiers like SVM, kNN, Bayes classifier and LDA, kNN provides the highest accuracy of 90.00%. Finally the performance has been compared with several other recent methods available in EEG based BCI literature and shown to be superior to the others.