Comparison of Artificial Intelligence Methods on the Example of Tea Classification Based on Signals from E-nose Sensors

The data collected from electronic nose systems are multidimensional and usually contain a lot of redundant information. In order to extract only the relevant data, different computational techniques are developed. The article presents and compares selected pattern recognition algorithms in application to qualitative determination of different brands of tea. The measured responses of an array of 18 semiconductor gas sensors formed input vectors used for further analysis. The initial data processing consisted on standardization, principal component analysis, data normalization and reduction. Soft computing one can divide into single method systems using neural networks, fuzzy systems, and hybrid systems like evolutionary-neural, neuro-fuzzy, evolutionary-fuzzy. All the presented systems were evaluated based on accuracy (generated error) and complexity (number of parameters and training time) criteria. A novel method of forming input data vector by aggregation of the first three principal components is also presented.


Introduction
A semiconductor gas sensors based on metal oxides are very popular due to their advantages: high sensitivity, low cost, small dimensions, low power consumption and easy integration with measurement circuit. Their main drawbacks are small selectivity, drift, lack of long term stability and sensitivity to other parameters, i.e. temperature changes, water vapour presence in gas atmosphere, etc. [8]. The reactions between the sensitive layer and gas atmosphere are thermally activated and sensitivity to specific gas depends on the temperature. In the simplest case the only signal obtained from the resistive type sensor is usually its resistance measured at constant temperature. Due to this fact and because of sensors cross-sensitivity and small selectivity a strict determination of gas type and its concentration in unknown mixture is usually impossible.
In order to solve this problem one can use different approaches. One of them is based on the gathering and processing of signals obtained from arrays of partially selective sensors [4,11]. The sensitivity and selectivity of a sensor array can be highly enhanced by developing various pattern recognition algorithms [3,14].
A promising and quite novel approach is the analysis of the dynamic response of a single sensor working at modulated temperature. In such a case, one sensor is equivalent to an array of sensors working at different temperatures. For sensors powered with a pulse voltage the average power consumption decreases and also the long-term stability of the sensor is improved [9,13,20]. The intentional temperature change according to a programmed profile can provide additional information consisted in time dependent non-linear sensor response. Such response is related to the adsorption and desorption processes at the semiconductor surface and is influenced by the concentration of the gas species and their chemical structure. These properties are used in gas sensor arrays working in electronic noses, where a multidimensional and non-linear responses are measured and analysed.
The collected information is usually too complex and impossible to analyse without any data processing system. This is a problem similar to the image analysis [25]. The procedure of feature extraction is typically 20 Comparison of Artificial Intelligence Methods on the Example of Tea Classification Based on Signals from E-nose Sensors performed by the standard pattern recognition methods used in chemometrics (i.e. PCA -Principal Component Analysis, CLA -Cluster Analysis, TM -Template Matching, DFA -Discriminant Function Analysis, TMLR -Transformed Multiple Linear Regression) [12] or signal processing (i.e. Fourier or wavelet analyses, ANN -Artificial Neural Networks, GA -Genetic Algorithms, FL -Fuzzy Logic, etc.) [6,27].
As electronic nose (e-nose) is the technical system consisting of both the hardware and software [7]. A hardware part is an array of sensors responsible for detecting odours present in the measured gas atmosphere. The software is created by advanced algorithms that process information about volatile organic compounds (VOCs) and make the final analysis. Just as a biological analog, the e-nose is used for identification and classification of gas mixtures and rarely their concentrations [16]. The enose systems based on sensor arrays are frequently used in qualitative analyses of different species, i.e. of milk [2,29], alcohols [1,30] and tea [5,10,21,28]. Figure 1 shows three stages of signal processing from measurements of odours to the final analysis using AI methods.
Step I -Measurements Step II -Data manipulation Step III -Analysis Odour  An essential part of the commercial e-nose devices is data processing unit data analysis system that performs data preprocessing, reduction and analysis. Algorithms employed should be relatively simple and not computationally demanding. In order to extract the relevant data only, minimize the computation time and prepare algorithms the best suited to be embedded in a hardware, a lot of different approaches are developed.
In this paper, the authors focused on a comparison of selected intelligent systems [26] based on computational intelligence, used to analyse the signals from the sensors array in order to classify brands of tea. The authors have developed several data analysis systems. To collect the data a commercial e-nose was used. The software was written in MATLAB 1 , as it contains very extensive libraries of functions using AI methods. 1

Measurements
The measurements were performed on commercially available e-nose from Alpha MOS company -model FOX 4000 [15]. It consists of an array of 18 semiconductor gas sensors. E-nose array consists of three types of metal oxide semiconductor sensors [17-19]: • type T -SnO 2 active layer; • type P -also based on SnO 2 ; higher sensitivity and faster response time than T-type, • type LY -based on Cr 2−x Ti x O 3+y (p-type) and WO 3 (n-type); LY are low power sensors.
Tea specimen means the vial broadcast with tea directly from the package. All tea specimens were heated before measurement for 5 min at 60 • C. The volume injected into the measuring chamber of the gas mixture was 0.5 ml. Measurement of each specimen of tea lasted 121 seconds. Sampling was performed every second, and single sample contains information about a resistance value of 18 sensors. A set of 121 samples was performed. In effect, for a single specimen of tea we obtain a matrix, consisting of 121 rows corresponding to subsequent samples, and 18 columns corresponding to the responses of the sensors. Table 1 shows the matrices in which the data was collected. The object of this study were data obtained for different brands of tea. For every kind of tea a number of data sets (from 5 to 25, see Table 3) was recorded and as a result we obtained 85 input data sets. After data selection process, described in next paragraphs, 10 brands of tea were taken into consideration: 1 -melissa; 2 -melon-apple; 3,4,5,6 -mint of 4 kinds; 7 -chamomile; 8 -vanilla; 9,10 -ginger of 2 kinds. As a result of the selection, some data was rejected and two brands of tea (mint and ginger) were divided into subtypes because they formed groups with tea specimens that have similar responses.

Data manipulation
The first stage of data manipulation was preprocessing of measurement data: standardization, principal component analysis (PCA), normalization and data reduction.

Preprocessing
All of the preprocessing stages performed on a single specimen of tea are described in Table 2 and shown in Figure 2. As a result 85 vectors build of 36-elements were obtained. A part of them -after data selection -were used as input data for all the developed and evaluated AI systems.

Data selection
Due to some errors and inaccuracies observed in sensor responses, caused by i.e. improper dosing of the tea specimen, the data selection is necessary. The incorrect measurements were rejected and two kinds of tea were divided into smaller subgroups. Results after selection  Figure 2. The visualization of preprocessing stages for single specimen of tea. A -measured sensors response vectors, each color represents separate response; B -signals after the standardization; C -after PCA, each PC is presented with a different color; Dfirst three PCs; E -first three PCs -every six sample left; F -after aggregation of PC1 to PC3; G -signal after normalization; H -signal after reduction of samples -every second sample left; I -data for all specimens of tea after preprocessing, the specimens of tea corresponding to the same species of tea have the same color in graph (6 colors, 6 species of tea).

22
Comparison of Artificial Intelligence Methods on the Example of Tea Classification Based on Signals from E-nose Sensors  are shown in Figure 3, where visual grouping of different kinds of tea is observed. Figure 3 presents the separation of preprocessed data, but as it can be seen, it is rather difficult to interpret. Therefore, other methods showing the grouping of data were used. One of them is PCA that converts a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components (PC). On the graph of PC1 vs. PC2 one can observe grouping of similar variables. Figure 4 which is equivalent to Figure 3 shows the grouping for all kinds of tea before and after selection. Table 3 provides a detailed description of the data, covering raw data, preprocessed data, reference matrices (matrices with model answers) and data selection. For the reference matrices the expected response should be understood as the correct brand of tea.

Training and test data sets
Intelligent systems learn on examples how to analyse input signals. Due to this fact one has to divide all of the elements into training data set used for learning and test data set used for evaluation of the system. In order to make properly this assignment, we draw the data and judge visually whether the elements are grouped or separated into classes (tea brands).

24
Comparison of Artificial Intelligence Methods on the Example of Tea Classification Based on Signals from E-nose Sensors   T1  T2  T3  T4  T5  T6  T7  T8 T9 T10  T1  T2  T3  T4  T5  T6  T7  T8 T9 T10 tea in comparison to one chosen kind of tea -mint (T3). As we can see there is an example of good separation of T3 tea from others. Data of this kind are relatively easy to classify for data analysis systems.
Similarly, figure 6 show signals obtained for specific brands of tea in comparison to mint tea (T5). As we can see there is an example of almost no separation of T5 tea from others. The signal for T5 tea is very similar to signals form T2, T8 and T9. Data of this kind are very difficult for classification systems.
After selection, the data have been carefully divided for the training set and test set. Figure 7 shows the data for the a.m. sets. Note that the elements (data vectors related to specimens of tea) of the same kind of tea are grouped close together and are marked with the same color. Some groups of elements are clearly separated, while others overlap. The elements used to test the systems are within a set of elements used for learning systems. Shapes of signals in training and test data sets (for a given species of tea) are similar.

Data analysis systems
The data contained in the preprocessed response vectors are fed into the next stage of recognition of odour -data analysis. It involves the assignment of the input data to appropriate classes. Such classification is performed using data analysis systems based on the methods of AI.
The advantage of techniques based on artificial intelligence (AI) methods lies in the properties inherited from their biological equivalent, such as learning and generalization of knowledge (ANN [23]), a global optimization (evolutionary algorithms) and the use of imprecise concepts (FL) [26]. Very high popularity of ANN in recent years has contributed to a number of different types of structures, data flow, type of transfer function of neurons and the method of training. When applying ANN to obtain maximum efficiency, the designer must demonstrate relevant experience realized in the selection of relevant parameters.
The authors designed the data analysis systems that use both the single methods of AI and/or their combinations [22,24]. Below are briefly characterized. A several types of ANN were evaluated and the results of their calculations were compared. -Neuro-fuzzy system (ANFIS) -Sugenotype fuzzy system transformed into an equivalent neural network, whose parameters are optimized using the method of backpropagation -Evolutionary-fuzzy system (FUZZY + GA) -Mamdani-type fuzzy system with parameters optimized using a genetic algorithm The most important parameters of these systems are presented in Table 4. For the ANN are: topology, the transfer function of neurons in successive layers and training algorithm. For the fuzzy system are: the type with the number of inputs -outputs in brackets and the number of inference rules. For the GA are: the number of individuals, the probability of crossover and mutation, the number of generation and the method of scaling and selection. For overall parameters are: the number of elements of the training and test sets, the number of variables (parameters determined during the training -such as weights and biases or parameters of membership functions in the premises and conclusions), and the number of correctly classified brands of tea.

Probabilistic neural network
Among all the tested systems the probabilistic neural network (PNN), received the best score in the evaluation. Its structure is shown in Figure 8. The results of calculations of the PNN system are presented in Figures  9 and 10. Table 5 presents its parameters. PNN consists of input, hidden and output layers. Each neuron of hidden layer (with radial transfer function) corresponds to one element from the training set.  In the output layer each recognized class corresponds to one neuron (with competition transfer function). The only parameter influencing the learning process of PNN is smoothing coefficient. It represents the radial deviation of appropriate Gaussian functions. Its modification affects the range of influence (scope of impact) of "knowledge" contained in the training set on space of input signals. The PNN system learns quickly, its disadvantage is the size. This is due to the fact that this type of network consists of neurons corresponding to all the elements of the training set. This results in mapping of the entire training set in PNN structure.
The example of matching the PNN response to the model response obtained for a test set are presented in Figure 9. The accuracy of the system is related to the slope of the fitting line and to the shift of the response points. The ideal line has a slope of 45 degrees and the points of test and model response match. The example of classification results obtained for PNN for the test set are shown in Figure 10. The proper classifications are marked with green color and are placed on diagonal, improper are in red. The test dataset consisted of 18 elements (1 test specimen of T1, T5, T6, T9, T10, 2 specimens of T4 and T8, and 3 specimens of T2, T3, T7). The data inside the boxes are the results obtained for the system. Upper value is the number of classified test specimens, below is placed corresponding percent value, i.e. the test set contained 3 specimens of T3 (16.7% of all test specimens). All tea specimens were properly classified (100% in the blue box in bottom right corner, 28 Comparison of Artificial Intelligence Methods on the Example of Tea Classification Based on Signals from E-nose Sensors     all tea specimens are in green boxes on diagonal).

Evaluation criteria
A comparison of data analysis systems was performed on the basis of accuracy (AC) and complexity criterions (CC). The AC covers 60% of the total evaluation and  CC covers 40%. The results were collected from 30 completed analyses of each of the systems, except from systems trained with GA, which require long calculation time or ANFIS system for which the subsequent results were identical. All results were normalized in such a way that the worst score in a group of systems is assigned to 0% of the grade, and the best 100%. Assessment values were normalized by the formulas 1 and 2: where: y + -normalized rating of the system for a given parameter x; used if the rating increases with the parameter x, y − -normalized rating of the system for a given parameter x, used if the rating decreases with the parameter x, x -a value of a parameter obtained by the current system evaluated, x max -a maximum value of a parameter, obtained from all the analyses, x min -a minimum value of a parameter, obtained from all the analyses During the process of designing the systems the emphasis was placed on the criterion of accuracy, so the main goal was to achieve the lowest error. When the minimal error was reached, the overall number of system parameters was decreased.
The analyses concerned 10 systems evaluated on the basis of three parameters for accuracy criterion, and three parameters for complexity criterion. The evaluation was made on the basis of systems efficiency (it tells how well different brands of tea were classified). It was calculated in Matlab as a number of correct classifications of the pattern in relation to all classifications calculated for the entire test set, according to the formula 3: (3) where: δ -percentage of correctly classified patterns for the entire test data set, N -number of tea specimens in the test data set, The winning pattern (kind of tea) was based on the intermediate results indicated in accordance with the simple rule "winner takes all" (i.e. out of: 0,7; 0,2 and 0,1 it is 0,7 that wins and is transformed to 1,0). After normalization, the values of all a.m. parameters were multiplied by their respective weights. The individual components of evaluational criteria, together with their corresponding weights are described below.
• Accuracy criterion (60%) -evaluated system error -  Figure 11. Results of qualitative analysis of systems from Table 6. Table 6. The comparison of results obtained for the developed systems. Normalized values (y + and y − ) are given in parentheses. The parameters WA1, WA2 and WA3 are calculated using normalized values of y + and y − . The best results are marked green and the worst red.  Figure 11 show the results of the evaluation of developed systems.
As one can see from table 6 systems based on radial neural networks exhibit the best classification of tea specimen patterns: PNN (WA3 parameter value -91.68%) and RBF (WA3 -91.64%). Although both systems have WA3 of similar value, the system based on PNN is marginally better than that based on RBF due to the smaller number of variables -biases, occurring in the last layer of the network (parameter CC1: 2256 vs. 2266). The winning systems have high efficiency in classifying the kinds of tea (over 90% of score). The systems based on radial neural networks (PNN and RBF) and fuzzy logic (ANFIS and FUZZY) obtained the best results in accuracy criteria, as they always classified all kinds of tea perfectly. The corresponding parameters AC1, AC2, AC3 and WA1 are equal to 100%. All a.m. systems have also the shortest training time (CC2 and CC3 values). The systems based on FL yielded worse marks resulting from the imposed priority on the accuracy criterion -they have an enormous complexity (CC1: 5232 and 6188). The best system according to complexity criteria was LVQ based on neural networks. The high value of parameter WA2 = 98.54% is connected with small number of variables (CC1 -294). LVQ classified properly only 6 of 10 kinds of tea. FF network exhibits greater stability during learning and greater accuracy than RNN (AC2: 72.22% vs. 67.21% and AC3: 33.33% vs. 23.33%), with lower complexity due to a decreased number of variables and lower computation time (CC1: 762 vs. 1018 and CC2: 20.93 vs. 59.73). The worst results exhibit systems based on ANN, both FF (WA3 -46.69%) and RNN (WA3 -48.97%), trained only with GA, probably because of too small training set. Both systems (FF and RNN + GA) characterize also the longest training time (CC2: 508.5 and 587, CC3: 292 and 300). The system using RNN + GA exhibits greater stability during learning and greater accuracy than FF + GA (AC2: 100% vs. 96.36% and AC3: 100% vs. 70%), at the expense of increased complexity due to increased number of variables and longer computation time (CC1: 536 vs. 436 and CC2: 587 vs. 508.5). System FUZZY based on fuzzy logic trained with GA was some better and it classified properly only 8 kinds of tea (AC1 -8). It has the highest number of training epochs (CC3 -300) and long training time (CC2 -380).
As we can see, the systems based on the radial neural networks worked the best at solving problems requiring high accuracy, even if we have a small training set. Fuzzy systems are best suited to problems where the complexity does not play a significant role, but we require high accuracy and have access to collected knowledge.

Conclusions
The authors proposed the stages of data preprocessing needed to prepare relevant information for the developed AI systems. A novel method of forming input data vector by aggregation of the first three principal components was proposed. A dozen of data analysis systems based on ANN, fuzzy system and hybrid systems (evolutionary-neural, neuro-fuzzy and evolutionary-fuzzy) were developed and evaluated. The systems were used to classify ten kinds of tea. All the methods were optimized according to assumed criteria. The best results in complexity criterion was obtained for systems based on radial neural networks (both PNN and RBF) and fuzzy logic (FUZZY and ANFIS). These systems accurately classified all kinds of tea.
In future the authors plan to study systems based on ANN (FF and RNN) trained GA and then optimized by the LM algorithm. This solution should eliminate the main disadvantage of ANN, i.e a random choose of initial values of network weights and biases, what frequently reaching of local minimum, not global. Such combined system should have improved efficiency.