A Comparison Between Two Approaches to Optimize Weights of Connections in Artiﬁcial Neural Networks

Artiﬁcial neural networks (ANNs) have been used for estimation in numerous areas. Raising the accuracy of ANNs is always one of the important challenges, which is generally deﬁned as a non-linear optimization problem. The aim of this optimization is to ﬁnd better values for the weights of the connections and biases in ANN because they seriously affect the efﬁciency. This study uses two approaches to do such optimization in an ANN. For this aim, we create a feed-forward backpropagation ANN using the functions of MATLAB’s deep learning toolbox. To improve its accuracy, in the ﬁrst approach, we use the Levenberg — Marquardt algorithm (LMA) for training, which is available in MATLAB’s deep learning toolbox. In the second approach, we optimize the values of weights and biases of ANN with Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), available in MATLAB’s global optimization toolbox. Then, we assess the accuracy of estimation for the trained ANNs. In this way, for the ﬁrst time in the literature, we compare these methods for the optimization of an ANN. The used data sets are also available in MATLAB. Based on the acquired results, in some data sets, training with LMA, and for some others training with PSO cause the best results, however, training with LMA is faster, signiﬁcantly. Although the used approaches and the obtained conclusions are beneﬁcial for researchers that work in this ﬁeld, they have some limitations. For instance, since only the functions and data sets from MATLAB are used, it can only serve as an example for researchers.

An ANN has components like neurons, connections and propagation function. In this study, we generate a feed-forward backpropagation ANN using MATLAB's deep learning toolbox to do estimations on data sets available in the same toolbox. In ANNs, a weight is assigned to each connection. The values of weights and biases seriously affect the performance of ANN. These values can be defined by solving an optimization model. We improve the performance of ANN in two ways. As the first approach, we use the Levenberg-Marquardt algorithm (LMA) available in MATLAB's deep learning toolbox. In the second approach, we define the values of weights and biases by using Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), which are available in MATLAB's global optimization toolbox.
In the literature, evolutionary algorithms (EAs) have been used to find the weights and biases of ANNs [8][9][10][11][12]. Different from them, in this study, we compare the performances of LMA, GA and PSO available in MATLAB's toolboxes, for the training of ANNs. So the main goal of this study can be summarized like this: we use LMA, GA and PSO to improve the accuracy of a feed-forward backpropagation ANN and make a comparison between their performances.
In other sections of the work, structure of ANN and its training are summarized. The used functions are explained briefly. Then experimental results are presented and compared. Conclusion is the last section of the article.
2 General Structure of ANN, ANN+GA, and ANN+PSO Each of the used data set is divided into training and test sets, whose inputs and outputs are normalized. The ANN is generated using newff function [13]. In different versions of MATLAB, there are similar ones to this function, which can also be used. In the syntax of the newff function, there is a matrix in size of I×2, denoted as PR, in which I is the number of inputs and shows the minimum and maximum values of the inputs. In all rows, the first and second columns of the PR matrix are -1 and 1, respectively. tansig is used as the transfer function, which is defined as A created ANN is shown in Figure 1. Subsequently, the formed ANN is trained with training data set. During this process, weights and biases are determined. The objective function of the training process is defined as in Equation 1.
where, d, Y T r T , and Y T r P are the number of data, target outputs, and predicted outputs in the training set, respectively. MSE stands for mean square error.
As mentioned before, two approaches are used for training. In the first one, the train function is applied from MATLAB's deep learning toolbox. This process is shown in Figure 2, which is for vinyl dataset that has 16 inputs and 1 output [15]. As seen in Figure 2, there are 10 neurons in the layer, LMA is used for training, and performance is calculated based on MSE. In the second approach, after the ANN is created, the values of its weights and biases are improved with GA and PSO. For this aim, these commands are used: Network.IW to access the input weight matrix, Network.LW to access layer weight matrix and Network.b to access bias vectors [16]. If there are I inputs and layer contains N neurons, the size of the input weight matrix is I × N . We use only one layer, and the size of its weight matrix is N × O, where O is the number of outputs. Each ANN has biases as many as neurons. Hence, in the optimization problem there are nvars = I × N + N × O + N decision variables, which are defined as a vector.
In GA, the vector of decision variables is initialized as follows: lb = −1 * ones(1, nvars); ub = +1 * ones(1, nvars); X = rand(1, nvars). * (ub − lb) + lb; lb and ub represent the lower and upper bounds of decision variables, respectively. The objective function of the training process with GA is defined as: f un = @(X)mse(Y T r T − Y T r P ) [17]. The sim function is used to get Y T r P [18].
Optimization with GA is done with the next commands:   In a similar way, in PSO, the optimization is done with the following commands [19]: options = optimoptions( particleswarm , P lotF cn , pswplotbestf ); rng default [x, f val, exitf lag, output] = particleswarm(f un, nvars, lb, ub, options); As mentioned before, each data set is divided into two parts as training and test data. ANN's accuracy is assessed using the test data, over the outputs. It's calculated MSE between the target (actual) and predicted (output) values. The corresponding objective function is defined as in Function 2. Similar to the training process, the sim function is used to obtain ANN's outputs [18].

Experimental Results
The used data sets are simple fitting, abalone shell rings, body fat percentage, building energy, chemical sensor, cholesterol, engine behavior, and vinyl bromide, which are available in MATLAB. The dimensions of the inputs and outputs of the data sets are given in Table 1. Table 1. The dimensions of the inputs and outputs of the used data sets [15].

Data set
Input The percentage of the data which is used for training is 75%. As an example, the input and output sizes of the abalone data set are 4177×8 and 4177×1, respectively, [15]. In this case, there are 4177 data, each has eight inputs and one output. So, the sizes of the input and output data sets used for training are 3132×8 and 3132×1, respectively. Assessment is done using the rest of the data, i.e. 25%. In fact, data of size 1045×8 is used as input from trained networks, and outputs of size 1045×1 are obtained. Assessment is done based on Equation 2.
Default options are used for selection, crossover and mutation in GA [20]. We use MaxStallGenerations and MaxGenerations in options. MaxStallGenerations checks the number of generations of GA without improvement to see whether there is progress or not, which is defined as equal to 500. The maximum number of generations is controlled by MaxGenerations that is set up as equal to 1000. Similar values have also been used for the options of PSO. Other options for both algorithms are as defaults, determined in the toolbox of global optimization. We use a system with an Intel Core i7 processor, 1.8 GHz with 16 GB of RAM. Convergences of GA and PSO for vinyl are shown in Figure 5. Similar situations are valid for the other sets. Results are presented in Table 2, in which the best results are bold. As seen, the best results are acquired by ANN for some data sets and also by ANN+PSO for some others, however, there is not much difference between the results. The computation time for ANN is less than ANN+GA and ANN+PSO. ANN+GA has not achieved the best result for any data set and has the highest computation times.
Considering the computation time for the used data sets, it may seem more reasonable to apply LMA for ANN optimization. But it should be noted that the sizes of the data sets used are not very large, and the other approach may provide better results for different ones.

Conclusion
In this study, to do a better estimation, the optimization of an ANN was done with two different approaches. In the first one, optimization was done with the LMA, while in the second approach, the values of weights and biases of ANN were optimized with EAs such as GA and PSO. These methods were demonstrated with ANN, ANN+GA, and ANN+PSO. Deep learning and global optimization toolboxes of MATLAB were used. The implementations were briefly described. Different sizes of data sets from MATLAB were used for training and testing. According to the acquired results, ANN achieved the best results for some data sets and also ANN+PSO for some others. However, there were no serious differences between the results. The ANN was the fastest approach while ANN + GA had the highest computation times.