Prediction of Water Quality for Free Water Surface Constructed Wetland Using ANN and MLRA

Constructed wetland is commonly used as a practice to reduce non-point source pollutants and as a stormwater treatment system. For many years, the evaluation of water quality assessment for the constructed wetland is using normal sampling and laboratory work. However, in line with the technology expansion, the prediction for water quality using modelling has been developed. This study focuses on the prediction of water quality parameter for constructed wetland under tropical climate using Artificial Neural Networks (ANN) and Multiple Linear Regressions Analysis (MLRA). There are five input parameters such as water quality at the inlet point, detention time, depth of water, ratio length to width, and rainfall. The output parameters consist of the water quality at the outlet point namely Biochemical Oxygen Demand (BOD 5 ), Chemical Oxygen Demand (COD), Total Phosphorus (TP), Total Nitrogen (TN), and Total Suspended Solid (TSS). Squared correlation coefficient (R 2 ) and root mean square error (RMSE) were applied to assess the model presentation and the result indicated that the ANN model shows excellent performance compared to MLRA. The R 2 value for each output parameter is higher than 0.90 and the RMSE values were closer to zero. However, TN has shown a very good pollutant removal in constructed wetland compared to other water quality tested. Findings from this study will contribute towards the enhancement of design performance and guideline for constructed wetlands under tropical climate.


Introduction
Rapid development has changed the pattern of land use which increases impervious surfaces and decreases infiltration [28]. Hence, the urbanization of the natural water cycle has changed both the quantity and quality of water [30]. The effects of urbanization can spread impervious areas and diversification of land use with naturally vegetated land has converted into the impervious area. In the long term, urbanization has resulted in frequent flash floods as well as deteriorating quality of water in the urban area due to non-point source pollutants [28]. The non-point source pollutants contain various contaminants such as nutrients, pesticides, pathogens, biological oxygen demand [22]. Therefore, concerted efforts should be carried out to reduce non-point source pollutants to meet water quality compliance and implement management practices.
Constructed wetland is an environmental practice to reduce non-point source pollutants and worked as a stormwater treatment system. It can be defined as man-made wetlands developed and managed to treat contaminants in stormwater and wastewater [12]. The constructed wetlands are an economical and acceptable wastewater treatment alternative worldwide [18]. As a result of the successful use of wetlands for wastewater, various treatment investigations have been studied such as using stormwater, industrial, agricultural, etc. [25,12]. The constructed wetlands are increasingly used by using combined systems method to treat various industrial wastewater [20,6,26,13]. The application of free water surface constructed wetland in Malaysia has been described in Manual Saliran Mesra Alam or Stormwater Management Manual (SWMM). This manual is a technical guideline based on proven models for regulators, engineers, planners, developers, and the public to achieve a positive move towards the enhancement of stormwater and water management in the country [32]. Thus, the main objective of this manual to utilize the retention/detention, infiltration, and purification [7,34]. Even though the SWMM stated the guidelines to design the constructed wetland but still the enhancement needs to be done according to the actual condition so that the comprehensive design of constructed wetlands can be performed.
There are various models have been developed to assist in the design and implementation of stormwater management strategies such as MUSIC, MOUSE, Storm Water Management Model (SWMM), and the Water Balance Model (WBM). However, each model has its strengths and weaknesses [8]. ANN is being widely used as one alternative in the analysis of forecasting [10] and this can be considered as an alternative for prediction tool in designing constructed wetland.

Artificial Neural Networks (ANN) for Constructed Wetland
Artificial neural network (ANN) is a form of machine learning method. It is a neural network system capable of learning from processed data and solving problems that are not appropriate for conventional statistical methods [1]. Nonlinear relationships involving input and output variables can be solved without having to understand the nature of physical processes. Leondes [15] indicated that the utilization of ANN has good value when it is complicated or impossible to form a relationship. This method is also useful even if the data is noisy or inadequate. Ghorbani et al. [10] stated that ANN a powerful tool for prediction. It can resolve the forecast problem when the output is constant or acts as a classifier when the output is binary [19]. The performance of ANN depends on its network architecture where the networks directly affect computing capabilities and generalization capabilities. According to Baykal and Yildirim [4], factors such as magnitude, training data, ANN construction, and learning algorithms influence the effectiveness of ANN in problem-solving.
Over the last few years, ANN has been successfully applied to modelling various water engineering problems, such as used for constructed wetlands. Tomenko et al. [27] developed ANN models to predict the performance of a constructed wetland wastewater treatment plant (CWWTP). The ANN models were found to be competent and a useful device in predicting CWWTP performance. Yalcuk [31] proposed the use of the ANN to model phenol removal in vertical and horizontal constructed wetlands. The results of the study found that the ANN was efficient and easy to apply to improve the performance of wetlands constructed horizontally and vertically. Ozengin et al. [21] used the application of ANN in forecasting the nutrients released from wetland construction and the results showed that ANN was able to forecast successfully. According to Wei et al. [29], the back propagation algorithms were used to train the ANN networks and showed good performance in forecasting nutrient uptake in wetlands constructed for tidal flow, and there are minor errors when linking predicted values and actual values. Hamada et al. [11] noticed that the ANN model can be used effectively in valuing BOD 5 , COD and TSS at Gaza wastewater treatment plant outlets. The results prove that the ANN performance model is better than the MLR model.

Multiple Linear Regressions Analysis (MLRA) for Constructed Wetland
Multiple Linear Regression Analysis (MLRA) is one of the most widely used all statistical methods to find the performance of the output. The Statistical Package System Software (SPSS) version 16.0 is used to analyze the MLRA. Multiple Regression is a continuum of linear regression, in which the value of a variable can be obtained based on two or more other variables. The prediction variable is called the dependent variable or also known as the target. Whereas the independent variable is the variable used to predict the value of the dependent variable. The result of this analysis is a linear regression between two or more variables and the method that is used to build the linear regression model is backward stepwise. A general linear regression model with variable X is shown in Equation (1). Y i = β 0 +β 1 Xi 1 +β 2 Xi 2 +…….+ β p-1 Xi p-1 (1) where, β 0 , β 1 , β p-1 = parameters; X i1 , ….., β p-1 = known constants Then the regression between the observed and the measured data was performed to determine the relationship between the observed and the simulated. This relationship was evaluated using the coefficient of Pearson's correlation (PCC). PCC can comprise a value of -1, past a value of 0, and up to a value of 1. The model performance shows the perfect correlation if the PCC closer to +1 and the determination of PCC value is shown in Equation (2).
In addition, a significant relationship between independent and dependent variables can be identified through the multiple regression correlation (R). This assessment can be utilized to increase the accuracy of the prediction for the dependent variable over one independent variable alone [5]. Babatunde et al. [2] developed the equations using multiple regression analysis for predicting BOD 5 and COD focused on the constructed wetland outlet as stated in Table 1. Sarmadian et al. [23] applied the use of the ANN model to model soil properties and the results showed that the ANN technique was effective and produced better results when compared to multivariate regression analysis. Meanwhile, Santos et al. [24] conducted a modelling study on soil penetration using statistical analyses and artificial networks which indicated the ANN model presented better results than the statistical model obtained from regression analysis. Kiiza et al. [14] conducted a study on the prediction of pollutant removal in vertical flow subsurface constructed wetland using ANN and the results produced a good agreement between predicted and experimental data for predicting TP and TN removal. Meanwhile, Li et al. [16] conducted a study on ANN and Multiple Regression Analysis (MRA) in modelling of TP treatment in HSSF constructed wetland system and it seemed ANN to be more efficient and strong potential compared to MRA. Therefore, this study intends to compare the predictive ability of artificial neural network (ANN) and Multiple Linear Regression Analysis (MLRA) in predicting the water quality for free water surface constructed wetland.

Location of Study
The selection of free-surface constructed wetland can be identified as small, medium, and large size. There are three selected locations located in Kuala Lumpur, Putrajaya and Pulau Pinang ( The larger site is located at Putrajaya and Putrajaya is the Federal Administrative Centre of Malaysia with the coordinate 20 57' 43" N and 1010 41' 47" E, at the south of the popular Klang Valley area. The Putrajaya Lake catchment, which is also known as the Sg. Chuau catchment is situated around 25 km south of Kuala Lumpur. The lake covers an area of 400 hectares and has a volume of 23.5 million cubic meters. The water depth ranges from 3 to 14 metres with an average depth of 6.6 metres. The lake has a 20 m width promenade that acts as a support element along the shores of the lake and stretches at a total length of 34.0 km. The Putrajaya catchment covers an area of 52. 4

Rainfall Data
Local rainfall data as shown in Table 2 are required to determine the hydrological feat of constructed wetland utilizing the automatic rain gauges which operate on the tipping bucket principle were installed by the Department of Irrigation and Drainage Malaysia (DID). The tips of the bucket occur with each 0.1 mm of precipitation collected within each 5 minutes interval.

Water Quality
A series of sampling to determine water quality was conducted at the channel of inlet and outlet during a storm event using the grab sampling method. The pollutant removal is the dependent variable while the average depth, the length to the width ratio, the detention time of constructed wetlands and the rainfall are the independent variables of the equation. The parameters assessed and involved for this study are Biochemical Oxygen Demand (BOD 5 ), Chemical Oxygen Demand (COD), Total Phosphorus (TP), Total Nitrogen (TN) and Total Suspended Solid (TSS). Sampling and testing methods are implemented according to the water and wastewater inspection standard method (20 th Edition).

Model Development for ANN
In this study, ANN prediction behaviour utilized 65 different designs of constructed wetland containing input and output parameters. The input data consists of (i) hydraulic characteristics at the inlet point of constructed wetland, (ii) detention time, (iii) depth of water, (iv) ratio length to width, and (v) rainfall. While the output data consists of the water quality parameter focused on the constructed wetland outlet namely, (i) BOD 5 , (ii) COD, (iii) TP, (iv) TN structure used is five inputs and one output, as shown in Figure 2. The feed-forward network has been selected for use. On the hidden layer, the tan-sigmoid transfer function was applied, whereas on the output layer, the linear transfer function was used. The input layer is utilized to convey data to the network. The hidden layer is used to perform as a feature detector group. The output layer is applied to generate the appropriate reaction to a provided input. The number of hidden layers 10 were selected for use in the ANN network for all output parameters. The numbers of the hidden layer were determined after a successful trial to select the best network structures.
According to Gencel [9], there is no theory available to reveal how many hidden units are required to approach a particular function. The try and error method is used to define the maximum number in the hidden layer [33]. Liu [17] stated that the network is affected by the number of hidden layers. The network training time will increase and result in an over-fitting because the network will not converge to the target error if the number of networks is too high. Meanwhile, the relationship between inputs and outputs cannot be fully demonstrated as well as model training is incomplete if the number of networks is too small. The training in this study was applied using Levenberg-Marquardt (LM) algorithm with variable learning rate. In neural network analysis, the data set is automatically and randomly divided by the software into three sets of 70% (45 data) for training, 15% (10 data) for validation, and 15% (10 data) for testing.

Model Development for MLRA
The MLRA is a method for estimating the behavior of dependent variables based on several independent variables. Through this method, a linear relationship for these two variables can be generated. The results of the regression where the simulated percentage of removal efficiency for each water quality parameter is obtained by putting the value of the hydraulic detention time, depth, length to the width ratio and rainfall. The regression between the observed and the simulated data was performed to determine the relationship between the observed and the simulated. The regression analysis is one way to understand the typical values of the dependent variable changes when any of the independent variable changes. The process of validation was introduced in the linear model and the result of the equation was compared to the actual measured data. The purpose of conducting the validation process is to ensure that the results of the analyses do not hold just on the sample studies but also on the entire population. Figure 3 shows a comparison of the measured and predicted values generated by the ANN model for outlet BOD 5 , COD, TP, TN and TSS, respectively. The result showed that ANN could predict the hydraulic characteristics of constructed wetland with high accuracy and close to the measured values. Visually, no significant difference between the measured and predicted values for all the hydraulic characteristics since the values seem to overlap with each other. Visual comparisons are difficult because the predicted values are almost the same as the measured values. However, the accuracy can be determined by comparing the value of the squared correlation coefficient (R 2 ) and root means square error (RMSE). The R 2 and RMSE values of the ANN model for all the water quality shown in Table 3. The result revealed that the ANN model shows excellent performance in predicting the water quality for constructed wetland because the R 2 value for each output parameter is higher than 0.90. The RMSE values ranged from 0.0140 to 3.9440. Figure 4 shows the performance of the ANN model in terms of R 2 value for outlet BOD 5 , COD, TP, TN and TSS. Overall, the ANN model for water quality of outlet TN more efficient compared with another model ANN because the R 2 value was closer to one and RMSE values were closer to zero. Kiiza et al. [14] conducted a study for the prediction of nitrogen and phosphorus removal using ANN and presented satisfactory results where R 2 higher than 0.65. Meanwhile, the regression results for MLRA are present in Table 4, where the simulated percentage of removal efficiency for each water quality parameter is obtained by putting the value of the hydraulic detention time, depth, length to the width ratio and rainfall. Table 5 indicates the linear regression model for the percentage of removal efficiency based on the dependent variables such as BOD 5 , COD, TP, TN and TSS towards the independent variables such as hydraulic detention time, depth, length to the width ratio and rainfall.

Results and Discussions
The findings indicate that the most affecting factor for BOD 5 removal efficiency is rainfall with an R 2 value of 0.2025. This is followed by length to width ratio with an R 2 value at 0.0853, hydraulic detention time with an R 2 value at 0.0066 and depth with an R 2 value at 0.0006. For COD removal efficiency, the percentage is influenced by rainfall with an R 2 value of 0.1024, followed by length to width ratio with R 2 value at 0.0072, hydraulic detention time with R 2 value at 0.0026 and depth with R 2 value at 0.0017. The highest R 2 value for TP removal is 0.0454 produced by depth, 0.0259 by hydraulic detention time, 0.0040 by rainfall and 0.0015 by length to width ratio. For TN removal, the most affected parameter is length to width ratio with R 2 value at 0.1149, 0.0595 formed by hydraulic detention time, 0.0216 by depth and 0.0177 by rainfall. TSS removal is highly affected by rainfall with an R 2 value at 0.0650, then depth with an R 2 value at 0.0357, hydraulic detention time with an R 2 value at 0.0259 and length to width ratio with an R 2 value at 0.0040.
The findings indicated that rainfall is the most affecting parameter to remove COD, BOD 5 , and TSS. For TP and TN, the affecting parameter is depth and length to width ratio, respectively. The study conducted by Babatunde et al. [3] on the constructed wetlands situated at a research farm in Newcastle, Dublin Ireland recorded the R 2 value for the Multiple Regression Analysis on BOD 5 and COD at around 0.665 and 0.588, respectively. The performance of MLRA which obtained from the measured and simulated removal percentage for water quality is shown in Figure 5. Meanwhile, the R 2 and RMSE values of the MLRA model for all the water quality shown in Table 6.

Conclusions
This study indicates that the ANN is a powerful and reliable tool in predicting the hydraulics characteristics of constructed wetland located in tropical climates. The prediction accuracy of the ANN model for all the water quality parameter is considered great because it has a value of R 2 greater than 0.90 meanwhile, as for MLRA, it presented low values of R 2 (between 0.029 to 0.6). Thus, it showed the ANN is the proper technique for modelling a complex system to help produce an accurate model. However, to produce a good prediction, a large amount of data is needed when training a network because neural networks need to study patterns thoroughly. During the testing and application phases, the amount of data used also affects network performance.
Findings from this study can be used as a guideline in the design of constructed wetland located in a tropical climate. The obtained model also can be used for preliminary assessment of preliminary design phases and feasibility studies for constructed wetland located in a tropical climate. For future studies, another artificial intelligence method such as the Artificial Neural Network Fuzzy Inference System (ANFIS) model can be investigated to compare the prediction accuracy of hydraulics characteristics for constructed wetland located