Regionalization Approach for Modeling of Monthly Evaporation Based on Cluster Analysis

Among interesting Climatic Parameters (CPs) of different climates, it may be asserted that evaporation has received a particular attention. The present study, by considering 14 stations in arid central and southern parts of Iran, has tried to determine the most important factors affecting evaporation through using Factor Analysis (FA). At first, by conducting Cluster Analysis (CA) through Minitab software, Homogeneous Regions (HRs) separated at the investigating areas. Then, the relation between climatic factors and evaporation on the basis of monthly routine was obtained in each region. Finally, by the help of accomplished regression equations, evaporation was calculated and MAE or RMSE statistics were applied to determine the precision of each relation. The results show that, firstly, factor analysis is an appropriate method for determining climatic parameters affecting evaporation in the regions. Secondly, as far as the relation between independent parameters and evaporation and priority of the main factors has been proved previously, multiple regressions may be used with great confidence to calculate evaporation and consequently evapotranspiration. The results also indicate that the difference between maximum and minimum temperatures (Tmax – Tmin), Tmax and the difference between maximum and minimum relative humidity (Hmax – Hmin) are, respectively, the main factors which influence the evaporation value at the Homogeneous Region (HR1) which is spread along Iran's water borders having a warm and humid climate. Generally, the Temperature Related Parameters (TPR) including Tmax – Tmin, Tmax, Tmin and Tmean at the HR1 are found as the main factors affecting evaporation. At the HR2 and HR3, wind speed, Tmax and Hmax – Hmin are identified as the main parameters and cloudiness as the second one. The results show that the wind speed and the cloudy times of day have a more effective role in the amount of evaporation rather than the temperature related factors.


Introduction
Evaporation is among complicated yet effective factors in the climate of each region. It is absolutely affected by different climatic parameters. Evaporation is a key factor to calculate hydrologic balance or drought severity. It is one of the hydro-climatic phenomena that due to its dependence on many different variables, is considered as a multidimensional phenomenon. As a whole, variables affecting evaporation are as follows: precipitation, wind speed, solar radiation, humidity, amount of sunny hours, cloudiness, plant coverage, soil characteristics, available water quality, and so on. However, since all of these data may not be available for a certain region, determining relative importance of effective variables and factors on evaporation plays a prominent role with respect to crop water requirement and irrigation water need in specified regions. Moreover, it is worthy to note that the main structure of data playing an important role in the quantity of evaporation is consisted of several parts and factors. The main point is that the share and the effectiveness quantity of each of these factors is very important and also each one is in close relation with any other physical parameters controlling evaporation. Investigation of this parameter seems necessary especially for arid and semi-arid regions confronting with water deficiency.
Pan Evaporation measurements have been used worldwide as a means of estimating evapotranspiration and free water surface evaporation. The availability of Pan Evaporation measurements is critical to many applications including irrigation scheduling, hydrologic modeling, and irrigation system design. The ability to predict daily pan evaporation based on limited meteorological observations is highly desirable. In many situations it is advantageous to calculate, rather than measure, pan evaporation. This is often the case in developing countries or remote locations where costs are prohibitive. Unfortunately, it is rather difficult to directly measure evaporation (e.g. Brutsaert, 1982) and indirect approaches are used to estimate it from variables directly related to evaporation (Huntington 2006;Dewi et al., 2009).
Many relationships that predict pan evaporation or potential evapotranspiration as a function of limited meteorological observations have been developed (Cahoon and et al, 1991). Evaporation is an element of hydrologic cycle, which can be generally estimated by the indirect methods such as mass transfer, energy budget, and water budget methods. One of the direct methods for evaporation measurements is the pan evaporation, which is used as an indicator of evapotranspiration. Many researchers have tried to estimate the evaporation through the indirect methods using the climatic variables, but some of these methods require the data which can not be easily obtained (Kim et al. 2008), and (Rosenberry et al, 2007). Mohan and Arumugam (1996) investigated the relative importance of different variables involved in evapotranspiration through a multivariate technique, namely factor analysis. Monthly data pertaining to eight meteorological stations located in two tropical states of India were used in the analysis. Factor analysis was applied to determine the relative effect of rainfall, temperature (maximum and minimum), wind speed, relative humidity, and sunshine duration on evapotranspiration. The principal components were obtained and a set of factors was derived using varimax factor rotation. The results of the study revealed that the variables, relative humidity, temperature, and wind speed, are those with the most influence in the evapotranspiration process.
In a wide country like Iran, with mostly arid and semi-arid weather, and existence of highly annual evapotranspiration, conducting such an investigation seems necessary and vital. Since it does not seem easy to calculate potential evaporation, this article tries to analyze evaporation and its affecting factors respectively. Hence, the introduction of factor analysis as a method used in this study is firstly attempted and then the goal and different steps of the present study are briefly covered. However, since simple statistical & univariate methods are not able to represent theses relations, taking advantages of multivariate statistical methods should be taken into consideration. Principal Component Analysis (PCA) and Factor Analysis (FA) are among the most famous and most applicable multivariate data analysis methods. These techniques are capable of describing observed relationship between several variables, in the form of some relatively simple relations, as well as presenting an idea based on the relative importance of different affecting factors on the phenomenon under study (Matalas and Reiher, 1967). In addition, this method is applied for making judgment on the basis of eigenvectors and vertical empirical functions (Rao, 1990). Some accomplished past studies showed that factor analysis statistical method might be used to investigate hydrologic phenomena. Iyengar (1991) and Gadgil and Iyengar (1980) used PCA method to determine temporal variations pattern of precipitation within an area in India. Moreover, Gadgil and Joshi (1983) used monthly precipitation, temperature, and humidity index data to classify Indian climates by the help of PCA. In addition, researchers like Bedi and Binderam (1980), and Goosen (1985) used this technique to analyze rainfall statistics. Molina et al. (2006) developed and validated a simulation model of the evaporation rate of a Class A evaporimeter pan. Raziei and Azizi (2007) addressed this issue and classified western parts of Iran's precipitation regime by the help of PCA and clustering methods. In the present investigation 10 parameters used to study principal component analysis were scaled down to 4 items and then Varimax Rotation was applied. Next, considering Ward's Hierarchical Clustering method and based on the standard score amounts of acquired components; applied stations in grouping analysis and western parts of Iran were divided into 5 homogeneous sub-regions.
This investigation attempted to study the relations between evaporation and other climatic parameters within the template of different factors. Moreover, through reviewing these relations, the most important climatic parameters affecting evaporation of central and southern parts of the country were determined. Then, through cluster analysis, investigating regions were divided into smaller homogeneous parts and the most important climatic parameters affecting evaporation and the relating regression equations were determined for each month and at each homogeneous region. Figure 1 shows different steps in conducting the present study. These stations have been located at main agricultural plains in central and southern Iran. It should be noted that climatic data of different synoptic weather stations like Bam, Bandar-Abass, Bushehr, Chabahar, Esfahan, Fasa, Jiroft, Kashan, Kerman, Shiraz, Tabas, Yazd, Zabol, and Zahedan were used here. Maximum statistical period for these stations was from 1953 to 2003 and minimum time period of available data was from 1966 to 2003. All of the monthly evaporation data and other climatic parameters influencing on evaporation, available at statistical period of each station, were used for the analysis processes.

Materials and Methods
As shown in Figure 1, at first the long-term statistics of selected stations were provided. These data were related to climatic parameters (CA) affecting evaporation (E) including cloudiness, max, min and mean temperature, max and min temperature difference, mean relative humidity, max and min humidity, and difference between max and min humidity and wind speed. Then, the most important factors affecting on evaporation were identified using factor analysis techniques.
The first step in FA is to standardize the data. Hence, through Minitab software the data were first standardized by dividing each parameter on its mean value. Then, we embarked on conducting FA of CA and determining items affecting evaporation. In this step, the Varimax rotation was used to identify the main factors affecting evaporation. This approach has been used in many researches, i.e., Mohan and Arumugan (1996) and Masoudian (2004). In addition to Varimax rotation, 4 expected factors were chosen in this investigation. By conducting FA, 4 factors, in each of them one of CA has a more important role, were defined. Then, after defining parameters affecting evaporation phenomenon on the monthly basis, homogeneous regions, considering these parameters, were identified. This was accomplished by CA. Next, after accomplishment of CA over long-term data of investigated meteorological stations (14 stations), three homogeneous regions (HR) were defined according to climatic parameters. By sorting data on the basis of 3 HR, the most important factors affecting E were evaluated again. It was accomplished through FA. Revealing 3 superior factors in each region through multiple regressions, the existing relation between evaporation (as dependent variable) and detected factors (as independent variables) was distinguished monthly. Having realized mathematical relation between evaporation and climatic factors, calculation of evaporation based on monthly reported regression equations was carried out. Hence, the calculated monthly evaporation amounts for each investigating station were provided from Iran Weather Organization.

Results
Since the nine climatic parameters had different units, all data sets were standardized. This rescaling effectively gives equal weight to each site characteristics in determining main variables and clusters. The 14×9 matrix of standardized selected variables was subjected to factor analysis. The first four factors, accounting for 85.8% of total variance, were selected and subjected to Varimax Normalized Rotation. This method of rotation is widely accepted as the most appropriate type of orthogonal rotation and for climate data. Loadings greater than 0.7 (Dinpashoh, 2004) were considered as important loadings. Factor scores for each of the 14 stations were calculated from the standardized variables and the associated factor loadings. Factor analysis technique was utilized to analyze variables in 14 stations in each month. It was identified that the variables are summarized to 4 factors. For instance, the result of FA in April was shown in Table 1. In this table, the variable which has biggest value (weight) is the main variable affecting evaporation in each factor. The first four more effective variables on E contain 84.8 percent of the variations. The method of principal components and Varimax Rotation was used to extract the factors loading matrix.
Where the Tmax, Tmin and Tmean are maximum, minimum and mean monthly temperature, Hmax and Hmin are maximum and minimum of monthly relative humidity, respectively. C represents the cloudiness and WS is the mean monthly wind speed.
As it is shown in the Table 1, temperature plays a prominent role in evaporation of all months of a year and it is determined as the most important parameter in the first factor. Temperature parameter also plays a role in the form of temperature min and max variation (Tmax -Tmin) and in some months maximum temperature (Tmax) imposes the highest effect. Then, considering climatic factors affecting evaporation, homogeneous regions were categorized and results were achieved in the form of a dendrogram. Figure 2 shows this dendrogram.  The number of station grouping to homogeneous groups depends on the similarity value in ward cluster approach. Therefore, three homogeneous regions can be defined with the similarity value of about 25. Obviously, considering the larger value of similarity causes less number of stations to be located at a homogeneous area.
Next, through factor analysis the most important parameters affecting evaporation in every month were determined and finally regression equations of evaporation calculation are set based on determinant climatic factors (tables 2 to 4). On the basis of these equations monthly evapotranspiration of total stations were calculated. The results showed that not only the three main factors affecting E but also the priority of them are not the same at different months of a year at each homogeneous region. Tables 5 to 7 show the factor analysis results for determining the three main climatic parameters affecting evaporation at the 3 HRs. Thus, the correlation between evaporation as a dependent variable and first (F1) and second (F2), and also, first, second and third (F3) more effective independent variables, R 2 II and R 2 III , respectively, were calculated at each HR (tables 5 to 7). The high correlation between E and the first three more effective variables indicates that the derived equations can be applied for sites with low or no recorded evaporation data located at three identified HRs.      Fig. 3 shows that the difference between maximum and minimum temperatures (Tmax -Tmin), Tmax and the difference between maximum and minimum relative humidity (Hmax -Hmin) are the main factors which influence the evaporation value, respectively. This homogeneous region (HR1) is spread along Iran's water borders having a warm and humid climate. Generally, the temperature related parameters (TPR) including Tmax -Tmin, Tmax, Tmin and Tmean at the HR1 are found as the main factors affecting evaporation. At the HR2 and HR3, wind speed, Tmax and Hmax -Hmin are identified as the main parameters and cloudiness as the second one ( Fig. 4  and 5). These two regions are spread at the central and southeastern Iran, the most portions of which are located at desert and arid climate. The results showed that the wind speed and cloudy times of day have a more effective role on the amount of evaporation rather than the temperature related factors.
It was shown that the most evaporation occurred at homogeneous regions 3, 2, and 1 respectively. This is not only related to the above mentioned parameters affecting evaporation but also may be related to topographical conditions and the altitude of region.
Being situated near the sea is the main reason of homogeneous region 1 for having lower potential evaporation than others. Higher relative humidity, among affecting factors, is an important reason for lowering evaporation of this region. As it is shown in figure 3 wind speed in region 1 can not increase the evaporation as well as the other two dry regions. Results of factor analysis have also pointed out this issue. However, in the homogeneous region 3, in addition to the positive and prominent role of climatic parameters in evaporation, topographical shape of the region and the low lying feature with no high lands may be considered as other affecting factors in high evaporation of these regions. Table 8 shows the amount of RMSE and MAE statistics related to the comparison of observed and predicted evaporation by regression equation for each month at any homogeneous region. As it is shown by this table, the amount of error statistics is a bit more in some months than others. This is derived from climatic parameters affecting prediction of the evaporation amount. In these months, it seems that selected parameters in regression model are not able alone to predict the real amount of evaporation and therefore more important parameters should be sought.

Conclusions
Statistical analysis results achieved from present investigation show that, firstly, the load of climatic parameters at all factors is extremely high and appropriate, and secondly, a set of determined items through factor analysis may have the ability to express the high percentage of variance that existed between data. In addition, regression correlation coefficient between the selected factors and evaporation amount in each month is very high. All these evidences indicate the prominent role of factor analysis method in investigating the relations between evaporation and climatic parameters. The study area at central and southern Iran involves the main agricultural plains and the selected weather stations are located at these plains. The results of this research are suitable for water resources and agricultural management at the plains located at the study area. There are some single mountainous regions at this area and it is obvious these areas don't belong with these three humongous regions. Due to the results of the present investigation it was found that, in each month, one of the climatic parameters has a more prominent role on the amount of evaporation and among them the temperature related parameters (Tmax -Tmin, Tmax, and Tmean), wind speed and cloudiness' role are more distinguished than the others. The knowledge of the most important factors affecting monthly evaporation in each region, especially, in arid and semi-arid regions can be used for a better management of water resources and agriculture.