Rainfall Modelling using Generalized Extreme Value Distribution with Cyclic Covariate

Increased ﬂood risk is recognized as one of the most signiﬁcant threats in most parts of the world, resulting in severe ﬂooding events which have caused signiﬁcant property and human life losses. As there is an increase in the number of extreme ﬂash ﬂood events observed in Klang Valley, Malaysia recently, this paper focuses on modelling extreme daily rainfall within 30 years from year 1975 toyear 2005 in Klang Valley using generalized extreme value (GEV) distribution. Cyclic covariate is introduced in the distribution because of the seasonal rainfall variation in the series. One stationary (GEV) and three nonstationary models (NSGEV1, NSGEV2, and NSGEV3) are constructed to assess the impact of cyclic covariates on the extreme daily rainfall events. The better GEV model is selected using Akaike’s information criterion (AIC), bayesian information criterion (BIC) and likelihood ratio test (LRT). The return level is then computed using the selected ﬁtted GEV model. Results indicate that the NSGEV3 model with cyclic covariate trend presented in location and scale parameters provides better ﬁts the extreme rainfall data. The results showed the capability of the nonstationary GEV with cyclic covariates in capturing the extreme rainfall events. The ﬁndings would be useful for engineering design and ﬂood risk management purposes.


Introduction
Extreme value analysis of hydrological data allows interpreting historical data and making inference on future probabilities of occurrence of extreme events, such as floods due to extreme rainfalls. Extreme values are often represented by the maximum value of a given variable over a time period such a year [1]. As climate change became an increasingly important issue over the past two decades, technical terms like "stationarity" and "nonstationarity" also became more obvious. Nonstationarity can simply be defined as processes that are are constant in time and that have statistical properties that are deterministic functions of time. Statistical inference for hydrological time series such as extreme rainfall events normally relied on the assumption of stationarity [2]. Nevertheless, under the combined influences of climate change and variability or human intervention [3], the series often exhibits nonstationary features and will not likely satisfy stationary assumptions [4,5]. Nonstationarity may affect both the severity and frequency of these extreme events [6]; therefore, the reality of nonstationary hydrometeorological extremes needs to be properly addressed. Accurate distribution models are required to accurately capture the nonstationary climate. Nonstationary extreme value models with climatic covariates could be the valuable tools to assess future changes in extreme rainfall distribution and quantiles for engineering design and flood risk management purposes [7].
One of the well-known distributions that were used in hydrological analysis is known as generalized extreme value (GEV) distribution. GEV distribution is one of the most commonly used for the analysis of extreme rainfall events [8]. Estimation of return levels is usually based on three extreme value distributions: Gumbel, Fréchet and Weibull which were suggested by [9] for stochastic behaviour of large samples. Quite a number of studies have been performed in Malaysia for determining the most suitable probability distribution of extreme rainfall events. Gamma, Generalized Normal, Generalized Pareto, GEV, Gumbel, Log Pearson Type III, Pearson Type III and Wakeby have been compared to determine the most accurate and appropriate distribution for the maximum rainfall estimates. It has been reported that GEV shows better descriptive and predictive abilities for the annual extreme rainfall as compared to other distributions [10]. From the analysis, it was found that 94% of rainfall stations in Peninsular Malaysia was best fitted by GEV. The findings are consistent with Nguyen [11] where several distributions namely Beta-K, Beta-P, GEV, Generalized Logistic distribution, Generalized Normal distribution, Generalized Pareto, Gumbel, Log Pearson Type III, Pearson Type III and Wakeby have been compared. It was found that GEV, Generalized Normal distribution and Pearson Type III were the best distribution for the annual maximum rainfall series. However, GEV was preferred as it gives a better description of annual maxima rainfall series where it has more solid theoretical basis which makes it preferable in determining the daily annual maximum rainfall. GEV was also capable in deriving the Intensity-Duration-Frequency (IDF) curve for rainfall series in [12] over other distributions. Recently, extreme precipitation data were fitted with four distributions namely, Generalized Pareto, Gumbel, GEV, and Exponential in Peninsular Malaysia. Results have shown that GEV distribution was found to best fit the time series at most of the stations in Peninsular Malaysia based on Kolmogorov Smirnov test [13].
It is very common to allow trends in the GEV parameters to consider nonstationarity in the frequency analysis of hydrological variables [14] as constant parameters may no longer be valid under nonstationary conditions. Several studies have developed nonstationary GEV models in hydrological area. Adlouni et al. [1] developed a nonstationary GEV with the linear and quadratic dependence of the location parameter on covariates, and the linear dependence in both location and scale parameters. Maraun et al. [15] proposed a study that investigated the annual cycle of extreme events across the United Kingdom by developing a statistical model and fitting the nonstationary GEV to data from 689 rain gauges. GEV distribution was used to fit to the time series of monthly maxima, across all months of the year simultaneously. The annual cycles of the location and scale parameters are approximated by harmonic functions, while keeping the shape parameter constant throughout the year. The study revealed that the approximation of the monthly maxima distribution by the GEV and the annual cycle by a sine wave appeared to be reasonable. The sinusoidal model proved to be a good compromise between a bias due to a stationarity assumption and the uncertainty owing to too many parameters.
Agilan and Umamahesh [16] observed that urbanization and local temperature changes were the best covariate factors for short duration rainfall in India. However, the covariate time was not concluded as the best covariates. Golroudbary et al. [17] found that the nonstationary GEV model gives a better estimate by considering the impact of the North Atlantic Oscillation and the annual seasonal cycle on the parameters. Meanwhile, this study uses a harmonic function model for all monthly maxima during the year with seasonal variations instead of individual models for every month. Sharma and Mujamdar [18] concluded that local mean temperature was found to be significant covariate of nonstationary GEV for extreme rainfall of the studied region. A nonstationary GEV model with cyclic covariate structure is proposed in this work. All extreme events were calculated assuming that maximum annual daily precipitations follow the GEV distribution. The parameters are estimated by using maximum likelihood estimation (MLE). To account for nonstationarity, the parameters of the extreme value distribution are modified with a set of predictors or covariates [19,20]. Accordingly, the sine and cosine functions are developed by considering the impact of cyclic covariate on the location and scale parameters. Therefore, the objectives of the study are to fit the stationary (GEV) and nonstationary (NSGEV) model with cyclic covariate on extreme daily rainfall series using the MLE, to identify the best fitted GEV model for extreme daily rainfall series using the Akaike's information criterion (AIC), bayesian information criterion (BIC) and likelihood ratio test (LRT) and to estimate the return level of extreme rainfall events using the best fitted GEV model.

Materials and Methods
Klang Valley covers the area of Kuala Lumpur and all the adjoining cities and towns in the state of Selangor. Over the years, the cities of Klang Valley have seen significant development and growth, leading to a good economy and a strong real estate market. The Klang Valley river basin is located on the west coast of Peninsular Malaysia and surrounds the federal territory of Kuala Lumpur and some parts of the state of Selangor. Since the Klang Valley river basin is surrounded by a heavily populated area with more than 4 million people, therefore it is considerably polluted. Huge development has narrowed certain stretches of the river and eventually resembles a large storm drain in some places. Hence, this contributes to the occurrences of flash floods in Kuala Lumpur, especially after heavy rain [21]. JPS Ampang rainfall station is selected in this study to represent Klang Valley. This is because, every year, higher rainfall amount is being recorded at this station.

Data
Daily rainfall data between the years 1975 and 2005 were sourced from the Drainage and Irrigation Department. JPS Ampang station as in Fig. 1 will be selected to represent daily rainfall in Klang Valley. Daily rainfall data consists of missing data out of 39 of 11324 total daily rainfalls. A missing rate of 5% or less in the datasets is inconsequential [22].

Stationary test
In order to fulfil the stationarity assumption of the GEV distributions, the Mann-Kendall (MK) test is performed on the data to check for trends over different selection periods. The MK trend test is a widely used non-parametric test for evaluating the presence of monotonic trends in time series data [23]. The MK test is a widely used technique for detecting monotonic trends in hydrological and meteorological time series [28]. The MK test which does not require normally distributed data and is well suited for analysing datasets that have 764 Rainfall Modelling using Generalized Extreme Value Distribution with Cyclic Covariate missing or tied data [25]. The annual maximum daily rainfall series are concerned. The variable m, with t = 1, . . . , T , to denote these annual maxima, and with T being the total number of years of data in the record. The differences in the data values between different time steps as with t 2 > t 1 , then the test statistic becomes ]. The test statistic N represents the number of m t2 is greater than m t1 , minus the number of times m t1 is greater than m t2 , for all possible combinations of m t2 and m t1 with t 2 > t 1 . A positive value of N implies that the time series increased more frequently than it decreased (and vice versa for a negative value of N ), and the value of N is bounded by ±T (T − 1)/2. Kendall's τ is a normalized version of this statistic, which is obtained by dividing by this upper bound So that τ is bounded by [−1, 1]. Assuming the data are serially independent, the null hypothesis can be approximated by a normal distribution, with more details in [23]. Using a 5% two-sided significance level, H 0 is rejected when the test statistics is less than the critical value which is 0.05. The Mann-Kendall analysis is conducted using the R package.
H 0 : There exists no trend in the extreme rainfall series H 1 : There exists trend in the extreme rainfall series

Stationary generalized extreme value distribution (GEV)
The GEV distribution is a family of continuous probability distributions developed within extreme value theory (EVT) which comprises of Gumbel, Fréchet and Weibull families. The GEV distribution is parameterized with a shape parameter (ξ), location parameter (µ) and scale parameter (σ). Each type corresponds to the limiting distribution of block maxima from a different class of underlying distributions. If the tails of distribution decrease exponentially, it leads to the type I (Gumbel) ; if the tails of distribution decrease as a polynomial, it leads to type II (Fréchet) and if the distribution has finite tails, it leads to the type III (Weibull). GEV combines three distributions into a single form, allowing a continuous range of possible shapes that include all three of the distributions. One of the distributions is used to model a particular dataset of block maxima. Based on the EVT, the GEV distribution is the limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. Data are blocked into sequences of observations of length n, for some large value of n, generating a series of block maxima, M n,1 , ..., M n,m , to which the GEV distribution can be fitted. Often the blocks are chosen corresponding to a time period of length one year, in which case n is the number of observations in a year and the block maxima are annual maxima. In this case, x is defined as the extreme rainfall (mm/day) in monthly maxima. Like the extreme value distribution, the GEV is often used to model the smallest or largest value among a large set of independent, identically distributed random values representing measurements or observations. The cumulative distribution function of the GEV distribution is The parameters of the GEV distribution are estimated by the method of maximum likelihood. A standard approach to determining the better fitting model is the maximum likelihood ratio test. Let the values X = x 1 , x 2 , ..., x n be the n years of annual maximum series. The log likelihood is given as The maximum likelihood estimate with respect to the entire GEV family can be estimated through maximizing (5) and (6) with respect to the parameter vector (µ, σ, ξ) [19].

Nonstationary generalized extreme value distribution (NSGEV)
Nonstationary processes have characteristics that change systematically through time. Nonstationarity is often apparent because of seasonal effects, perhaps due to different climate patterns in different months, or in the form of trends, more probably due to long-term climate changes. In this study, nonstationary GEV (NSGEV) distribution models will be developed to investigate whether GEV parameters exhibit any dependency with representative indices of cyclic covariate factors that have significant effects on daily rainfall in Klang Valley. Nonstationarity is introduced by revealing more than one parameters of the GEV distribution as function of cyclic covariate indices. The cumulative distribution function of the NSGEV distribution is, Consequently, the constant GEV parameters µ (or σ or ξ) are replaced by the new parameters, µ n and µ 1 (or the corresponding parameters for σ and ξ) [15]. On the other hand, it is quite difficult to estimate the shape parameter of the extreme value distribution with precision when it is time dependent, thereby; it is not realistic to attempt to estimate the scale parameter as a smooth function of time [19]. To obtain a model which is linear in the parameters,µ and σ are modeled as a combination of sine and a cosine with a 1-year period, evaluated at the centre-day of each month, c i .
For the location parameter, where,µ 1 denotes the amplitude of the sinusoidal component (estimated free parameter) φ denotes the phase ω = (2π)

352.25
which denotes the angular frequency c i denotes t. However, it is not linear in the parameters due to phase π. Hence, 8 is reformulated to obtain an equivalent linear model as In this study, the location and scale parameters with the seasonal variation described by sinsoidal functions and the shape parameter is assumed to be a constant in 10 and 11.
where, the terms above are the same as described in 8, 12 means 12 months in a year and ξ is treated as constant.

Selection of models
The significance of the models will be tested using AIC [26] and BIC [27] and LRT. Both the AIC and BIC are used to select the best nonstationary fitting models. The best fitted GEV model is selected following these three steps: (1) the model with the lowest AIC value is identified; then, (2) the model with the lowest BIC value, (3) the greatest deviance statistics value which indicates that NSGEV model explains more variability of data than GEV model. But how large is large for deviance statistics value should be determined from asymptotic distribution of deviance function. These steps imply that if one of the NSGEV model is the best fitted model, then the cyclic covariate trend is said to be significantly influenced the extreme daily rainfall at the 5% significance level based on LRT [28].
The AIC is The BIC is where,θ denotes the vector of model parameters, L(θ) denotes the likelihood of the candidate model given the data when evaluated at the maximum likelihood estimate of θ k denotes the number of estimated parameters in the candidate model n denotes the number of observations. Statistical significance of nonstationary parameters in the model M 1 against GEV model is commonly evaluated with LRT. If NSGEV models outperform the stationary GEV models; then the NSGEV models should be used instead of the GEV models for frequency analysis of extreme daily rainfall. Let M 0 and M 1 be stationary and nonstationary models respectively. LRT is employed to compare the superiority of 766 Rainfall Modelling using Generalized Extreme Value Distribution with Cyclic Covariate M 1 over M 0 using the log-likelihood difference (D) [1,19]. where

Return level
Once the best fitted GEV model for the data has been determined, the return level of extreme daily rainfall is derived. The probability P of the occurrence of the extreme events is defined as the chance of the event occurring at least once on average in T years; hence, P = 1 T [17]. More precisely, it is the level exceeded y the monthly maxima in any month with probability, p. Estimation of the return level, z p is obtained from the stationary models by inverting the cumulative function of GEV distribution.
For NSGEV model, where, µ(t) and σ(t) are described in (7) and (8) 20-year, 50-year and 100-year return levels of extreme rainfall for the year 1975 and 2005 are predicted using (12) and (13) derived from the corresponding GEV distribution.

Results
A summary of daily rainfall amount for JPS Ampang station is summarized in Table 1 where the mean, standard deviation, median, first quartile, third quartile, minimum value and maximum value are given. Total daily rainfall in January showed the least standard deviation whereas the highest standard deviation is recorded in April which indicates the largest variation in their daily rainfall amount series. This could possibly be affected by the extreme values (maximum value). Based on Fig.  2, the station recorded the highest amount rainfall in April and November due to heavier rainfall that occurred and least rainfall in January. Due to the northeast monsoon and southwest monsoon season, there are some infrequent outliers lies in the daily rainfall data series. These outliers are treated as maximum values where the GEV will be applied to fit these values.

Discussion
The presence of trend in the extreme rainfall series is tested using the MK trend test in which the null hypothesis indicates no trend. The daily rainfall is blocked into monthly as in Table  2 suggests that the annual maximum rainfall shows that there is no trend while the monthly maximum rainfall shows the existence of a trend as time increases. The seasonal monthly trend is more pronounce than annual trend as Malaysia's surface climate is influenced by two monsoon seasons namely northeast (November until February) and southwest (May until August) monsoon seasons. In between these two monsoon seasons are the inter-monsoon seasons occurring in March -April and September -October, which brings convective rainfall. The annual average rainfall is 2,420 mm for Peninsular Malaysia, 2,630 mm for Sabah and 3,830 mm for Sarawak. Malaysia has consistently hot and humid weather throughout the year as it is situated near the equator. This explains the significance of the test statistics of MK where the monthly trend is more significant than annual trend.

GEV and NSGEV models
The total daily rainfall amount per month for JPS Ampang station during the period January-December for all available year is considered. The time blocks of fixed lengths (months) is considered and the maximum value is taken for each block. There are four GEV or NSGEV distribution models presented in this study: GEV0 is a stationary model where the parameters are treated as constant; NSGEV1 is a nonstationary model where the location parameter is treated as time dependent parameter; NSGEV2 is a nonstationary model where the scale parameter is assumed as time dependent parameter; NSGEV3 is a nonstationary model where both scale and location parameters are treated as time dependent parameters. The covariate functions in each model are presented in Table 3.

Parameter estimations and model selection
GEV distribution is parameterized with a shape parameter ξ, location parameter µ and scale parameter σ. Nonstationarity was introduced by revealing more than one parameters of the GEV distribution as function of cyclic covariate. µ and σ are described by sine and cosine functions while ξ is assumed to be a constant. In Table 4, the nonstationary models used in this study incorporated unfixed location and scale parameters and the shape parameter is fixed. The "ismev" package in    Table 3. Developed GEV and NSGEV moels for monthly maxima rainfall data.
Model ID Parameters of model Remarks GEV0 µ (constant), σ (constant), ξ (constant) Stationary model NSGEV1 σ (constant), ξ (constant), Nonstationary model with µ(t) = µ 0 + µ 1 sin(2πt/12) + µ 2 cos(2πt/12) time dependent location parameter NSGEV2 µ (constant), ξ (constant), Nonstationary model with σ(t) = σ 0 + σ 1 sin(2πt/12) + σ 2 cos(2πt/12) time dependent scale parameter NSGEV3 ξ (constant), σ(t) = σ 0 + σ 1 sin(2πt/12) + σ 2 cos(2πt/12) Nonstationary model with σ(t) = σ 0 + σ 1 sin(2πt/12) + σ 2 cos(2πt/12) time dependent location and scale parameter hood under the different models is listed to enable the LRT test statistics to be computed. The result of model analysis with monthly maxima rainfall series of Klang Valley is given in Table 5. NSGEV3 is found to be the best NSGEV model for monthly maximum rainfall in which the trend is introduced in location and scale parameter with cyclic covariate. The AIC and BIC values indicate that NSGEV3 is the best or adequate model among the four models. Furthermore, the likelihood ratio test is used to evaluate between the choices of model which is D(θ) = 2[l(θ n ) − l(θ)] whereby l is the log likelihood function, based on Chi-Square distribution with critical value of a degree of freedom of 1 is 3.84 at the 0.05 significance level. The deviation value between NSGEV3 and GEV0 is 4.416 which is greater than the critical value (3.84). This indicates that model NSGEV3 is the most superior to the stationary model. Figure 4 show the model diagnostics from fitting the extreme monthly rainfall (mm/day) for one GEV model and three NS-GEV models. The probability and quantile plots in Fig. 4(A) shows points are close to the linear line which means the fitted model is valid. The negative estimates of the ξ cause the return level curve to asymptote to a finite level. Thus, the estimated curve for the return level plot is approximately linear since the values of ξ is close to zero. For NSGEV1, NSGEV2, NSGEV3, only the residual probability and quantile plots are displayed with the quantile plot on the Gumbel scale in Fig. 4(B), 4(C) and 4(D). Adequacy of best selected model is checked with two graphical diagnostic plots. Inspection on NSGEV 1 and NSGEV2 resulted in similar results. These plots reveal that all models are well-fitted.

Return level estimate
Return values contain two quantities; return period 1/p and return level (recurrence level) z p . Finally, the best fitting GEV model is used to estimate the 20-year, 50-year and 100-year return levels of extreme rainfalls at JPS Ampang Station between the years 1975 and 2005. The highest daily rainfall for the 30year observation period is 135mm/day. Return levels are used to predict the probability that a daily maximum rainfall exceeding 135mm/day will occur in a longer period. The estimation of the T -year return levels for T = 20, 50 and 100 return periods are estimated as shown in Table 6. Equation (16) is used to estimate the return levels for return period = 20, 50 and 100 years as confidence of shape parameter ξ is not including 0. The return levels for monthly selection periods with their 95% confidence intervals which are obtained by profile likelihood are displayed in brackets in Table 6. From this table, it can be observed that return level estimates increase as the return periods increase. The results also prove return level estimation of the best fitted GEV model indicates the rainfall does not exceed extreme rainfall value for in the next 20, 50 and 100 years in terms of monthly extreme rainfall which is 135mm/day of the observation rainfall data.

Conclusions
The purpose of this study is to provide statistical knowledge for the dependence of nonstationary effects of trend to be conveniently accounted for risk management. GEV distribution is used to model extreme rainfall using data obtained from JPS Ampang station for the period from 1975 to 2005. Due to the presence of the seasonal monsoon season in Malaysia, cyclic covariate trend is applied in the GEV distribution in this study. In this study, NSGEV distribution model is presented. The focus is on nonstationarity in the location and scale parameters of the GEV distribution in order to counter the seasonality factor in rainfall series. In the case of extreme value variables, the NSGEV model is better in capturing the mean ad variance than GEV0. In particular, the models presented correspond to the stationary GEV model, the case in which there is a cyclic covariate trend in the location parameter, the case where the scale parameter is a cyclic function of the covariate and the case with cyclic covariate trend in both location and scale parameters. Parameter estimation for these models is generally carried out using the MLE method. The Block Maxima with GEV is applied on 30-year daily rainfall data in Klang Valley during the year of 1975 until the year of 2005. The data stationarity is also a concerning issue for this study, thus MK test is used to assess the stationarity of the data. For this study, the data is blocked into monthly and annually then the MK test is applied on the data. Results have shown that there is an existence of significance trend in the monthly extreme rainfall whereas there is an absence of significance trend in the annual extreme rainfall series. Overall, by considering the model selection criteria, NSGEV3 produces the smallest AIC value and BIC value. The deviation value between NSGEV3 and GEV0 is greater than the critical value therefore null hypothesis is rejected at 95% significance level which indicates that model NSGEV3 is the most superior to the GEV0. It can be thus concluded that NSGEV3 better fits the extreme rainfall data in JPS Ampang station. Based on the best fitted GEV model, the return levels are then computed in the period of 20 years, 50 years    and 100 years later. The return level estimation of best fitted GEV model indicates the rainfall does not exceed extreme rainfall value for T = 20, 50 and 100 in terms of monthly extreme rainfall which is 135mm/day of the observation rainfall data.
This study has proved the capability of EVT serves as a useful analysis tool in describing the extreme events. Hence, nonstationary extreme events models with cyclic covariates could become the valuable tools to assess the future changes in extreme rainfall distribution and quantiles for engineering design and flood risk management purposes. The limitation is that other covariate trend apart from cyclic trend should be considered for this research purpose such as linear trend or quadratic trend therefore it is easier to evaluate the performance of GEV model distribution in extreme rainfall. For recommendation, this paper suggests the use of GEV approach in modelling rainfall in Malaysia for future study as it is a widely used distribution tools for extreme events which is commonly found in the past literatures. This study also recommends that the most recent rainfall data of extreme rainfall should be accounted to carry out the future research instead of taking the past 14 years for research purpose. On the other hand, to design the infrastructure in a changing climate, it is vital to develop a nonstationary model to fit the extreme daily rainfall by modelling present trend in the observed extreme daily rainfall. There is an urgent need for the public authorities in the country to improve monitoring, modelling, and forecasting techniques in extreme daily rainfall in order to develop more effective flood protection measures and management. Findings of this study are expected to make significant contribution to the study area by increasing the understanding of nonstationarity of extreme rainfall in Klang Valley, Malaysia.