Parameter Estimations of the Generalized Extreme Value Distributions for Small Sample Size

The standard method of the maximum likelihood has poor performance in GEV parameter estimates for small sample data. This study aims to explore the Generalized Extreme Value (GEV) parameter estimation using several methods focusing on small sample size of an extreme event. We conducted simulation study to illustrate the performance of different methods such as the Maximum Likelihood (MLE), probability weighted moment (PWM) and the penalized likelihood method (PMLE) in estimating the GEV parameters. Based on the simulation results, we then applied the superior method in modelling the annual maximum stream flow in Sabah. The result of the simulation study shows that the PMLE gives better estimate compared to MLE and PMW as it has small bias and root mean square errors, RMSE. For an application, we can then compute the estimate of return level of river flow in Sabah.


Introduction
Extreme Value Theory (EVT) is a statistics field that concentrates on any possible event that can be led to more extreme than it is normally happening. Usually, EVT is used to measure safety during catastrophic events, sometimes if we do not pay attention to the risk of an event because it just has a low occurrence it will cause huge losses. Therefore we can use EVT in a specific location to estimate the frequency and cost of such events over a period of time. EVT has been widely used in various fields such as geophysical variable, insurance, risk management and hydrology [19]. There are two approaches used when it comes to analyzing the extreme value, which is Block maxima (BM) and peak over the threshold (POT). In BM, the period will be divided into equal section and the maximum of each will be selected. The approach is usually going to pair with generalized extreme value (GEV). While POT will select every value that exceeds a certain threshold and this approach leads to generalized Pareto distribution (GPD) [1].
GEV distribution was introduced by Jenkinson [3] and has been used in many research areas such as in civil engineering design [4], in hydrology [2], to estimate air quality [15] and also in finance [14]. The GEV distribution consists of three parameters; shape ( ) , scale ( ) and location µ. This parameter estimation of GEV distribution can be obtained using several statistical methods such as the Maximum Likelihood Estimator (MLE), Probability Weighted Moments (PWM), Penalized Maximum Likelihood Estimator (PMLE) and L-moment. The aim of this study is to model the annual maximum stream flow using the GEV distribution focusing on small sample size data. We apply several methods to estimate the GEV parameters.
Each method of parameter estimation has its advantages and disadvantages. But to get ideal parameter estimation, it can be explained in terms of unbiasedness, efficiency and consistency. It is said that the parameter estimation must be unbiased where the estimated parameter closed to the true parameter and the parameter estimation is efficient. The method with the smallest of the root mean square error (RMSE) shows an efficient estimator. Other than that, the parameter estimation must be consistent where the function of estimation is well converged [12].
MLE is the method that is mostly used to estimate the GEV parameter because MLE has good asymptotic properties such as consistency and efficiency. MLE is easy to adapt to model change [12]. Besides that, MLE can be used in a complex model such as the non-stationary model, temporal dependence and covariate effect. However, this parameter estimation can only be used in large sample data and the result will become uncertain if the data is less than 48 Parameter Estimations of the Generalized Extreme Value Distributions for Small Sample Size 50 values (minimum) [7]. This is confirmed by Hosking et al. (1985) that MLE shows a poor performance due to the small sample size. Considering MLE cannot perform well in small sample size, Coles & Dixon [18] show an investigation about how to improve MLE by proposing an alternative method called PMLE. Their study stated that PMLE will not only maintain model flexibility and large sample optimality of MLE, also help to improve it on small properties. This may be concluded that PMLE is given an improved smoother estimation along with better accuracy thandirect estimate without penalties [20].
PWM was probably advantageous for small set data because it has smaller uncertainty than ordinary moment [19] and has lower variance than others [12]. But when the shape parameter is large, this parameter estimation performs poorly [18] and upper quantile will show PWM is biased. But PWM is still preferable than MLE for small sample size data. On the other hand, MLE is more flexible than PWM because covariate can be easily added in parameterization [11].
PWM is equivalent to L-moment and it performs better than MLE in terms of bias and RMSE [17,10].L moment is the summary statistic, where it provides a measure of location, kurtosis, skewness or any aspects of shape that explain about probability distribution and data sample. Although L-moment produces bias, but it is still preferable due to having a smaller variance than MLE [13] as MLE produced a very large variance and error for estimation [9]. However, L-moment parameter estimation can only be used to estimate the stationary process [8]. Therefore L-Moment and MLE can be "mixed' to produce a better result for GEV parameter estimator. The outcome of this combination helps to reduce variance and bias [13]. In this study, we will illustrate the GEV parameter estimations using simulation study. The superior method then will be applied to model the annual maximum stream flow in Sabah.

Methodology
The previous study has shown that PMLE is more superior to other methods. In this study, we will illustrate the GEV parameter estimation using 3 parameter estimates such as MLE, PWM, and PMLE. We conducted simulation study for methods comparison using R software with our own written code. From the result, then we will apply this method to model the annual maximum river flow in Sabah.

GEV Distribution
The GEV distribution is a family distribution consisting of three distributions called as Gumbel, Fré chet, and Weibull. These distributions can fit the extreme data set with high accuracy. Choosing only one of family GEV distribution may cause bias in data and the term of distribution, uncertainty will be ignored [12].
The GEV distribution having the non-degenerate distribution function fulfills where and are constant with > 0 The cumulative distribution function (CDF) of GEV distribution is denoted as follows [13]: in this model , and µ are the parameters for shape, scale, and location. By equation (2) GEV distribution for Frẻchetξ>0and Weibullξ<0, while for Gumbel distribution ξ=0 taken as ξ→0.

Maximum Likelihood Estimation (MLE)
Generally, MLE is the most popular estimation method in EVT because MLE is having good asymptotic properties such as consistency, efficiency, and normality. MLE can be applied to complex modeling situations such as temporal dependence, non-stationary and covariate effects [12]. The likelihood function can be written as where g is probability density function of GEV When the sample size rises to infinity, it is said that the MLE shows consistent estimator and the variance will go to zero. The asymptotic theory allows MLE to be normally distributed as the sample size rises. MLE was chosen due to the stable performance in a large sample size (n>50) [5]. The parameter estimation for GEV can be obtained by maximizing log likelihood function with respect to parameters.

Penalized Maximum Likelihood Estimator (PMLE)
The Where ( ) ,, L    is the standard likelihood function of 49 MLE from equation (4) and the penalty function p ( ) is shown in equation (6): Where the appropriate value for  and  is nonnegative. The PMLE will help to overcome the poor result of MLE due to the small sample size. This is supported byColes & Dixon[18], where they have conducted a study to explain the behavior of penalized likelihood. The PMLE was almost identical to the MLE for the case of that is a negative value. But if shows positive value PMLE will be almost the same with PWM, hence the characteristics of PMLE will inherit smaller variance at expense of negative bias [12]. Overall, PMLE has properties that will match in all sample sizes and helps to improve MLE and PWM.

L-moment is a method based on a combination of PWM [7], hence PWM is equivalent to L-moment for GEV distribution [6] and this method was introduced by Hosking [10].
Random variable X for PWM and L-moment can be defined as; X is distribution function for F and ̂ is the estimate of empirical distribution in; Therefore parameter of GEV can be estimated using this equation; ξ = 7.8590c + 2.9554c 2 (9)

Return Level
Return level is frequently used to convey information about the likelihood of extreme events such as earthquake, flood, hurricanes, etc [19]. For the application above method, we can estimate the return level by using equation 13.

Simulation Study
We illustrate the comparison of GEV parameter estimations using a simulation study. For this purpose, we simulate extreme events from GEV distribution, X~GEV for (0,1,0.15) with a sample size of n=30. We repeat this simulation for 1000 times. For each case, we estimate the parameter estimation using MLE, PWM, and PML. Then we compute the bias and RMSE for method comparison. Table 1 shows the GEV parameter estimation by MLE, PWM and PMLE. It shows that estimation is close to the actual value, ̂≈ .  Table 2 shows that the biasness is close to zero for all parameter estimation methods. As we can see from Table  3, PMLE produces smaller RMSE of compared to other methods. Hence we can conclude that PMLE is superior compared to MLE and PMW as shown in a previous study Coles & Dixon [18] and Musakkal [2].

Application of Data Stream Flow in Sabah
This study uses data annual maximum streamflow (m 3 s -1 ) from several stations in Sabah. Data were obtained from the Hydrology Department of Sabah. The data were collected from several stations. Table 4 shows the number of observations for each station. We applied the result from the simulation study in modeling the annual maximum river flow in Sabah. As a result, Table 5 shows the GEV parameter estimation by using the PMLE method. We evaluate the goodness of fit of GEV using the Q-Q plot with a 95% tolerance interval. The Q-Q plot is a useful tool to check the empirical distribution that is close or similar to the critical distribution. As a result, GEV has fitted well the annual maximum river for all stations. Figure  1 shows the example of the Q-Q plot with a 95% tolerance interval for the GEV fit of annual maximum river flow at station Segama. It can be seen that all points are scattered in a straight line with slope equal to 1 and within 95% tolerance interval. We then calculated the return value of annual maximum for each site with p=0.01. The corresponding return value estimation for all station is shown in Table 6.

Conclusions
The simulation study shows that the PMLE gives a better estimate compared to MLE and PMW because it has small bias and RMSE. We then use this result for an application of modeling the annual maximum river flow in Sabah. The GEV distribution is the appropriate model for these extreme data. For the application we used 100 years return level for each of station. It shows that the theoretical distribution is similar to the empirical distribution. For

Estimated Data
Observation Data future study, we will consider the effect of the covariate in the model [14]