Estimating Weibull Parameters Using Maximum Likelihood Estimation and Ordinary Least Squares: Simulation Study and Application on Meteorological Data

Inefficient estimation of distribution parameters for current climate will lead to misleading results in future climate. Maximum likelihood estimation (MLE) is widely used to estimate the parameters. However, MLE is not well performed for the small size. Hence, the objective of this study is to compare the efficiency of MLE with ordinary least squares (OLS) through the simulation study and real data application on wind speed data based on model selection criteria, Akaike information criterion (AIC) and Bayesian information criterion (BIC) values. The Anderson-Darling (AD) test is also performed to validate the proposed distribution. In summary, OLS is better than MLE when dealing with small sample sizes of data and estimating the shape parameter, while MLE is capable of estimating the value of scale parameter. However, both methods are well performed at a large sample size.


Introduction
Weibull distribution is one of the extreme value distributions (EVD) that is used to deal with outliers that occur in datasets. Outliers are values that are either extremely high or extremely low that can affect and provide false information of the datasets if not using the right type of distribution. The Weibull distribution is considered as Type III as it obeys three characteristics of EVD. This continuous probability distribution has an inverse distribution and it is made up of independent, identically distributed random variables. Weibull distribution is widely used in industrial fields where it is used in assessing product reliability to model failure times and life data analysis. The parameters associated with Weibull are scale shape and location parameter. The three parameters probability density function (pdf) will have only two parameters when the location parameter, is equal to zero [1]. The two parameters of Weibull are commonly used in failure analysis as no failure can happen before or when the time is zero [2]. Both scale and shape parameters affect the distribution characteristics such as the shape of the probability density function (pdf), the reliability and also the failure rate. The scale parameter, , determines the scale of the distribution along the time axis. Hence, the change in α will change the distribution function in the scale of time but not the shape function. It will stretch the existing shape of the distribution function. As for the shape parameter, which is also known as the slope of Weibull , it controls the shape of the Weibull distribution function. It will also determine the failure rate of the data. EVD is commonly used for frequency analysis as well as risk and reliability analysis of the life times of systems and their components. Its applications have been reported frequently in hydrology and meteorology. It has been widely used in the provision of the magnitude of extreme events in hydrology. However, before selecting an appropriate model, one should employ an efficient parameter estimation method to acquire the parameters which are reliable. The problem of estimation is to utilize the sample observations to construct a good estimator of the parameters. The efficiency of the parameters depends on the method of estimation, type of data and other factors. In order to make sure that EVD performs well, all the parameters need to be estimated properly.
The term "parameter estimation" is referring to the process of using sample data to estimate the parameters of the distribution. It is a useful technique for determining the best value of certain model parameters through data assimilation or other similar techniques. It is to ensure that the estimated parameters are optimized and able to provide accurate information of the data. The failure to do so may result in making the wrong decision. Hence, the estimation methods need to be evaluated to find the most suitable one to use. There are many studies conducted to compare various parameter estimation methods to determine which is the most appropriate in all cases. Unfortunately, there is no specific method in determining the best parameter estimation method for EVD. Thus, it is most likely that an inappropriate method will be chosen to be used in the studies which results in inaccurate findings. Thus, the estimation procedures must have been done carefully. This is to ensure that the distribution function or the probability plot of the datasets will give the correct information to the researchers. For example, Weibull distributional form was first derived by [3] by using an extreme value approach. Weibull et al. [4] had demonstrated that Weibull distribution fit many different datasets and gave good results, even for small samples. Zhang et al. [5] stated that the concave and convex properties of the distribution function are due to the different values of the shape and scale parameters in the Weibull distribution, and the function characteristics were used to determine the range of the failure probability. Since it is an important model for reliability and maintainability analysis, Bartkute and Sakalauskas [6] stated that the parameters associated with Weibull distribution need to be evaluated precisely, accurately and efficiently. According to [7], the estimation of parameters of Weibull distribution exists in both graphical analytical methods. Evans et al. [8] concluded that the methods for estimating the values of the parameters have a wide variety of procedures and adaptations of procedures. The study categorized the estimation of procedures into four major categories; method of moments estimator, linear es-timators, estimators based on few order statistics and maximum likelihood estimation (MLE).
These four categories which are analytical methods are considered more accurate and reliable compared to the graphical method [7].
MLE is commonly used for the estimation of the parameters of a statistical model where this method provides estimates for the model's parameters from a dataset and given statistical model. MLE is a way of inferring parameter values from sample data. Parameters are chosen such that it maximizes the probability (likelihood) of drawing the sample that was actually observed. MLE was first proposed by [9]. Lei [10] mentioned that MLE is considered as the best method as it is asymptotically the most efficient method, and thus it is the most frequently used method to estimate parameters of distributions. Olteanu and Freeman [11] mentioned that MLE provides better estimates of the Weibull scale parameter and recommended practitioners to use MLE in estimating the parameters and proves that MLE can easily accommodate right censoring, as well as left censoring and interval censoring. Evans et al. [8] stated that the main reason MLE is commonly used is usually due to its large sample efficiency. Schneider [12] has proved the MLE properties which are: asymptotic unbiasedness, strong consistency, efficiency and good finite sample properties. However, MLE is not always used to estimate the parameters of a Weibull distribution due to three concerns; potential problems in calculating Weibull parameter estimates, the bias of the estimates for small samples, and the possible existence of more efficient and simpler estimates for small sample size. Nevertheless, [13] explained that under appropriate circumstances, the bias in MLE is relatively low, and will disappear as the sample sizes increases. In several cases where the distribution is more complicated and has no closed form, MLE may not perform well and need to use iterative method such as Newton Raphson method to obtain the optimized parameters numerically. Dibal et al. [14] found out that the combination of MLE and numerical methods (Newton-Raphson), provides an efficient means to estimate the parameters of two-parameter Weibull distribution. Gupta et al. [15] proved that using Newton-Raphson iteration method to solve MLE for two-parameter Weibull distribution always converges quadratically and monotonically. Manurung et al. [16] stated that Weibull distribution that estimates using MLE and Newton-Raphson iterative procedure give a good description and prediction in analysing and interpreting the data. Nonetheless, MLE is only efficient for large sample sizes [13] while the estimators are very unstable for small sample sizes [14,15].
Alternatively, a simpler method yet efficient known as ordinary least squares (OLS) method has also been used to estimate the parameters. When using the OLS, the sum of squares of the deviations should be minimized. Safi and White [17] explained that OLS estimator is efficient when the disturbances have mean zero, constant variance, and are uncorrelated. Ramí rez and Carta [18] also stated that OLS provides a robust and computationally efficient alternative to techniques used. The idea of OLS is to find the values of parameters that minimized the squared errors; the sum of squared differences between the data values and their corresponding modelled values. This requires a dataset to be fitted to a straight line so the error of the sum squares will be reduced. Thus, to get a general result, this study will compare the performance of the MLE and OLS using Monte Carlo simulation for various sample sizes. Consequently, the best parameter estimation method for EVD will be selected based on several model criteria.

Methods
Monte Carlo simulation study is carried out in order to test the efficiency of the estimation methods. The random sample of sizes, = 5, 30, 50, 100, 300, 500, 1000 and 2000 are generated from Weibull distribution with parameters and . The scale and shape parameters are chosen as = 0.5 and = 1.0 respectively. For each combination of and = 5000 random samples are generated by the Monte Carlo simulation [19]. Then, the parameters are estimated by using both MLE and OLS. The values of AIC and BIC will be obtained, and the results will be used to compare the efficiency of the estimation method in estimating the parameters of Weibull distribution. The estimates with smaller standard error (SE), AIC and BIC are preferred. Figure 1 shows the flow chart of research methodology.

Weibull Distribution
The probability density function and the cumulative distribution function of a two parameter Weibull distribution with scale parameter, α > 0 and shape parameter, β > 0, are given by, The cumulative distribution function is, where x is the random variable.

Maximum Likelihood Estimation (MLE).
The method of MLE is a common procedure to estimate parameters of a model's distribution which are assumed to be independent and identically distributed. The parameters are estimated by maximizing the likelihood function. Let be a sample of size obtained from a probability density function ( ̂) where ̂ is an unknown parameter. The likelihood function is given as, The MLE of ̂ is the value of ̂ that maximizes the likelihood function or the log-likelihood function where, ̂ (4) By applying Eqn. (3) to the Weibull probability density function in Eqn. (2), the likelihood function will be, Taking the logarithms of Eqn. (5), differentiating with respect to α and β and equating to zero, the equations become, By eliminating α from both Eqns. (7) and (8) and simplifying the equations, The estimate ̂ can be obtained using Eqn. (9). However, the estimate ̂ must be solved numerically as the Eqn. (10) has not produced the analytical solution. It can be accomplished by applying the optimization method. One of the most used methods for optimization is the Newton-Raphson method. The Newton-Raphson method requires finding the inverse of the Hessian, at each iteration. Newton-Raphson method will be used to get the iteration value until a convergent estimator is achieved. It can be written as, (11) where is the iteration. Eqn. (10) is used as the initial point, . Next, Eqn. (9) is substituted into the log-likelihood function in Eqn. (6) to obtain, The partial maximized log-likelihood function is called the profile log-likelihood. Then, Eqn. (14) is differentiated twice with respect to β to form, ∑ ∑ ∑ (15) and, The convergence criterion is given as,

Ordinary Least Squares (OLS)
The OLS estimates the parameter by finding the regression line such that the sum of the squared error is minimized. The fitted value for when = is defined as, The error or the residual is defined by the difference between the observed value, and the predicted value, ̂ . Hence, the residual for i th observation, , is given as, Hence, the OLS estimates minimize the sum of squares residuals, To apply the Weibull distribution to Eqn. (18), double logarithmic transformation of the cumulative distribution function in Eqn. (2) is performed, From Eqn. (25), let, which can be written as, This is a straight line with gradient, β and intercept, . Hence, The data are sorted into ascending order to approximate . Then, mean rank will be used as follows, ̂ where is the rank order of the data. The estimation for is obtained from in Eqn. (38) where, The estimate values for α are derived from the intercept, . Therefore, where,

Model Selection Criterion
To assess the performance of MLE and OLS, AIC and BIC approaches will be used. The AIC is given as, where ( ̂) is the likelihood function of the data when evaluated at the maximum likelihood estimate of θ and is the number of estimated parameters. For small sample sizes, the second-order AIC (AIC c ) should be used instead.
The AIC c is, where n is the number of observations. A sample size is considered small when . Next, another model selection criterion used is BIC. The difference between BIC and AIC is the greater penalty imposed for the number of parameters by the former than the latter. The formula of BIC is computed as, The estimates with smaller values of AIC and BIC are preferred.

Application on Wind Speed Data
The real data used in this study is obtained from Environment Canada. The wind speed datasets at weather station located in Alberta, Canada corresponding to 21 years (1991-2011) were recorded daily for every month with hourly frequency at hub height of 10 meters. The quantile-quantile plot (Q-Q plot) is plotted while Anderson-Darling (AD) test is performed for numerical method. Weibull distribution will be fitted on the data in which and will be estimated using both MLE and OLS methods. The performance of both methods will be identified based on AIC and BIC and the results will be compared with the simulation results.

Anderson-Darling (AD) test
Anderson-Darling (AD) test was developed by [20] test claimed the sample was taken from a population with density function, as the cumulative distribution function is the integral of the density function [20]. In this study, AD test is used to test whether the data fit to the Weibull distribution. The hypotheses are where is the data sort in increasing order. At α = 0.05 level of significance, the critical value is 0.757. The null hypothesis H 0 is rejected if the p-value is less than the level of significance or the AD test value is more than the critical value.

Quantile-quantile (Q-Q) plot
The Q-Q plot is a graphical tool that is used to test if a dataset comes from theoretical distribution. It is a scatterplot that is created by plotting two sets of quantiles against one another. Q-Q plot will sort the dataset in ascending order and plot them along the y-axis as the sample quantiles against quantiles calculated from a standard Normal distribution plotted along the axis known as theoretical quantiles. Wilk and Gnanadesikan [21] have stated that an elementary property of Q-Q plots would be linear if is a linear function of , but with possibly altered in position and slope. From the Q-Q plot, the skewness and the kurtosis of the distribution can be observed. If the Q-Q plot displays curve size with slope increasing from left to right, it means that the distribution of data is skewed to the right whereas curve size with slope decreasing from left to right, reveals the distribution skewness to the left [22]. Table 1 illustrates the results of the comparison for selected sample sizes, and considered values of the Weibull parameters, and . The following conclusions are drawn from the results of the Monte Carlo simulation study. When is small, OLS (i.e =5) performs better compared to MLE based on the smaller AIC and BIC values alt-hough the shape parameter is not significant at 0.05 significance level. It is observed that both methods tend to overestimate the estimated values for both parameters maybe due to biasedness exists in small sample size. However, as increases, MLE is slightly better than OLS with lower values of AIC and BIC. Schoonbroodt [23] stated that MLE is consistent when the sample size is large whereas a very large bias is detected when relatively small sample sizes are used. Gavilanes [24] claimed that OLS estimator is less reliable as the coefficients for each variable tend to be unstable in the presence of multicollinearity.

Monte Carlo Simulation Study Results
The 95% confidence intervals (CI) are also quite wide for both parameters indicates that it is less sure and perhaps information needs to be collected from a larger number of sample sizes to increase the confidence. Nevertheless, as n increases, the AIC and BIC values between MLE and OLS are closer to each other which suggests that both methods have good performance in estimating the parameter.
The estimated parameters are approximate to the true values as larger with lower SE and hence the estimation precision of the parameters increases. The parameters are also highly significant at = 0.05 based on the 95% CI. From the results, it can also be noticed that the value of log-likelihood is negative. This is because optimization and real analysis proves that maximizing the likelihood function is also equivalent to minimizing the negative log-likelihood. The method with smaller values of SE, AIC and BIC determines the most efficient method to estimate the parameters of the Weibull distribution. The method's performance can also be compared by analyzing the value of the log-likelihood as the higher the value of likelihood, the better the method fits to the model. Figure 2 shows the density plots of Weibull distribution for = 5, 30, 50, 100, 300, 500, 1000 and 2000 with considered values of the Weibull parameters and . From Figure 2, it can be observed that all the graphs are heavily tailed and positively skewed except for when = 5 where the density plot becomes bimodal. From the figure, it is noticed that the slope of the curve becomes steeper as the value of shape parameter, , decreases whereas the scale along the axis gradually increases when the value of scale parameter, , increases.

Monthly Wind Speed Data
Q-Q plot and boxplot are plotted for the real dataset in each month and the results are shown in Figure 3. From the Q-Q plots, it can be observed that the data is not normally distributed as there are plots that deviate from the straight line and curve upwards at both ends for all months. This indicates that the distributions are positively skewed. Pleil [25] stated that if both the high and low end of the data set have deviations from the straight line, this suggests an impact of outlier points that are likely not part of the overall distribution or may display a form of heavy-tailed behavior. The Q-Q plots also illustrate that more data are concentrated at the ends of the distribution than at the center which shows there are outliers in the data. These observations from Q-Q plots are supported by the boxplots. It can be seen that from the boxplots, data are located outside the outer fence indicate that there are outliers in the wind speed dataset in each month. The medians are closer to the lower quartile with shorter whisker on the lower end of the boxes which means that the data is positively skewed. In conclusion, the data is suitable to analyze using extreme value distribution since the Q-Q plots and the boxplots show the presence of outliers in the wind speed dataset. Furthermore, the positively skewed plot proves that the data is not normally distributed.
The results for monthly wind speed data are shown in Table 2. Firstly, the estimation methods are executed at all months to estimate the parameters of and It can be observed that both methods show good performance in estimating the value for and . The parameters are also highly significant at = 0.05 based on the 95% CI. For MLE estimates better than OLS as the SE for MLE are smaller than OLS for all months whereas OLS performs better than MLE in estimating judging from the smaller value of SE. Then, all the monthly data are tested using AD test where the null hypothesis is rejected if the statistic value is more than the critical value. The test statistics and p-values for each month are presented in Table 3. It can be observed that at 5% level of significance, with critical value 0.745, all the test statistics are less than the critical value. The p-values are also larger than 0.05 which indicates that Weibull distribution is a good fit for the wind speed dataset. Next, to compare which methods perform better in estimating Weibull parameters, AIC and BIC are calculated. The results in Table 2 show that for each month, MLE has smaller value of AIC and BIC than OLS which indicates that MLE performs better in estimating the parameters. Moreover, the p-values of AD test for MLE are larger than OLS which means it is far from critical value and prove that it is the better method. Nonetheless, the AIC and BIC between the two methods are not much different.

Annual Wind Speed Data
The wind speed dataset is also analyzed by plotting Q-Q plot and boxplot according to years and the results are illustrated in Figure 4. The Q-Q plots and boxplots of the years have similar distribution with the graphs on the monthly basis showed in Figure 3. The plots that deviate from the straight line and curve upwards at both ends shows that the annual data are non-normally distributed. The upward trend of the Q-Q plots shows that the data corresponding to 20 years have positively skewed distribution. Outliers also exist in the dataset as more data are plotted at the extreme of the distribution than at the center. From the boxplots, it can also be observed that there are many points located over the outer fence of the higher quartile which prove the existence of outliers in the dataset. In conclusion, the wind speed dataset is also appropriate to analyze on an annual basis using extreme value distribution given that the Q-Q plots and the boxplots show the presence of outliers. As seen in Table 4, values are most likely the same between MLE and OLS. However, the SE of MLE is lower than OLS. Similar to monthly data, the SE of OLS for is lower than MLE. All estimated parameters are also seeming to be significant based on the 95% CI. In addition, the p-values of AD test for MLE are larger than OLS which means it is far from critical value and proves that it is the better method as shown in Table 5. However, the AIC and BIC between the two methods are not much different.

Conclusions
The purpose of this study is to evaluate the performance of two estimation methods, MLE and OLS in estimating the scale, and shape, parameters of Weibull distribution. To achieve the main objectives, Monte Carlo simulation study was carried out with sample sizes, n = 5, 30, 50, 100, 300, 500, 1000 and 2000 with three sets of parameters. Based on the simulation study, there are several conclusions that can be highlighted. Firstly, OLS shows good performance while MLE shows poor performance in estimating the parameters when the sample size is small. However, as the sample size increases, the performances of MLE and OLS improve significantly, and the value of estimated parameters are close to the true value of parameters in which demonstrate the accuracy of both estimators. Next, to further compare the performance of each method in estimating Weibull distribution parameters, both MLE and OLS are executed on the real dataset. The dataset chosen is hourly wind speed data at a weather station in Alberta, Canada for 20 years (1990-2011). In order to provide more complete analysis, the tests were carried out on both a monthly and annual basis. Both monthly and annual data are suitable to fit with Weibull distribution. Secondly, the analysis obtained from the results shows that both methods display excellent performance in estimating the parameters. However, it can be assumed that MLE is able to estimate the value of better, whereas OLS is better in estimating . This can be seen from the smaller value of SE for both parameters.
Subsequently, to test the efficiency of MLE and OLS in estimating the parameters of Weibull distribution, two model selection criteria are used, AIC and BIC where the method with the smaller value of AIC and BIC is chosen as the most efficient method in estimating the parameters of Weibull distribution. Subsequently, the p-value from AD test in real dataset can also be used in comparing the efficiency of the method where the greater p-value is better as it is further from the significance level. Predicated on the AIC and BIC in the simulation studies, it can be observed that the values of AIC and BIC of both MLE and OLS are constantly smaller for all sample sizes. When the methods were implemented on real dataset, the results also showed that MLE has smaller value of AIC and BIC compared to OLS. However, the differences of the AIC and BIC are not too great. In addition, this is also sup-ported by the results of AD test where Weibull seems to be a good fitted distribution when estimating parameters using the MLE and OLS. In conclusion, both MLE and OLS are efficient methods and comparable to each other in estimating the parameter of Weibull distribution especially for large sample sizes given that both techniques produce smaller SE as well as smaller value of AIC and BIC. However, researchers may need to consider using the OLS in estimating the shape parameter of EVD, as well as when they are dealing with data with a small sample size ( is less than 30).