Stationary and Non-Stationary Models of Extreme Ground-Level Ozone in Peninsular Malaysia

High ground-level ozone (GLO) concentrations will adversely affect human health, vegetations as well as the ecosystem. Therefore, continuous monitoring for GLO trends is a good practice to address issues related to air quality based on high concentrations of GLO. The purpose of this study is to introduce stationary and non-stationary model of extreme GLO. The method is applied to 25 selected stations in Peninsular Malaysia. The maximum daily GLO concentration data over 8 hours from year 2000 to 2016 are used. The factors of this distribution are anticipated using maximum likelihood estimation. A comparison between stationary (constant model) and non-stationary (linear and cyclic model) is performed using the likelihood ratio test (LRT). The LRT is based on the larger value of deviance statistics compared to a chi-square distribution providing the significance evidence to non-stationary model either there is linear trend or cyclic trend. The best fit model between selected models is tested by Akaike’s Information Criterion. The results show that 25 stations conform to the non-stationary model either linear or cyclic model, with 14 stations showing significant improvement over the linear model in location parameter while 11 stations follow the cyclic model. This study is important to identify the trends of ozone phenomenon for better quality risk management.


Introduction
Extreme ground-level ozone (GLO) in this study refers to the maximum value of GLO concentration in a specific period of time such as yearly, monthly, biweekly or weekly. Higher concentrations will adversely affect human health, indirectly affect the efficiency of photosynthesis in plants as well as other activities in the ecosystem as mentioned in several literatures [1]- [3]. Therefore, continuous monitoring for GLO trends is important to ensure that the level of ozone concentration does not exceed the standard recommendation stated by the National Ambient Air Quality Standard (NAAQS) and Malaysia Ambient Air Quality Standard (MAAQS). MAAQS has recommended that ozone concentration values should not exceed 0.06 ppm for an average of 8 hours.
Several studies have explored the trends of this GLO specifically in Peninsular Malaysia such as in [4]- [7]. Most of the studies found that there are seasonal variations in GLO according to locations which are influenced by the interchange of the two monsoons, namely the Northeast Monsoon and the Southeast Monsoon. However, most 358 Stationary and Non-Stationary Models of Extreme Ground-Level Ozone in Peninsular Malaysia literatures such as in [8]- [11] are focused on the centre of distribution rather than the tail distribution which may contain extreme values. Hence, we continue the analysis on GLO trends by focusing on extreme parts using extreme theory to classify the seasonal pattern based on stationary and non-stationary model. Stationary model will describe the characteristics of GLO data based on the constant value of all parameters. As for non-stationary model, linear and cyclic trends for the location parameter that is dependent on time were considered.
By taking into account the underlying processes driving the GLO generation process, the stationary and non-stationary characteristics of GLO data can be established. GLO is a secondary pollutant, which means that it is produced in the atmosphere, and the chemical reactions involved in ozone synthesis are depending on environmental factors or time. Seasonal trends that occur in the production of GLO are influenced by the increment rate of the reactions in the presence of the sun and human activities. A couple of good examples are, combustion of fossil fuels from the increasing number of vehicles in urban areas as well as emissions from major activities in industrial zones.
This study proposed the use of Generalized Extreme Value (GEV) model for both stationary and non-stationary models. Maximum likelihood estimation is used to evaluate the parameters. For stationary model, all parameters are assumed to be constant. Meanwhile, for non-stationary model, variations through time in the observed process are modelled as linear trends and yearly cyclic trends in the location parameter. The non-stationarity of the model is often apparent because of seasonal effects, or in the form of trends that occurs in environmental processes. This might help the analysis of non-stationary model to obtain the best-fitted model in selected GLO data as the novelty of this study.
The aims for this statistical study of GLO data are to predict the high levels of GLO concentration and give public health awareness, possibly in response to legislations regulating emissions of pollutants, and more importantly to identify trends in high ozone levels. These findings can hopefully increase the current scientific interest in ascertaining effects of human activities on the environment and to assess changes in ozone levels due to activities, either directly or through changing emission patterns. Therefore, the two major aims in this study are to introduce a stationary and non-stationary model of weekly maxima GLO, and to identify the best fitted model to describe extreme GLO data sets in the selected air monitoring stations.

Data Description
The method is applied to GLO data sets in Peninsular Malaysia, which consist of weekly maxima of GLO concentrations from year 2000 to 2016. In 25 stations operated by a private firm known as Alam Sekitar Malaysia Sdn Bhd (ASMA) employed by the Department of Environment (DOE) Malaysia, this datum was measured at the monitoring site. In compliance with Mokhtar et al. [5], Teledyne Ozone Analyzer Model 400E UV Absorption is used by Beer-Lambert Law to measure low-range ozone concentrations. This instrument was used to record ozone levels at all of the monitoring stations shown in Table 1.

Extreme Value Theory (EVT)
Extreme value theory (EVT) explores rare events by determining their statistical model, estimation, and prediction. Currently, this method is quite famous and has been used by researchers to analyse data in many other disciplines. As pointed by Coles[12], EVT is a statistical discipline that produces techniques and models for explaining the rarely high or low levels of amounts in catastrophic data (such as heatwaves or widespread flooding). There are two methods in the application of EVT which are blocks maxima and threshold. In this study, block maxima method was used as the point of concern is daily exceedance of 0.06 ppm, which is a rare event that can cause severe health effects. The block maxima method has a limiting distribution that corresponds to generalized extreme value (GEV) distribution. GEV distribution is a single family of model with distribution functions based on three different distributions namely Gumbel, Fré chet, and Weibull. These three classes of distributions are recognised as type I, II, and III respectively. The GEV for stationary case that assumes all parameters are constant has the following cumulative density function (cdf): where x is the extreme value data obtained using block maxima method, σ is a scale parameter, ξ is a shape parameter, and μ is a location parameter. The shape parameter, ξ is important in describing the tail behaviour of extreme data to classify the particular type of distribution. More specifically, when ξ < 0 , the GEV distribution is Weibull distribution which supports  σ x μ-ξ ; when ξ > 0 , it corresponds to Fré chet distribution which supports x      ; and when ξ = 0 , the GEV distribution is Gumbel distribution which supports ℝ. The model for stationary case is considered as independent series in all parameters. The stationary GEV model to describe the distribution of xis: In the case of non-stationary model, yearly seasonality of maximum observed values in time increments is incorporated in the model. Referring to Coles [12], non-stationary cases are dependent on seasonal effects, or there are trends in the data due to climate change or any other causes. In this study, the trends in several of the GLO data increases doubts about the suitability of model that is assumed as constant distribution. So, two models are suggested; where location parameter is considered to change linearly over the observation period and can be cyclic function over a period of one year. The suitable model for t x , the weekly maximum value of extreme This model is considered as linear trend in the location parameter, μ . The parameter 0  corresponds to mean value while 1  is the weekly rate of change in weekly maxima of GLO data. The second model is more complex with yearly cyclic function. The equation of yearly cyclic function is expressed as 2πt u(t) = β + β sin + β cos y y (4) where 0  is mean value, 1  and 2  represent the harmonic amplitudes, and y is one year (in days). The idea to propose a new model as yearly cyclic function come mainly from the trend in extreme GLO data that shows strong seasonal effects each year.

Parameter Estimation
In this study, the maximum likelihood estimation (MLE) is used as parameter estimation. This method is widely used by researchers because MLE has high likelihood corresponding to models which give high probability to observed data. Furthermore, this method also has high adaptability to change model structure based on Phalitnonkiat et al. [13]. Hence, for stationary GEV model, Meanwhile the log-likelihood for non-stationary GEV model, where the distribution of t x for t = 1,.....,m is ∼ ( ( ), , ) given that μ(t) is expressed as in equation (3) and (4), is:

Model Selection
Akaike's information criterion (AIC) and Likelihood ratio test (LRT) are usedto select the best model for all extreme GLOfor all selected stations. According to Coles [12], LRT is a simple test to identify the best model compared to other models especially for nested models determined by maximum likelihood estimation. It said that, this test will be used to select the best model between stationary model as mentioned in (2) and two non-stationary models as mentioned in (3) and (4).
After selecting the best model between stationary and two non-stationary models, the next step involved accuracy test for the best non-stationary model as in equation (3) and (4) using AIC. This method is used to choose the non-stationary model which best describes the extreme GLO data in all stations. The AIC formula is AIC = -2l + 2p (10) where and p is the parameter of the model and l is the maximized log-likelihood for the non-stationary model. Thus, between these two non-stationary models, the model with the smallest value of AIC is chosen as the best fitted model for extreme GLO data.

Model Diagnostics
Model diagnostics are important in order to make certain that the final model adopted is a excellent representation of extreme data. Based on Coles [12], quantile and probability plot are used in the diagnostics for the best fitted model. The dotted data that show near to linear line indicate that the model fit well to the stationary or non-stationary model. The probability plots and quantile plots points for diagnostics of stationary model are shown in point (11) and (12), respectively while for non-stationary models the diagnostics points are as in point (13) and (14), respectively.

Results and Discussion
Table 2 reviews the descriptive statistics of all selected air monitoring stations in Peninsular Malaysia. The results indicate that the highest reading in weekly maximum GLO concentration is recorded at station 19 (WK19) which is in Larkin, Johor with 0.20 ppm. This station is located in an industrial area with higher sources of GLO concentration. The second highest reading is recorded at station 34 (WK34) which is in Kuala Terengganu with 0.18 ppm. Although 0.18 ppm is indicated as the second highest reading, the average value for this station is quite stable around 0.05 ppm which means that this area is still considered safe from exposure to air pollution caused by ozone. The third highest reading of weekly maximum GLO concentration is 0.17 ppm but it is accompanied by a high average reading of 0.09 ppm. This reading is recorded at Shah Alam station (WK25). The increase or decrease in ozone concentrations are influenced by surrounding activities. Overall, the three highest readings for maximum values of weekly GLO concentrations have exceeded the standard recommendation stated by MAAQS and NAAQS.
Based on Abdullah et al.
[10], the highest readings of ozone concentrations are located in industrial areas. The finding is in line with the results of this study. Meanwhile, urban areas also have higher GLO concentrations because the activities that might affect ozone concentrations are magnified by the larger numbers of populations and vehicles that indirectly cause higher petrol emissions. Furthermore, the study by Abdullah et al. [10], is focused on the average value of GLO concentration whereas this study, is focused on weekly maximum GLO concentration. The lowest reading of weekly maximum GLO concentration is recorded in station 22 (WK22) which is located at Kota Bharu, Kelantan where the total amount of GLO is 0.07 ppm. Figure 1 shows the time series for three air monitoring stations, which can give an indication of whether the data have a pattern such as linear, constant or cyclic trend. The results show that further investigations to indicate the best fitted model are important in this analysis. The models are stated as stationary models (constant model) or non-stationary models of either linear or cyclic model.

Model Selection
LRT is used to compare the two models to determine the conformity of GLO models to either stationary or non-stationary models. Table 3 indicates that each station conforms to either stationary or non-stationary model or both non-stationary models. For example, station WK01conforms to stationary (constant model) and non-stationary (cyclic model). Station WK02 conforms to both non-stationary either linear or cyclic model. For stations WK01 and WK19, the values of likelihood ratio test or Dare "NA". This means that they are "not applicable" for this testing because there is no result in linear model.
The model that fits best is chosen depending on the value of D which must be greater than chi-square ( 2 k X ) distribution where k is degree of freedom at level of significance, α = 0.05 . This test is used to compare constant and linear models. The second comparison is between constant and cyclic models. Result from the comparison shows that nine stations conform to stationary model (constant model) while sixteen stations conform to non-stationary model (linear model). From the second comparison, only two stations fit the stationary model (constant model) while the rest of the stations conform to non-stationary model (cyclic model). In order to determine which model is the best fit for GLO data, further testing using AIC method is necessary.
Based on Table 4, the smallest AIC value indicates the best fitted model. There are fourteen (14) air monitoring stations that fit to non-stationary model (linear model) and the remaining stations fit to non-stationary model (cyclic model). After the most excellent model is selected, diagnostic tests for each model are carried out using graphical method which are quantile and probability plot according to the selection of the model. The result is shown in Figure 2 Table 5 divides air monitoring stations in Peninsular Malaysia according to the four regions. Three monitoring stations in the North area conform to non-stationary model with cyclic trend. They are located in Sungai Petani, Kedah (WK17), Langkawi, Kedah (WK32), and USM, Penang (WK38). These findings are in line with literature [7], which states that GLO concentrations in Sungai Petani exhibits a seasonal pattern. This pattern refers to the annual cycle that is consistent with the main GLO concentration occurring in July, and the lowest GLO concentration in December. Another study, as found in [14], also states that several stations show seasonal patterns that contribute to their conformity to cyclic model. One of this station is in Kota Bharu, Kelantan (WK22). The remaining models based on the four regions can be found in Table 5.   (3) and (4) for prediction purposes. The parameter β 1 in the linear model as in Table  6 is the weekly rate of change in weekly maximum GLO concentration. By referring to the value of shape (ξ) parameter in the GEV model in Table 6,negative values indicates that the tail behaviour is Weibull distributions, except in one station WK34 with ξ~0 which is Gumbel distribution. This result is also similar to findings in Table   7 which show the values of ξ < 0 corresponding to Weibull distribution for all air monitoring stations except one station WK45 which shows Gumbel distribution with the value ξ~0 . Hence, an example of non-stationary model with linear trends on the location parameter for station WK02 is shown in equation (16). While, non-stationary model with yearly cyclic function on location parameter for station WK01 is mentioned in equation (17). u(t) = 0.045712 +0.000007t (16) ( ) = 0.0447 − 0.0005 ( 2 ) + 0.0006 ( 2 ) (17)

Model Diagnostics
Based on Coles [12], probability plot and quantile plot are important to evaluate the accuracy of the GEV model to ensure that the selected model is best fitted to the data. Each set of plotted points for both plotting's must be near to the line, also called near-linear. The results for probability plots and quantile plots for non-stationary models (linear and cyclic model) are shown in Figure 2 and 3. The plots are constructed by non-stationary parameter values to the corresponding generalized extreme value distribution. In the case of the GEV model, this research is performed, so the probability plot is invariant to the choice of Gumbel as a reference distribution and the quantile plot is unique to the Gumbel scale choice. Figure 2 displays the residual diagnostic plots for 14 air monitoring stations in linear pattern GEV models, pointing to the validity of the equipped model with perfect plotting points close to the linear axis. Furthermore, the same goes for cyclic trend with 11 air monitoring stations exhibiting perfect plotting points near to the linear line to prove the validity of the fitted model in Figure 3. Finally, this means that all of the model conforms to GEV model.

Conclusions
The weekly maxima ground-level ozone concentrations in all of the 25 selected air monitoring stations in Peninsular Malaysia conform to both stationary and non-stationary GEV model. However, based on the results of goodness of fit test, non-stationary GEV models are the best fitted model. These non-stationary GEV models allow the location parameters to be expressed in linear trend with time and yearly cyclic trend. Hence, this study shows some improvement from stationary to non-stationary model by allowing the linear trend and inclusion of cyclic trend in one year cycle time. Results also show that Weibull is identified as the significant distribution for weekly maxima GLO concentrations except for two stations as mentioned in results and discussion.
This study is focused on GLO series to determine the best fitted model using non-stationary and stationary models. In view of the important up and down pattern of extreme GLO concentrations specifically in Peninsular Malaysia, it is important to consider the non-stationary nature of GLO data to achieve better estimations of GLO distribution so that it can contribute towards in-depth understanding of the trends of ozone phenomenon. Lastly, for further extension of the current work, other models selection methods and different classifications of data should be considered.