Evaluating the Performance of Unit Root Tests in Single Time Series Processes

Unit root tests for stationarity have relevancy in almost every practical time series analysis. Deciding on which unit root test to use is a topic of active interest. In this study, we compare the performance of the three commonly used unit root tests (i.e., Augmented DickeyFuller (ADF), Phillips-Perron (PP), and Kwiatkowski Phillips Schmidt and Shin (KPSS)) in time series. Based on literature, these unit root tests sometimes disagree in selecting the appropriate order of integration for a given series. Therefore, the decision to use a unit root test relies essentially on the judgment of the researcher. Suppose we wish to annul the subjective decision. In that case, we have to locate an objective basis that unmistakably characterizes which test is the most appropriate for a particular time series type. Thus, this study seeks to unravel this problem by providing a guide on which unit root tests to utilize when there is a disagreement between them. A simulation study of eight (8) univariate time series models with eight (8) different sample sizes, three (3) differencing orders, and nine different parameter values were performed. It was observed from the results that the performance of the three tests improved as the sample size increased. Based on comparing the overall performance, the KPSS was the "best" unit root test to use when there is disagreement.


Introduction
A time series is a sequence of ordered data from a family of random variables, say … , x t−1 , x t , x t+1 Tebbs [1]. Time series modeling aims to study past observations of a series to develop an appropriate model that explains the series Adhikari [2]. A time series process is said to be weak stationary if its mean, variance, and covariance do not change over time Test [3]. A time series is characterized as integrated of order (d) if their stationarity is achieved by differencing the series "d times" Fedorov'a [4] The importance of stationarity cannot be underestimated since many statistical tests and forecasts in time series depend on it. It is necessary to ensure that a time series data is stationary before building a model. Non-stationary time series exhibits trends, seasonal variations from which these models' forecasts are not reliable Twumasi-Ankrah [5].
Therefore, it is vital to make a non-stationary time series data stationary by removing the time trends and/or seasonal variations before carrying out any time series analysis. According to Brownlee [6], stationarity can be measured in the following ways: through time plots (i.e. plotting the data and visually checking if there is any obvious trends or seasonality); summary statistics (i.e. reviewing the mean and variance for your data for seasons or random partitions and checking for significant difference) and statistical tests (i.e. using statistical tests to check if the expectations of stationarity are met or have been violated).
There are two general approaches for testing stationarity, namely parametric and non-parametric Masliah [7]. This study focuses on the parametric that is the unit root test. A unit root is a characteristic of a stochastic process that can create problems in statistical analysis. A stochastic process is said to have a unit root if (1) is a solution to the process's characteristic equation. In this case, the process is said to be non-stationary. Several statistical tests have been employed to differentiate if a series has a stochastic trend and should be differenced or if it has a deterministic trend and should be detrended Cochrane [8]. Non-stationarity in mean can be a deterministic trend or stochastic trend. In deterministic trend, the series is a linear function of time, that is x t = α + βt + ε t . The parameter β measures the average change in x t from one period to the other. According to Mushtaq [9], macroeconomic data has stochastic trends, and therefore, such series has a unit root, and using these variables in modeling may lead to spurious regression. That is, regressing on trended series is meaningless and yields misleading outcomes.
Determining and verifying the order of integration is a very wide area that incorporates broad rundown of tests described as unit root tests where some of them are, Dickey and Fuller (DF) test, Augmented Dickey-Fuller (ADF) test, Phillips-Perron (PP) test, Kwiatkowski Phillips Schmidt and Shin (KPSS) test, ADF-GLS test, Ng-Perron, etc. [4]. In literature, the three most commonly used tests are the ADF test, the PP test, and the KPSS test [4].
Davidson [10] reports that the PP test performs worse in finite samples than the ADF test. Again, according to Schwert [11], tests for unit roots can have low power against some particular other options. From our preliminary analysis, these tests disagree in selecting the appropriate order of integration for a given series.
In our simulation study, we are interested in investigating three main aspects, namely: (1) performance of the unit root tests by varying the order of integration, (2) performance of the unit root test by varying sample size, and (3) the overall performance of the three tests used in this study.

Materials and Methods
This section seeks to outline the specific procedures or techniques used in this work. It includes how the simulation was done to examine the three commonly used unit root tests' performance: ADF, PP and KPSS.

Review on the Unit Root Tests
The Augmented Dickey-Fuller (ADF) Test The ADF test includes extra lagged in terms of the dependent variables to dispense with autocorrelation. One of the weaknesses is that the power of the test is very low. The ADF test is based on an autoregressive process with order one, which is given by the equation below: where 1 is the parameter for the autoregressive model. Test Statistics:

The Phillips Perron (PP) Test
The PP test builds on the DF test. It makes a nonparametric correction to the t-test statistic. One advantage of the PP test over the ADF test is that the PP test is robust to general forms of heteroscedasticity in the error term . Also, the user does not have to define a lag length for the test regression.

Hypothesis:
: The process has a unit root. 1 : The process does not have a unit root.

The Kwiatkowski Phillips Schmidt and Shin (KPSS) Test
KPSS test is broadly utilized in empirical work as an accompaniment to standard unit root tests Hadri [12], and it is not affected by seasonal dummies Phillips [13].

Hypothesis:
0 : The series is stationary 1 : The series is not stationary.

Where
= ∑= 1 and ̂2 is the estimate of variance 2 of process from the equation.

Simulation
The simulation was done using existing codes in R version 3.1.1.

Evaluation Procedures
Here, we show the procedures for evaluating the performance of the unit root tests.

Performance of the individual tests in selecting the appropriate order of integration.
The algorithm used is summarized below:  Count the number of times each model at different sample sizes, and parameter values are stationary.  Convert the counts to percentages using the formula below: where, is the number of times the data was stationary, and is the number of replications.

Performance of the unit root tests as model order increases
The algorithm used is summarized below:  All the AR processes from order 1 to 4 and MA processes from order 1 to 4 with the same integration order were grouped.


The eight different sample sizes were categorized into two; small and large. For small sample sizes 10, 15, 20, 25 were used (i.e. 10-25) and large sample sizes 45, 100, 1000, 10000 were also used (i.e. 45-10000).  The average of ARI(p, d), and IMA(d, q) where p, q = 1, 2, 3, 4 and d = 0, 1, 2 which fell into the small sample size category were computed.  Also the average of AR(p, d) and IMA(d, q) where p, q = 1, 2, 3, 4 and d = 0, 1, 2 which fell into the large sample size category were computed.  The average percentage values indicating stationarity of each of the model order is computed as where are the percentage values, and m is the number of elements in each model order category, which is 4.

Performance of the unit root tests as the sample size increases
In this subsection, the following algorithm was used:  All the Autoregressive processes with the same order of integration were grouped. i.e.
[ The average percentage values for each category of the sample size is formulated as where is the sample size, and t is the number of elements in each category, 4.

The overall performance of the unit root tests
To determine the overall performance of the three commonly used unit root tests, the following algorithm was used:  The performance for the respective Unit Root Tests was grouped into performance classes or groups (PG).  The percentage performance class interval used was formulated as  Assign weights to the Performance Classes from 0, 1, 2, 3, 4, respectively.  Summation of the product of weight and the frequencies of the performance of the Unit Root Tests is formulated below: Where are the weights, ᴦ are the frequencies of the performance classes.  Rank Scores with the highest score being the best Unit Root Test

Results
In this section, the results from the simulation study are presented in terms of the performance of the individual tests in selecting the appropriate order of integration, the effect of the order of the model on the three conventional unit root tests, the impact of sample size on the three unit root tests, and the overall performance of the unit root tests.

Performance of the Individual Unit Root Test in Selecting Appropriate Order of Integration
Here, we consider the performance of the individual unit root tests.   Table 1 illustrates the percentage performance of the ADF test. For Autoregressive models at all levels, the ADF test's performance at sample sizes 10, 15, 20, 25, 45, and 100 is weak except AR(1) and AR(2) at sample size 100. However, when the sample sizes are 1000 and 10000, the ADF test performs excellently. For the Moving Average models at all levels, the ADF test's performance at sample sizes 10, 15, 20, 25, and 45 was weak except for MA(1) at sample sizes 20, 25, and 45. It performed excellently at sample size 100, 1000, and 10000. Hence, the performance was relatively the same as far as the order of integration is concerned. Table 2 illustrates the performance of the PP test. For all cases of the order of integration, AR's performance at lower sample sizes (i.e., 10 and 15) was low. However, sample sizes from 20 onwards performance of the PP test were better. For MA processes at all integration cases, the performance was excellent from sample sizes 100 upward, whereas the other model's performance was better from sample size 45 upwards.

Performance of the PP Test
Generally, in both AR and MA models, the PP's performance at small sample sizes is poor. For large sample sizes, the performance was excellent. Hence, with increasing sample sizes, the performance of the PP test improved.    Table 3 illustrates the performance of the KPSS test. The KPSS test's performance on the MA processes was excellent in all the sample sizes at all integration levels. However, for the AR (2) and AR(3) at all levels of integration, there was a decreasing trend in performance as sample sizes increased from 20 to 100. The performance, in this case, was excellent for 10, 1000, and 10000. Hence, the order of integration d=0, d=1, and d=2 does not affect the KPSS test's performance with regards to the MA processes. Table 4 illustrates the average percentage performance as the order of the model's increases. As the order of both Autoregressive and Moving Average at level increases from order 1 to 4, the ADF test's performance was not ordered and was low. Still, for large sample sizes, the performance decreased as the model's order increases, which shows a negative relation. As the order of the AR models increases from order 1 to 4, the performance of the PP was not ordered for both small and large sample sizes. However, the PP's performance increases as the Moving Average increases from order 1 to 4 for both small and large sample sizes. The KPSS test performed excellently in both Autoregressive and Moving Average models for small and large sample sizes. Again, the order of integration does not affect the three most commonly used unit root tests as far as increasing the model's order is concerned. Effect of sample size on the three unit root tests Table 5. illustrates the average percentage values of the unit root tests as the order of the model's increases. With the ADF test, as the sample size increased from 10 to 20 concerning AR and MA models at level, the performance decreased and started to increase when the sample sizes were 25 to 10000. Hence, the performance at sample size 1000 and 10000 was excellent. Unlike the ADF and the PP tests have an increasing effect on the performance as sample size increase from 10 to 10000 in both AR and MA models at the level. In the KPSS, as sample size increased from 10 to 10000, the performance decreased in the Autoregressive models at a level from sample size 10 to 100 and became excellent when the sample sizes were 1000 and 10000. The performance of the KPSS in the Moving Average models at level was perfect at all the sample sizes. Moreover, when the series was differenced once and twice that is d = 1, 2 there was no change in the three unit root tests' performance.

Effect of increase in the order of the model on the three conventional tests of stationarity
The overall performance of the three conventional unit root tests Table 6 illustrates the overall performance of the three most commonly used unit root tests. The performance in percentages of the unit root test were grouped in classes, PG with weights assigned to each of the classes. The classes were 0-20 with an assigned weight of 0, 21-41 with an assigned weight of 1, 42-62 with an assigned weight of 2, 63-83 with a given weight of 3, and 84-100 with a given weight of 4. The frequencies, ᴦ in each of the performance groups were multiplied by their respective weights, and the results were summed together to obtain the Score for each of the unit root tests.

Conclusions
In summary, the issue of unit root tests has recently grown in importance. Most studies have evaluated various unit root tests. The commonly used unit root tests are the ADF, PP, and KPSS tests. Nonetheless, primarily analysis done by several authors has shown disagreement with these unit root tests. This study compared the performances of the unit root tests through a simulation study.
To compare the performances of these three unit root tests, a simulation study was performed, out of which 24 different univariate time series models and 8 different sample sizes were used. Regarding the unit root tests' performances under different conditions, our results showed that the KPSS was exceptional compared to the ADF and PP considering the overall performance.
The ADF and PP performances were poor for small sample sizes but were excellent for very large sample size in all the differencing orders. Also, the KPSS test's performance was excellent for all the sample sizes in all the differencing orders.
Increasing the order of both AR and MA models decreases the ADF test's performance for a large sample size but unordered for a small sample size. The performance of the PP test in AR models is unordered for both small and large sample sizes, but MA models increase as the order increases for both small and large sample sizes. The performance of the KPSS is also unordered as model order increases in the AR models and excellent in MA models as the order increases.
As the sample size increases, all three tests' performance increases in both AR and MA models.
In comparison to the other two unit root tests, the KPSS had an astounding performance.