Prediction of Electricity Consumption in Ghana: Long or Short Memory

In this study, we explore both univariate and multivariate aspects of time series analysis. In the univariate aspect, we evaluated the predictive performance of three widely used univariate time series methods in forecasting the electricity consumption in Ghana during the 1980 – 2011 periods. The three univariate time series approaches are autoregressive integrated moving average (ARIMA), autoregressive fractional integrated moving average (ARFIMA) and exponential smoothing. In each approach, we examined competing models and the “best” model according to the minimum information criterion and diagnostic checking was selected. The forecast accuracy measure (i.e.; mean absolute forecast error, MAFE) was computed for each “best” model in the three different approaches. The empirical results revealed that the MAFE for ARIMA, ARFIMA and exponential smoothing were 31.3%, 9.4% and 41.6% respectively. Thus, the comparative analysis of the forecast performance of these methods clearly concluded that the ARFIMA method gives better forecast in predicting electricity consumption in Ghana. And, in the multivariate aspect, we examined whether GDP, export, import and population influences electricity consumption. The results revealed a feedback causality between electricity consumption and economic growth. Again, we established that there exists an uni-directional influence of import, export, population towards electricity consumption. The “best” model of the univariate approach is ARFIMA (2,0.31,1) with MAFE of 9.4% while the “best” model for the multivariate approach is vector error correction model VECM (3) with MAFE of 1.5%. Thus, the multivariate approach has a better predictive performance in forecasting electricity consumption in Ghana. This shows how superior the multivariate approach against the univariate time series approach.


Introduction
In Ghana, we generate electricity from hydro (which contributes 63.9%), thermal (36%) and renewables (0.1%) [1]. Electricity consumption is a major concern in Ghana. The daily concern of electricity providers is to keep up the balance between generation and load.
In 2013, the total grid electricity generated in Ghana was 12874 GWh about 6% more than in 2012. According to the Ghana Energy Commission (GEC), the estimated total electricity needed for 2014 was in the range of 15,725 to 16500 GWh. This suggests that the total grid electricity generation in 2013 must be improved by 15% to 18% to meet what the country needs for 2014. This disequilibrium in consumption and generation of electricity has burdened Ghanaians with the sufferings of power or load shedding. The negative impact of this load shading is clear in how most industries are performing and the economy at large. Some industries have collapsed as well as some individual Ghanaians have lost their jobs due to the shortage in supply of electricity.
Thus, an exact forecasting of electricity consumption in Ghana is of national importance. It provides the basis for national policy development and improvement of the economy.
The purpose of this paper is to (1) look at the predictive performance of three univariate time series approach in forecasting electricity consumption in Ghana, (2) examine the causal relationship, if any, between the economic growth (GDP), import, export, population and electricity consumption in a Vector Error Correction framework, and (3) compare the predictive performance of the "best" models in the univariate and multivariate time series methods.

Literature Review
The issue of electricity consumption has been well documented due to its economic influence. Thus, recent studies have tried to show the relationship between electricity consumption and some key macroeconomics indicators. The emphasis of these studies has been the causal relationship of the variables. The mixed conclusions of these studies have left the impact of electricity consumption on these macroeconomic indicators to serious debate.

108
Prediction of Electricity Consumption in Ghana: Long or Short Memory Current studies have revealed two main findings of the causal relationship between energy consumption and economic growth as uni-directional and bi-directional. Thus, it has become necessary that further investigations in specific countries be conducted.
Empirical studies like Yu and Choi [13], Erol and Yu [7], Cheng [6], Masih and Masih [10] and Glasure and Lee [8] have established that uni-directional causality run from economic growth to energy consumption. However, studies like Hwang and Gum [9], Yang [12], and Paul and Bhattacharya [11] also indicated a bi-directional causal relationship between energy consumption and economic growth.
In summary, it is obvious that the results of these empirical studies have mixed conclusions. Some studies indicated causality running from economic growth to energy consumption, others revealed causality running from energy consumption to economic growth and even some studies suggest no causal relationship. According to Binh [14], the differences among these studies lie on the specific country characteristics, sample periods, research methodologies, and proxy variables.
In Ghana, studies like Kwakwa [3] and Adom [2] have used multivariate time series approach to show the impact of electricity consumption on economic growth. In these studies, we observed that certain variables that are useful were not included in their analysis.

Data Source
We obtained electric power consumption (kWh per capita), GDP (per capita), population, import and export data spanning from 1980 to 2011 from the World Bank Development Indicator website. However, we used the data period from 1980 to 2009 to fit the model while 2010 and 2011 period was used for out-sample forecast accuracy measure.

ARIMA Models
In order to find an appropriate model for a series, we must check the series for stationary. A non-stationary time series will have a time-varying mean or a time-varying variance or both. Thus, it is always proper to transform a non-stationary time series to a stationary series before doing any meaningful analysis. The unit root test is a formal way of testing for stationarity.

Unit Root Test
Among the various methods of unit root test, the test developed by Dickey and Fuller, known as the augmented Dickey-Fuller (ADF) test, is commonly used. The hypothesis of the test is: H 0 : series has a unit root or not stationary H 1 : series does not have a unit root or stationary The ADF test consists of estimating the following regression model: where t ε is a pure white noise error term and the ADF test follows an asymptotic distribution. ( )

Moving Average (MA) model
and t ε is a white noise sequence. MA processes are useful in describing phenomena in which events produce an immediate effect that only lasts for short periods of time. The ACF and PACF are used to determine the process that a series is following. In MA processes: (1) the number of spikes of the ACF determines the order of MA and (2) while the PACF decay exponentially depending on the sign.

Autoregressive (AR) Model
AR processes are useful in describing situations in which the present value of a time series depends on its preceding values plus a random walk. In AR processes: (1) the ACF decay exponentially depending on the sign and (2) number of spikes of the PACF determines the order of AR.

Autoregressive Integrated Moving Average model (ARIMA)
Suppose we have a non-stationary ARMA(p + d, q) process of the form  . We can say y t is an ARIMA(p, d, q) and t ω is an ARMA(p, q). If both the ACF and the PACF tail off or exponentially decay, then it indicates a mixed ARMA model.

ARFIMA Model
The conventional ARMA(p, q) process is often referred to as a short memory process because the coefficients are dominated by exponential decay. According to [4], for short memory process (normal ARIMA model), Long memory (or persistent) time series were considered as intermediate compromises between the short memory ARMA type models and the fully integrated non-stationary processes in the Box-Jenkins class. According to [4], the easiest way to generate a long memory series is to think of using the difference operator (1 − B) d for fractional values of d, say, 0 < d <0 .5, so a basic long memory series gets generated as ( ) where w t still denotes white noise with variance σ 2 w . Now, d becomes a parameter to be estimated along with σ 2 w . This idea of assigning d = 1 or 2 to make a series stationary has been extended to the class of fractionally integrated ARMA, or ARFIMA models, where we allow −0.5 < d <0.5; when d is negative. A better way to estimate d is using the expression It becomes easy to estimate d by deriving the recursions [See [4], pages 271 -280 for details)

Exponential Smoothing Technique
Usually, time series can be viewed as a combination of various components such as the trend (T), cycle (C), seasonal (S), and irregular or error (E) components. These components can be combined in a number of different ways. A purely additive model can be expressed as where the three components are added together to form the observed series. In exponential smoothing, we always start with the trend component, which is itself a combination of a level term (l) and a growth term (b). If the error component is ignored, then we have the fifteen exponential smoothing methods given in Table 1. [see [5] for details]

Error Correction Model
The error correction model is used when the time series are not stationary and are cointegrated.

Tests for Cointegration
Two I (1) time series y 1,t and y 2,t are said to be cointegrated if there exists a linear relationship of the form so that the cointegrating relationship is written t t Z y β ′ = , then β is called the cointegrating vector. The cointegrating vector is not unique. The cointegration relationship is often interpreted as being a long run or equilibrium relationship between the variables. In many applications, such statistical relationships are equated with economic equilibrium. These cointegrating vectors are linearly independent, meaning that one is not a linear function of the other. The number of linearly independent cointegrating vectors is called the cointegrating rank. The two common tests to determine the cointegrating rank are the trace and the maximum eigenvalues tests. The hypothesis of the test is: H 0 : the number of cointegrating vectors is r, H 1 : the number of cointegrating vectors is (r+1) The two statistics are:

Vector Error Correction Model
The appropriate model for cointegrated time series is called a Vector Error Correction Model (VECM) and is a rearranged restricted form of a VAR. An error correction model is parameterized so that the variables tend to revert back to the equilibrium relationship that is specified by the cointegrating vector. In general, a VAR(p) model 1 1 ...
is rearranged to give a VECM of the form Note that a VAR of order p translates to a VECM with p -1 lagged differences of y t . A VECM thus consists of a mixture of variables in levels and first difference form. If we applied the univariate modeling strategy of taking first differences of any I (1) time series, and hence fitting a VAR in first differences, the resulting model would be mis-specified because of the omitted error correction term. Conversely we cannot use a VAR in levels to model cointegrated time series because the resulting inference in the presence of the nonstationarity would not be valid. In the presence of cointegration, a VECM is required.

Model Selection Criteria
The information criteria used in this study are given below: Akaike information criterion (AIC): Corrected Akaike information criterion (AICc) : where ( )k L q is the likelihood of the fitted model, k = p + 1 (which is the model size), p = number of parameters and n = number of observation.

Measure of Forecast Accuracy
The forecast error measure used in this study is the mean absolute forecast error (MAFE), which is given by:

Diagnostic Checking
Once the model is identified and fitted to an observed series, the next stage is to check the model for possible discrepancies. Residuals obtained from the fitted model are important for investigating the possible discrepancies in the model and also to further suggest some modifications to the model. Residuals are analyzed and checked to see if they satisfy the model assumptions. Any significant differences from the model assumptions mean we fail to prove that our fitted model is correct. Test of autocorrelations provides an important diagnostic tool. Any autocorrelation (partial autocorrelation) which is significant indicates some non-randomness in the residuals. Instead of testing the significance of individual autocorrelations, the Ljung-Box, Q test is used for the first m autocorrelations. The hypothesis is given as: H 0 : residuals are not auto-correlated H 1 : residuals are auto-correlated The Ljung-Box, Q test is defined as: ˆh ρ is estimated autocorrelation at lag h.

Statistical Software
The R software, with the package 'forecast' was used in fitting ARIMA models, exponential smoothing and the package "arfima" for the ARFIMA models. However, we used the Eviews student version 8 software to fit models under the multivariate framework.

Results and Discussion
Firstly, in this section, we will use three univariate time series approaches to model and forecast the electricity consumption in Ghana. The time series approaches are ARIMA, ARFIMA and Exponential Smoothing. We will test these approaches by using their predictive performance to select the "best" method for forecasting electricity consumption in Ghana. Secondly, we will use vector error correction model to show the causal relationship between GDP, import, export, population and electricity consumption. Again, we will compare the predictive performance of the univariate "best" model and the multivariate "best" model.

ARIMA Model
A time series plot of the electricity consumption series indicated that the series is not stationary (Appendix A1). Also, we conducted the Augmented Dickey-Fuller test (ADF) on the electricity consumption series. The null hypothesis that the electricity series has a unit root (not stationary) was not rejected [D = -2.13 with p-value of 0.52]. However, the electricity consumption series became stationary after differencing once (Appendix A2).
The sample ACF and PACF (Appendix A3) suggest that the electricity consumption series follows a MA(1). The standard practice is that, we should fit more than one model to select the "best" model. Thus, we used the lowest value of the AICc and statistical significance of the parameter estimates to select the 'best' model among competing models. We present the results in Table 2. In Table 2, the selection criterion, AICc indicates model (2,1,1) as the "best" model. Thus, we will focus on the ARIMA (2, 1, 1) which is the 'best' model. In Figure 1, we use a plot of the standardized residuals, the ACF of the residuals (note that R includes lag zero which is one), to explain the diagnostic checking of the 'best' model. Inspection of the time plot of the standardized residuals in Figure 1 shows no obvious patterns. The ACF of the standardized residuals shows no clear departure from the model assumptions, none of the lags is significant. This shows that the residuals are not correlated.
Again the residuals are normally distributed. The 'best' model passed the residual diagnostic tests. This means that the 'best' model ARIMA(2,1,1) fits the electricity consumption series well. The estimates of the 'best' model are reported in Table 3. In Table 3, it is obvious that all the parameter estimates are statistically significant. In figure 2, we present the forecast graph of the 'best' ARIMA model (2,1,1).  In Figure 2, a forecast value is given for one lead period with an accompanying 80% (dark grey) and 95% (light grey) prediction interval. It is obvious that forecast value increase slightly as we go from 2010 to 2012.

ARFIMA Model
The electricity consumption series is not stationary but the fractional ARIMA approach suggests that the series must be differenced less than 1 for the series to become stationary. Thus, our detailed analysis suggested three competing fractional ARIMA models. The competing models with their respective information criterion and the difference integer are given in Table 4. The lowest value of the AIC and statistical significance of the parameter estimates were used to select the 'best' model among the competing models. Here, the best model is ARMA(2,1) with a difference integer of 0.31. The parameter estimates of the "best" model is given in Table 5. In Table 5, all the parameter estimates are statistically significant. We present in figure 3, the forecast graph of the 'best' ARFIMA model (2,0.31,1). In Figure 3, a forecast value is given for one lead period with an accompanying 80% (dark grey) and 95% (light grey) prediction interval. Here, there is a steep increase in the forecast value from 2010 to 2012.

Exponential Smoothing Method
The appropriate exponential smoothing technique for the electricity consumption series is the simple exponential smoothing method. This is because the electricity consumption series did not exhibit any obvious form of trend. Technically, the "best" approach is ETS (A,N,N); that is additive error, no trend and no seasonality. In other words, the appropriate technique is a simple exponential smoothing with additive errors. Three competing simple exponential approach with three different alpha values were fitted to the electricity consumption data. This illustration is presented in Figure 4.
In figure 4 are one-step ahead within-sample forecast alongside the data over the period 1980 to 2009. The influence of alpha on the smoothing process is clearly visible. It is obvious that simple exponential smoothing with the alpha value of 0.99 "best" fit the electricity consumption data. Thus, the "best" smoothing technique is the simple exponential smoothing method with additive errors having alpha value of 0.99.

Comparative Summary of Results
The mean absolute forecast error (MAFE) for forecasting the annual electricity consumption for 2010 to 2012 for the three methods discussed are given in Table 6.

Multivariate Time Series
Stationarity Test Table 7 reports the results of the ADF and PP Tests of unit root by Newey-West Bandwith (automatic selection). We performed tests on both the level and first differences of the log variables. The variables log electricity consumption, log GDP, log Import, log Export and log Population are I(1) processes according to ADF and PP. However, these variables became stationary after first difference. Table 8 gives the results of the Johansen Cointegration Test. According to the results of the ADF and PP in Table 7, it is obvious that the variables have the same order of integration, i.e., I(1) and the Johansen Cointegration Test was used to find out the cointegration rank and the number of cointegrating vectors. The null hypothesis of r = 0 (i.e., there is no cointegration) is rejected against the alternative hypothesis of r =1 at the 5% level of significance in case of the Max-Eigen value statistic. Similarly, going by the result of the Trace statistic, the null hypothesis of r = 0 is rejected against the alternative hypothesis of r ≥1. In Table 8, both trace statistic and maximum eigenvalue statistic show that there is one cointegration equation at 0.05 level of significant. The two conditions for using the vector error correction model are met, thus, we fitted a VEC model.

Lag Length
In order to capture the impact of variables observed in the past time period in explaining the future performance, the ideal lag length p (which is 3 in the present study) is chosen (see Table 9. Thus, according to Table 9, vector error correction model with lag 3 [i.e., VECM (3)] is selected by all the selection criteria as the "best" model. However, it is proper to fit other competing models in order to make accurate assessment. VECM(1) and VECM (2) were also fitted to the data but they violated the white noise test. Therefore, VECM(3) which passed the white noise test and according to the results in Table 9, was considered as the "best" model.
Again, we allowed these three VEC models to forecast for 2010 and 2011 period. We considered the model with the minimum forecast accuracy measure as the "best" model in terms of their predictive performance. The forecast accuracy measure used in this section is the mean absolute forecast error (MAFE). The MAFE for VECM (1) is 4.2%,; VECM (2) has MAFE of 3.0% while VECM (3) has an MAFE of 1.5%. Thus, VECM (3) is considered as the "best" model in terms of predictive performance.
In Table 10, the cointegration equations are given along with the equation for changes in electricity consumption [first column, D(CON)], changes in Export (second column), changes in GDP (third column), changes in Import (third column) and changes in Population (fourth column). In this study, our interest is the first column (electricity consumption as the endogenous variable). The coefficients of the cointegrating equation contain information about whether the past values affect the current values of the variable under study in the long run. In the cointegrating equation in Table 10, the previous year export variable is negative and statistically significant. This means that, in the long-run, previous year export is negatively related to electricity consumption. Previous year GDP, Import and Population are statistically significant but positively related to electricity consumption in the long run.
For the vector error correction model, a significant lagged co-efficient implies that past equilibrium errors has a role in determining the current outcomes in the short run. The lagged coefficients of change in electricity consumption are negative and statistically significant at 0.05. This indicates that higher previous electricity consumption have negative effect on current electricity consumption in the short run. The lagged coefficients of change in export are positive and statistically significant at 0.05. This suggests that higher export have positive effect on electricity consumption in the short run.
The lagged coefficients of change in GDP are negative and statistically significant at 0.05. This means that higher GDP have negative effect on electricity consumption in the short run. The lagged coefficients of change in import are negative but statistically significant at only the lag 1. This indicates that higher imports have negative effect on electricity consumption in the short run. The lagged coefficients of change in population are positive (for lags 1 and 3) but negative for lag 2 and statistically significant at 0.05. This suggests that population has mixed effect on electricity consumption in the short run.

Causality Test with VECM
We presented the causality test with VECM(3) in Table  11. The null hypothesis that Log export, log GDP, log Import, log Population do not Granger cause Log electricity consumption is tested using changes in electricity consumption (D(log Electricity Consumption), changes in export (Dlogexport), changes in GDP (DlogGDP), changes in import (DlogImport) and changes in population (DlogPopulation). All these variables are stationary in their first difference form in standard Granger causality regression. The null hypothesis is accepted or rejected based on "chi-squared test based on Wald criterion" to 116 Prediction of Electricity Consumption in Ghana: Long or Short Memory determine the joint significance of the restrictions under the null hypothesis.
In this study, our interest is to establish the direction of influence between electricity consumption and GDP, export, import, population and if there is a feedback influence.
In Table 11, all the p values of the variables are less than 0.05; this indicates that the coefficients of logExport, logGDP, logImport and logPopulation are not jointly zero in the equation for logElectricity Consumption. Thus, the null hypothesis that Export, GDP, Import and Population individually does not Granger cause electricity consumption can be rejected and an unidirectional causality is observed from these variables to electricity consumption. It is observed that there is existence of a long-run relationship between GDP and electricity consumption, since there is presence of bi-directional causality between the two variables.

Conclusions
In this study, we considered both univariate and multivariate time series approaches. In the univariate approach, we evaluated three methods and the "best" method with respective to their predictive performance was selected. For the multivariate approach, a "best" model was selected among competing models and the causal influence of explanatory variables (import, GDP, export and population) towards the dependent variable (electricity consumption) was established.
For the univariate time series approach, in each method, competing models were examined and the "best" model according to the minimum information criterion and diagnostic checking was selected. The forecast accuracy measure (i.e.; mean absolute percentage error, MAPE) was computed for each "best" model in the three different methods. The empirical results revealed that the MAFE for ARIMA, ARFIMA and exponential smoothing were 31.3%, 9.4% and 41.6% respectively. It is obvious that the ARFIMA method out-performed the ARIMA and the exponential smoothing methods.
Thus, the ARFIMA method can accurately forecast the electricity consumption in Ghana under the univariate context. Hence, the long memory approach gives better accurate prediction on electricity consumption in Ghana.
The "best" model of the univariate approach is ARFIMA(2,0.31,1) with MAPE of 9.4% while the "best" model for the multivariate approach is VECM(3) with MAPE of 1.5%. Thus, the multivariate approach has a better predictive performance in forecasting electricity consumption in Ghana. This shows the superiority of the multivariate approach against the univariate time series approach.
We observed feedback causality between electricity consumption and economic growth. This result suggests that electricity consumption is growth-enhancing; and policy makers must be concerned of increasing electricity generation. Again, it was established that there exist a uni-directional influence of import, export, population towards electricity consumption. We observed that import, population and GDP are positively related to electricity consumption. Thus, as these variables increases, policy