Statistical Model for the Forecast of Electricity Power Generation in Ghana

Abstract

Adequate power supply is a vital factor in the development of the economic growth of every nation. However, due to changing hydrological conditions, inadequate fuel supplies and dilapidated infrastructure, developing countries face challenges in planning the power grid infrastructure needed to support rapidly growing urban populations. This research seeks to model the monthly electricity power generation for prediction purposes, by implementing stochastic process models on a historical series of monthly electricity power generation in Ghana. A detailed explanation of model selection and forecasting accuracy is presented. The SARIMA (1, 0, 0) × (0, 1, 1)12 model with an AIC score of 439.6995, a BIC score of 446.3537 and an AICc score of 440.8759, has been identified as an appropriate model for predicting monthly electricity power generation in Ghana. The range used was from 2015 to 2019 and it was validated with data from April to December of 2019. The predicted values for 2019 are relatively close to the observed values. Thus, the experimental results show good prediction performances. Therefore, with developed SARIMA model, the forecast is made for the year 2021, proving an increase of monthly power generation. The performance and validation of the SARIMA model were evaluated based on various statistical measures, the test data produced RMSE (55.8606), MAE (45.454) and MAPE (3.0621%). The lagged effect can also help in accurate forecasting and assist policy and decision-makers to establish strategies, priorities on electric power generation.

Share and Cite:

Wiah, E. , Buabeng, A. and Agyarko, K. (2022) Statistical Model for the Forecast of Electricity Power Generation in Ghana. Open Journal of Statistics, 12, 373-384. doi: 10.4236/ojs.2022.123024.

1. Introduction

Electricity generation is one of the key components in achieving sound economic development. Although, Ghana has committed itself to universal electricity access by 2020, the real challenge is the capacity to meet this goal and, most importantly, to ensure that supply is reliable and adequate [1]. The electricity supply is frequently interrupted in Ghana and series of load shedding incidents have been experienced, such as in 2014 and 2015, which was popularly referred to as “Dum-Sor” [2].

Ghana’s main sources of electricity generation are hydroelectric, thermal and renewable. Ghana currently has over 4000 MW of installed generation capacity, though actual availability rarely exceeds 2400 MW. The total installed capacity for existing plants in Ghana is 4132 MW consisting of Hydro 38%, Thermal 61% and Solar less than 1% [3] [4]. However, as already mentioned, this type of electricity generation has its limitation due to hydrological conditions, inadequate fuel supplies and dilapidated infrastructure.

Since the Sustainable Development Goals (SDGs) were adopted by all United Nations Member States in 2015, which seeks to “ensure access to affordable, reliable, sustainable and modern energy for all” and has centered on electricity access, often ignoring the poor rates of grid resilience and reliability [5] [6].

In 2014, the Ghanaian government announced two generation expansion goals—5000 MW installed capacity by 2015 and 10% renewable energy by 2020. Yet, as of 2018, installed capacity stood at 3800 MW and renewable energy formed only 2% of the generation mix, and the government pushed back its goal of 10% renewable energy to 2030 [4]. These goals were announced on the heel of rising electricity tariffs and the 2012-2016 power crisis, the worst in the country’s history.

From 2015 to 2017, the average number of outages experienced by consumers in Ghana has increased from 18 to 48 per year despite the regulatory maximum of 6 power outages per customer per year [4]. During the peak of the crisis in 2014, consumers averaged 8 outages per month with each lasting at least 8 hours and back up diesel generators accounted for 12 percent of grid capacity [7].

Although, electricity generation capacity has increased, the trend of future electricity demand is vital to solving challenges such as frequent power outages, load shedding, and inadequate electricity supply [8].

These events in Ghana, have compelled energy planners to diagnose the electricity issue as a problem of reliability, the policy goals focused largely on fixing power supply shortages through generation expansion.

Several researches such as [9] [10] and [11] have looked into prediction of electricity consumption and electricity demands in Ghana but both articles failed to look at the amount of power needed to be generated to help solve the consumption and demand problems.

Against the backdrop, the objective of this research is to model the historical data of the monthly gross production of electric energy in Ghana between 2015 and 2019, performing a detailed analysis of the data to reduce the forecast error using the Seasonal Autoregressive Integrated Moving Average (SARIMA) technique. This will allow us to make a stochastic model to forecast the monthly production of hydroelectric energy and can be used for energy planning from different sources of electricity production.

2. Materials and Methods

2.1. Data Collection

The Monthly Electricity Power Generation series data was obtained from Ghana Energy Commission corresponding to the years 2015 to 2019 for this research. The series data has 60 monthly observations from January 2015 to December 2019 [12]. The first 51 monthly observation from January 2015 to March 2019 were used for model calibration. The rest 9 observations from April 2019 to December 2019 were used for model validation “in-sample” forecast.

2.2. Series Analysis

The analytical approach to this study is bounded by the Box-Jenkins SARIMA model [13]. The modelling of the data is done using a combination of non-seasonal and seasonal components, and can be specified as SARIMA (p, d, q) × (P, D, Q)s. These models are regression models with delays in the dependent variable Xt and delays with respect to the error term. In the ARIMA models (p, d, q), the parameters p, d and q must be identified, where the parameter p is the autoregressive value of the dependent variable, d is the finite difference transformation, and q is the delay of the error term or the moving average value of these stochastic models [14]. To find these values, the stationarity of the time series data was analyzed in detail. The single Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) are correlograms functions help to determine the degree of correlation between two consecutive values of the series and give an idea of the possible parameters of the ARIMA models [15] [16].

Step 1: Series transformation

The SARIMA non-seasonal and seasonal differencing was conducted to achieve stationarity of the time series by eliminating the trend and seasonality. From the non-seasonal and seasonal differenced data, the non-seasonal and seasonal components of the model were formulated by examining their autocorrelation function (ACF) and partial autocorrelation function (PACF). The ACF and PACF were used to determine the degree of differencing and appropriate autoregressive and moving average terms.

Step 2: Stationarity evaluation

From the non-seasonal and seasonal differenced data, the non-seasonal and seasonal components of the model were formulated by examining their autocorrelation function (ACF) and partial autocorrelation function (PACF). The ACF and PACF were used to determine the degree of differencing and appropriate autoregressive and moving average terms. In addition, the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) and the Augmented Dickey Fuller (ADF) unit root tests were also used to ascertain the stationarity of the series. These tests assume three variants such as a random walk with null mean, random walk with drift and random walk with drift and linear trend [17].

Step 3: Model identification

In ARIMA models (p, d, q) [17] the series differs d times to obtain a stationary series. These stationary models present the following Equation (1)

X t d = α + ϕ 1 X t 1 d + + ϕ p X t p d AR ( p ) + θ 1 ε t 1 d + + θ q ε t q d + ε t MA ( q ) (1)

where X t d is the series with differences of order d, ε t represents the process of white noise with normal distribution N ( 0 , σ 2 ) being independent and identically (i.i.d) and α , ϕ 1 , , ϕ p , θ 1 , , θ q are the model parameters.

As the monthly electricity power generated data presents strong seasonality [18], models have been used where ARIMA models are combined with seasonal terms. This new combined model has two components: a component with regular structure ARIMA (p, d, q) that models the non-independence associated with the data and the other component with ARIMA structure (P, D, Q) that models the seasonality component, where P is the autoregressive seasonal term, D seasonal term of difference and Q seasonal term of moving average.

The Equation (2) of the general mathematical model for this type of model, also called SARIMA [19] is

X t = α + ϕ 1 X t 1 + + ϕ p X t p AR ( p ) + φ 1 X t s + + φ P X t P s SAR ( P ) + ε t θ 1 ε t 1 θ q ε t q MA ( q ) ϑ 1 ε t 1 ϑ Q ε t Q s SMA ( Q ) (2)

where α , ϕ 1 , , ϕ p , φ 1 , , φ P , θ 1 , , θ q , ϑ 1 , , ϑ Q are the model parameters to estimate.

In identifying a suitable model, information loss metrics such as the Akaike Information Criterion (AIC) [20], the corrected AIC (AICc) and Bayesian Information Criterion (BIC) [21] were employed.

AIC = 2 log [ L ( Ψ ˜ ) ] + 2 k (3)

AICc = AIC + 2 k ( k + 1 ) n k + 1 (4)

BIC = 2 log [ L ( Ψ ˜ ) ] + k log ( n ) (5)

where: n is number of observations, k is the number of parameters in the model k = p + q + P + Q + 1 and a likelihood function, L ( Ψ ˜ ) , where Ψ ˜ is the maximum likelihood estimates of the parameters for the SARIMA.

Step 4: Estimation of model parameters

After the successful identification of a suitable model, the parameters discussed in Step 3 are then estimated. At this stage, the significance as well the relative contribution of each parameter to the overall predictive ability of the model is assessed.

Step 5: Model diagnostics and validation

At this stage, diagnostics is performed on the residual of the suitable model. This is carried out to ensure that the assumptions governing the usage of the model are satisfied. To achieve this, the standardised plot, ACF plot and the plot of Box-Ljung test (modified version of the Box-Pierce test) are employed as diagnostic tools to test for the “lack-of-fit”. The Box-Ljung [22] test statistic is expressed as shown in Equation (6)

Q = n ( n + 2 ) k = 1 m r ^ k 2 n k (6)

Thus, the model is shown to be asymptotically distributed as a χ 1 α 2 ( m ) , where m is the number of lags being tested with degree of freedom d f = m q P Q , n is the sample size and r ^ k is the estimated autocorrelation of the series at lag k.

The Box-Ljung test is defined on the null hypothesis, Ho: the model does not exhibit lack-of-fit against the alternate hypothesis Ha: the model exhibits lack-of-fit. The null hypothesis is rejected if Q > χ 1 α , d f 2 .

In validating the selected model, performance indicators such as the Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and the Mean Absolute Percentage Error (MAPE) are utilised as shown in Equations (7) to (9) respectively [22].

RMSE = 1 n t = 1 n ( X ^ t X t ) 2 (7)

MAE = 1 n t = 1 n | X ^ t X t | (8)

MAPE = 1 n t = 1 n | X t X ^ t X t | × 100 (9)

Step 6: Model forecast

Once the SARIMA model has been validated, the selected SARIMA model in step 5 is then used to forecast electricity power generated from January 2020 to December 2020. The predicted values are estimated with a 95% confidence level.

3. Results and Discussion

Figure 1 shows the monthly electricity power generated data, which covers 60 months, from January 2015 to December 2019. Though electricity power is generated every month, they also have obvious periodicity and seasonality, which manifests basically in the last quarter of the year to the first quarter of the subsequent year. This could be attributed to the enormous megawatts of power needed in the celebration of seasonal festivities during these periods.

Figure 2 shows the plot of the training dataset and presents the autocorrelation functions ACF and PACF. Both functions decay exponentially in a delay or lag, which are significant with period seasonal frequencies suggested by the SARIMA model.

In order to address the unit root problem and develop a reliable model for predicting electricity power generation, a seasonal differencing is applied on the

Figure 1. Electricity power generated in megawatt (MW).

Figure 2. Correlogram functions of the electricity power generated series.

training dataset. The 12th differences (seasonal difference at lag 12) of the training dataset is estimated, the ACF and the PACF plots of the seasonally differenced estimates are shown in Figure 3. In checking the stationarity of the seasonally differenced data, Table 1 reports on the KPSS and ADF tests. As observed, the statistics from both tests suggest that the seasonal differenced data is now stationary.

For the analysis of the stationarity of the series, we obtained the Augment Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) unit root tests in Table 1. Observing from Table 1, the statistic from the KPSS test (1.1768) is less than the critical values at various significant levels (0.347, 0.463, 0.739), even at 10% significance level. On the other hand, the statistic from the ADF test (1.4221) is less than the critical values (1.61, 1.95, 2.60). Thus, the hypothesis of the existence of a unit root is not rejected.

From Figure 4, a spike at 12 in the ACF is significant but no other is significant at lags multiple of 12, the PACF shows an exponential decay in the seasonal

Figure 3. Time series, ACF and PACF plots for the seasonally differenced.

Table 1. Test of stationarity.

lags; that is 12, 24, 36 etc. Thus, the seasonal part of the model has a moving average term of order 0 and an autoregressive term of 1. For the non-seasonal part, the ACF tails off after lag 2 and the PACF cuts off after lag 1. Therefore, the non-seasonal part has an autoregressive term of 1 and a moving average term of 0. Based on the features portrayed by the plots, an initial SARIMA (1, 0, 0) × (0, 1, 1)12 model is proposed. Further investigation of neighbouring models was conducted and the result is as shown in Table 2. However, the tentative model along with its variants are estimated and then compared in terms of AIC, BIC and AICc information loss metrics. Table 2 shows the information loss report on the estimated competing models. As observed, ARIMA (1, 0, 0) × (0, 1, 1)12 was indeed the most suitable among the competing models with least values of AIC (439.6995), BIC (446.3537) and AICc (440.8759).

To validate and verify the SARIMA model, Figure 4 shows the standardized residuals, the respective ACF graph and p-values for the Ljung-Box statistic. Panel (a) of Figure 4 suggests that the standardized residuals estimated from this

Table 2. AIC, BIC and AICc scores for neighbouring models.

Figure 4. Graphical Diagnostics of SARIMA (1, 0, 0) × (0, 1, 1)12 Model: (a) Standardized Residuals, (b) ACF (c) P-values for the Ljung-Box statistic.

model should behave as an independent and identically distributed sequence with a mean of zero and a constant variance. The ACF of the residuals showed in Panel (b) suggests that the autocorrelations are close to zero. This result means that the residuals did not deviate significantly from a zero mean white noise process. Panel (c) shows p-values for the Ljung-Box statistic. Given the high p-values associated with the statistics, we cannot reject the null hypothesis of independence in this residual series. Thus, we can say that the SARIMA (1, 0, 0) × (0, 1, 1)12 model fits the data well. The residual plots in Figure 4 suggest that the distribution of the residuals of our proposed model is Gaussian (white noise). Hence, our proposed model is justified.

Table 3 shows the parameters of the selected SARIMA (1, 0, 0) × (0, 1, 1)12 model. As observed, all the parameters are statistically significant (p-values < 0.05). This indicates the contribution of the individual parameters to the overall predictive ability of the model. Also, the p-values from the Box-Pierce and Box-Ljung test statistics (0.657, 0.6475 > 0.05) further ascertains the adequacy of the model as discussed in Figure 5.

Figure 5 shows the plot of observed monthly electricity power generated and train predicted from 2015 to 2019. The figure shows that the values for monthly predicted cases tend to follow the reported values quite closely with RMSE (56.09), MAE (38.785) and MAPE (3.2217%).

The selected SARIMA (0, 1, 1) × (0, 1, 1)12 model was then used to forecast monthly electricity power generated from April 2019 to December 2020 as shown

Table 3. Parameter estimate of SARIMA (1, 0, 0) × (0, 1, 1)12 model.

Box-Pierce = 0.1972 (0.657); Box-Ljung = 0.2091 (0.6475).

Figure 5. Observed and training predictions.

in Table 4 and Figure 6. The first part of the forecast values (i.e. from April to December 2019) was used to validate the model by comparing with the reserved

Table 4. Validation and forecast values of SARIMA (1, 0, 0) × (0, 1, 1)12.

Figure 6. Observed power generated (January 2015-December 2019) and forecast of power generated (January 2020 to December 2020).

test data. Results from the validation produced a RMSE (55.8606), MAE (45.454) and MAPE (3.0621%). The forecasting model generated the good empirical results as the forecasted data is very close to the observed data.

4. Conclusion

This study employed the SARIMA modelling technique in forecasting the monthly electricity power generated in Ghana. This was achieved by first soliciting for monthly data (from January 2015 to December 2019) on electricity power generated (MW) in Ghana from Ghana Energy Commission. The data was then partitioned into training (January 2015 to March 2019) and test (April 2019 to December 2019) through which various SARIMA models were developed. The study found that the SARIMA (1, 0, 0) × (0, 1, 1)12 model was suitable when compared with the other models, thus, can adequately represent the dynamics in electric power generated in Ghana. Also, all the tests conducted suggest that the model is reasonable for its short-term forecasting with a high forecasted accuracy based on performance indicators such as the RMSE, MAE, and MAPE. Due to the fundamental importance of reliability in forecast, the model was again validated by comparing the forecast values with observed test values. This validation on the test data produced RMSE (55.8606), MAE (45.454) and MAPE (3.0621%). The model was used to forecast the monthly electricity generated (MW) in Ghana from the year 2020. The SARIMA forecast model can serve as a useful tool that can provide information to support policy makers and the energy sector.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Adusei, L. (2012) Energy Security and the Future of Ghana.
http://newsghana.com.gh/energy-security-and-the-future-of-ghana/
[2] Ghana Growth and Development Platform (2015) Of “Dumsor” and Ghana’s Energy Sector Challenges: Part 3.
http://ghanagdp.org/
[3] Energy Commission of Ghana (2016) 2016 Energy (Supply and Demand) Outlook for Ghana—FINAL.
http://www.energycom.gov.gh/files/Energy%20Commission%20-%202016Energy%20Outlook%20for%20Ghana_final.pdf
[4] Ghana Energy Commission (2018) Ghana Wholesale Electricity Market Bulletin— December 2018. Technical Report 36.
[5] United Nations Statistical Commission (2017) Revised List of Global Sustainable Development Goal Indicators. Report of the Inter-Agency and Expert Group on Sustainable Development Goal Indicators (E/CN. 3/2017/2).
[6] Owusu, P. and Asumadu-Sarkodie, S. (2016) A Review of Renewable Energy Sources, Sustainability Issues and Climate Change Mitigation. Cogent Engineering, 3, Article ID: 1167990.
https://doi.org/10.1080/23311916.2016.1167990
[7] DeVynne, F., Jaramillo, P. and Samaras, C. (2018) Sustainability Implications of Electricity Outages in Sub-Saharan Africa. Nature Sustainability, 1, 589-597.
https://doi.org/10.1038/s41893-018-0151-8
[8] United Nations in Ghana (n.d.) 2030 Agenda for Sustainable Development.
http://gh.one.un.org/content/unct/ghana/en/home/global-agenda-in-ghana/sustainable-developmentgoals.html
[9] Awoponea, A.K., Zobaa, A.F. and Banuenumah, W. (2017) Assessment of Optimal Pathways for Power Generation System in Ghana. Cogent Engineering, 4, Article ID: 1314065.
https://doi.org/10.1080/23311916.2017.1314065
[10] Katara, S., Faisal, A. and Engman, G.M. (2014) A Time Series Analysis of Electricity Demands in Tamale, Ghana. International Journal of Statistics and Applications, 4, 267-275.
[11] Twumasi-Ankrah, S. and Ankrah, I. (2015) Prediction of Electricity Consumption in Ghana: Long or Short Memory. Advances in Economics and Business, 3, 107-117.
https://doi.org/10.13189/aeb.2015.030304
[12] World Bank (2020) Consumption of Electrical Energy (kWh per capita).
https://www.worlddata.info/africa/ghana/energy-consumption.php
[13] Shumway R. H and Stoffer D. S. (2011) Time Series Analysis and Its Applications with R Examples. 3rd Edition, Springer, New York.
[14] McCleary, R., Hay, R.A., Meidinger, E.E. and McDowall, D. (1980) Applied Time Series Analysis for the Social Sciences. Sage Publications, Beverly Hills.
[15] Box, G.E.P. and Pierce, D.A. (1970) Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models. Journal of the American Statistical Association, 65, 1509-1526.
https://doi.org/10.1080/01621459.1970.10481180
[16] Chan, K. (2008) Time Series Analysis with Applications in R. Springer Science, New York.
[17] Canova, F. and Hansen, B.E. (1995) Are Seasonal Patterns Constant over Time? A Test for Seasonal Stability. Journal of Business & Economic Statistics, 13, 237-252.
https://doi.org/10.1080/07350015.1995.10524598
[18] Sugiura, N. (1978) Further Analysts of the Data by Akaike’s Information Criterion and the Finite Corrections. Communications in Statistics—Theory and Methods, 7, 13-26.
https://doi.org/10.1080/03610927808827599
[19] Diebolt, F.X. (1998) Elements of Forecasting. South-Western College Publishing, Cincinnati.
[20] Abraham, B.J. and Ledolter, R.A. (2000) Introduction to Time Series and Forecasting. John Wiley and Sons, New York.
[21] Ljung, G. and Box, G. (1978) On a Measure of a Lack of t in Time Series Models. Biometrika, 65, 297-303.
https://doi.org/10.1093/biomet/65.2.297
[22] Wiah, E.N. and Twumasi-Ankrah, S. (2017) Impact of Climate Change on Cocoa Yield in Ghana Using Vector Autoregressive Model. Ghana Journal of Technology, 1, 32-39.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.