ANN-Time Varying GARCH Model for Processes with Fixed and Random Periodicity

Elias K. Karuiru; John Mwaniki Kihoro; Thomas Mageto; Anthony Gichuhi Waititu

doi:10.4236/ojs.2021.115040

Open Journal of Statistics > Vol.11 No.5, October 2021

ANN-Time Varying GARCH Model for Processes with Fixed and Random Periodicity

Elias K. Karuiru¹, John Mwaniki Kihoro², Thomas Mageto³, Anthony Gichuhi Waititu³
¹The Pan African University Institute for Basic Sciences, Technology and Innovation (PAUSTI), Nairobi, Kenya.
²School of Computing and Mathematics, Co-Operative University, Nairobi, Kenya.
³Department of Statistics and Actuarial Sciences, JKUAT, Nairobi, Kenya.
DOI: 10.4236/ojs.2021.115040 PDF HTML XML 143 Downloads 771 Views Citations

Abstract

Financial Time Series Forecasting is an important tool to support both individual and organizational decisions. Periodic phenomena are very popular in econometrics. Many models have been built aiding capture of these periodic trends as a way of enhancing forecasting of future events as well as guiding business and social activities. The nature of real-world systems is characterized by many uncertain fluctuations which makes prediction difficult. In situations when randomness is mixed with periodicity, prediction is even much harder. We therefore constructed an ANN Time Varying Garch model with both linear and non-linear attributes and specific for processes with fixed and random periodicity. To eliminate the need for time series linear component filtering, we incorporated the use of Artificial Neural Networks (ANN) and constructed Time Varying GARCH model on its disturbances. We developed the estimation procedure of the ANN time varying GARCH model parameters using non parametric techniques.

Keywords

Fixed Periodicity, Random Periodicity, Artificial Neural Network, Time Varying GARCH

Share and Cite:

Karuiru, E. , Kihoro, J. , Mageto, T. and Waititu, A. (2021) ANN-Time Varying GARCH Model for Processes with Fixed and Random Periodicity. Open Journal of Statistics, 11, 673-689. doi: 10.4236/ojs.2021.115040.

1. Introduction

Periodic phenomena such as business and temperature cycles appear in our daily life very often. Many models have been built aiding capture of these periodic trends as a way of enhancing forecasting of future events as well as guiding business and social activities. The nature of real world systems is characterized by many uncertain fluctuations which makes prediction difficult. Prediction is made much harder when there is mixed periodicity and randomness especially when noise is very large and periodicity is covered up. Currently, time series analysis involves describing the observed values as a path of a stationary process by fitting the data using different non-structural models. One of the most used models is the Autoregressive Moving Average (ARMA) model. ARMA model was introduced by Whittle in 1951 and was popularised by BOX and Jenkins in 1970. The model describes a time series as a stationary stochastic process which linearly combines an autoregressive polynomial of the current data to past data and a moving average polynomial of history noises up to lag q. When the time series is not stationary then de-trending and de-seasoning is needed. Considering the classical time series decomposition

$Y (t) = T (t) + S (t) + C (t) + X (t)$ (1)

where $Y (t)$ is the observed data, $T (t)$ is the trend component, $S (t)$ is the seasonal component, $C (t)$ is the cyclic component and $X (t)$ is the noise term. The ARMA model is applied to fit the noise term [1].

This method is quite limited due to inflexible assumption placed on the seasonal component. Franses in [2] recommends that much attention should be placed in the seasonal fluctuations when dealing with econometric data. This is caused by randomness in the seasonal variation that can more precisely explain the behaviour of economic agents. Franses in [2] also observed that the non-seasonal fluctuations in many periodic observed macroeconomic time series are non stationary over time and hence insinuating possibility of dependence between seasonal and non seasonal fluctuations in some time series. The classical decomposition model is hence seen to be limited. In-order to solve this problem, many sophisticated models have been built based on the ARMA model.

Autoregressive Conditional Heteroskedasticity (ARCH) model is also another commonly used model in financial time series. ARCH is a stationary non linear model. An ARCH(q) first models the stationary process by an AR(q) model, and takes the variance of the residual term as a q-th autoregressive polynomial relating to the history squared residuals back to lag q. The model was first proposed by Engle in 1982 and latter generalised by Bollerslev and Taylor in 1986 and termed as Generalised Autoregressive Conditional Heteroskedasticity (GARCH) model.

These models have improved the application range to the real world problems in time series. However, in periodic phenomena, randomness and periodicity estimation is still considered separately. The GARCH model for example fits a periodic function to the seasonal trend and adds it to the ARMA type process of the residuals linearly. This is hence seen as a limitation of the model.

The concept of random periodic processes describes the randomness and periodicity in stochastic process revolution simultaneously. It therefore inspires us to apply random periodic process to classical time series analysis as a way of describing the random periodic phenomena.

2. Statement of the Problem

Financial Time Series Forecasting is an important tool to support both individual and organizational decisions. The existing forecasting models which include the ARMA and ARCH models and their various forms are built under the linearity and non linearity assumption respectively. This assumption creates a limitation in forecasting complex real world problems which exhibit both linear and non linear properties. Further, these existing models concentrate only on random processes without considering that some processes exhibit random periodic properties. In addition, some processes also exhibit a mixture of fixed and random periodicity. This study therefore constructed a model with both linear and non-linear attributes and specific for processes with fixed and random periodicity. To eliminate the need for time series linear component filters, we incorporated the use of Artificial Neural Networks (ANN) which requires less background knowledge about the data.

ANN-Time Varying GARCH model for processes with fixed and random periodicity will be applied in financial markets whose products fluctuate periodically and also affected by other periodic but random factors. For retailers, for instance, the model may reveal consumer demand for winter clothes spikes at a distinct time period each year, information that would be important in forecasting production and delivery requirements.

3. Literature Review

Financial time series analysis involves development of a model that describes the relationship between the variable past observations. This model is then used to predict the future when the time series is extrapolated. This approach is applied when limited knowledge on the process of generating data is available or there is no satisfactory explanatory model that relates the prediction variable to the explanatory variables [3]. Time series forecasting have attracted many researchers leading to development and improvement of many models. In time series forecasting, model selection is very critical since it determines the accuracy of the forecasts. The determination of whether to fit a linear or non linear model is dependent on the relationship between the current value and the past observations [1].

The concept of random periodicity was introduced by Feng and Zhao in the year 2015. They discovered that many periodic real world systems are subject to the influence of randomness. They proposed that the mix between periodicity and randomness may be best described by random periodic motion rather than a periodic motion. Liu in [4] regarded stationary process as a special kind of random periodic processes. Random periodic processes describe a mixed structure of seasonal and random patterns [5]. Liu in [4] used random periodic process to replace stationary process in the analysis of time series as a way of enlarging the application and enhance accuracy of ARMA model estimation. Zhao in [6] established an ergodic theory in the random periodic regime with periodic measures and random periodic processes.

In his dissertation, [4] gave a counter example situation in which the classical times series additive structure is not sufficient as there is seasonal pattern in the irregular component. In the example, the de-seasoned data was stationary and satisfied an autoregressive equation. The autocorrelations and partial autocorrelations of residuals were near zero, which indicated the independency of noise. However, the autocorrelations and partial autocorrelations of squared residuals showed obvious periodic pattern, which indicated sufficient dependency of time in the volatility of noise. He also observed that the Shapiro-Wilk test rejected the “normal” hypothesis with very small p-value. But if tested on the periodic-point sequence of residuals, he observed that the Shapiro-Wilk test accepted that it satisfied a normal distribution. He therefore concluded that the standard procedure to analyze a time series by the classical decomposition fails to obtain the correct result for a mixed season and randomness case. He hence developed an ARMA model for random periodic process to model the seasonal and irregular component simultaneously.

We extend the work of [4] by considering that some processes also exhibit a mixture of fixed and random periodicity. We therefore constructed a model for processes with fixed and random periodicity to model financial time series which exhibit heteroscedastic property. To eliminate the need for time series linear component filters, we incorporated the use of Artificial Neural Networks (ANN) which do requires less background knowledge about the data.

4. Methods

1) Random Periodic Process

A random periodic process of period $τ$ of the random dynamic system $Φ : ℝ^{+} \times Ω \times M \to M$ is an $F$ -measurable map $R : ℝ^{+} \times Ω \to M$ such that for almost all $ω \in Ω$ ,

$Φ (t, θ (s) ω) R (s, ω) = R (t + s, ω), R (s + τ, ω) = R (s, θ (τ) ω)$ (2)

for any $s, t \in ℝ$ .

For a statistical description, we usually do not know the exact expression of the dynamical system driving the time series. We only consider the second equation in 2 as the definition of random periodic process, while the first part is hidden in the time series evolution [4]. For detailed definition on random periodic path, its properties and periodic measure refer to [5].

2) Decomposition of Time Series with fixed and Random periodicity

A time series is normally decomposed into four components namely trend, seasonal, cyclical and random. When these four components are believed to be independent then an additive model is employed. When the components are not necessarily independent then a multiplicative model is assumed [1]. Consider the classical decomposition model of time series

$Y (t) = T (t) + S (t) + C (t) + X (t)$ (3)

where $Y (t)$ is the observed data, $T (t)$ is the trend, $S (t)$ is the seasonal component $C (t)$ is the cyclical component and $X (t)$ is the random component. In some situations, there is seasonal pattern in the irregular component, in which the classical additive structure is not sufficient [4]. The insufficiency of the classical decomposition to figure out the mixed structure of the seasonal and noise components inspires us to make modification to it. We attempt to use random periodic process to describe the seasonal and noise components simultaneously. The modified model is as follows:

$Y (t) = T (t) + S_{f} (t) + C (t) + R (t)$ (4)

where

$R (t) = S (t) X (t)$ (5)

and $S_{f} (t)$ is the fixed periodic component. Therefore $R (t)$ is a random periodic process with the following properties:

1) $E [R {(t)}^{2}] < \infty$ .

2) $E [R (t)]$ is a deterministic periodic function of time t.

3) The autocovariance function of random periodic process is periodic function of time t on both indexes.

$Y (t)$ is therefore a process with fixed and random periodicity.

4.1. Periodic Time Series Models

Introduction

The introduction of periodic models into economics dates back to the late 1980s. By then, the focus was on describing trending consumption and income data, and the use of periodic models for out-of-sample forecasting.

The fixed periodic models assume that the coefficients of the underlying model are purely repetitive, that is, they vary with the fixed period s. That is, the number of the coefficients is s times more than the standard coefficients in the standard stochastic model. A fixed periodic model therefore permits a different standard stochastic model for each fixed period s. On the other hand, random periodic models have all the coefficients vary with time t. That is, all the coefficients are different from each other even if they are in the same random period $τ$ . Therefore, the major difference between fixed and random periodicity is that the same time points in different periods will have the same behaviour in fixed periodicity but there will be small fluctuations between different periods in random periodicity.

The following are the graphical representation of fixed periodicity, random periodicity and a combination of fixed and random periodicity (Figures 1-3).

Figure 1. Periodic time series with fixed periodicity $s = 10$ .

Figure 2. Periodic time series with random periodicity $τ = 10$ .

Figure 3. Periodic time series with fixed periodicity $s = 10$ and random periodicity $τ = 10$ .

4.2. Model for Processes with Fixed and Random Periodicity

4.2.1. ANN-Time Varying GARCH Model

Next we construct an ANN-Time Varying GARCH model for processes with fixed and random periodicity. Define the process with fixed periodicity s and random periodicity $τ$ as $Y (t)$ to be an ANN-Time Varying GARCH process, if for each t,

$Y_{t} = μ_{t} + R_{t},$ (6)

where

$μ_{t} = α_{0} + \sum_{j = 1}^{m} α_{j} f {V_{j} + \sum_{i = 1}^{d} V_{i j} Y_{t - i s}},$ (7)

$R_{t} = σ_{t} Z_{t},$ (8)

such that

$σ_{t} = \sqrt{w (t) + \sum_{i = 1}^{p} ϕ_{i} (t) R_{t - i}^{2} + \sum_{i = 1}^{q} θ_{i} (t) σ_{t - i}^{2}}$ (9)

and $Z_{t}$ is an i.i.d white noise. Therefore, the model is an ANN model with Time Varying GARCH disturbances. Where,

1) $α_{j}$ ( $j = 1, 2, \dots, m$ ) and $V_{i j}$ ( $j = 1, 2, \dots, m$ and $i = 1, 2, \dots, d$ ) are the hidden and input connection weights respectively.

2) $α_{0}$ and $V_{j}$ ( $j = 1, 2, \dots, m$ ) are the output and hidden layers connection bias respectively.

3) d is the number of nodes in the input layer.

4) m is the number of nodes in the hidden layer.

5) f is the hidden layer transfer function.

6) $R_{t}$ is a random periodic process.

7) $w (.)$ , $ϕ (.)$ and $θ (.)$ are non negative functions of time t.

Hence, the ANN model of (7) in fact performs a nonlinear functional mapping from the past observations $Y_{t - s}, Y_{t - 2 s}, Y_{t - 3 s}, \dots$ to the future value $Y_{t}$ therefore making connection weights of the model vary with period s.

Equation (8) is a Time Varying GARCH(p, q) process with time varying parameters $w (t), ϕ_{i} (t)$ and $θ_{i} (t)$ .

The estimation of parameters for the model will be done using non parametric techniques.

4.2.2. Assumptions of the Model

1) $Y (t)$ is a non-stationary process which can be decomposed as follows:

$Y (t) = T (t) + S (t) + C (t) + R (t)$ (10)

where $T (t)$ is the trend, $S_{f} (t)$ is the fixed periodic component with period s, $C (t)$ is the cyclical component and $R (t)$ is the random periodic component with period $τ$ .

2) $R (t)$ is a random periodic process with the following properties:

a) $E [R (t) | F_{t - 1}] = 0$ .

b) $V a r [R (t) | F_{t - 1}] = σ_{t}^{2}$ .

c) The autocovariance function of random periodic process $R (t)$ is periodic function of time t on both indexes.

$γ_{R} (t, r) = γ_{R} (t + τ, r + τ)$ (11)

for any $t, r \in ℤ$

Proof:

$\begin{matrix} γ_{R} (t + τ, r + τ) = E [(R (t + τ, ω) - m (t + τ)) (R (r + τ, ω) - m (r + τ))] \\ = E [(R (t, θ (τ) ω) - m (t)) (R (r, θ (τ) ω) - m (r))] \\ = E [(R (t, ω) - m (t)) (R (r, ω) - m (r))] \\ = γ_{R} (t, r) . \end{matrix}$ (12)

3) The conditional variance $σ_{t}^{2}$ of $R_{t}$ given information available up to time $t - 1$ , has an autoregressive structure and is positively correlated to its own recent past and to recent values of the squared returns $R_{t}^{2}$ . This captures the idea of volatility (conditional variance) being “persistent”: large (small) values of $R_{t}^{2}$ are likely to be followed by large (small) values.

4.3. Determination of Parameters

4.3.1. Determination of ANN Parameters

The Artificial Neural Network model has a number of parameters to be estimated prior to its application. The topology of the ANN is as follows Figure 4.

Figure 4. Topology of artificial neural network.

Further, an appropriate activation function needs first to be selected before the parameters are estimated. These parameters include biases, weights connecting the inputs with the hidden nodes $V_{i j}$ ( $j = 1, 2, \dots, m$ and $i = 1, 2, \dots, d$ ), weights linking the hidden nodes to the output node $α_{j}$ ( $j = 1, 2, \dots, m$ ), numbers of input nodes d and number of hidden nodes m.

Determination and selection of input nodes in the input layer and the number of hidden nodes in the output layer forms the major problems in artificial neural network specification. Problems, which can occur due to poor selection of the parameters, include: increased input dimensionality, increased computational complexity, increased memory requirements, increased learning difficult, mis-convergence, poor model accuracy and problem of understanding results from complex models [7].

Activation function

The activation function is a mathematical “gate” in between the input feeding the current neuron and its output going to the next layer [8] [9]. It can be as simple as a step function that turns the neuron output on and off, depending on a rule or threshold. Or it can be a transformation that maps the input signals into output signals that are needed for the neural network to function. The Common Nonlinear Activation Functions include: The sigmoid/logistic function which is defined as:

$f (x) = \frac{1}{1 + \exp (- x)}$ (13)

The logistic function has smooth gradient, bounded outputs and clear predictions advantages. However, the function is computationally expensive, not zero centred and has a vanishing gradient. The hyperbolic tangent function which is defined as

$\tanh (x) = 2 (\frac{1}{1 + \exp (- 2 x)}) - 1,$ (14)

unlike the the sigmoid function it is zero centred. Other activation functions include Rectified Linear Unit (ReLu) which is computationally efficient despite having the dying ReLu problem. The Leaky ReLU function despite solving the ReLu dying problem does not provide consistent predictions for negative input values. The parametric ReLu unlike leaky ReLU, this function provides the slope of the negative part of the function as an argument. It is, therefore, possible to perform backpropagation. We however adopt and apply the hyperbolic tangent activation function 14 since it zero centred and bounded.

Bias and Connection Weights

When the activation function is defined we select the training/learning algorithm which is appropriate. The parameter space includes the vector of the matrix $W_{1} = ((V_{i j}))$ of weights connecting the inputs with the hidden neurons and the matrix $W_{2} = ((α_{j}))$ of weights linking the hidden neurons to the output neuron and a vector of biases b. To estimate $Ψ = (W_{1}, W_{2}, b)$ , we minimize the following error function.

$E (Ψ) = \sum_{i = 1}^{N} {(Y_{t} - {\hat{Y}}_{t})}^{2}$ (15)

The optimization techniques for minimizing the error function 15 are referred to as learning rules [7]. The generalised delta rule method also referred to as the error back propagation is the best known method which continuously modifies connection weight to reduce the difference between the required and the actual output. For the learning process, the data must be divided in two sets: the training data set, which is used to calculate the error gradients and to update the weights; and the validation data set, which allows selecting the optimum number of iterations to avoid overlearning.

In this method the weights are adjusted as follows:

$\begin{array}{l} V^{r + 1} = V^{r} + Δ V \\ α^{r + 1} = α^{r} + Δ α \end{array}$ (16)

We have the r^th iteration weights as:

$α_{j}^{r + 1} = α_{j}^{r} - λ_{1} \frac{δ E (Ψ)}{δ α_{j}},$ (17)

for $j = 1, \dots, m$ and $λ_{1}$ is the step gain.

Similarly,

$V_{i j}^{r + 1} = V_{i j}^{r} - λ_{2} \frac{δ E (Ψ)}{δ V_{i j}},$ (18)

for $i = 1, \dots, d; j = 1, \dots, m$ and $λ_{2}$ is the step gain. The weights are adjusted until a stopping criterion is met. This method is however slow and unstable.

This rule has various variations such as gradient descent with momentum, gradient descent with adaptive learning rate, quasi-Newton, conjugate gradient, Scaled conjugate gradient and Levenberg-Marquardt. In spite of various modified back propagation training algorithms, some crucial limitations of the standard back propagation technique such as slow convergence rate still remain unresolved [3]. Standard batch backpropagation is the most popular training method of all, but it is slow, unreliable, and requires the tuning of the learning rate, which can be a tedious process [7]. We hence discuss and apply the Quasi Newton method which is very fast and reliable.

The Quasi Newton method starts by inputing an initial set of weights $Ψ^{0}$ from which $S^{2} (Y_{i}, Ψ^{0})$ is determined. From the second order Taylor operation principles $S^{2} (Y_{1}, Ψ^{1})$ can be found.

$Ψ^{1} = Ψ^{0} - B_{0}^{- 1} A_{0}$ (19)

where $B_{0}^{- 1} A_{0}$ represents the change direction. $B_{0}$ represents the direction angle while $A_{0}$ represents the direction size. The direct off-diagonal elements of the matrix $B_{0}$ are evaluated as

$B_{0} = \frac{δ^{2} S^{2 (Y_{i}, Φ^{0})}}{δ Φ^{0, d} δ Φ^{0, m}} .$ (20)

While the direct diagonal elements of matrix $B_{0}$ are evaluated as

$B_{0} = \frac{δ^{2} S^{2 (Y_{i}, Φ^{0})}}{{(δ Φ^{0, d})}^{2}} .$ (21)

The minimization then continuous from iteration 1 to 2 until the stopping criterion is met. Generally, the iterations are given as:

$Ψ^{r + 1} = Ψ^{r} - B_{r}^{- 1} A_{r} .$ (22)

However, the Hessian matrix B may become non-singular. This means that $B_{r}^{- 1}$ would be undefined. The method solves the problem by numerically approximating $B_{r}$ . The method redefines the general equation as

$Ψ^{r + 1} = Ψ^{r} - M_{r} B_{r}^{- 1} A_{r},$ (23)

where $M_{r}$ is the step length. By letting,

$a_{r} = Ψ^{r + 1} - Ψ^{r} = M_{r} B_{r}^{- 1} A_{r},$ (24)

represent the change in parameters in the r^th iteration and

$b_{r} = A^{r + 1} + A^{r} .$ (25)

The method has

$B_{r + a} a_{r} = b_{r} .$ (26)

Therefore, $B_{r + 1}$ is the ratio of the change in the gradient to the change in the parameters.

The Stopping rules commonly used are:

1) $| Φ^{r + 1} - Φ^{r} | < ε$ , for $ε > 0$ but small.

2) $| S (Y, Φ^{r + 1}) - S (Y, Φ^{r}) | < ε$ , for $ε > 0$ but small.

3) $S (Y, Φ^{r + 1})$ is less than a specified lower bound.

4) r is greater than a specified number of iterations.

It is worth noting that rule (4) can be applied together with rule (1), rule (2) and rule (3).

Number of nodes in Input Layer

The number of input nodes is the number of lags that will be considered as inputs in the neural network input layer. To determine which lags to include in an input set Y of variables, autocorrelations and partial autocorrelations analysis together with AIC and its variation have been used, but they have not been very helpful [7]. Other methods such as network pruning have been developed but they have a problem associated with computation of the elements of the Hessian matrix which are the second derivatives of the error function with respect to the training weights.

The automatic relevance determination method developed by Mackay (1992) and Neal (1996) defines a prior over the regression parameters that embody the concept of uncertain relevance, so that the model is effectively able to infer which variables are relevant and then switch the others off thus preventing those inputs from causing significant over-fitting [7]. This method may be applied to any univariate time series problem if an autoregressive neural network (ARNN) is to be fitted to the data. An improvement of Mackay’s ARD method uses a sub set of the weights, with a clear cut method of selecting the set to ensure that the effects of a particular input lag are well represented [10]. This method is computationally fast, as it requires no evaluation of eigenvalues of the Hessian matrix, which is not easy [7].

Number of nodes in Hidden Layer

A multi layer feedforward network with at least one hidden layer and a sufficient number of hidden nodes is powerful enough to represent any form of time series [7]. Deciding the number of neurons in the hidden layers is a very important part of deciding your overall neural network architecture. Using too few neurons in the hidden layers will result in something called underfitting. Underfitting occurs when there are too few neurons in the hidden layers to adequately detect the signals in a complicated data set. Using too many neurons in the hidden layers can result in several problems. First they may result in overfitting. Overfitting occurs when the neural network has so much information processing capacity that the limited amount of information contained in the training set is not enough to train all of the neurons in the hidden layers [11]. Secondly, large number of neurons in the hidden layers can increase the time it takes to train the network.

Several researchers have proposed some rules of thumb for determining an optimal number of hidden units for any application. One rule of thumb is for the size of this hidden layer to be somewhere between the input layer size and the output layer size [12]. Blum in [12] also suggested that the size of the hidden layer should never be more than twice as large as the input layer. Another rule of thumb is that there should be as many hidden nodes as dimensions needed to capture 70 - 90 percent of the variance of the input data set [6]. However, most of those rules are not applicable to most circumstances as they do not consider the training set size and the complexity of the data set to be learnt [11].

Another selection algorithm, a critical value is chosen arbitrarily first. The final structure is built up through the iteration that a new node is created in the hidden layer when the training error is below the critical value [13]. On the other hand, [14] proposed an approach which is similar to [13] but removes nodes when small error values are reached. Rivals in [15] also provided a selection procedure for neural networks based on least squares estimation and statistical tests. Xu and Chen, in [11] developed a novel mathematical method for determining the number of hidden nodes in the hidden layer.

$m = C {(N / (d \log N))}^{1 / 2}$ (27)

where m is the number of hidden nodes, d is the input dimension of the target function, N is the number of training pairs, and C is a constant which does not depend on any function. Another method is the geometric pyramid rule which roughly approximates the number of hidden neurons using the following function:

$m = \sqrt{d * O},$ (28)

where m is the number of hidden neurons, d is the number of input nodes and O is the number of outputs.

The most commonly used rule in determining the size of hidden layer is to train a network successfully with one hidden node then two and so on as one monitors the error for the validation data set. This error decreases with every increment in number of hidden nodes m until overfitting begins. At this point training is stopped and this m is taken to be the best choice.

4.3.2. Time Varying GARCH Parameters

Next we consider the following model

$R_{t} = σ_{t} Z_{t},$ (29)

such that

$σ_{t} = \sqrt{w (t) + \sum_{i = 1}^{p} ϕ_{i} (t) R_{t - i}^{2} + \sum_{i = 1}^{q} θ_{i} (t) σ_{t - i}^{2}}$ (30)

and $Z_{t}$ is an i.i.d white noise.

Equation (29) is a time varying GARCH(p, q) model. As a process of obtaining meaningful asymptotic theory, the domain of parameter functions of Equation (29) is rescaled to unit interval. Therefore, we study the following process,

$σ_{t} = \sqrt{w (\frac{t}{n}) + \sum_{i = 1}^{p} ϕ_{i} (\frac{t}{n}) R_{t - i}^{2} + \sum_{i = 1}^{q} θ_{i} (\frac{t}{n}) σ_{t - i}^{2}}, t = 1, 2, \dots, n$ (31)

The stochastic process $R_{t}$ is therefore said to be a Time Varying GARCH if it satisfies 31. The functions $w (u), ϕ (u), θ (u) \geq 0$ $\forall u \in (0,1]$ ensures non-negativity of $σ_{t}^{2}$ . We define $w (u), ϕ (u), θ (u) = 0$ for $u < 0$ . Rescaling techniques as such does not affect estimation procedure and are common in non parametric regression [10].

Rohan and Ramanathan in [16] developed a two step local polynomial non parametric estimation procedure for time varying GARCH(1, 1) process. We therefore generalise the local polynomial non parametric procedure to estimate the parameters for time varying GARCH(p, q) process.

We first state the following technical assumptions:

1) There exists $δ > 0$ such that

$0 < \sum_{i = 1}^{p} ϕ_{i} (u) + \sum_{i = 1}^{q} θ_{i} (u) \leq 1 - δ, \forall 0 < u \leq 1$ (32)

2) There exist finite constants $M_{1}, M_{2}, \dots, M_{1 + p + q}$ such that $\forall u_{1}, u_{2} \in (0,1]$ ,

$| w (u_{1}) - w (u_{2}) | \leq M_{1} | u_{1} - u_{2} |$

$| ϕ_{1} (u_{1}) - ϕ_{1} (u_{2}) | \leq M_{2} | u_{1} - u_{2} |$

$⋮$

$| ϕ_{p} (u_{1}) - ϕ_{p} (u_{2}) | \leq M_{1 + p} | u_{1} - u_{2} |$

$| θ_{1} (u_{1}) - θ_{1} (u_{2}) | \leq M_{2 + p} | u_{1} - u_{2} |$

$⋮$

$| θ_{q} (u_{1}) - θ_{q} (u_{2}) | \leq M_{1 + p + q} | u_{1} - u_{2} |$

Assumption (i) is similar to stationarity condition for GARCH(p, q) model. A well defined and unique solution for time varying GARCH(p, q) process requires this condition to hold. The local stationarity of the time varying GARCH(p, q) requires the Lipschitz continuity condition for the parameters defined in assumption (ii). We do not make any assumptions on the density function of $R_{t}$ .

We define a stationary GARCH(p, q) process which can locally approximate the original time varying GARCH(p, q) process 31 in the neighbourhood of a fixed point $U_{0}$ . Let $R_{t} (u_{0}), u \in (0,1]$ be a process with the following properties;

1) $E [R_{t} (u_{0}) | F_{t - 1}] = 0$ .

2) $V a r [R_{t} (u_{0}) | F_{t - 1}] = σ_{t}^{2} (u_{0})$ .

Then, $R_{t} (u_{0})$ is said to follow a stationary GARCH(p, q) process which is associated with 31 at time point $u_{0}$ if it satisfies;

$R_{t} (u_{0}) = σ_{t} (u_{0}) Z_{t},$ (33)

such that,

$σ_{t} (u_{0}) = \sqrt{w (u_{0}) + \sum_{i = 1}^{p} ϕ_{i} (u_{0}) R_{t - i}^{2} + \sum_{i = 1}^{q} θ_{i} (u_{0}) σ_{t - i}^{2}}, t = 1, 2, \dots, n$ (34)

which is a stationary ergodic process if assumption (1) holds. Using recursive substitution and strong law of large numbers it can be shown that the process 34 has a well defined solution given by

${\hat{σ}}_{t}^{2} (u_{0}) = w (u_{0}) + \sum_{a = 1}^{\infty} \prod_{b = 1}^{a} (\sum_{i = 1}^{p} ϕ_{i} (u_{0}) Z_{t - b}^{2} + \sum_{j = 1}^{q} θ_{j} (u_{0})) w (u_{0}),$ (35)

such that $| σ_{t}^{2} - {\hat{σ}}_{t}^{2} | \to 0$ almost sure, if $σ_{0}^{2}$ is finite with probability one.

The local polynomial estimation of the time varying GARCH(p, q) can be done in two steps. The first step involves obtaining a preliminary estimate of $σ_{t}^{2}$ using a time varying ARCH(p) model. The second step involves reaching the estimators of time varying GARCH(p, q) parameter functions.

Step 1. We estimate the prior $σ_{t}^{2}$ using time varying ARCH(p) model;

$σ_{t}^{2} = \sum_{i = 1}^{p} ϕ_{i} (\frac{t}{n}) R_{t - i}^{2},$ (36)

which can also be defined as

$R_{t}^{2} = ϕ_{0} (\frac{t}{n}) + \sum_{i = 1}^{p} ϕ_{i} (\frac{t}{n}) R_{t - i}^{2} + σ_{t}^{2} (Z_{t}^{2} - 1) .$ (37)

Treating $σ_{t}^{2} (Z_{t}^{2} - 1)$ as error we use local polynomial technique to estimate the functions $ϕ_{i} (u), i = 0, 1, \dots, p$ as described in [16]. Therefore, given a Kernel function $K (\cdot)$ , we get the estimator at point $u_{0}$ by minimizing,

$L = \sum_{j = p + 1}^{n} {(R_{j}^{2} - \sum_{k = 0}^{g} (ϕ_{0 k} + \sum_{i = 1}^{p} ϕ_{i k} R_{j - i}^{2}) {(u_{j} - u_{0})}^{k})}^{2} K_{h_{1}} (u_{j} - u_{0}) .$ (38)

where g is the polynomial degree, $\frac{t}{n} = u_{t}$ , $K_{h_{1}} (.) = (\frac{1}{h_{1}}) K (\frac{.}{h_{1}})$ and $h_{1}$ is the bandwidth.

Therefore, the initial $σ_{t}^{2}$ is given by,

${\hat{σ}}_{t}^{2} = {\hat{ϕ}}_{0} (\frac{t}{n}) + \sum_{i = 1}^{p} {\hat{ϕ}}_{i} (\frac{t}{n}) R_{t - i}^{2} .$ (39)

Step 2. We use the conditional variance initially estimated in Step 1 to get the estimates of the time varying GARCH(p, q) parametric functions. We can write 35 as

$\begin{matrix} R_{t}^{2} = w (\frac{t}{n}) + \sum_{i = 1}^{p} ϕ_{i} (\frac{t}{n}) R_{t - i}^{2} + \sum_{i = 1}^{q} θ_{i} (\frac{t}{n}) {\hat{σ}}_{t - i}^{2} \\ - Θ (\frac{t}{n}) (\sum_{i = 1}^{q} ({\hat{σ}}_{t - i}^{2} - σ_{t - i}^{2})) + σ_{t}^{2} (Z_{t}^{2} - 1) . \end{matrix}$ (40)

Here, for a particular choice of bandwidth $E ({\hat{σ}}_{t}^{2} - σ_{t}^{2})$ is asymptotically negligible. Treating $σ_{t}^{2} (Z_{t}^{2} - 1)$ as error we use local polynomial technique to estimate the functions $ϕ_{i} (u), i = 0,1, \dots, p$ and $θ_{i} (u), i = 0, 1, \dots, q$ as described in [16]. The estimates at point $u_{0}$ are obtained by minimizing

$L = \sum_{t = 2}^{n} {(R_{t}^{2} - \sum_{k = 0}^{g} (w_{k 2} + \sum_{i = 1}^{p} ϕ_{i} R_{t - i}^{2} + \sum_{j = 1}^{q} θ_{j} σ_{t - j}^{2}) {(u_{t} - u_{0})}^{k})}^{2} K_{h_{2}} (u_{t} - u_{0}) .$ (41)

where g is the polynomial degree, $\frac{t}{n} = u_{t}$ , $K_{h_{2}} (.) = (\frac{1}{h_{2}}) K (\frac{.}{h_{2}})$ and $h_{2}$ is the bandwidth.

The final estimates for $σ_{t}^{2}$ in time varying GARCH(p, q) is obtained using these estimates as follows

${\hat{σ}}_{t}^{2} = \hat{w} (\frac{t}{n}) + \sum_{i = 1}^{p} {\hat{ϕ}}_{i} (\frac{t}{n}) R_{t - i}^{2} + \sum_{j = 1}^{q} {\hat{θ}}_{j} (\frac{t}{n}) σ_{t - i}^{2} .$ (42)

Bandwidth Selection

The bandwidth governs the complexity of the model, and therefore the choice of the smoothing parameter h is of crucial importance for every kernel regression. Generally, we have to choose the bandwidth carefully to balance the bias and variance. A large bandwidth over-smooths the data resulting to underfitting, whereas a small bandwidth restricts neighbourhood size resulting in overfitting [17]. The data-driven bandwidth selection procedures most commonly used are, direct plug-in method, cross-validated bandwidth method, least-squares cross-validation method, smoothed cross-validation method, and the contrast method. Using the cross validation method based on the best linear predictor of $R_{t}^{2}$ given the past. The bandwidth h is chosen for which

$C V (h) = \frac{1}{n - 1} \sum_{t = 2}^{n} (R_{t}^{2} - \hat{w} (\frac{t}{n}) + \sum_{i = 1}^{p} {\hat{ϕ}}_{i} (\frac{t}{n}) R_{t - i}^{2} + \sum_{j = 1}^{q} {\hat{θ}}_{j} (\frac{t}{n}) σ_{t - i}^{2}),$ (43)

is minimum, where the estimators are obtained through local polynomial technique by leaving the t^th observation. The bandwidth selection procedure is computationally too cumbersome, especially when n is large [16]. For a simplified version of cross validation method to reduce the computational complexity and make the bandwidth selection easy and doable refer to [16].

In random periodic case we apply a fixed value for bandwidth h which is proportional to random periodicity $τ$ . That is,

$h = \frac{τ}{n} .$ (44)

Therefore, we will only consider $τ$ nearest neighbours of $u_{0}$ for each target covariate $u_{0}$ .

Kernel Function

The kernel function choice $K (u)$ determines the local neighborhood’s shape over which the smoothing is performed. The kernel function has the following properties:

1) $K (u) \geq 0, \forall u$

2) $\int_{- \infty}^{\infty} K (u) d u = 1$

3) $K (- u) = K (u), \forall u$

The first two properties are those of probability density function and the third property implies that the Kernel density is symmetrical. The particular form of the function has only a relatively small effect on estimation accuracy. Hence, Gaussian kernel

$K (u) = \frac{1}{{(2 π)}^{0.5}} e^{- \frac{1}{2} u^{2}},$ (45)

which is differentiable with low computational complexity is most commonly used.

5. Conclusions and Suggestions

Modelling complex periodic time series is quite tedious particularly when periodicity is mixed with randomness. The ANN Time varying GARCH model incorporates the modelling power of artificial neural networks and time varying GARCH models in order to model processes with fixed and random periodicity. The proposed model whose theoretical background in parameter estimation is developed in this paper through non parametric methods forms one of the candidates models in modelling complex financial time series. More research on the model properties and application with real data is still needed in order to entirely develop a comprehensive theoretical framework of the proposed model.

Acknowledgements

We wish to thank the reviewers.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	Adhikari, R. and Agrawal, R. (2013) An Introductory Study on Time Series Modeling and Forecasting.
[2]	Frances, P. (1996) Periodicity and Stochastic Trends in Economic Time Series. OUP Catalogue, Oxford University Press, Oxford.
[3]	Zhang, G. (2003) Time Series Forecasting Using a Hybrid ARIMA and Neural Network Model. Neurocomputing, 50, 159-175. https://doi.org/10.1016/S0925-2312(01)00702-0
[4]	Liu (2019) ARMA Model for Random Periodic Processes. Loughborough University, Loughborough.
[5]	Feng and Zhao (2015) Random Periodic Processes, Periodic Measures and Ergodicity.
[6]	Boger, Z. and Guterman, H. (1997) Knowledge Extraction from Artificial Neural Network Models. 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, Orlando, 12-15 October 1997, Vol. 4, 3030-3035.
[7]	Kihoro, J., Otieno, R. and Wafula, C. (2004) Seasonal Time Series Forecasting: A Comparative Study of ARIMA and ANN Models. African Journal of Science and Technology, 5, 41-49.
[8]	Bishop Christopher, M., et al. (1995) Neural Networks for Pattern Recognition. Oxford University Press, Oxford.
[9]	Bouwmans, T., Javed, S., et al. (2019) Deep Neural Network Concepts for Background Subtraction: A Systematic Review and Comparative Evaluation. Neural Networks, 117, 8-66. https://doi.org/10.1016/j.neunet.2019.04.024
[10]	Dahlhaus, R., Rao, S., et al. (2006) Statistical Inference for Time-Varying ARCH Processes. Annals of Statistics, 34, 1075-1114. https://doi.org/10.1214/009053606000000227
[11]	Xu, S. and Chen, L. (2008) A Novel Approach for Determining the Optimal Number of Hidden Layer Neurons for FNN’s and Its Application in Data Mining.
[12]	Blum, A. (1992) Neural Networks in C++ an Object-Oriented Framework for Building Connectionist Systems. John Wiley and Sons, Inc., Hoboken.
[13]	Ash, T. (1989) Dynamic Node Creation in Back-Propagation Networks. Connection Science, 1, 365-375. https://doi.org/10.1080/09540098908915647
[14]	Hirose, Y., Yamashita, K. and Hijiya, S. (1991) Back-Propagation Algorithm Which Varies the Number of Hidden Units. Neural Networks, 1, 61-66. https://doi.org/10.1016/0893-6080(91)90032-Z
[15]	Rivaks, I. and Personnaz, L. (2000) A Statistical Procedure for Determining the Optimal Number of Hidden Neurons of a Neural Model. Second International Symposium on Neural Computation (NC’2000), Berlin, 23-26 May 2000, 14-17.
[16]	Rohan, N. and Ramathan, T. (2013) Nonparametric Estimation of a Time-Varying GARCH Model. Journal of Nonparametric Statistics, 25, 33-52. https://doi.org/10.1080/10485252.2012.728600
[17]	Gajewicz, S., Kar, S. and Piotrowska, M. (2021) The Kernel-Weighted Local Polynomial Regression (KwLPR) Approach: An Efficient, Novel Tool for Development of QSAR/QSAAR Toxicity Extrapolation Models. Journal of Cheminformatics, 13, 1-20. https://doi.org/10.1186/s13321-021-00484-5

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies