A Normal Weighted Inverse Gaussian Distribution for Skewed and Heavy-Tailed Data ()
1. Introduction
It is well known that mixture distributions produce flexible models with good statistical and probabilistic properties. Our first objective, therefore, is to construct and obtain properties of a finite mixture of two special cases of Generalized Inverse Gaussian distribution. These two special cases are related to the inverse Gaussian distribution which is also a special case of Generalised Inverse Gaussian distribution.
The Generalized Hyperbolic Distribution (GHD) introduced by Barndorff-Nielsen [1] as a Normal Variance-Mean Mixture is obtained when the Generalized Inverse Gaussian (GIG) distribution is the mixing distribution. Barndorff-Nielsen [2] introduced the Normal Inverse Gaussian (NIG) distribution obtained
when the mixing distribution is Inverse Gaussian (IG). The IG is obtained as a special case of GIG when the index parameter
.
The two special cases and their finite mixture are weighted Inverse Gaussian distributions. Using this finite mixture as a mixing distribution to the Normal Variance Mean Mixture we get a Normal Weighted Inverse Gaussian (NWIG) distribution. The second objective, therefore, is to construct and obtain properties of the NWIG distribution.
The maximum likelihood parameter estimates of the proposed model are estimated via EM algorithm and three data sets are used for application.
In literature, the Normal Inverse Gaussian (NIG) distribution has been used repeatedly for financial data which are skewed, leptokurtic and heavy-tailed because they are collected over short-time intervals, such as daily or weekly. Our third objective is to compare the log-likelihood functions of NWIG and NIG distributions.
Generalized Inverse Gaussian distribution has three parameters
. The distribution is denoted by
. When
, we have
which is an Inverse Gaussian (IG) distribution. If
, we have
which is Reciprocal Inverse Gaussian distribution. The third special case is
.
and
are expressed in terms of
and are weighted IG distributions. Their finite mixture; i.e.,
is also WIG.
The concept of weighted distribution was introduced by Fisher [3] and elaborated by Patil and Rao [4]. Gupta and Kundu [5] considered the finite mixture of the IG and the length biased IG distributions. Generalized Hyperbolic Distribution (GHD) is a normal variance mean mixture with GIG mixing distribution. It
is a five parameter distribution denoted by
. For
we have a normal Inverse Gaussian (NIG) distribution. For
and
we have normal weighted Inverse Gaussian (NWIG) distributions.
The rest of the paper is organised as follows: section 2 deals with the proposed mixing distributions. Section 3 is on the proposed mixed model, posterior distribution and posterior expectations. Section 4 is on the EM algorithm estimation procedure. Application and Conclusion are in section 5 and 6 respectively.
2. Proposed Mixing Distribution
We show that two special cases of Generalised Inverse Gaussian (GIG) distribution can be expressed as Weighted Inverse Gaussian (WIG) distribution. A finite mixture of these cases can also be expressed as WIG distribution. The Generalized Inverse Gaussian (GIG) distribution is given by
(1)
where
and
is the Modified Bessel function of the third kind of order
evaluated at point
.
In short form, it is stated as
.
The moments around the origin of the
distribution are given by
(2)
Remark: This expectation formula works when r is also a negative integer.
Special Cases
When
(3)
This is an Inverse Gaussian (IG) distribution.
When
(4)
This is a Reciprocal Inverse Gaussian (RIG) distribution.
When
(5)
which is the
.
Using the concept of weighted distribution introduced by Fisher (1934) it can be shown that the two special cases are weighted inverse Gaussian distribution. More specifically, we express
and
in terms of
as follows:
(6)
and
(7)
A finite mixture of the two cases is given by
Put
(8)
(9)
3. Proposed Model
Construction of the Mixed Model
Suppose the conditional of x given z is
. If z follows itself distribution defined by formula (9). The mixed model is constructed as follows
(10)
(11)
where
and
The log-likelihood function
(12)
Posterior Expectation
(13)
Similarly,
(14)
(15)
4. EM Algorithm
4.1. Introduction
EM algorithm is a powerful technique for maximum likelihood estimation for data containing missing values or data that can be considered as containing missing values. It was introduced by Dempster et al. [6].
Karlis [7] considers the mixing operation responsible for producing missing data.
Assume that the true data are made of an observed partX and unobserved part Z. Kosta [8] observes the log likelihood of the complete data
for
factorizes into two parts. This implies that the joint density of X and Z is given by
.
The likelihood function is
where
and
.
4.2. M-Step for the Conditional Probability
Since
then
i.e.,
(16)
(17)
4.3. M-Step for the Mixing Distribution
From formula (9)
(18)
Therefore
(19)
Differentiating w.r.t
we obtain
implies that
(20)
Similarly
implies that
(21)
4.4. E-Step
Values of random variables
,
and
are not known. So we estimate them by considering posterior expectations
,
and
as given in formulae (12), (13) and (14) respectively. Let
,
and
.
The k-th iterations are as follows
(22)
(23)
(24)
For the log-likelihood, the k-th iteration is given as
(25)
4.5. Iterative Scheme
From Equations (19) and (20), we obtain the following iterative scheme
(26)
(27)
From Equations (15) and (16) we also obtain
(28)
(29)
(30)
5. Application
Let (
) denote the price process of a security at time t, in particular of a stock. In order to allow comparison of investments in different securities we shall investigate the rates of return defined by
.
In this section, we consider three data sets for data analysis. They include: Range Resource Corporation (RRC), Shares of Chevron Corporation (CVX) and s&p500 index. The histogram for the weekly log-returns in Figure 1 for RRC illustrates that the data is negatively skewed and exhibits heavy tails. The Q-Q plot shows that the normal distribution is not a good fit for the data, especially at the tails. This is also similar for the other data sets.
Table 1 provides descriptive statistics for the return series in consideration. We observe that the data sets experience excess kurtosis indicates the leptokurtic behaviour of the returns. The log-returns have distributions with relatively heavier tails than the normal distribution. The skewness indicates that the two tails of the returns behave differently.
Table 2 below gives the method of moment estimates of NIG for the three data sets. The estimates will be used as initial values for the EM-algorithm.
The stopping criterion is when
(31)
where tol is the tolerance level chosen; e.g 10−6 and
as given in Equation (11). We now wish to obtain the maximum likelihood parameter estimates of the data sets for the proposed model via the EM algorithm. Tables 3-5 illustrate monotonic convergence at different levels. The loglikelihood and AIC for each data set are also provided.
Figure 1. Histogram and Q-Q plot for RRC weekly log-returns.
Table 1. Summary statistics for the data sets.
Table 2. NIG method of moment estimates for the data sets.
Table 3. Maximum likelihood estimates of the proposed model for RRC.
Table 4. Maximum likelihood estimates of the proposed model for CVX.
Table 5. Maximum likelihood estimates of the proposed model for s&p500 index.
Figures 2-4 show that the proposed models is a good fit the data sets.
Remark:
Expressing the proposed model in terms of its components we have
(32)
Using the estimates we obtain the estimates of p for the data sets as shown in Table 6 below:
The finite mixture for these data sets is more weighted to the NRIG than the other special case of the GHD when
.
Figure 2. Fitting the proposed model to RRC log weekly returns.
Figure 3. Fitting the proposed model to CVX log weekly returns.
Figure 4. Fitting the proposed model to s&p500 index log weekly returns.
Table 6. Estimates of p for the data sets.
6. Conclusions
Two special cases of the Generalized Inverse Gaussian have been shown to be Weighted Inverse Gaussian distributions. Their mixture has been used as a mixing distribution for Normal Variance-Mean mixture to a Normal Weighted Inverse Gaussian Model. The mean and variance of the proposed model have been obtained.
Three data sets: Range Resource Corporation (RRC), Shares of Chevron Corporation (CVX) and s&p500 index for the period 3/01/2000 to 1/07/2013 with 702 observations have been used for data analysis. An iterative scheme has been presented for parameter estimation by the EM algorithm. The iterative scheme demonstrates a monotonic convergence. The method of moment estimates for NIG worked well for the three data sets. The model fits the data sets well.