Value at Risk and Expected Shortfall for Normal Variance Mean Mixtures of Finite Weighted Inverse Gaussian Distributions

Abstract

The Normal Inverse Gaussian (NIG) distribution, a special case of the Generalized Hyperbolic Distribution (GHD) has been frequently used for financial modelling and risk measures. In this work, we consider other normal Variance mean mixtures based on finite mixtures of special cases for Generalised Inverse Gaussian as mixing distributions. The Expectation-Maximization (EM) algorithm has been used to obtain the Maximum Likelihood (ML) estimates of the proposed models for some financial data. We estimate Value at risk (VaR) and Expected Shortfall (ES) for the fitted models. The Kupiec likelihood ratio (LR) has been applied for backtesting of VaR. Akaike Information Creterion (AIC), Bayesian Information Creterion (BIC) and Log-likelihood have been used for model selection. The results clearly show that the proposed models are good alternatives to NIG for determining VaR and ES.

Share and Cite:

Maina, C. , Weke, P. , Ogutu, C. and Ottieno, J. (2022) Value at Risk and Expected Shortfall for Normal Variance Mean Mixtures of Finite Weighted Inverse Gaussian Distributions. Journal of Mathematical Finance, 12, 150-178. doi: 10.4236/jmf.2022.121010.

1. Introduction

The most popular measures for financial risk are Value at Risk (VaR) and Expected Shortfall (ES). These risk measures are based on return distributions. VaR is generally defined as possible maximum loss over a given holding period within a fixed confidence level. An attractive feature of VaR is the backtestability of the measure. Backtesting is a method that uses historical data to gauge accuracy and effectiveness (Zhang and Nadarajah [1] ). However, the main shortcoming of VaR is that it ignores any loss beyond the value at risk level. That is, it fails to capture tail risk. It also lacks a mathematical property called subadditivity as stated by Wimmerstedt [2]. That is, VaR for two combined portfolios can be larger than VaR for the sum of the two portfolios independently. This implies that diversification could increase risk, a contradiction to standard beliefs in finance.

Artzner et al. [3] [4] have proposed the use of Expected Shortfall (ES) also called conditional Value at Risk (CVaR) to circumvent the problems inherent in VaR. Expected Shortfall is the conditional expectation of loss given that the loss is beyond the VaR level. Nadarajah et al. [5] have given a detailed review of VaR and ES for various distributions. One of the distributions reviewed is the Generalized Hyperbolic Distribution (GHD) introduced by Barndorff-Nielsen [6] as a Normal Variance-Mean Mixture with the Generalized Inverse Gaussian (GIG) distribution as the mixing distribution. The most common special case is Normal Inverse Gaussian (NIG) distribution introduced by Barndorff-Nielsen [7] with the Inverse Gaussian (IG) as the mixing distribution.

The objective of this paper is to determine VaR and ES for some financial data using Normal Weighted Inverse Gaussian (NWIG) distributions. In particular we consider Normal mixtures with finite mixtures of G I G ( 1 2 , δ , γ ) , G I G ( 3 2 , δ , γ ) and G I G ( 3 2 , δ , γ ) as mixing distribution. We study their properties and estimate parameters using the Expectation Maximization algorithm introduced by Dempster et al. [8]. Akaike Information Creterion (AIC), Bayesian Information Creterion (BIC) and Log-likelihood have been used for model selection.

The concept of a weighted distribution was introduced by Fisher [9] and elaborated by Patil and Rao [10]. Reciprocal Inverse Gaussian and the finite mixture of Inverse Gaussian and Reciprocal Inverse Gaussian distribution are shown to be Weighted Inverse Gaussian (WIG) distributions by Akman and Gupta [11], Gupta and Akman [12], Gupta and Kundu [13]. Backtesting for value at Risk of the proposed models we use the Kupiec likelihood ratio (LR) introduced by Kupiec [14].

2. Value at Risk and Expected Shortfall: Mathematical Background

The most important risk measures despite their drawbacks are Value at Risk (VaR) and Expected Shortfall (ES). VaR was proposed by Till Guldimann in the late 1980s, and at the time he was the head of global research at J. P. Morgan.

Value at Risk is generally defined as possible maximum loss over a given holding period within a fixed confidence level. Mathematically VaR at the (100-α) percent confidence level is defined as the lower 100α percentile of the profit-loss distribution.

In statistical terms, VaR is a quantile of distribution for financial asset returns. More formally, VaR is defined as

P { X V a R 1 α X } = α (2.1)

where X represents the Asset’s returns. In integral form it can be expressed as

V a R α X f ( x ) d x = α (2.2)

where f(x) is the profit-loss distribution.

The concept of Expected Shortfall (ES) was first introduced in Rappoport [15]. Artzner et al. [3] [4] formally developed the concept. ES is the conditional expectation of loss given that the loss is beyond the VaR level and measures how much one can lose on average in the states beyond the VaR level.

From Equation (2.2)

1 α V a R α f ( x ) d x = 1 (2.3)

Therefore f ( x ) α is a pdf for < x < V a R α and we refer to it as “Tail loss distribution”.

Conditional Expectation

E [ X | X < V a R α ] = V a R α x f ( x ) α d x (2.4)

is the Expected Shortfall denoted as E S α . This version was used by Yamai and Yoshima [16] to obtain the ES for a normal distribution. Equation (2.4) can be expressed in a different version as follows: Defining F(x) as the cdf of the random variable X, let

u = F ( x ) x = F 1 ( u )

d u = f ( x ) d x

when

x = u = 0

x = V a R α u = α

E S α = 1 α 0 α F 1 ( u ) d u = 1 α 0 α V a R u d u (2.5)

as presented by Zhang et al. [17].

Remarks: Equation (2.4) is the mean of the loss distribution. Equation (2.5) represents the average of the VaR between 0 and α . The loss distribution, f ( x ) α , < x < V a R α gives the tail distribution.

For the purpose of VaR and ES analysis, a model for the return distribution is important because it describes the potential behaviour of a financial security in the future (Bams and Wielhouwer [18] ). A Normal distribution supposedly underestimates the tail and hence VaR. Recently alternative distributions have been proposed that focus more on tail behaviour of the returns. One such candidate is the Normal Inverse Gaussian (NIG) distribution. We consider extensions of NIG distribution as Normal Weighted Inverse Gaussian (NWIG) distributions. In the next few sections we give a detailed illustration on their construction, properties and parameter estimation via EM-algorithm.

3. Weighted Inverse Gaussian Distribution

Let Z be a random variable with pdf f ( z ) . A function of Z, w ( Z ) is also a random variable with expectation

E [ w ( Z ) ] = w ( z ) f ( z ) d z

1 = w ( z ) E [ w ( Z ) ] f ( z ) d z

Thus

g ( z ) = w ( z ) E [ w ( Z ) ] f ( z ) , < x < (3.1)

is a weighted distribution. It was introduced by Fisher [9] and elaborated by Patil and Rao [10].

In this work we consider weighted distribution for the Inverse Gaussian (IG) distribution.

Now, suppose Z I G ( γ , δ ) the Inverse Gaussian distribution with parameters γ and δ and probability density function given by

f ( z ) = δ 2 π exp ( δ γ ) z 3 2 exp ( 1 2 ( δ 2 z + γ 2 z ) ) (3.2)

IG is a special case of the Generalised Inverse Gaussian (GIG) distribution.

Generalised Inverse Gaussian

The Generalised Inverse Gaussian (GIG) Distribution is based on modified Bessel function of the third kind. Modified Bessel function of the third kind of order λ evaluated at ω denoted by K λ ( ω ) is defined as

K λ ( ω ) = 1 2 0 x λ 1 e ω 2 ( x + 1 x ) d x (3.3)

with the following properties

a) K 1 2 ( ω ) = K 1 2 ( ω ) = π 2 ω e ω (3.4)

b) K 3 2 ( ω ) = K 3 2 ( ω ) = π 2 ω e ω ( 1 + 1 ω ) (3.5)

c) K 5 2 ( ω ) = K 5 2 ( ω ) = π 2 ω e ω ( 1 + 3 ω + 3 ω 2 ) (3.6)

d) K 7 2 ( ω ) = K 7 2 ( ω ) = π 2 ω e ω ( 1 + 6 ω + 15 ω 2 + 15 ω 3 ) (3.7)

e) K 9 2 ( ω ) = K 9 2 ( ω ) = π 2 ω e ω ( 1 + 10 ω + 45 ω 2 + 105 ω 3 + 105 ω 4 ) (3.8)

f) K 11 2 ( ω ) = K 11 2 ( ω ) = π 2 ω e ω ( 1 + 15 ω + 105 ω 2 + 420 ω 3 + 945 ω 4 + 945 ω 5 ) (3.9)

which are necessary in deriving the properties and estimates of the proposed models. For more definition and properties see Abramowitz and Stegun [19].

Using Parametrization ω = δ γ and transformation x = γ δ z then Formula (3.3) becomes

K λ ( δ γ ) = 1 2 0 ( γ δ ) λ z λ 1 exp { 1 2 ( δ 2 z + γ 2 z ) } d z

Hence

g ( z ) = ( γ δ ) λ z λ 1 2 K λ ( δ γ ) exp { 1 2 ( δ 2 z + γ 2 z ) } (3.10)

z > 0 ; < λ < , δ > 0 , γ > 0

which is a Generalized Inverse Gaussian (GIG) distribution with parameter λ , δ , γ .

Thus

Z G I G ( λ , δ , γ )

with

E ( Z r ) = ( δ γ ) r K λ + r ( δ γ ) K λ ( δ γ ) (3.11)

where r can be positive or negative integers.

Consider the following special weights:

Case 1:

Let

w ( Z ) = 1 + Z

E ( w ( Z ) ) = γ + δ γ

g ( z ) = γ γ + δ ( 1 + z ) f ( z ) (3.12)

which is also a finite mixture of G I G ( 1 2 , δ , γ ) and G I G ( 1 2 , δ , γ ) . That is

g ( z ) = p G I G ( 1 2 , δ , γ ) + ( 1 p ) G I G ( 1 2 , δ , γ )

with

p = γ γ + δ

The mean and variance for the weighted distribution are

E [ Z ] = δ + δ γ ( γ + δ ) γ 2 ( γ + δ ) (3.13)

and

v a r ( Z ) = 3 δ γ + 2 δ 2 γ 2 + δ γ 3 + 2 δ 2 + δ 3 γ γ 4 ( δ + γ ) 2 (3.14)

Case 2:

Let

w ( Z ) = 1 + 1 1 + δ γ 1 Z (3.15)

E [ w ( Z ) ] = 1 + δ 2 δ 2

g ( z ) = δ 2 1 + δ 2 ( 1 + 1 1 + δ γ 1 z ) f ( z ) (3.16)

which is also a finite mixture of G I G ( 1 2 , δ , γ ) and G I G ( 3 2 , δ , γ ) . That is

g ( z ) = p G I G ( 1 2 , δ , γ ) + ( 1 p ) G I G ( 3 2 , δ , γ )

with

p = δ 2 1 + δ 2

The mean and variance for the weighted distribution are

E [ Z ] = δ 2 1 + δ 2 [ δ ( 1 + δ γ ) + γ γ ( 1 + δ γ ) ] (3.17)

v a r ( Z ) = δ 2 1 + δ 2 [ δ γ 2 + δ ( 1 + δ γ ) 2 γ 3 ( 1 + δ γ ) δ 2 1 + δ 2 ( δ ( 1 + δ γ ) + γ ) 2 γ 2 ( 1 + δ γ ) 2 ] = δ 3 ( γ 2 + ( 1 + δ γ ) 2 ) ( 1 + δ 2 ) ( 1 + δ γ ) δ 4 γ ( δ ( 1 + δ γ ) + γ ) 2 γ 3 ( 1 + δ γ ) 2 ( 1 + δ 2 ) 2 (3.18)

Case 3:

Let

w ( Z ) = 1 + Z 2 1 + δ γ (3.19)

E [ w ( Z ) ] = γ 3 + δ γ 3 (3.20)

g ( z ) = γ 3 γ 3 + δ ( 1 + z 2 1 + δ γ ) f ( z ) (3.21)

which is also a finite mixture of G I G ( 1 2 , δ , γ ) and G I G ( 3 2 , δ , γ ) . That is

g ( z ) = p G I G ( 1 2 , δ , γ ) + ( 1 p ) G I G ( 3 2 , δ , γ )

with

p = γ 3 γ 3 + δ

The mean and variance for the weighted distribution are

E [ Z ] = γ 3 γ 3 + δ [ δ γ 4 ( 1 + δ γ ) + δ 3 γ 2 + 3 δ 2 γ + 3 δ γ 5 ( 1 + δ γ ) ] (3.22)

= δ γ 4 ( 1 + δ γ ) + δ 3 γ 2 + 3 δ 2 γ + 3 δ γ 2 ( 1 + δ γ ) ( γ 3 + δ ) (3.23)

v a r ( Z ) = ( δ γ 6 ( 1 + δ γ ) 2 + δ 5 γ 4 + 10 δ 4 γ 3 + 45 δ 3 γ 2 + 105 δ 2 γ + 105 δ ) ( 1 + δ γ ) ( γ 3 + δ ) γ 6 ( 1 + δ γ ) 2 ( γ 3 + δ ) (3.24)

( δ γ 4 ( 1 + δ γ ) + δ 3 γ 2 + 3 δ 2 γ + 3 δ ) 2 γ 6 ( 1 + δ γ ) 2 ( γ 3 + δ ) (3.25)

Case 4:

Let

w ( Z ) = Z + 1 1 + δ γ 1 Z (3.26)

and using Formula (3.11)

E [ w ( Z ) ] = δ 3 + γ γ δ 2

g ( z ) = γ δ 2 δ 3 + γ ( z + 1 1 + δ γ 1 z ) f ( z ) (3.27)

which is also a finite mixture of G I G ( 1 2 , δ , γ ) and G I G ( 3 2 , δ , γ ) . That is

g ( z ) = p G I G ( 1 2 , δ , γ ) + ( 1 p ) G I G ( 3 2 , δ , γ )

with

p = δ 3 δ 3 + γ

The mean and variance for the weighted distribution are

E [ Z ] = δ 3 ( 1 + δ γ ) 2 + δ 2 γ 3 γ 2 ( 1 + δ γ ) ( δ 3 + γ ) (3.28)

and

v a r ( Z ) = δ 2 ( ( δ 3 γ 2 + 3 δ 2 γ + 3 δ ) ( 1 + δ γ ) + δ γ 4 ) ( 1 + δ γ ) ( δ 3 + γ ) δ 4 ( δ ( 1 + δ γ ) 2 + γ 3 ) 2 γ 4 ( 1 + δ γ ) 2 ( δ 3 + γ ) 2 (3.29)

Case 5:

Let

w ( Z ) = Z + Z 2 1 + δ γ

E [ w ( Z ) ] = δ ( γ 2 + 1 ) γ 3

g ( z ) = γ 3 δ ( γ 2 + 1 ) [ z + z 2 1 + δ γ ] f ( z ) (3.30)

which is also a finite mixture of G I G ( 1 2 , δ , γ ) and G I G ( 3 2 , δ , γ ) . That is

g ( z ) = p G I G ( 1 2 , δ , γ ) + ( 1 p ) G I G ( 3 2 , δ , γ )

with

p = γ 2 γ 2 + 1

The mean and variance for the weighted distribution are

E [ Z ] = γ 2 ( 1 + δ γ ) 2 + δ 2 γ 2 + 3 δ γ + 3 γ 2 ( 1 + δ γ ) ( γ 2 + 1 ) (3.31)

v a r ( Z ) = ( ( δ 2 γ 2 + 3 δ γ + 3 ) ( 1 + δ γ ) γ 2 + ( δ 3 γ 3 + 6 δ 2 γ 2 + 15 δ γ + 15 ) ) ( 1 + δ γ ) ( γ 2 + 1 ) γ 4 ( 1 + δ γ ) 2 ( γ 2 + 1 ) 2 ( γ 2 ( 1 + δ γ ) 2 + δ 2 γ 2 + 3 δ γ + 3 ) 2 γ 4 ( 1 + δ γ ) 2 ( γ 2 + 1 ) 2 (3.32)

Case 6:

Let

w ( Z ) = 1 Z + Z 2

E [ w ( Z ) ] = ( γ 3 + δ 3 ) ( 1 + δ γ ) γ 3 δ 2

g ( z ) = γ 3 δ 2 ( γ 3 + δ 3 ) ( 1 + δ γ ) [ 1 z + z 2 ] f ( z ) (3.33)

which is also a finite mixture of G I G ( 3 2 , δ , γ ) and G I G ( 3 2 , δ , γ ) . That is

g ( z ) = p G I G ( 3 2 , δ , γ ) + ( 1 p ) G I G ( 3 2 , δ , γ )

with

p = γ 3 γ 3 + δ 3

The mean and variance for the weighted distribution are

E [ Z ] = δ 2 ( 1 1 + δ γ + 3 δ γ 2 ( γ 3 + δ 3 ) ) (3.34)

and

v a r ( Z ) = 3 ( 1 + δ γ ) 2 δ 3 [ 5 γ 3 + 2 δ 3 ] + δ 3 γ 9 + 2 δ 6 γ 6 + δ 9 γ 3 γ 4 ( γ 3 + δ 3 ) 2 ( 1 + δ γ ) (3.35)

4. Normal Variance-Mean Mixture

A stochastic representation of a Normal Variance-Mean mixture is given by letting Let

X = μ + β Z + Z Y

where

Y N ( 0,1 )

and Z, independent of Y, is a positive random variable.

If F(x) is a cdf of X, then

F ( x ) = p r o b { X x } = { Y x μ β z z , 0 < z < } = 0 x μ β z z ϕ ( y ) g ( z ) d y d z = 0 Φ ( x μ β z z ) g ( z ) d z

where ϕ ( ) and Φ ( ) are pdf and cdf of a standard normal distribution, respectively.

f ( x ) = 0 1 z ϕ ( x μ β z z ) g ( z ) d z = 0 1 2 π z e [ x ( μ + β z ) ] 2 2 z g ( z ) d z (4.1)

Thus we have a hierarchical representation as

X / Z = z N ( μ + β z , z ) (4.2)

being the conditional pdf and g(z) the mixing distribution.

If

Z G I G ( λ , δ , γ ) (4.3)

we obtain the Generalized Hyperbolic Distribution (GHD) introduced by Barndorff-Nielsen [6].

In general the integral formulation for constructing the Weighted Inverse Gaussian (WIG) distributions is presented as

f ( x ) = δ e δ γ e β ( x μ ) 2 π 0 w ( z ) E [ w ( Z ) ] z 2 e 1 2 { δ 2 ϕ x z + α 2 z } d z (4.4)

We now construct six (6) models based on the special cases developed above.

Model 1:

Assuming case 1 presented by Formulation (3.12) the mixed model becomes

f ( x ) = δ γ e δ γ e β ( x μ ) π ( γ + δ ) { α δ ϕ ( x ) K 1 ( α δ ϕ ( x ) ) + K 0 ( α δ ϕ ( x ) ) } (4.5)

With the following properties

E ( X ) = μ + β ( δ + δ γ ( γ + δ ) ) γ 2 ( γ + δ ) (4.6)

and

v a r ( X ) = δ 2 γ 2 ( 1 + 2 α 2 ) + δ γ 3 ( 1 + α 2 ) + α 2 δ 3 γ + β 2 δ ( 3 γ + 2 δ ) γ 4 ( δ + γ ) 2 (4.7)

Model 2:

Assuming case 2 presented by Formulation (3.16) the mixed model becomes

f ( x ) = δ e δ γ e β ( x μ ) π ( 1 + δ 2 ) ϕ ( x ) { α δ ϕ ( x ) K 1 ( α δ ϕ ( x ) ) + α 2 1 + δ γ K 2 ( α δ ϕ ( x ) ) } (4.8)

With the following properties

E ( X ) = μ + β δ 2 1 + δ 2 [ δ ( 1 + δ γ ) + γ γ ( 1 + δ γ ) ] (4.9)

and

v a r ( X ) = δ 2 1 + δ 2 [ δ ( 1 + δ γ ) + γ γ ( 1 + δ γ ) ] + β 2 [ δ 3 ( γ 2 + ( 1 + δ γ ) 2 ) ( 1 + δ 2 ) ( 1 + δ γ ) δ 4 γ ( δ ( 1 + δ γ ) + γ ) 2 ] γ 3 ( 1 + δ γ ) 2 ( 1 + δ 2 ) 2 (4.10)

Model 3:

Assuming case 3 presented by Formulation (3.21) the mixed model becomes

f ( x ) = γ 3 e δ γ e β ( x μ ) ( ϕ ( x ) ) 1 α π ( γ 3 + δ ) ( 1 + δ γ ) { α 2 ( 1 + δ γ ) + δ 2 ϕ ( x ) } K 1 ( α δ ϕ ( x ) ) (4.11)

With the following properties

E ( X ) = μ + β δ γ 4 ( 1 + δ γ ) + δ 3 γ 2 + 3 δ 2 γ + 3 δ γ 2 ( 1 + δ γ ) ( γ 3 + δ ) (4.12)

v a r ( X ) = δ γ 4 ( 1 + δ γ ) + δ 3 γ 2 + 3 δ 2 γ + 3 δ γ 2 ( 1 + δ γ ) ( γ 3 + δ ) β 2 [ ( δ γ 6 ( 1 + δ γ ) 2 + δ 5 γ 4 + 10 δ 4 γ 3 + 45 δ 3 γ 2 + 105 δ 2 γ + 105 δ ) ( 1 + δ γ ) ( γ 3 + δ ) γ 6 ( 1 + δ γ ) 2 ( γ 3 + δ ) ( δ γ 4 ( 1 + δ γ ) + δ 3 γ 2 + 3 δ 2 γ + 3 δ ) 2 γ 6 ( 1 + δ γ ) 2 ( γ 3 + δ ) ] (4.13)

Model 4:

Assuming case 4 presented by Formulation (3.27) the mixed model becomes

f ( x ) = γ δ e δ γ e β ( x μ ) π ϕ ( x ) ( δ 3 + γ ) ( 1 + δ γ ) { ( 1 + δ γ ) δ 2 ϕ ( x ) K 0 ( α δ ϕ ( x ) ) + α 2 K 2 ( α δ ϕ ( x ) ) } (4.14)

With the following properties

E ( X ) = μ + β δ 3 ( 1 + δ γ ) 2 + δ 2 γ 3 γ 2 ( 1 + δ γ ) ( δ 3 + γ ) (4.15)

v a r ( X ) = δ 3 ( 1 + δ γ ) 2 + δ 2 γ 3 γ 2 ( 1 + δ γ ) ( δ 3 + γ ) β 2 δ 2 ( ( δ 3 γ 2 + 3 δ 2 γ + 3 δ ) ( 1 + δ γ ) + δ γ 4 ) ( 1 + δ γ ) ( δ 3 + γ ) δ 4 ( δ ( 1 + δ γ ) 2 + γ 3 ) 2 γ 4 ( 1 + δ γ ) 2 ( δ 3 + γ ) 2 (4.16)

Model 5:

Assuming case 5 presented by Formulation (3.30) the mixed model becomes

f ( x ) = γ 3 e δ γ e β ( x μ ) α π ( 1 + δ γ ) ( γ 2 + 1 ) { α ( 1 + δ γ ) K 0 ( α δ ϕ ( x ) ) + δ ϕ ( x ) K 1 ( α δ ϕ ( x ) ) } (4.17)

With the following properties

E ( X ) = μ + β γ 2 ( 1 + δ γ ) 2 + δ 2 γ 2 + 3 δ γ + 3 γ 2 ( 1 + δ γ ) ( γ 2 + 1 ) (4.18)

v a r ( X ) = γ 2 ( 1 + δ γ ) 2 + δ 2 γ 2 + 3 δ γ + 3 γ 2 ( 1 + δ γ ) ( γ 2 + 1 ) β 2 ( ( δ 2 γ 2 + 3 δ γ + 3 ) ( 1 + δ γ ) γ 2 + ( δ 3 γ 3 + 6 δ 2 γ 2 + 15 δ γ + 15 ) ) ( 1 + δ γ ) ( γ 2 + 1 ) γ 4 ( 1 + δ γ ) 2 ( γ 2 + 1 ) 2 ( γ 2 ( 1 + δ γ ) 2 + δ 2 γ 2 + 3 δ γ + 3 ) 2 γ 4 ( 1 + δ γ ) 2 ( γ 2 + 1 ) 2 (4.19)

Model 6:

Assuming case 6 presented by Formulation (3.33) the mixed model becomes

f ( x ) = δ γ 3 e δ γ e β ( x μ ) α π ϕ ( x ) ( δ 3 + γ 3 ) ( 1 + δ γ ) { α 3 K 2 ( α δ ϕ ( x ) ) + ( δ ϕ ( x ) ) 3 K 1 ( α δ ϕ ( x ) ) } (4.20)

With the following properties

E ( X ) = μ + β E ( Z ) = μ + β δ 2 ( 1 1 + δ γ + 3 δ γ 2 ( γ 3 + δ 3 ) ) (4.21)

v a r ( X ) = E ( Z ) + β 2 V a r ( Z ) (4.22)

= δ 2 γ 9 ( α 2 δ + γ ) + 3 ( 1 + δ γ ) 2 δ 3 [ α 2 ( γ 3 + δ 3 ) + β 2 ( 4 γ 3 + δ 3 ) ] + ( 2 γ 3 + δ 3 ) δ 5 γ 2 [ α 2 ( 1 + δ γ ) β 2 ] γ 4 ( γ 3 + δ 3 ) 2 ( 1 + δ γ ) 2 (4.23)

5. Maximum Likelihood Estimation via Expectation-Maximization (EM) Algorithm

EM algorithm is a powerful technique for maximum likelihood estimation for data containing missing values or data that can be considered as containing missing values. It was introduced by Dempster et al. [8].

Assume that the true data are made of an observed part X and unobserved part Z. This then ensures the log likelihood of the complete data ( x i , z i ) for i = 1 , 2 , 3 , , n factorizes into two parts (Kostas, [20] ). i.e.,

log L = log i = 1 n f ( x i / z i ) + log i = 1 n g ( z i ) = i = 1 n log f ( x i / z i ) + i = 1 n log g ( z i )

where

l 1 = i = 1 n log f ( x i / z i )

and

l 2 = i = 1 n log g ( z i )

Karlis [21] applied EM algorithm to mixtures which he considered to consist of two parts; the conditional pdf is for observed data and the mixing distribution is based on an unobserved data, the missing values.

5.1. M-Step for Conditional pdf

Since the conditional distribution for the six models is normal distribution as presented in Formula (4.2), we have

l 1 = n 2 log ( 2 π ) 1 2 i = 1 n log z i i = 1 n ( x i μ β z i ) 2 2 z i

Therefore

β l 1 = 0 i = 1 n ( x i μ ^ β ^ z i ) = 0

i.e., i = 1 n x i n μ ^ β ^ i = 1 n z i = 0

μ ^ = x ¯ β ^ z ¯

where x ¯ = i = 1 n x i n and z ¯ = i = 1 n z i n .

Similarly,

μ l 1 = 0 i = 1 n x i z i μ ^ i = 1 n 1 z i n β ^ = 0

i = 1 n x i z i x ¯ i = 1 n 1 z i + β ^ z ¯ i = 1 n 1 z i n β ^ = 0

β ^ = i = 1 n x i z i x ¯ i = 1 n 1 z i n z ¯ i = 1 n 1 z i

5.2. For Model 1

Consider Formula (3.12)

5.2.1. M-Step

l 2 = n δ γ + n log δ γ n 2 log ( 2 π ) n log ( γ + δ ) + i = 1 n log ( 1 + z i ) 3 2 i = 1 n log ( z i ) δ 2 2 i = 1 n 1 z i γ 2 2 i = 1 n z i (5.1)

Maximizing with respect to δ and γ we have the following representation

δ ^ = γ ^ [ 1 + 1 δ ^ ( γ ^ + δ ^ ) ] 1 n i = 1 n 1 z i (5.2)

γ ^ = δ ^ [ 1 + 1 γ ^ ( γ ^ + δ ^ ) ] 1 n i = 1 n z i (5.3)

5.2.2. E-Step

Since the Values of random variables Z i and 1 Z i are unknown, we estimate them by considering posterior expectations E ( Z i / X i ) and E ( 1 Z i ) .

In general,

E ( Z / X ) = 0 z f ( x / z ) g ( z ) d z 0 f ( x / z ) g ( z ) d z

E ( 1 Z / X ) = 0 1 z f ( x / z ) g ( z ) d z 0 f ( x / z ) g ( z ) d z

In general let

s i = E ( Z i / X i )

and

w i = E ( 1 Z i / X i )

Then the k-th iterations are as follows:

s i ( k ) = α ( k ) δ ( k ) ϕ ( k ) ( x i ) K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( δ ( k ) ) 2 ϕ ( k ) ( x i ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) α ( k ) δ ( k ) ϕ ( k ) ( x i ) K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( α ( k ) ) 2 K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) )

and

w i ( k ) = α ( k ) δ ( k ) ϕ ( k ) ( x ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( α ( k ) ) 2 K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) α ( k ) δ ( k ) ϕ ( k ) ( x i ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( δ ( k ) ) 2 ϕ ( k ) ( x i ) K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) )

We therefore have the following iterative scheme

δ ^ ( k + 1 ) = γ ^ ( k ) [ 1 + 1 δ ^ ( k ) ( γ ^ ( k ) + δ ^ ( k ) ) ] w ¯ ( k ) (5.4)

γ ^ ( k + 1 ) = δ ^ ( k + 1 ) [ 1 + 1 γ ^ ( k ) ( γ ^ ( k ) + δ ^ ( k + 1 ) ) ] s ¯ ( k ) (5.5)

β ^ ( k + 1 ) = i = 1 n ( x i x ¯ ) w i ( k ) n ( 1 s ¯ ( k ) w ¯ ( k ) ) (5.6)

μ ^ ( k + 1 ) = x ¯ β ^ ( k + 1 ) s ¯ ( k ) (5.7)

α ^ ( k + 1 ) = [ ( γ ^ ( k + 1 ) ) 2 + ( β ^ ( k + 1 ) ) 2 ] 1 2 (5.8)

where

w ¯ = i = 1 n w i n

and

s ¯ = i = 1 n s i n

and the k-th iteration is given as

l ( k ) = n log δ ( k ) γ ( k ) + n δ ( k ) γ ( k ) + n β ( k ) x ¯ n β ( k ) μ ( k ) n log ( π ( γ ( k ) + δ ( k ) ) ) + i = 1 n log [ α ( k ) δ ( k ) ϕ ( k ) ( x i ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) ]

Note that the iterative for β and μ is the same for all the models considered below. The posterior estimates differentiate the values obtained.

5.3. For Model 2

Consider Formula (3.16)

5.3.1. M-Step

l 2 = 3 n log δ + n δ γ n log ( 1 + δ 2 ) n 2 log ( 2 π ) 3 2 i = 1 n log z i + i = 1 n log ( 1 + 1 1 + δ γ 1 z i ) γ 2 2 i = 1 n z i δ 2 2 i = 1 n 1 z i (5.9)

Maximizing with respect to δ and γ we have the following representation

γ ^ = δ [ n 1 1 + δ γ i = 1 n 1 1 + ( 1 + δ γ ) z i ] i = 1 n z i (5.10)

δ = 3 n + n δ 2 δ ( 1 + δ 2 ) + γ [ n 1 1 + δ γ i = 1 n 1 1 + ( 1 + δ γ ) z i ] i = 1 n 1 z i (5.11)

5.3.2. E-Step

The posterior expectations for the k-th iteration are:

s i ( k ) = α ( k ) δ ( k ) ϕ ( k ) ( x i ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( 1 + δ ( k ) γ ( k ) ) δ 2 ϕ ( k ) ( x i ) K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) ( 1 + δ ( k ) γ ( k ) ) α ( k ) δ ( k ) ϕ ( k ) ( x i ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( α ( k ) ) 2 K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) )

w i ( k ) = ( 1 + δ ( k ) γ ( k ) ) α ( k ) δ ( k ) ϕ ( k ) ( x i ) K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( α ( k ) ) 2 K 3 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) α ( k ) δ ( k ) ϕ ( k ) ( x i ) K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( 1 + δ ( k ) γ ( k ) ) ( δ ( k ) ) 2 ϕ ( k ) ( x i ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) )

We therefore have the following iterative scheme

γ ^ ( k + 1 ) = δ ( k ) [ n 1 1 + δ ( k ) γ ( k ) i = 1 n 1 1 + ( 1 + δ ( k ) γ ( k ) ) s i ( k ) ] i = 1 n s i ( k ) (5.12)

δ ^ ( k + 1 ) = 3 n + n ( δ ( k ) ) 2 δ ( k ) ( 1 + ( δ ( k ) ) 2 ) + γ ( k + 1 ) [ n 1 1 + δ ( k ) γ ( k + 1 ) i = 1 n 1 1 + ( 1 + δ ( k ) γ ( k + 1 ) ) s i ( k ) ] i = 1 n w i ( k ) (5.13)

and the k-th iteration for the log likelihood is given by

l ( k ) = n log δ ( k ) + n δ ( k ) γ ( k ) + β i = 1 n x i n β ( k ) μ ( k ) n log ( π ( 1 + ( δ ( k ) ) 2 ) ) i = 1 n log ( ϕ ( k ) ( x i ) ) + i = 1 n log { α ( k ) δ ( k ) ϕ ( k ) ( x ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + ( α ( k ) ) 2 1 + δ ( k ) γ ( k ) K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) } (5.14)

5.4. For Model 3

5.4.1. M-Step for the Mixing Distribution

Consider Formula (3.21)

l 2 = n log δ + 3 n log γ + n δ γ n 2 log ( 2 π ) n log ( 1 + δ γ ) n log ( γ 3 + δ ) + i = 1 n log ( 1 + δ γ + z i 2 ) 3 2 i = 1 n log z i δ 2 2 i = 1 n 1 z i γ 2 2 i = 1 n z i (5.15)

Maximizing with respect to δ and γ we have the following representation

( n δ 2 1 + δ γ i = 1 n z i ) γ 2 + i = 1 n δ γ 1 + δ γ + z i 2 + 3 n δ γ 3 + δ = 0 (5.16)

( n γ 2 1 + δ γ i = 1 n 1 z i ) δ 2 + i = 1 n δ γ 1 + δ γ + z i 2 + n γ 3 γ 3 + δ = 0 (5.17)

Both equations are quadratic in γ and δ respectively.

5.4.2. E-Step

The posterior expectations for the k-th iteration are:

s i ( k ) = α ( k ) δ ( k ) ( 1 + δ ( k ) γ ( k ) ) ϕ ( k ) ( x ) K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + ( δ ( k ) ) 3 ( ϕ ( k ) ( x ) ) 3 2 α ( k ) K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) [ ( α ( k ) ) 2 ( 1 + δ ( k ) γ ( k ) ) + ( δ ( k ) ) 2 ϕ ( k ) ( x ) ] K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) )

w i ( k ) = ( α ( k ) ) 3 ( 1 + δ ( k ) γ ( k ) ) K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + α ( k ) ( δ ( k ) ) 2 ϕ ( k ) ( x ) K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) [ ( α ( k ) ) 2 δ ( k ) ( 1 + δ ( k ) γ ( k ) ) ϕ ( k ) ( x ) + ( δ ( k ) ) 3 ( ϕ ( k ) ( x ) ) 3 2 ] K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) )

v i ( k ) = ( 1 + δ ( k ) γ ( k ) ) ( α ( k ) ) 2 ( δ ( k ) ) 2 ϕ ( k ) ( x i ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + ( δ ) 4 ( ϕ ( k ) ( x i ) ) 2 K 3 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) [ 1 + δ ( k ) γ ( k ) ( α ( k ) ) 4 + ( α ( k ) ) 2 ( δ ( k ) ) 2 ϕ ( k ) ( x i ) ] K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) )

Now, define the iterative scheme as follows:

let

a 1 ( k + 1 ) = ( n ( δ ( k ) ) 2 1 + δ ( k ) γ ( k ) i = 1 n s i ( k ) )

b 1 ( k + 1 ) = i = 1 n δ ( k ) 1 + δ ( k ) γ ( k ) + v i ( k )

c 1 ( k + 1 ) = 3 n δ ( k ) ( γ ( k ) ) 3 + δ ( k )

let

t ( k + 1 ) = b 1 ( k + 1 ) ( b 1 ( k + 1 ) ) 2 4 a 1 ( k + 1 ) c 1 ( k + 1 ) 2 a 1 ( k + 1 ) (5.18)

using the square root transformation, we have

γ k + 1 = t ( k + 1 ) (5.19)

Similarly, define

a 2 ( k + 1 ) = ( n ( γ ( k + 1 ) ) 2 1 + δ ( k ) γ ( k + 1 ) i = 1 n w i ( k ) ) (5.20)

b 2 ( k + 1 ) = i = 1 n δ ( k ) γ ( k + 1 ) 1 + δ ( k ) γ ( k + 1 ) + v i ( k ) (5.21)

c 2 ( k + 1 ) = n ( γ ( k + 1 ) ) 3 ( γ ( k + 1 ) ) 3 + δ ( k ) (5.22)

let

s ( k + 1 ) = b 2 ( k + 1 ) ( b 2 ( k + 1 ) ) 2 4 a 2 ( k + 1 ) c 2 ( k + 1 ) 2 a 2 ( k + 1 ) (5.23)

using the square root transformation, we have

δ k + 1 = s ( k + 1 ) (5.24)

and the k-th iteration for the loglikelihood is given by

l ( k ) = 3 n log γ ( k ) + n δ ( k ) γ ( k ) + β ( k ) i = 1 n ( x i μ ( k ) ) n log α ( k ) π n log ( 1 + δ ( k ) γ ( k ) ) n log ( ( γ ( k ) ) 3 + δ ( k ) ) 1 2 i = 1 n log ϕ ( k ) ( x i ) + i = 1 n log [ ( α ( k ) ) 2 ( 1 + δ ( k ) γ ( k ) ) + ( δ ( k ) ) 2 ϕ ( k ) ( x i ) ] + i = 1 n log K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) (5.25)

5.5. For Model 4

5.5.1. M-Step for the Mixing Distribution

Consider Formula (3.27)

l 2 = n log γ + 3 n log δ + n δ γ n 2 log ( 2 π ) n log ( δ 3 + γ ) n log ( 1 + δ γ ) 3 2 i = 1 n log ( z i ) δ 2 2 i = 1 n 1 z i γ 2 2 i = 1 n z i + i = 1 n log ( ( 1 + δ γ ) z i + 1 z i ) (5.26)

Maximizing with respect to δ and γ we have the following representation

n δ 3 γ ( δ 3 + γ ) + n γ δ 2 1 + δ γ γ i = 1 n z i + i = 1 n δ z i 2 1 + ( 1 + δ γ ) z i 2 = 0 (5.27)

3 n γ δ ( δ 3 + γ ) + n δ γ 2 1 + δ γ δ i = 1 n 1 z i + i = 1 n γ z i 2 1 + ( 1 + δ γ ) z i 2 = 0 (5.28)

5.5.2. E-Step

The posterior expectations for the k-th iteration are:

s i ( k ) = [ ( 1 + δ ( k ) γ ( k ) ) ( δ ( k ) ) 3 ( ϕ ( k ) ( x i ) ) 3 + ( α ( k ) ) 2 δ ( k ) ϕ ( k ) ( x i ) ] K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) α ( k ) ( 1 + δ ( k ) γ ( k ) ) ( δ ( k ) ) 2 ϕ ( k ) ( x i ) K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( α ( k ) ) 3 K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) (5.29)

w i ( k ) = α ( k ) ( δ ( k ) ) 2 ( 1 + δ ( k ) γ ( k ) ) ϕ ( k ) ( x i ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( α ( k ) ) 3 K 3 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) ( 1 + δ ( k ) γ ( k ) ) ( δ ( k ) ϕ ( k ) ( x i ) ) 3 K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( α ( k ) ) 2 δ ( k ) ϕ ( k ) ( x ) K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) (5.30)

v i ( k ) = ( 1 + δ ( k ) γ ( k ) ) ( δ ( k ) ) 2 ϕ ( k ) ( x i ) K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( α ( k ) ) 2 K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) ( α ( k ) ) 2 ( 1 + δ ( k ) γ ( k ) ) K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( α ( k ) ) 2 K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) (5.31)

From Equations (5.27) and (5.28), we obtain the following iterative scheme

γ ( k + 1 ) = n ( δ ( k ) ) 3 γ ( k ) ( ( δ ( k ) ) 3 + γ ( k ) ) + n γ ( k ) ( δ ( k ) ) 2 1 + δ ( k ) γ ( k ) + i = 1 ( k ) δ ( k ) z i 2 1 + ( 1 + δ ( k ) γ ( k ) ) z i 2 i = 1 n s i ( k ) (5.32)

δ ( k + 1 ) = 3 n γ ( k + 1 ) δ ( k ) ( ( δ ( k ) ) 3 + γ ( k + 1 ) ) + n δ ( k ) ( γ ( k + 1 ) ) 2 1 + δ ( k ) γ ( k + 1 ) + i = 1 n γ ( k + 1 ) z i 2 1 + ( 1 + δ ( k ) γ ( k + 1 ) ) z i 2 i = 1 n w i k (5.33)

and the k-th iteration for the loglikelihood is given by

l ( k ) = n log ( δ ( k ) γ ( k ) ) + n δ ( k ) γ ( k ) + β ( k ) i = 1 n x i n β ( k ) μ ( k ) n log ( ( 1 + δ ( k ) γ ( k ) ) π ( ( δ ( k ) ) 3 + γ ( k ) ) ) i = 1 n log ϕ ( k ) ( x i )

+ i = 1 n log { ( 1 + δ ( k ) γ ( k ) ) ( δ ( k ) ) 2 ϕ ( k ) ( x ) K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + ( α ( k ) ) 2 K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) } (5.34)

5.6. For Model 5

5.6.1. M-Step for the Mixing Distribution

Consider Formula (3.30)

l 2 = 3 n log γ + n δ γ n 2 log ( 2 π ) n log ( γ 2 + 1 ) n log ( 1 + δ γ ) 1 2 i = 1 n log z i + i = 1 n [ log ( 1 + δ γ + z i ) ] γ 2 2 i = 1 n z i δ 2 2 i = 1 n 1 z i (5.35)

Maximizing with respect to δ and γ we have the following representation

3 n + 2 n δ γ γ ( 1 + δ ) γ ( 2 n γ 2 + 1 + i = 1 n z i ) + δ ( n + i = 1 n 1 1 + δ γ + z i ) = 0 (5.36)

n δ γ 2 1 + δ γ + γ i = 1 n 1 1 + δ γ + z i δ i = 1 n 1 z i = 0 (5.37)

5.6.2. E-Step

The posterior expectations for the k-th iteration are:

s i ( k + 1 ) = α ( k ) δ ( k ) ϕ ( k ) ( x ) ( 1 + δ ( k ) γ ( k ) ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + δ ( k ) ϕ ( k ) ( x ) ( 1 + δ ( k ) γ ( k ) ) K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) ( α ( k ) ) 2 ( 1 + δ ( k ) γ ( k ) ) K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + α ( k ) δ ( k ) ϕ ( k ) ( x ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) )

w i ( k + 1 ) = ( α ( k ) ) 2 ( 1 + δ ( k ) γ ( k ) ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + α ( k ) δ ( k ) ϕ ( k ) ( x ) K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) α ( k ) δ ( k ) ϕ ( k ) ( x ) K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + ( δ ( k ) ) 2 ϕ ( k ) ( x ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) )

v i ( k + 1 ) = α ( k ) ( 1 + δ ( k ) γ ( k ) ) ( δ ( k ) ) 2 ϕ ( k ) ( x ) K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + ( δ ( k ) ϕ ( k ) ( x ) ) 3 K 3 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) ( α ( k ) ) 3 ( 1 + δ ( k ) γ ( k ) ) K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + ( α ( k ) ) 2 δ ( k ) ϕ ( k ) ( x ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) )

These can be used to obtain the (k + 1)-th values as follows

γ ^ ( k + 1 ) = 3 + 2 δ ( k ) γ ( k ) γ ( k ) ( 1 + δ ( k ) γ ( k ) ) + δ ( k ) [ 1 + 1 n i = 1 n 1 1 + δ ( k ) γ ( k ) + s i ( k ) ] 2 ( γ ( k ) ) 2 + 1 + s ¯ ( k ) (5.38)

δ ^ ( k + 1 ) = n δ ( k ) ( γ ( k + 1 ) ) 2 1 + δ ( k ) γ ( k + 1 ) + γ ( k + 1 ) i = 1 n 1 1 + δ ( k ) γ ( k + 1 ) + s i n w ¯ ( k ) (5.39)

The (k + 1)-th iteration of the log-likelihood function becomes

l ( k + 1 ) = 3 n log γ ( k + 1 ) + n δ ( k + 1 ) γ ( k + 1 ) + β ( k + 1 ) i = 1 n x i n β ( k + 1 ) μ ( k + 1 ) n log ( α ( k + 1 ) π ( ( γ ( k + 1 ) ) 2 + 1 ) ( 1 + δ ( k + 1 ) γ ( k + 1 ) ) ) + i = 1 n log [ α ( k + 1 ) ( 1 + δ ( k + 1 ) γ ( k + 1 ) ) K 0 ( α ( k + 1 ) δ ( k + 1 ) ϕ ( k + 1 ) ( x ) ) + δ ( k + 1 ) ϕ ( k + 1 ) ( x ) K 1 ( α ( k + 1 ) δ ( k + 1 ) ϕ ( k + 1 ) ( x ) ) ] (5.40)

5.7. For Model 6

5.7.1. M-Step for the Mixing Distribution

Consider Formula (3.33)

l 2 = 3 n log δ + 3 n log γ + n δ γ n 2 log ( 2 π ) n log ( ( γ 3 + δ 3 ) ( 1 + δ γ ) ) + i = 1 n log ( 1 z i + z i 2 ) 3 2 i = 1 n log ( z i ) γ 2 2 i = 1 n z i δ 2 2 i = 1 n 1 z i (5.41)

Maximizing with respect to δ and γ we have the following representation

γ ^ = 3 δ 3 ( 1 + δ γ ) γ ( γ 3 + δ 3 ) ( ( 1 + δ γ ) z ¯ δ 2 ) (5.42)

δ ^ = 3 γ 3 ( 1 + δ γ ) δ ( γ 3 + δ 3 ) [ ( 1 + δ γ ) w ¯ γ 2 ] (5.43)

5.7.2. E-Step

The posterior expectations for the k-th iteration are:

s i ( k ) = ( α ( k ) ) 3 δ ( k ) ϕ ( k ) ( x ) K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + ( ( δ ( k ) ) 2 ϕ ( k ) ( x ) ) 2 K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) ( α ( k ) ) 4 K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + α ( k ) ( δ ( k ) ) 3 ( ϕ ( k ) ( x ) ) 3 2 K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) (5.44)

w i ( k ) = ( α ( k ) ) 4 K 3 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + α ( k ) ( δ ( k ) ϕ ( k ) ( x ) ) 3 K 0 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) δ ( k ) ( α ( k ) ) 3 ϕ ( k ) ( x ) K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) + ( ( δ ( k ) ) 2 ϕ ( k ) ( x ) ) 2 K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x ) ) (5.45)

From Equations (5.42) and (5.43), we obtain the following iterative scheme

γ ^ ( k + 1 ) = 3 ( δ ( k ) ) 3 ( 1 + δ ( k ) γ ( k ) ) γ ( k ) ( ( γ ( k ) ) 3 + ( δ ( k ) ) 3 ) ( ( 1 + δ ( k ) γ ( k ) ) s ¯ ( k ) ( δ ( k ) ) 2 ) (5.46)

δ ^ ( k + 1 ) = 3 ( γ ( k ) ) 3 ( 1 + δ ( k ) γ ( k ) ) δ ( k ) ( ( γ ( k ) ) 3 + ( δ ( k ) ) 3 ) [ ( 1 + δ ( k ) γ ( k ) ) w ¯ ( k ) ( γ ( k ) ) 2 ] (5.47)

and the k-th iteration for the loglikelihood is given by

l ( k ) = 3 n log γ ( k ) + n log δ ( k ) + n δ ( k ) γ ( k ) + β ( k ) i = 1 n ( x i μ ( k ) ) n log ( α ( k ) π ( ( δ ( k ) ) 3 + ( γ ( k ) ) 3 ) ( 1 + δ ( k ) γ ( k ) ) ) i = 1 n log ϕ ( k ) ( x i ) + i = 1 n log { ( α ( k ) ) 3 K 2 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) + ( δ ( k ) ϕ ( k ) ( x i ) ) 3 K 1 ( α ( k ) δ ( k ) ϕ ( k ) ( x i ) ) } (5.48)

Remarks:

1) For all the proposed models, the β , μ and α parameters of the conditional distribution are updated as follows:

β ^ ( k + 1 ) = i = 1 n ( x i x ¯ ) w i ( k ) n 1 n i = 1 n s i ( k ) i = 1 n w i ( k )

μ ^ ( k + 1 ) = x ¯ β ^ ( k + 1 ) i = 1 n s i ( k ) n

α ^ ( k + 1 ) = [ ( γ ( k + 1 ) ) 2 + ( β ( k + 1 ) ) 2 ] 1 2

2) The stopping criterion is when

l ( k ) l ( k 1 ) l ( k ) < t o l

where tol is the tolerance level chosen; e.g 10−6.

3) Initial values used are moment estimates of NIG distribution as proposed by Karlis (2002).

6. Application

6.1. Fitting of the Proposed Models

The data used in this research is the Shares of Chevron (CVX) weekly returns for the period 3/01/2000 to 1/07/2013 with 702 observations. The histogram for the weekly log-returns in shows that the data is negatively skewed and exhibiting heavy tails. The Q-Q plot shows that the normal distribution is not a good fit for the data especially at the tails.

Table 1 provides descriptive statistics for the return series in consideration. We observe that the excess kurtosis of 2.768252 indicates the leptokurtic behaviour of the returns. The log-returns have a distribution with relatively heavier tails than the normal distribution. We observe skewness of −0.1886714 which indicates that the two tails of the returns behave slightly differently.

Table 1. Summary Statistics for CVX weekly log-returns.

The proposed models are now fitted to CVX weekly log-returns. Using the sample estimates and the NIG estimators to the RRC data we obtain the following estimates as initial values for the EM algorithm Karlis (2002).

α ^ = 0.4190067 , β ^ = 0.1054991 , δ ^ = 0.8324058 , μ ^ = 0.3036691 .

The initial values were used in all the proposed models to obtain the maximum likelihood estimates as shown in Table 2 below

The parameter estimates from Table 2 are now fitted to RRC weekly log-returns. Figures 1-6 show the histogram and Q-Q plots of the RRC returns fitted with the proposed models.

Figures 1-6 show that the proposed model fit the data well.

Table 2. Estimates for the proposed models.

Figure 1. Fitting Model 1 to CVX weekly log returns.

Figure 2. Fitting Model 2 to CVX log-weekly returns.

Figure 3. Fitting Model 3 to CVX log-weekly returns.

Figure 4. Fitting Model 4 to CVX log-weekly returns.

Figure 5. Fitting Model 5 to CVX log-weekly returns.

Figure 6. Fitting Model 6 to CVX log-weekly returns.

Table 3 presents values of Alkaike Information Criterion (AIC), Bayesian Information Creterion (BIC) and Log-likelihood. The values illustrate that the models are alternative to each other.

Remark:

Model 6 has the lowest AIC and BIC with the highest log-likelihood. It is the best fit for the data. Using Karlis [21] formulation, the Normal Inverse Gaussian (NIG) parameter estimates for the EM-algorithm are: α ^ = 0.9265284 , δ ^ = 0.9265284 , β ^ = 0.2429664 , μ ^ = 0.5578459 . The loglikelihood = 1221.667 at t o l = 10 8 with 119 iterations. Therefore a finite mixture of G I G ( 3 2 , δ , γ ) and G I G ( 3 2 , δ , γ ) is versatile compared to Inverse Gaussian (IG) distribution.

6.2. Risk Estimation and Backtesting

We use the parameter estimates for our proposed model to determine the VaR and ES at levels α { 0.001 , 0.01 , 0.05 } as given in Table 4 and Table 5 respectively. The three level are used to measure the risk of long position, We apply the Kupiec Likelihood Ratio (LR) test (Kupiec, [14] ) which test the hypothesis that the expected proposition of violations is equal to α . The method consist of calculating τ ( α ) the number of times the observed returns, x t falls below the V a R α estimates at level α as given in Table 6; i.e., x t < V a R α , and compare the corresponding failure rate to α .

The likelihood ratio statistic is given by

2 log ( τ ( α ) n ) τ ( α ) ( 1 τ ( α ) n ) n τ ( α ) 2 log ( α τ ( α ) ( 1 α ) n τ ( α ) ) (6.1)

where τ ( α ) is the number of violations. Under the null hypothesis this statistic is distributed as χ 2 distribution with one degree of freedom.

Model 3 has the highest VaR and ES value indicating that it perform well than the other models at the tails. Table 7 gives the P-value for the Kupiec Test for Each Distribution.

Remark: At 5 percent level of significant, the Normal distribution is rejected at levels at the level 0.001. The Normal weighted Inverse Gaussian distributions were all effective and well specified on all levels of VaR.

Table 3. AIC, BIC and Log-likelihood values.

Table 4. VaR values of CVX log-returns based on normal and proposed models.

Table 5. ES values of RRC log-returns based on normal and proposed models.

Table 6. Number of violations of VaR for each distribution at different levels.

Table 7. P-value for the Kupiec test for each distribution at different levels.

7. Conclusions

In this work we constructed a class of weighted inverse Gaussian Distribution by considering a finite mixture of two special cases of Generalized Inverse Gaussian distribution. We considered the cases when the indexes are 1 2 , 1 2 , 3 2 and 3 2 . These special cases are also weighted inverse Gaussian distributions.

We further used the class as mixing distributions to construct the Normal Variance-Mean Mixtures. The parameter estimates were obtained using the Expectation Maximization (EM) algorithm. We obtained a monotonic convergence for the iterative schemes of the models using the method of moments estimates of NIG as initial values.

We used AIC, BIC and loglikelihood for model selection. The model with the mixing distribution based on a finite mixture for G I G ( 3 2 , δ , γ ) and G I G ( 3 2 , δ , γ ) was found to be the best model. The results show that the six models are sufficient for VaR computation.

Further work can be done on Normal Mixtures when the mixing distributions are Finite mixtures of higher indexes.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Zhang, Y. and Nadarajah, S. (2017) A Review of Backtesting for Values at Risk. Communications in Statistics—Theory and Methods, 47, 3616-3639.
https://doi.org/10.1080/03610926.2017.1361984
[2] Wimmerstedt, L. (2015) Backtesting Expected Shortfall: The Design and Implementation of Different Backtest. Master’s Thesis in Mathematical Statistics, Royal Institute of Technology, Stockholm.
[3] Artzner, P., Delbaen, F., Eber, J.-M. and Heath, D. (1997) Thinking Coherently. Risk Journal, 10, 68-71.
[4] Artzner, P., Delbaen, F., Eber, J.-M. and Heath, D. (1999) Coherent Measures of Risk. Mathematical Finance, 9, 203-228. https://doi.org/10.1111/1467-9965.00068
[5] Nadarajah, S., Zhang, B. and Chan, S. (2014) Estimation Methods for Expected Shortfall. Quantitative Finance, 14, 271-291.
https://doi.org/10.1080/14697688.2013.816767
[6] Barndorff-Nielsen, O.E. (1977) Exponentially Decreasing Distributions for the Logarithm of Particle Size. Proceedings of the Royal Society of London. Series A, 353, 409-419. https://doi.org/10.1098/rspa.1977.0041
[7] Barndorff-Nielsen, O.E. (1997) Normal Inverse Gaussian Distribution and Stochastic Volatility Modelling. Scandinavian Journal of Statistics, 24, 1-13.
https://doi.org/10.1111/1467-9469.00045
[8] Dempster, A.P., Laird, N.M. and Rubin, D. (1977) Maximum Likelihood from Incomplete Data Using the EM Algorithm. Journal of the Royal Statistical Society: Series B, 39, 1-38. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
[9] Fisher, R.A. (1934) The Effect of Methods of Ascertainment upon the Estimation of Frequencies. The Annals of Human Genetics, 6, 13-25.
https://doi.org/10.1111/j.1469-1809.1934.tb02105.x
[10] Patil, G.P. and Rao, C.R. (1978) Weighted Distributions and Size-Biased Sampling with Applications to Wildlife Populations and Human Families. Biometrics, 34, 179-189. https://doi.org/10.2307/2530008
[11] Akman, H.O. and Gupta, R.C. (1992) A Comparison of Various Estimators of the Mean of an Inverse Gaussian Distribution. Journal of Statistical Computation and Simulation, 40, 71-81. https://doi.org/10.1080/00949659208811366
[12] Gupta, R.C. and Akman, O. (1995) On Reliability Studies of a Weighted Inverse Gaussian Distribution Model. Statistical Papers, 38, 445-452.
https://doi.org/10.1007/BF02925999
[13] Gupta, R.C. and Kundu, D. (2011) Weighted Inverse Gaussian—A Versatile Lifetime Model. Journal of Applied Statistics, 38, 2695-2708.
https://doi.org/10.1080/02664763.2011.567251
[14] Kupiec, P. (1995) Techniques for Verifying the Accuracy of Risk Measurement Models. The Journal of Derivatives Winter, 3, 73-84.
https://doi.org/10.3905/jod.1995.407942
[15] Rappoport, P. (1993) A New Approach: Average Shortfall. Technical Report, J.P. Morgan.
[16] Yamai, Y. and Yoshiba, T. (2002) On the Validity of Value-at-Risk Comparative Analyses with Expected Shortfall. Monetary and Economic Studies, 20, 57-85.
[17] Zhang, Y., Chu, J., Chan, S. and Chan, B. (2019) The Generalised Hyperbolic Distribution and Its Subclass in Analysis of a New Era of Cryptocurrencies: Ethereum and Its Financial Risk. Physica A, 526, 1-12.
https://doi.org/10.1016/j.physa.2019.04.136
[18] Bams, D. and Wilhouwer, J.L. (2001) Empirical Issues in Value at Risk. ASTIN Bulletin, 31, 299-315. https://doi.org/10.2143/AST.31.2.1007
[19] Abramowitz, M. and Stegun, I.A. (1972) Handbook of Mathematical Function. Dover, New York.
[20] Kostas, F. (2007) Tests of Fit for Symmetric Variance Gamma Distributions. UADPhilEcon, National and Kapodistrian University of Athens, Athens.
[21] Karlis, D. (2002) An EM Type Algorithm for Maximum Likelihood Estimation of the Normal-Inverse Gaussian Distribution. Statistics and Probability Letters, 57, 43-52. https://doi.org/10.1016/S0167-7152(02)00040-8

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.