Estimation of Aggregate Losses of Secondary Cancer Using PH-OPPL and PH-TPPL Distributions

Cynthia Mwende; Patrick Weke; Davis Bundi; Joseph Ottieno

doi:10.4236/ojs.2021.115049

Open Journal of Statistics > Vol.11 No.5, October 2021

Estimation of Aggregate Losses of Secondary Cancer Using PH-OPPL and PH-TPPL Distributions

Cynthia Mwende^1*, Patrick Weke², Davis Bundi², Joseph Ottieno²
¹Department of Mathematics and Actuarial Science, Kisii University, Kisii, Kenya.
²School of Mathematics, University of Nairobi, Nairobi, Kenya.
DOI: 10.4236/ojs.2021.115049 PDF HTML XML 139 Downloads 599 Views

Abstract

Kenyan insurance firms have introduced insurance policies of chronic illnesses like cancer; however, they have faced a huge challenge in the pricing of these policies as cancer can transit into different stages, which consequently leads to variation in the cost of treatment. This has made the estimation of aggregate losses of diseases which have multiple stages of transitions such as cancer, an area of interest of many insurance firms. Mixture phase type distributions can be used to solve this setback as they can in-cooperate the transition in the estimation of claim frequency while also in-cooperating the heterogeneity aspect of claim data. In this paper, we estimate the aggregate losses of secondary cancer cases in Kenya using mixture phase type Poisson Lindley distributions. Phase type (PH) distributions for one and two parameter Poisson Lindley are developed as well their compound distributions. The matrix parameters of the PH distributions are estimated using continuous Chapman Kolmogorov equations as the disease process of cancer is continuous while severity is modeled using Pareto, Generalized Pareto and Weibull distributions. This study shows that aggregate losses for Kenyan data are best estimated using PH-OPPL-Weibull model in the case of PH-OPPL distribution models and PH-TPPL-Generalized Pareto model in the case of PH-TPPL distribution models. Comparing the two best models, PH-OPPL-Weibull model provided the best fit for secondary cancer cases in Kenya. This model is also recommended for different diseases which are dynamic in nature like cancer.

Keywords

PH One Parameter Poisson Lindley, PH Two Parameter Poisson Lindley, PH Three Parameter Poisson Linldey, Discrete Fourier Transform, Discretization

Share and Cite:

Mwende, C. , Weke, P. , Bundi, D. and Ottieno, J. (2021) Estimation of Aggregate Losses of Secondary Cancer Using PH-OPPL and PH-TPPL Distributions. Open Journal of Statistics, 11, 838-853. doi: 10.4236/ojs.2021.115049.

1. Introduction

Aggregate losses are estimated by in-cooperating both claim frequency and claim severity distributions. Pavel (2010) [1] reviewed methods used to calculate distributions of aggregate losses. Robertson (1992) [2] applied Discrete Fourier Transform in estimation of aggregate losses from frequency and severity distributions. Ronoet al. (2020) [3] developed compound distribution to model extreme natural disasters in Kenya. Mohamed et al. (2010) [4] introduced use of simulation approach in estimation of aggregate losses which can be employed when frequency and severity distribution cannot be combined to derive a compound distribution. Aggregate loss distributions are based on collective risk model expressed as:

$S_{N} = \sum_{i = 1}^{N} X_{i}$ (1)

where:

$X_{i}$ is the severity distribution and N is the claim count distribution. The distribution of N in this paper is considered to follow mixed PH Poisson distributions.

Phase type distributions are constructed, when mixture distributions are convoluted resulting to an interrelated Poisson process occurring in phases. Phase type distributions were introduced way back by Erlang (1909) [5] and it has been advanced by Marcel F. Neuts (1981) [6] and Assussen (2003) [7] among others. Mogens Bladt (2005) [8] introduced phase type distributions in risk theory while O’cinneide (2017) [9] highlighted on Phase type distributions as well as their invariant polytopes. Wu et al. (2010) [10] developed phase type distributions when frequency distributions followed Panjer class $(a, b,0)$ while Kok et al. (2010) [11] used phase type distributions of Panjer class $(a, b,1)$ to model claim frequency.

Markov chains were introduced by Andrei Markov (1856-1922). Nurul et al. (2019) [12] proposed a simple forecasting model of predicting the future air quality using Markov chains which in-cooperated the Markov chains as an operator of evaluating pollution distribution in the long run. Yajuan et al. (2018) [13] used Markov chains to model demand for stations in Bike sharing systems. In this study, the concept of Markov chains is used to determine the matrices of the phase type distributions used in modeling claim frequency.

Frequency data is used to model occurrences in different areas such as engineering, insurance, biology etc. Poisson distribution is often used to model count data; however, it is based on the assumption that variance to mean ratio is unity (equi-dispersion) which is not applicable to real data; hence, it is considered as an inflexible model. Most real life data either experience over dispersion where variance exceeds the mean or under dispersion where the mean exceeds the variance which can be modeled using Poisson mixtures [14]. Poisson Lindley distributions are perfect examples of Poisson mixtures where characteristics of Poisson distribution follow some characteristics of Lindley distribution. One parameter Poisson Lindley which can model over dispersed data was introduced by Sankaran (1970) [15] while Shanker and Mishra (2014) [16] developed two parameter Poisson Lindley which further research has justified that it can model over dispersed data.

In the insurance sector, when calculating aggregate losses for chronic diseases which have various stages like cancer the claim frequency distributions considered do not in-cooperate the different stages of such diseases. In-cooperating phase type distributions solve this short coming of ordinary distributions. Further considering mixed phase type distributions improves modeling of claim frequency data as it considers the heterogeneity aspect of claim data. In this paper, we develop PH one parameter Poisson Lindley distribution and PH two parameter Poisson Lindley distributions where the mixing distribution follows PH Lindley distribution. The resulting PH distributions are used to model claim numbers of secondary cancers in Kenya. Section 1 has a brief introduction to Poisson distributions and Poisson Lindley distributions.

The structure of this paper is as follows: Section 2 will discuss construction of phase type distribution using PH Lindley distributions which will later be applied in modeling of the aggregate losses. Compound distributions from the frequency and severity distributions are developed in Section 3. Aggregate losses for the data are estimated using Discrete Fourier Transforms and the results discussed in Section 4 and Section 5 outlines the conclusions.

2. Proposed Phase Type Poisson Lindley Distributions

In this section we develop phase type distributions for one parameter Poisson Lindley and two parameter Poisson Lindley. Phase type Poisson Lindley distributions are derived when the mixing distribution follow phase type Lindley distribution.

2.1. Phase Type One Parameter Poisson Lindley Distribution

Definition 1. A random variable X is said to be a phase type one parameter Poisson Lindley distribution if it follows:

$X | λ ~ P o ( λ )$

$λ | Λ ~ P H - O P L ( Λ )$

for $λ > 0$ and $Λ$ is $m * m$ matrix.

Theorem 1. If $X ~ P H - O P P L$ distribution then the probability distribution function of X is:

$f (x; Λ) = \vec{γ} \frac{Λ^{2}}{{(I + Λ)}^{x + 3}} {(x + 2) I + Λ} {\vec{1}}^{T}$ (2)

where $Λ$ is $M * M$ and I is an identity matrix.

Proof:

If $X | λ ~ P o (λ)$ and $λ | Λ ~ P H - O P L (Λ)$ , then the pdf of variable X is expressed as;

$P (x) = \int_{0}^{\infty} P r (x | λ) f (λ; Λ) d λ$

where $f (λ; Λ)$ is $P H - O P L (Λ)$ .

$\begin{matrix} P (x) = \int_{0}^{\infty} \frac{e^{- λ} λ^{x}}{x!} \frac{Λ^{2}}{I + Λ} (1 + λ) e^{- Λ λ} λ > 0, Λ = M * M \\ = \frac{Λ^{2}}{I + Λ} \int_{0}^{\infty} [\frac{λ^{x}}{x!} e^{- λ (I + Λ)} + \frac{λ^{x + 1}}{x!} e^{- λ (I + Λ)}] \\ = \vec{γ} \frac{Λ^{2}}{{(I + Λ)}^{x + 3}} {(x + 2) I + Λ} {\vec{1}}^{T} \end{matrix}$ (3)

Properties of Phase Type One Parameter Poisson Lindley Distribution

The r^th moments of PH-OPPL distribution is given by:

$\begin{matrix} E (X^{r}) = \int_{0}^{\infty} x^{r} f (x, Λ) d x \\ = \frac{Λ^{2}}{I + Λ} \int_{0}^{\infty} x^{r} e^{- Λ x} (1 + x) d x \\ = \frac{Λ^{2}}{I + Λ} [\frac{Γ (x + 1)}{Λ^{x + 1}} + \frac{Γ (x + 2)}{Λ^{x + 2}}] \\ = \vec{γ} \frac{x! [(x + 1) I + Λ]}{Λ + I} {\vec{1}}^{T} \end{matrix}$ (4)

The expectation and variance of PH-OPPL distribution can be easily obtained from Equation (4) as:

1) Expectation

$E (x) = \frac{1! [(1 + 1) I + Λ]}{Λ (Λ + I)} = \vec{γ} \frac{2 I + Λ}{Λ (Λ + I)} {\vec{1}}^{T}$ (5)

2) Variance

$\begin{matrix} V a r (x) = \frac{2! [(2 + 1) I + Λ]}{Λ^{2} (Λ + I)} - {\frac{2 I + Λ}{Λ (Λ + I)}}^{2} \\ = \vec{γ} \frac{2 I + 4 Λ + Λ^{2}}{{(Λ + I)}^{2}} + \frac{2 I + Λ}{Λ (Λ + I)} {\vec{1}}^{T} \end{matrix}$ (6)

The probability generating function of PH-OPPL distribution is given by:

$\begin{matrix} G (s) = \int_{0}^{\infty} e^{λ (1 - s)} \frac{Λ^{2}}{I + Λ} (1 + λ) e^{- Λ λ} d λ \\ = \frac{Λ^{2}}{I + Λ} [\int_{0}^{\infty} λ e^{- λ (I + Λ - s I)} d λ + \int_{0}^{\infty} e^{- λ (Λ + I - s I)} d λ] \\ = \vec{γ} \frac{Λ^{2}}{I + Λ} [\frac{Λ + (2 - s) I}{{[Λ + (1 - s) I]}^{2}}] {\vec{1}}^{T} \end{matrix}$ (7)

The parameter $Λ$ of PH-OPPL distribution is estimated using continuous Chapman-Kolmogorov equation.

2.2. Phase Type Two Parameter Poisson Lindley Distribution

Definition 2. A random variable X is said to be a phase type two parameter Poisson Lindley distribution if it follows:

$X | λ ~ P o ( λ )$

$λ | Λ, α ~ P H - T P L (Λ, α)$

for $α > 0, λ > 0$ and $Λ$ is $M * M$ matrix.

Theorem 2. If $X ~ P H - T P P L$ distribution then the probability density function of X is expressed as:

$f (x; Λ, α) = \vec{γ} \frac{Λ^{2}}{{(I + Λ)}^{x + 2}} [I + \frac{(α + x) I}{α Λ + I}] {\vec{1}}^{T}$ (8)

where $α > 0$ , $Λ$ is $M * M$ and I is an identity matrix.

Proof:

If $X | λ ~ P o (λ)$ and $λ | Λ, α ~ P H - T P L (Λ, α)$ , then the pdf of variable X is given by;

$P (x) = \int_{0}^{\infty} P r (X = x | λ) f (λ; Λ, α) d λ$

where $f (λ; Λ, α)$ is $P H - T P L (Λ, α)$ .

$\begin{matrix} P (x) = \int_{0}^{\infty} \frac{e^{- λ} λ^{x}}{x!} \frac{Λ^{2}}{I + Λ α} (α + λ) e^{- Λ λ} d λ λ > 0, Λ = M * M \\ = \frac{Λ^{2}}{I + Λ α} \int_{0}^{\infty} \frac{α λ^{x}}{x!} e^{- λ (I + Λ)} d λ + \int_{0}^{\infty} \frac{λ^{x + 1}}{x!} e^{- λ (I + Λ)} d λ \\ = \frac{Λ^{2}}{α Λ + I} [\frac{α Γ (x + 1) I}{x! {(I + Λ)}^{x + 1}} + \frac{Γ (x + 2)}{x! {(I + Λ)}^{x + 2}}] \\ = \vec{γ} \frac{Λ^{2}}{{(I + Λ)}^{x + 2}} [I + \frac{(α + x) I}{α Λ + I}] {\vec{1}}^{T} \end{matrix}$ (9)

Properties of Phase Type Two Parameter Poisson Lindley Distribution

The r^th moments of PH-TPPL distribution is given by:

$\begin{matrix} E (X^{r}) = \int_{0}^{\infty} x^{r} f (x, Λ, α) d x \\ = \int_{0}^{\infty} [\sum_{x = 0}^{\infty} x^{r} \frac{e^{- λ} λ^{x}}{x!}] \frac{Λ^{2}}{I + α Λ} (α + λ) e^{- Λ λ} d λ \\ = \frac{Λ^{2}}{I + Λ α} \int_{0}^{\infty} λ^{r} Λ e^{- Λ λ} d λ + \int_{0}^{\infty} λ^{r + 1} e^{- Λ λ} d λ \\ = \frac{Λ^{2}}{I + Λ α} [α \frac{Γ (r + 1) I}{Λ^{r + 1}} + \frac{Γ (r + 2)}{Λ^{r + 2}}] \\ = \vec{γ} \frac{Γ (r + 1) I}{Λ^{r}} \frac{α Λ + (r + 1) I}{α Λ + I} {\vec{1}}^{T} \end{matrix}$ (10)

The expectation and variance of PH-TPPL distribution can be easily obtained from Equation (10) as:

1) Expectation

$\begin{matrix} E (x) = \frac{Λ^{2}}{α α + I} \int_{0}^{\infty} λ (α + λ) e^{- Λ λ} d λ \\ = \vec{γ} \frac{2 I + Λ α}{Λ (Λ α + I)} {\vec{1}}^{T} \end{matrix}$ (11)

2) Variance

$V a r (x) = E (x^{2}) - {[E (x)]}^{2}$

$\begin{matrix} E (x^{2}) = \frac{α Λ + 2 I}{Λ (α Λ + I)} + \frac{2 (α Λ + 3 I)}{Λ^{2} (α Λ + I)} \\ = \vec{γ} \frac{α Λ + 2 I}{Λ (α Λ + I)} + \frac{2 (α Λ + 3 I)}{Λ^{2} (α Λ + I)} - {[\frac{2 I + Λ α}{Λ (Λ α + I)}]}^{2} {\vec{1}}^{T} \end{matrix}$ (12)

The probability generating function of PH-TPPL distribution is given by:

$\begin{matrix} G (s) = \frac{Λ^{2}}{{(Λ + I)}^{2}} \sum_{x = 0}^{\infty} {[\frac{s I}{Λ + I}]}^{x} \\ + \frac{Λ^{2}}{{(Λ + I)}^{2} (α Λ + I)} \sum_{0}^{\infty} (α + x) {[\frac{s}{Λ + I}]}^{x} \\ = \vec{γ} \frac{α Λ [Λ + (1 - s) I] + Λ^{2}}{(α Λ + I) {[Λ + (1 - s) I]}^{2}} {\vec{1}}^{T} \end{matrix}$ (13)

The value of $Λ$ is known hence the value of $α$ can be obtained from Equation (11) if the value of $E (x)$ is known.

2.3. Shape of Probability Function of PH-OPPL and PH-TPPL Distributions

Matrix $Λ$ was determined using continuous Chapman-Kolmogorov equation for cancer data in Kenya and the values of $γ$ is the stationary probabilities obtained using the formula $π_{k} = π_{0} Λ^{k}$ . The values of $Λ$ for three state Markov model represents cancer patients who transit from Healthy-Leukemia-Dead states, four state Markov model represents patients who transit from Healthy-Liver-Colon-Dead states, five state Markov model represents Healthy-Stomach-Pharynx-Colon-Dead states and six state Markov model represents patients transiting from Healthy-Oesophagus-Stomach-Lung-Kidney-Dead states. The values of $Λ$ for different states are:

$[\begin{matrix} 0.8783 & 0.1217 & 0 \\ 0 & 0.3938 & 0.6062 \\ 0 & 0 & 1.0000 \end{matrix}] [\begin{matrix} 0.7900 & 0.2100 & 0 & 0 \\ 0 & 0.2898 & 0.7102 & 0 \\ 0 & 0 & 0.8985 & 0.1015 \\ 0 & 0 & 0 & 1.0000 \end{matrix}]$

$[\begin{matrix} 0.8364 & 0.1636 & 0 & 0 & 0 \\ 0 & 0.3892 & 0.6108 & 0 & 0 \\ 0 & 0 & 0.6688 & 0.3312 & 0 \\ 0 & 0 & 0 & 0.5524 & 0.4476 \\ 0 & 0 & 0 & 0 & 1.0000 \end{matrix}] [\begin{matrix} 0.4851 & 0.5149 & 0 & 0 & 0 & 0 \\ 0 & 0.1223 & 0.8777 & 0 & 0 & 0 \\ 0 & 0 & 0.1533 & 0.8467 & 0 & 0 \\ 0 & 0 & 0 & 0.4410 & 0.5590 & 0 \\ 0 & 0 & 0 & 0 & 0.8668 & 0.1332 \\ 0 & 0 & 0 & 0 & 0 & 1.0000 \end{matrix}]$

The shape of probability function of phase type one parameter Poisson Lindley is expressed as:

Figure 1 shows that phase type one parameter Poisson Lindley is a long tailed distribution.

The shape of probability function of phase type two parameter Poisson Lindley is expressed as:

Figure 2 shows that phase type two parameter Poisson Lindley is a long tailed distribution.

(a) (b) (c) (d)

Figure 1. Pdf plots of PH-OPPL for different values of Λ.

(a) (b) (c) (d)

Figure 2. Pdf plots of PH-TPPL for different values of Λ.

3. Compound Phase Type Distribution

Compound distribution in the actuarial field is the total loses in the group of insurance policies. In this section we develop compound phase type distributions (CPHD) which can be used to model secondary cancer cases.

Definition 3. Let N be a r.v with probability generating function $F (S)$ and $X_{1}, \dots, X_{N}$ be a set of iid random variable with a common probability generating function $G (S)$ and is independent of N, then the probability generating function of the compound distribution is expressed as:

$H (S) = F [G (S)]$ (14)

Unlike ordinary compound distributions which do not consider transition phases of diseases, (CPHD) in-cooperates the transition states. Probability generating functions of compound distributions can be derived by convolution of probability generating function of two distributions as shown in Equation (14).

Theorem 3 (Compound one parameter Poisson Lindley distribution). If the pgf of $N ~ P H - O P P L (Λ)$ the compound pgf of N is:

$H (S) = \vec{γ} \frac{Λ^{2}}{I + Λ} [\frac{Λ + (2 - L_{x} [G (S)]) I}{{[Λ + (1 - L_{x} [G (S)]) I]}^{2}}] {\vec{1}}^{T}$ (15)

where $L_{x} [G (S)]$ is the Laplace transform of the severity distribution as most continuous distributions their pgf is not available.

Proof:

$\begin{matrix} H (S) = F [G (S)] = F [L_{x} [G (S)]] \\ = \vec{γ} \frac{Λ^{2}}{I + Λ} [\frac{Λ + (2 - L_{x} [G (S)]) I}{{[Λ + (1 - L_{x} [G (S)]) I]}^{2}}] {\vec{1}}^{T} \end{matrix}$ (16)

Theorem 4 (Compound two parameter Poisson Lindley distribution). If the pgf of $N ~ P H - T P P L (Λ)$ the compound pgf of N is:

$H (S) = \vec{γ} \frac{α Λ [Λ + (1 - L_{x} [G (S)]) I] + Λ^{2}}{(α Λ + I) {[Λ + (1 - L_{x} [G (S)]) I]}^{2}} {\vec{1}}^{T}$ (17)

where $L_{x} [G (S)]$ is as defined in theorem (3).

Proof:

$\begin{array}{l} H (S) = F [G (S)] = F [L_{x} [G (S)]] \\ = \vec{γ} \frac{α Λ [Λ + (1 - L_{x} [G (S)]) I] + Λ^{2}}{(α Λ + I) {[Λ + (1 - L_{x} [G (S)]) I]}^{2}} {\vec{1}}^{T} \end{array}$ (18)

The continuous distributions considered in this research are; Weibull, Pareto and Generalized Pareto distributions hence their Laplace transforms will be derived and replaced in Equations (16) and (18) to get the pgf of their compound distribution using PH-OPPL and PH-TPPL distributions respectively. The Laplace transform of Weibull, Pareto and Generalized Pareto are derived as:

1) Weibull distribution

$L_{x} (S) = E [e^{- s x}]$

$\begin{matrix} L_{x} G (S) = \int_{0}^{\infty} e^{- s x} \frac{β}{α} {(\frac{x}{α})}^{β - 1} e^{- {(\frac{x}{α})}^{β}} d x \\ = \frac{β}{α} \int_{0}^{\infty} {(\frac{x}{α})}^{β - 1} e^{- \frac{x}{α} [s α + {(\frac{x}{α})}^{β - 1}]} \\ = \frac{β}{α} \frac{Γ β}{{[s α + {(\frac{x}{α})}^{β - 1}]}^{β}} \end{matrix}$ (19)

2) Pareto distribution

$L_{x} (S) = E [e^{- s x}]$

$\begin{matrix} L_{x} G (S) = α β^{α} \int_{0}^{\infty} \frac{e^{- s x}}{{(x + β)}^{α + 1}} d x = \frac{α}{β} \int_{0}^{\infty} e^{- β x} \sum_{k = 0}^{\infty} (\begin{matrix} - (α + 1) \\ k \end{matrix}) {(\frac{x}{β})}^{k} d x \\ = \frac{α}{β} \sum_{k = 0}^{\infty} {(- 1)}^{k} \frac{Γ (α + k)}{k! Γ α} \frac{k!}{β^{2 k + 1}} = \sum_{k = 0}^{\infty} {(- 1)}^{k} \frac{α}{Γ α} \frac{Γ (α + k)}{β^{2 k + 2}} \end{matrix}$ (20)

3) Generalized Pareto distribution

$L_{X} (S) = E [e^{- s x}]$

$\begin{matrix} L_{X} G (S) = \int_{0}^{\infty} e^{- s x} \frac{x^{α - 1}}{β (α, γ) {(x + λ)}^{α + γ}} d x \\ = \frac{1}{λ^{γ} β (α, γ)} \int_{0}^{\infty} x^{α} e^{- s x} \sum_{k = 0}^{\infty} (\begin{matrix} α + γ \\ k \end{matrix}) {\frac{x}{λ}}^{k} d x \\ = \frac{1}{λ^{γ} β (α, γ)} \sum_{k = 0}^{\infty} \frac{- (α + γ)}{λ^{k}} \int_{0}^{\infty} x^{γ + k + 1 - 1} e^{- s x} d x \\ = \frac{1}{λ^{γ} β (α, γ)} \sum_{k = 0}^{\infty} \frac{- (α + γ)}{λ^{k}} \frac{Γ α + k}{s^{α + k}} \end{matrix}$ (21)

Replacing Equations (19), (20) and (21) in Equation (16) the pgf of the compound distributions of PH-one parameter Poisson Lindley with Weibull, Pareto and Generalized Pareto respectively are:

1) Compound PH-OPPL-Weibull distribution

$H (S) = \vec{γ} \frac{Λ^{2}}{I + Λ} [\frac{Λ + (2 - \frac{β}{α} \frac{Γ β}{{[s α + {(\frac{x}{α})}^{β - 1}]}^{β}}) I}{{[Λ + (1 - \frac{β}{α} \frac{Γ β}{{[s α + {(\frac{x}{α})}^{β - 1}]}^{β}}) I]}^{2}}] {\vec{1}}^{T}$ (22)

2) Compound PH-OPPL-Pareto distribution

$H (S) = \vec{γ} \frac{Λ^{2}}{I + Λ} [\frac{Λ + (2 - \sum_{k = 0}^{\infty} {(- 1)}^{k} \frac{α}{Γ α} \frac{Γ (α + k)}{β^{2 k + 2}}) I}{{[Λ + (1 - \sum_{k = 0}^{\infty} {(- 1)}^{k} \frac{α}{Γ α} \frac{Γ (α + k)}{β^{2 k + 2}}) I]}^{2}}] {\vec{1}}^{T}$ (23)

3) Compound PH-OPPL-Generalized Pareto distribution

$H (S) = \vec{γ} \frac{Λ^{2}}{I + Λ} [\frac{Λ + (2 - \frac{1}{λ^{γ} β (α, γ)} \sum_{k = 0}^{\infty} \frac{- (α + γ)}{λ^{k}} \frac{Γ α + k}{s^{α + k}}) I}{{[Λ + (1 - \frac{1}{λ^{γ} β (α, γ)} \sum_{k = 0}^{\infty} \frac{- (α + γ)}{λ^{k}} \frac{Γ α + k}{s^{α + k}}) I]}^{2}}] {\vec{1}}^{T}$ (24)

Replacing Equations (19), (20) and (21) in Equation (18) the pgf of the compound distributions of PH-two parameter Poisson Lindley with Weibull, Pareto and Generalized Pareto respectively are:

1) Compound PH-TPPL-Weibull distribution

$H (S) = \vec{γ} \frac{α Λ [Λ + (1 - \frac{β}{α} \frac{Γ β}{{[s α + {(\frac{x}{α})}^{β - 1}]}^{β}}) I] + Λ^{2}}{(α Λ + I) {[Λ + (1 - \frac{β}{α} \frac{Γ β}{{[s α + {(\frac{x}{α})}^{β - 1}]}^{β}}) I]}^{2}} {\vec{1}}^{T}$ (25)

2) Compound PH-TPPL-Pareto distribution

$H (S) = \vec{γ} \frac{α Λ [Λ + (1 - \sum_{k = 0}^{\infty} {(- 1)}^{k} \frac{α}{Γ α} \frac{Γ (α + k)}{β^{2 k + 2}}) I] + Λ^{2}}{(α Λ + I) {[Λ + (1 - \sum_{k = 0}^{\infty} {(- 1)}^{k} \frac{α}{Γ α} \frac{Γ (α + k)}{β^{2 k + 2}}) I]}^{2}} {\vec{1}}^{T}$ (26)

3) Compound PH-TPPL-Generalized Pareto distribution

$H (S) = \vec{γ} \frac{α Λ [Λ + (1 - \frac{1}{λ^{γ} β (α, γ)} \sum_{k = 0}^{\infty} \frac{- (α + γ)}{λ^{k}} \frac{Γ α + k}{s^{α + k}}) I] + Λ^{2}}{(α Λ + I) {[Λ + (1 - \frac{1}{λ^{γ} β (α, γ)} \sum_{k = 0}^{\infty} \frac{- (α + γ)}{λ^{k}} \frac{Γ α + k}{s^{α + k}}) I]}^{2}} {\vec{1}}^{T}$ (27)

4. Data Analysis, Results and Discussions

4.1. Severity and Frequency Probabilities

The cancer data considered in this research is obtained from a medical facility in Kenya. The cancer transitions states considered are Healthy-Leukemia-Dead states for 3 state model, Healthy-Liver-Colon-Dead states for four state model, Healthy-Stomach-Pharynx-Colon-Dead states for five state model and Healthy-Oesophagus-Stomach-Lung-Kidney-Dead states for six state models. The values of $Λ$ for the data are obtained using continuous Chapman-Kolmogorov equations expressed as:

$p_{i j} (ϕ_{A}, γ_{t} + Ψ_{d}) = \sum_{k = 1}^{n} p_{i k} (ϕ_{A}, γ_{t}) p_{k j} (γ_{t}, γ_{t} + Ψ_{d})$

$\begin{array}{l} \underset{Ψ_{d} \to 0}{l i m} \frac{p_{i j} (ϕ_{A}, γ_{t} + Ψ_{d}) - p_{i j} (ϕ_{A}, γ_{t})}{Ψ_{d}} \\ = \underset{Ψ \to 0}{l i m} \frac{p_{i k} (ϕ_{A}, γ_{t}) p_{k j} (γ_{t}, γ_{t} + Ψ_{d}) - p_{i j} (ϕ_{A}, γ_{t}) [1 - p_{j j} (γ_{t}, γ_{t} + Ψ_{d})]}{Ψ_{d}} \end{array}$

$\frac{\partial}{\partial γ_{t}} p_{i j} (ϕ_{A}, γ_{t}) = \sum_{k = 1}^{n} p_{i k} (ϕ_{A}, γ_{t}) ℑ_{k j} - p_{i j} (ϕ_{A}, γ_{t}) ℑ_{j}$

$p_{i j} (ϕ_{A}, γ_{t} + Ψ_{d}) = \sum_{k = 1}^{n} p_{i k} (ϕ_{A}, γ_{t}) p_{k j} (γ_{t}, γ_{t} + Ψ_{d})$

$p_{i j} (ϕ_{A}, γ_{t}) = 1 - e^{- ℑ_{i j} (ϕ_{A}) t}$ (28)

where:

$\lim_{Ψ_{d} \to 0} \frac{p_{k j} (γ_{t}, γ_{t} + Ψ_{d})}{Ψ_{d}} = ℑ_{k j}$

$\lim_{Ψ \to 0} \frac{1 - p_{j j} (γ_{t}, γ_{t} + Ψ_{d})}{Ψ_{d}} = ℑ_{j}$

The values of $Λ$ for three, four, five and six state using the data obtained were as shown in Section 2.3.

The severity distributions considered in this research are Weibull, Pareto and Generalized Pareto distributions. DFT requires severity probabilities to be discrete hence they will be discretized using method of mass rounding which is expressed as:

$\begin{array}{l} f_{0} = F_{J} (\frac{h}{2}) \\ f_{x} = F_{J} (x h + \frac{h}{2}) - F_{J} (x h + \frac{h}{2}) x = 1, 2, 3, \dots \\ f_{m} = 1 - F_{J} (m h - \frac{h}{2}) \end{array}$

The pdf of Wei-bull, Pareto and Generalized Pareto distributions respectively are expressed as;

$\begin{array}{l} f (x) = \frac{β}{α} {(\frac{x}{α})}^{β - 1} e^{- {(\frac{x}{α})}^{β}}; x > 0; a, b > 0 \\ f (x) = \frac{α β^{α}}{x + β^{α + 1}} \\ f (x) = \frac{x^{α - 1} λ^{γ} Γ (α + γ)}{Γ γ Γ α {(x + λ)}^{α + γ}} \end{array}$

The frequency and severity probabilities for secondary cancer cases are: (Table 1).

Table 1. Claim frequency and severity probabilities.

4.2. Discrete Fourier Transform

There are different numerical methods used in estimation of aggregate losses such as; Monte Carlo, Panjer recursive model, Fourier transforms and Direct Numerical Integration. Panjer recursive model is applicable when the claim frequency distributions follow either Panjer class $(a, b,0)$ or class $(a, b,1)$ . In this study we will consider Discrete Fourier Transform (DFT) in estimation of the aggregate losses. Robertson (1992) applied Fourier transforms in computation of aggregate losses [2]. Pavel (2010) [1] reviewed these numerical methods and concluded that each method had it strength and weaknesses hence they should be chosen according to the study. DFT mostly preferred as it is arguably said to be the most elegant and powerful technique in evaluation of aggregate loss probabilities when claim amount $X_{i}$ is both discrete and continuous [17].

The algorithm of DFT of aggregate losses requires computation of DFT of frequency and DFT of severity separately.

Definition 4 (Discrete Fourier Transform). Let $X_{n}$ be the severity or frequency distribution of the claim data. For any discrete function $X_{k}$ the Discrete Fourier transform is the mapping;

$X_{k} = \sum_{n = 0}^{N - 1} X (n) e^{\frac{- i 2 Π k n}{N}} k = 0, 1, 2, \dots, N - 1$ (29)

Expression (29) is very complex to work with hence to reduce its complexity we apply Euler’s formula and it becomes:

$X (k) = \sum_{n = 0}^{N - 1} X (n) \cos (\frac{2 Π k n}{N}) + i \sin (\frac{2 Π k n}{N})$

$X (k) = \sum_{n = 0}^{N - 1} X (n) W_{N}^{k n}$ (30)

which is the DFT of the severity or frequency probabilities. The severity and frequency probabilities are of length 8 and hence the matrix W must be a primitive 8^th root of unity therefore Equation (30) can be rewritten as:

$X (k) = \sum_{N = 0}^{7} X (n) W_{N}^{k n}$ (31)

The frequency or severity probabilities will be padded with equal number of zero’s as its elements in order to perform no wrap convolution. The DFT algorithm is as follows:

1) Multiply the matrix $W_{N}^{k n}$ with the frequency or severity probabilities to get the DFT of frequency or severity probabilities.

2) Compute DFT of DFT of frequency and severity by multiplying DFT of frequency probabilities with the DFT of the severity probabilities and consequently multiplying the resulting vector with the matrix $W_{N}^{k n}$ .

3) Select the values without the complex i and divide each value by the number of elements in the vector of frequency or severity distribution and arrange the resulting probabilities in reverse except for the first probability.

4) Values corresponding to original frequency and severity values are the aggregate loss probabilities.

The values of aggregate loss probabilities using DFT are:

Table 2. Aggregate loss probabilities.

The values of Table 2 can be represented graphically as:

(a) (b) (c)

Figure 3. Aggregate loss probabilities.

Figure 3(a) shows aggregate loss probabilities using PH-OPPL distribution with severity distributions and it indicates that PH-OPPL with Weibull and Pareto were similar to the actual aggregate loss probabilities while PH-OPPL with generalized Pareto distribution overestimate the aggregate losses for six state model. Figure 3(b) shows aggregate loss probabilities using PH-TPPL distribution with Pareto and generalized Pareto provided a better fit for secondary cancer data while PH-TPPL with Weibull overestimated the aggregate losses. However, PH-OPPL with Weibull and PH-TPPL with generalized Pareto provided a better fit compared to PH-OPPL-Pareto model and PH-TPPL Pareto respectively hence they are compared in Figure 3(c) indicating that PH-OPPL with Weibull provided the best fit for aggregate loss data of secondary cancers in Kenya. PH-OPPL-Weibull model can be used to provide better estimates of aggregate losses for secondary cancer data in Kenya.

5. Conclusion

Mixed phase type distributions are developed to model secondary cancer cases in Kenya. Unlike ordinary distributions which do not in-cooperate the transition of different states, the distributions proposed here take into consideration transition states while modeling claim frequency data. The distributions are based on Poisson and Lindley distributions, where PH-OPPL-Weibull provided the best for PH-OPPL models while PH-TPPL-Generalized Pareto provided the best fit for PH-TPPL models. This model improves estimation of aggregate loses as it in-cooperates transition probabilities of different states of cancer as well as heterogeneous aspect of claim data. This greatly improves estimation of insurance policies for diseases which transit to different state such as cancer hence improving the financial positions of the insurance firms as it will improve estimation of its reserves. This model, however, is only applicable in risk theory for diseases which have multiple transitions states. Further research can be done on this study factoring in patients who were censored in this study and also the same study can be carried out for disease such as HIV-AID which has transition states.

Data Availability

The data used to support the findings of this study can be availed upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	Shevchenko, P.V. (2010) Calculation of Aggregate Loss Distribution. Journal of Operational Risk, 5, 3-40.
[2]	Robertson, J. (1992) The Computation of Aggregate Loss Distribution.
[3]	Rono, A., Ogutu, C. and Weke, P. (2020) On Compound Distributions for Natural Disaster Modelling in Kenya. International Journal of Mathematics and Mathematical Sciences, 2020, Article ID: 9398309. https://doi.org/10.1155/2020/9398309
[4]	Mohamed, M.A., Razali, A.M. and Ismail, N. (2010) Approximation of Aggregate Losses Using Simulation. Journal of Mathematics and Statistics, 6, 233-239. https://doi.org/10.3844/jmssp.2010.233.239
[5]	Erlang, A. (1909) Sandsynlighedsregning og telefonsamtaler. Nyttidsskriftfor Matematik, 20, 33-39.
[6]	Neuts, M.F. (1981) Matrix-Geometric Solutions in Stochastic Models. Series in the Mathematical Sciences. Johns Hopkins University Press, Baltimore.
[7]	Assussen, S. (2003) Applied Probability and Queues. Springer-Verag, New York.
[8]	Mogens, B. (2005) A Review of Phase-Type Distribution and Their Used in Risk Theory. Austin Bulletin, 35, 145-161. https://doi.org/10.2143/AST.35.1.583170
[9]	O’Cinneide, C. (2017) Phase-Type Distributions and Invariant Polytopes.
[10]	Wu, X.Y. and Li, S.M. (2010) Matrix-Form Recursions for a Family of Compound Distributions. Austin Bulletin, 40, 351-368. https://doi.org/10.2143/AST.40.1.2049233
[11]	Kok, S. and Wu, X. (2010) Matrix-Form Recursive Evaluation of the Aggregate Claims Distribution Revisited. Centre for Actuarial Studies, Department of Economics, The University of Melbourne, Melbourne.
[12]	Nurul, N.Z., Mahmod, O., Rajalingam, S., Hanita, D., Lazim, A. and Evizal, A.K. (1981) Markov Chain Model Development for Forecasting Air Pollution Index of Miri, Sarawak. Journal of Sustainability, 11, 5190. https://doi.org/10.3390/su11195190
[13]	Zhou, Y.J., Wang, L.L., Zhong, R. and Tan, Y.L. (2018) A Markov Chain Based Demand Prediction Model for Stations in Bike Sharing Systems. Journal of Mathematical Problems in Engineering, 2018, Article ID: 8028714. https://doi.org/10.1155/2018/8028714
[14]	Das, K.K., Ahmed, I. and Bhattacharjee, S. (2018) A New Three-Parameter Poisson-Lindley Distribution for Modelling Over-Dispersed Count Data. International Journal of Applied Engineering Research, 13, 16468-16477.
[15]	Sankaran, M. (1970) The Discrete Poisson-Lindley Distribution. Biometrics, 26, 145-149. https://doi.org/10.2307/2529053
[16]	Shanker, R. and Mishra, A. (2014) A Two-Parameter Poisson-Lindley Distribution. International Journal of Statistics and Systems, 9, 79-85.
[17]	Kemeny, J.G. and Snell, J.L. (2016) Finite Markov Chains. Springer-Verlag, Princeton.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies