Parameters Estimation of a Bivariate Generalized Poisson Distribution with Applications to Metabolic Syndrome Data

Mohamed M. Shoukri

doi:10.4236/ojs.2024.145020

Open Journal of Statistics > Vol.14 No.5, October 2024

Parameters Estimation of a Bivariate Generalized Poisson Distribution with Applications to Metabolic Syndrome Data

Mohamed M. Shoukri
Department of Epidemiology and Biostatistics, Schulich School of Medicine and Dentistry, University of Western Ontario, London, Ontario, Canada.
DOI: 10.4236/ojs.2024.145020 PDF HTML XML 99 Downloads 484 Views

Abstract

Background: Bivariate count data are commonly encountered in medicine, biology, engineering, epidemiology and many other applications. The Poisson distribution has been the model of choice to analyze such data. In most cases mutual independence among the variables is assumed, however this fails to take into accounts the correlation between the outcomes of interests. A special bivariate form of the multivariate Lagrange family of distribution, names Generalized Bivariate Poisson Distribution, is considered in this paper. Objectives: We estimate the model parameters using the method of maximum likelihood and show that the model fits the count variables representing components of metabolic syndrome in spousal pairs. We use the likelihood local score to test the significance of the correlation between the counts. We also construct confidence interval on the ratio of the two correlated Poisson means. Methods: Based on a random sample of pairs of count data, we show that the score test of independence is locally most powerful. We also provide a formula for sample size estimation for given level of significance and given power. The confidence intervals on the ratio of correlated Poisson means are constructed using the delta method, the Fieller’s theorem, and the nonparametric bootstrap. We illustrate the methodologies on metabolic syndrome data collected from 4000 spousal pairs. Results: The bivariate Poisson model fitted the metabolic syndrome data quite satisfactorily. Moreover, the three methods of confidence interval estimation were almost identical, meaning that they have the same interval width.

Keywords

Lagrange Distributions, Double Poisson, Maximum Likelihood Estimation, Score Test of Independence, Higher Order Moments, Non-Parametric Bootstrap

Share and Cite:

Shoukri, M. (2024) Parameters Estimation of a Bivariate Generalized Poisson Distribution with Applications to Metabolic Syndrome Data. Open Journal of Statistics, 14, 467-480. doi: 10.4236/ojs.2024.145020.

1. Introduction

For many years there has been an increase in the statistical research to develop multivariate distributions for count data. There are numerous ways by which we can generalize the distributions of univariate counts to multivariate forms. The applications of these multivariate distributions are found in many fields such as engineering, bioinformatics, genetic epidemiology and biomedicine. An interested reader is referred to the recent editions of the book by Winkelmann [1] and the book by Cameron and Trivedi [2] for thorough reviews of regression models for multivariate count data and the very recent work by Tzougas et al. [3]. Needless to say, fitting these multivariate distribution to data is quite complicated and requires a great deal of computing resources. Therefor there is a need to develop more flexible bivariate models and demonstrate their applicability to real life data.

The main purpose of this article is to investigate the statistical characteristics of a class of bivariate generalized Poisson models with a correlation parameter that offers sufficient flexibility for accommodating overdispersion and accounting for the positive correlation between the variables of interest. The distribution is named “Bivariate Generalized Poisson Distribution”, BGPD, which is a special member of a larger family of distributions developed by Shenton and Consul [4]. Some of the interesting properties of this class were studies further by Jain and Sing [5], and by Shoukri [6] who investigated some of the properties of the distribution as a member of the class of bivariate modified power series distributions.

This paper has three-fold objectives. Firstly, we derive the locally powerful score test on the hypothesis that the correlation parameter is significantly different from zero, and then assess the power of this test. Secondly, construct confidence interval on the ratio of the means of the two correlated Poisson variables. Thirdly, we fit the model to metabolic syndrome data available from a random sample of spousal pairs.

2. The Bivariate Generalized Poisson Distribution (BGPD)

A bivariate random variable (X, Y) is said to have BGPD when the joint probability distribution is given by (1):

$\begin{matrix} P (X = x, Y = y) = \frac{{(1 + m_{1} x + m_{2} y)}^{x + y - 1}}{x! y!} θ_{1}^{x} θ_{2}^{y} \exp [- (θ_{1} + θ_{2}) \\ \cdot (1 + m_{1} x + m_{2} y)] \end{matrix}$ (1)

$x, y ≽ 0$ , $0 < θ_{i} < \infty$ , $m_{i} > 0 (i = 1, 2)$ , $0 < θ_{1} m_{1} + θ_{2} m_{2} < 1$ , and zero otherwise. The BGPD can be obtained as follows:

By expanding the bivariate probability generating function (BPGF) $ϕ (t_{1}, t_{2}) = \exp [θ_{1} (t_{1} - 1) + θ_{2} (t_{2} - 1)]$ , using the bivariate Lagrange expansion of an implicit function, as was shown in [4].

3. Recurrence Relation among Moments

From (1) we can write

$\begin{matrix} \exp (θ_{1} + θ_{2}) = \sum_{x, y} {(x! y!)}^{- 1} {(1 + m_{1} x + m_{2} y)}^{x + y - 1} \\ \cdot θ_{1}^{x} θ_{2}^{y} \exp [- (θ_{1} + θ_{2}) (1 + m_{1} x + m_{2} y)] \end{matrix}$ (2)

Therefore, from (2) we can see that the distribution posses a power series expansion in terms of the parameters ( $θ_{1}$ & $θ_{2}$ ).

On differentiating the above equation {2} partially w.r.t. $θ_{1}$ and $θ_{2}$ respectively, dividing by $\exp (θ_{1} + θ_{2})$ , and on summation, we get by solving for $μ_{1, 0}$ and $μ_{0, 1}$ .

$μ_{1, 0} = θ_{1} {(1 - m_{1} θ_{1} - m_{2} θ_{2})}^{- 1}$ (3)

$μ_{0, 1} = θ_{2} {(1 - m_{1} θ_{1} - m_{2} θ_{2})}^{- 1}$ (4)

To obtain a recurrence relation among the higher non-central product moments, we write

$μ_{r \cdot s} = \sum_{x, y} x^{r} y^{s} P (X = x, Y = y)$ (5)

Define the cross product moments:

$μ_{r + 1, s} = E [{(x - {μ^{'}}_{10})}^{r + 1} {(y - {μ^{'}}_{01})}^{s}]$ .

We can show that

$μ_{r + 1, s} = θ_{1} {(1 - m_{1} θ_{1} - m_{2} θ_{2})}^{- 1} [{(1 - m_{2} θ_{2}) \partial_{1} + m_{2} θ_{2} \partial_{2}} {μ^{'}}_{r, s}] + {μ^{'}}_{r, s} {μ^{'}}_{1, 0}$ (6)

By symmetry we can write:

${μ^{'}}_{r, s + 1} = θ_{2} {(1 - m_{1} θ_{1} - m_{2} θ_{2})}^{- 1} [{(1 - m_{1} θ_{1}) \partial_{2} + m_{1} θ_{1} \partial_{1}} {μ^{'}}_{r, s}] + {μ^{'}}_{r, s} {μ^{'}}_{0, 1}$ (7)

These relationships among the moments are obtained on following Kendall [7].

In particular, from (6) and (7):

$μ_{1, 0} = θ_{1} {(1 - m_{1} θ_{1} - m_{2} θ_{2})}^{- 1}$ (8)

$μ_{2, 0} = θ_{1} {(1 - m_{1} θ_{1} - m_{2} θ_{2})}^{- 3} [{(1 - m_{2} θ_{2})}^{2} + θ_{1} θ_{2} m_{2}^{2}]$ (9)

are respectively the mean and variance of X in terms of the model parameters.

And the values of $μ_{0, 1}$ and $μ_{0, 2}$ can be written down by symmetry. The coefficient of correlation $ρ$ , between X and Y is given by

$ρ = \frac{θ_{1} θ_{2} [m_{1} (1 - m_{2} θ_{2}) + m_{2} (1 - m_{1} θ_{1})]}{{[θ_{1} θ_{2} (1 - 2 m_{1} θ_{1} + m_{1}^{2} θ_{1}^{2} + θ_{1} θ_{2} m_{1}^{2}) (1 - 2 m_{2} θ_{2} + m_{2}^{2} θ_{2}^{2} + θ_{1} θ_{2} m_{2}^{2})]}^{1 / 2}}$ (10)

As can be seen, from the restrictions on the model parameters, the correlation (10) cannot be negative.

3.1. Maximum Likelihood Estimators

Assuming $m_{1}$ and $m_{2}$ to be known constants, let $(X_{i}, Y_{i}), i = 1, 2, \dots, N$ be a random sample of fixed size $n$ , taken from the BGPD family given in (1). It can be easily shown that the maximum likelihood (ML) estimators for $θ_{1}$ and $θ_{2}$ are as shown in (11) and (12):

${\hat{θ}}_{1} = z_{1} {(n + m_{1} z_{1} + m_{2} z_{2})}^{- 1}$ (11)

${\hat{θ}}_{2} = z_{2} {(n + m_{1} z_{1} + m_{2} z_{2})}^{- 1}$ (12)

With some considerable algebra, converting the Fisher’s information matrix we can show that the asymptotic variances and covariances of the ML estimators of $θ_{1}$ and $θ_{2}$ are given respectively in (13)-(15):

$Var ({\hat{θ}}_{1}) = \frac{n θ_{1}}{{(n + m_{1})}^{2}} - \frac{n^{2} m_{1} θ_{1}^{2}}{{(n + m_{1})}^{2} (n + 2 m_{1})} - \frac{n m_{2} θ_{1} θ_{2}}{(n + m_{1}) (n + m_{1} + m_{2})}$ (13)

$Var ({\hat{θ}}_{2}) = \frac{n θ_{2}}{{(n + m_{2})}^{2}} - \frac{n^{2} m_{2} θ_{2}^{2}}{{(n + m_{2})}^{2} (n + 2 m_{2})} - \frac{n m_{1} θ_{1} θ_{2}}{(n + m_{2}) (n + m_{1} + m_{2})}$ (14)

$Cov ({\hat{θ}}_{1}, {\hat{θ}}_{2}) = \frac{n θ_{1} θ_{2}}{n + m_{1} + m_{2}} - \frac{n^{2} θ_{1} θ_{2}}{(n + m_{2}) (n + m_{2})}$ (15)

3.2. Properties of the BGPD

1: If $(X, Y) ~ B G P D$ , then $X$ and $Y$ are stochastically independent if and only if $m_{1} = m_{2} = 0$ .

2: If $(X, Y) ~ B G P D$ , then $X$ and $Y$ cannot be perfectly correlated.

Proof. The quality of $ρ^{2}$ , given in (10), to unity, will imply that ${(1 - m_{1} θ_{1} - m_{2} θ_{2})}^{2} = 0$ . This is not true unless $m_{1} θ_{1} + m_{2} θ_{2} = 1$ , which is a contradiction to the condition enforced by the strict inequality $0 < m_{1} θ_{1} + m_{2} θ_{2} < 1$ .

3: If $(X_{i}, Y_{i}), i = 1, 2, \dots, n$ is a random sample taken from the BGPD family, the joint probability distribution of the sums $Z_{1} = X_{1} + X_{2} + \dots + X_{n}$ and $Z_{2} = Y_{1} + Y_{2} + \dots + Y_{n}$ is also a GDPD and its probability function takes the form

$\begin{array}{l} P (Z_{1} = z_{1}, Z_{2} = z_{2}) \\ = \frac{n {(n + m_{1} z_{1} + m_{2} z_{2})}^{z_{1} + z_{2} - 1}}{z_{1}! z_{2}!} θ_{1}^{z_{1}} θ_{2}^{z_{2}} \exp [1 (θ_{1} + θ_{2}) (n + m_{1} z_{1} + m_{2} z_{2})] \end{array}$ (17)

4: The marginal distribution of $X$ is the generalized Poisson distribution given by Consul and Jain [8], where their $λ_{1}$ and $λ_{2}$ are taken as $θ_{1}$ and $m_{1} θ_{1}$ respectively. Thus, the marginal probability distribution of $X$ and $Y$ are given respectively in (18) and (19):

$P (X = x) = {(1 + m_{1} x)}^{x - 1} {(x!)}^{- 1} {(θ_{1} e^{- θ_{1}})}^{x} / e^{θ_{1}}$ (18)

Similarly

$P (Y = y) = {(1 + m_{1} y)}^{y - 1} {(y!)}^{- 1} {(θ_{2} e^{- θ_{2}})}^{y} / e^{θ_{2}}$ (19)

5: The regression equation of $X$ on $Y$ is given by

$E (Y / x) = {\begin{array}{l} θ_{2} {(1 - m_{2} θ_{2})}^{- 1} & x = 0 \\ θ_{2} {(1 - m_{2} θ_{2})}^{- 1} + θ_{2} m_{1} {(1 - m_{2} θ_{2})}^{- 1} x & x ≽ 0 \end{array}$

3.3. Score Test on the Hypothesis of Independence (H₀:m = 0)

In the subsequent analyses we shall assume that $m_{1} = m_{2} = m$ . Let $(x_{i}, y_{i}), i = 1, 2, \dots, n$ be an srs. The likelihood is given by:

$L = \prod_{i = 1}^{n} P (x_{i}, y_{i}),$

The log-likelihood function is:

$\begin{array}{l} ℓ = \log L = \sum_{i = 1}^{n} [(x_{i} + y_{i} - 1) \log (1 + m (x_{i} + y_{i}))] \\ + n \bar{x} \log θ_{1} + n \bar{y} \log θ_{2} - (θ_{1} + θ_{2}) \sum_{i = 1}^{n} [1 + m (x_{i} + y_{i})] \end{array}$ (20)

Moreover denote the sample means by: $\bar{x} = \sum_{i = 1}^{n} x_{i} / n$ and $\bar{y} = \sum_{i = 1}^{n} y_{i} / n$ .

Differentiating the log-likelihood function (20) with respect to m, the score function ${\frac{\partial l}{\partial m} |}_{m = 0}$ is given by:

$U = {\frac{\partial l}{\partial m} |}_{m = 0} = \sum_{j = 1}^{n} (x_{i} + y_{i}) (x_{i} + y_{i} - 1) - n (\bar{x} + \bar{y}) (θ_{1} + θ_{2})$

Replacing $(θ_{1} + θ_{2})$ with their MLE under $H_{0}$ ; $m = 0$ , we have

$U = \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2} + \sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2} + 2 \sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y}) - n (\bar{x} + \bar{y})$ (21)

Under $H_{0} = m = 0$ , from (21) we have:

$E [U] = 0$

Since

$\begin{matrix} u = n S_{x}^{2} + n S_{y}^{2} - n (\bar{x} + \bar{y}) \\ = n (S_{x}^{2} - \bar{x}) + n (S_{y}^{2} - \bar{y}) \end{matrix}$

$E [u] = n μ_{20} + n μ_{02} - n ({μ^{'}}_{10} + {μ^{'}}_{01})$

And under $H_{0} = m = 0$

$μ_{20} = θ_{1}, μ_{02} = θ_{2}$

${μ^{'}}_{10} = θ_{1}, {μ^{'}}_{01} = θ_{2}$

Therefor, $E [u] = 0$ , and the variance of $u$ under the null hypothesis is:

$Var (u) = V 0 = n^{2} [{(S^{2} x - μ_{20})}^{2}] + n^{2} E [{(S^{2} y - μ_{02})}^{2}] + 2 n cov (\bar{x}, S^{2} x)$

$Var (u) = 2 n (θ_{1}^{2} + θ_{2}^{2})$

Hence

$\begin{matrix} z^{'} = \frac{u^{2}}{var (u)} ~ x_{(1)}^{2} \\ = n^{2} [S^{2} x + S^{2} y - \bar{x} - \bar{y}] / 2 n (θ_{1}^{2} + θ_{2}^{2}) \\ = \frac{n ([(S^{2} x - \bar{x}) + (S^{2} y - \bar{y})]) 2}{2 ({\bar{x}}^{2} + {\bar{y}}^{2})} \end{matrix}$ (21)

as $n \to \infty$ the distribution of the test statistic (21) follows Chi-square distribution with one degree of freedom.

To evaluate the asymptotic power of the score test we need to find the non-null asymptotic distribution of $u$ .

When the null hypothesis does not hold, the expected value of, denoted by $M_{1} = E [u | m \neq 0]$ , is such that:

$\frac{M_{1}}{n} = (μ_{20} + μ_{02}) - ({μ^{'}}_{0} + {μ^{'}}_{01})$ (22)

Moreover, denoting the variance of the test statistics $u$ under the non-null distribution by

$V_{1} = var (u | m \neq 0)$ , then

$V_{1} = n D$ , where,

$\begin{matrix} D = (μ_{40} - μ_{20}^{2}) + (μ_{20} - 2 μ_{30}) + (μ_{04} - μ_{02}^{2}) + (μ_{02} - 2 μ_{03}) \\ + 2 [μ_{22} - μ_{12} - μ_{21} - μ_{11}] \end{matrix}$ (23)

An expression for the estimated sample size needed to verify the hypothesis of independence for given type I error rate and power 1-β is shown to be:

$n = \frac{{[Z_{β} \sqrt{D} + Z_{α} \sqrt{2 (θ_{1}^{2} + θ_{2}^{2})}]}^{2}}{M_{1}^{2}}$ (24)

For example, in (24) given 5% type error rate $Z_{α} = 1.64$ , and power 80%, $Z_{β} = 0.84$ .

In the Appendix we give the algebraic expressions of the elements of D as functions of the population parameters.

3.4. Confidence Interval on the Ratio of BGPD Means

1) Delta Method

Let $R = {μ^{'}}_{10} / {μ^{'}}_{01}$ denote the ratio of the mean of X to the mean of Y, The point estimator $\hat{R} = \bar{x} / \bar{y}$ . From Kendall [7], we have to the first order of approximation:

$Var (\hat{R}) = n^{- 1} [μ_{20} / {({μ^{'}}_{01})}^{2} + μ_{02} {({μ^{'}}_{10} / {μ^{'}}_{01}^{2})}^{2} + 2 μ_{11} ({μ^{'}}_{10} / {μ^{'}}_{01}^{3})]$ (25)

The (1 − α) 100% confidence interval on R is therefore given by:

$\hat{R} \pm Z_{α / 2} Var {(\hat{R})}^{1 / 2}$ (26)

2) Feiller’s Interval

Let $δ = \bar{x} - R \bar{y}$

$Var (δ) = \frac{1}{n} [μ_{20} - 2 R μ_{11} + R^{2} μ_{02}]$

Hence, the

$1 - α = P_{r} [\frac{δ}{\sqrt{V (δ)}} < Z_{α / 2}]$

$1 - α = P_{r} [α^{2} \leq V (α) Z_{α / 2}^{2}] = P_{r} [{(\bar{x} - R \bar{y})}^{2} \leq V (α) Z_{α / 2}^{2}]$

A point estimator of $Var (δ)$ is given by:

$\hat{V} = n^{- 1} [S_{x}^{2} - 2 R \hat{ρ} S_{x} S_{y} + R^{2} S_{y}^{2}]$

Therefore,

$1 - α = P_{r} [a R^{2} - R b + c]$

where

$a = {\bar{y}}^{2} - Z_{α / 2}^{2} S_{y}^{2} / n$

$b = 2 (\bar{x} \bar{y} + Z_{α / 2}^{2} \hat{ρ} S_{x} S_{y} / n)$

and

$c = {\bar{x}}^{2} - Z_{α / 2}^{2} S_{y}^{2} / n$

Solving the quadratic in R we get

$R \in (A, B)$ , where

$A \equiv \frac{1}{2 a} [- b + \sqrt{b^{2} - 4 a c}]$ (27)

$B \equiv \frac{1}{2 a} [- b - \sqrt{b^{2} - 4 a c}]$ (28)

In the data analysis section, we shall compare the above intervals to the non-parametric bootstrap confidence interval.

4. Application: Modeling Metabolic Syndrome in Spousal Pairs Using GDPD

The metabolic syndrome is the co-aggregation of hypertension, impaired glucose tolerance, dyslipidemia, and abdominal obesity and is associated with an increased risk of total and cardiovascular mortality in adults [9] [10]. Genetics as well as environmental influences have been implicated in obesity and several cardiovascular risk factors [11] [12]. Family is one of the most important factors affecting metabolic risk factors in children, in that family displays an interaction between genetic and shared environmental factors [13] [14]. Recent research showed that childhood and adolescent overweight has been increasing in Asian countries due to urbanization and economic development. For example, over the past 10 years, the rates of overweight among Korean children and adolescents aged 5 - 20 years have doubled [15], which may ultimately cause an increase in adverse cardiovascular outcomes. Globally, the prevalence of the metabolic syndrome is high among obese children and adolescents, and increases with increasing obesity [16]. The widely used definition of metabolic syndrome is that of the World Health Organization [17] [18]. The components of each definition and criteria for making the diagnosis of the metabolic syndrome are summarized in Table 1.

The abbreviations of the medical official bodies given in Table 1 are:

IDF ≡ International Diabetes Federation, WHO ≡ World Health Organization, EGIR ≡ European Group for the Study of Insulin Resistance, and NCEP ≡ National Cholesterol Education Program.

Table 1. Definition of the components of metabolic syndrome [Journal of the Royal Society of Medicine Vol. 99 September 2006].

Component	IDF		WHO		EGIR		NCEP
Component	M	F	M	F	M	F	M	F
Central Obesity	≥102	≥88	≥102	≥88	≥94	≥80	≥102	≥88
Raised TG	≥1.7	≥1.7	≥1.7	≥1.7	≥2.0	≥2.0	≥1.7	≥1.7
Low HDL	<1.03	<1.29	≤.9	≤1	≤1	≤1	≤1.03	≤1.29
Hypertension	≥130/ 85	≥130/ 85	≥140/ 90	≥140/ 90	≥140/ 90	≥140/ 90	≥130/85	≥130/ 85
Fasting Glucose	≥5.6	≥5.6	≥6.1	≥6.1	≥6.1	≥6.1	≥6.1	≥6.1

The Metabolic syndrome data:

Based on a random sample of 4000 spousal pairs, the female and male counts are cross classified in Table 2 and Table 3.

The test independence had a value of U= 7.027, with a corresponding p-value= 2.11e−12.

Therefore the hypothesis of independence is not supported by the metabolic

Table 2. Cross classification of the counts of components of MS in spousal pairs.

		Female count					Total
		0.00	1.00	2.00	3.00	4.00
Male count	0.00	2337	0	0	0	0	2337
	1.00	1228	0	0	0	0	1228
	2.00	0	360	0	0	0	360
	3.00	0	0	44	12	0	56
	4.00	0	0	0	7	12	19
Total		3565	360	44	19	12	4000

Table 3. Summary statistics of counts.

	N	Mean	Std. Deviation	Variance
Male count	4000	0.5480	0.75421	0.569
Female count	4000	0.1382	0.45353	0.206

syndrome data.

To fit the model to the data we shall use the method of moments to estimate the three population parameters. We shall use the notations $μ_{10}^{*}$ & $μ_{01}^{*}$ to respectively denotes the sample means of x & y, while $μ_{11}^{*}$ denotes the sample covariance between x & y.

$θ_{1}^{*} = \frac{μ_{10}^{*}}{1 + m^{*} (μ_{10}^{*} + μ_{10}^{*})}$ (29)

$θ_{2}^{*} = \frac{μ_{01}^{*}}{1 + m^{*} (μ_{10}^{*} + μ_{01}^{*})}$ (30)

$m^{* 2} (μ_{10}^{*} + μ_{01}^{*}) μ_{10}^{*} μ_{01}^{*} + 2 m^{*} μ_{10}^{*} μ_{01}^{*} - μ_{11}^{*} = 0$ (31)

From Equations (29)-(31), the admissible moment estimator for $m$ is.

$m^{*} = \frac{{[{(μ_{10}^{*} μ_{01}^{*})}^{2} + (μ_{10}^{*} + μ_{01}^{*}) μ_{11}^{*} μ_{10}^{*} μ_{01}^{*}]}^{1 / 2} - μ_{10}^{*} μ_{01}^{*}}{(μ_{10}^{*} + μ_{01}^{*}) μ_{10}^{*} μ_{01}^{*}}$ (32)

The moments estimators of $θ_{1}$ & $θ_{2}$ are given respectively by $θ_{1}^{*}$ & $θ_{2}^{*}$ in Equations (33) and (34).

$θ_{1}^{*} = \frac{μ_{10}^{* 2} μ_{01}^{*}}{{[{(μ_{10}^{*} μ_{01}^{*})}^{2} + (μ_{10}^{*} + μ_{01}^{*}) μ_{11}^{*} μ_{10}^{*} μ_{01}^{*}]}^{1 / 2}}$ (33)

$θ_{2}^{*} = \frac{μ_{01}^{* 2} μ_{10}^{*}}{{[{(μ_{10}^{*} μ_{01}^{*})}^{2} + (μ_{10}^{*} + μ_{01}^{*}) μ_{11}^{*} μ_{10}^{*} μ_{01}^{*}]}^{1 / 2}}$ (34)

From the metabolic syndrome data we have, for females, $θ_{1}^{*} = 0.294$ , and for male, $θ_{2}^{*} = 0.074$ , and $m^{*} = 1.262$ . The familiar chi-square goodness of fit, with 4 degrees of freedom had a p-value = 0.141. This shows that the model gives a reasonable fit to the data.

The following is a short R code from which we evaluate the nonparametric confidence interval and assess the large sample distribution of the ratio of correlated Poisson means.

R-CODE for bootstrapping the confidence interval on the ratio of means

x<-data$Male_count

y<-data$Female_count

df<-data.frame(x,y)

library(boot)

boot_df=boot(df,function(i,d)mean(i[d] [1])/mean(i[d] [2]),R=999)

boot_df

Bootstrap Statistics :

original bias std. error

ratio 3.964 0.0163 0.148

It is clear from Figure 1 that the quantile plot of the sample ratio of means follows a normal distribution (Table 4).

Figure 1. Histogram and normal quantile plot for 1000 bootstrap replicates of the estimated ratio of means.

Table 4. The upper and lower limits of the 95% confidence interval on the ratio of means for the three methods (delta, Feiller’s, and bootstrap).

Limits	Delta method	Feiller’s	Bootstrap
Lower limit	3.417	3.467	3.674
Upper limit	4.514	4.575	5.254

5. Concluding Remarks

In this article, we discussed some of the interesting applications of a bivariate generalized Poisson distribution. The model has three parameters and has provided satisfactory fit to the metabolic syndrome data collected from spousal pairs. The common parameter (m) captures overdispersion and accurately accounts for the strength of the positive correlation between the two variables.

It is worth noting that this family of models, which is a member of the Lagrange family of bivariate distributions, is suitable for many applications, and in particular in situation where queuing theory is applicable. For example, in large tertiary hospital outpatients departments, the patients usually wait for admission form queues. If we have two types of queues (e.g. males and females), then the random variables would represent the numbers of admitted patients during a busy period of the admission office (single server queue). We also note that the marginal probability distributions have been applied to a variety of data from genomics, and case fatality due to COVID [19].

Finally, it is worth noting that with the not so complicated structure of the model, it can be shown that it belongs to the class of bivariate of generalized linear models where inclusion of covariates will make this model attractive alternative in many applications.

Appendix: Higher Order Cross-Product Moments

On using the recurrence relationship in (6) we can show that: Denote

$Δ = (1 - m θ_{1} - m θ_{2})$

$μ_{11} = \frac{m θ_{1} θ_{2} (1 - m θ_{1} - m θ_{2})}{{(1 - m θ_{1} - m θ_{2})}^{3}}$

$\begin{matrix} μ_{12} = \frac{1}{Δ^{4}} [m θ_{1} θ_{2}^{2} (1 - m θ_{2}) - m θ_{1} θ_{2} (1 - m θ_{1}) (1 - m θ_{2}) \\ + m θ_{1} θ_{2} {(1 - m θ_{1})}^{2} + 2 m^{3} θ_{1}^{2} θ_{2}^{2}] \\ + \frac{1}{Δ^{5}} [3 m^{4} θ_{1}^{2} θ_{2}^{3} + 3 m^{2} θ_{1} θ_{2}^{2} {(1 - m θ_{1})}^{2} + 3 m^{3} θ_{1}^{2} θ_{2}^{2} (1 - m θ_{2}) \\ + 3 m θ_{1} θ_{2} {(1 - m θ_{1})}^{2} (1 - m θ_{2})] \end{matrix}$

$\begin{matrix} μ_{22} = E [{(x - {μ^{'}}_{10})}^{2} {(y - {μ^{'}}_{01})}^{2}] \\ = \frac{θ_{1}}{1 - m θ_{1} - m θ_{2}} {[(1 - m θ_{2}) \frac{\partial μ_{12}}{\partial θ_{1}} + m θ_{2} \frac{\partial μ_{12}}{\partial θ_{2}}] \\ + μ_{02} [\frac{{(1 - m θ_{2})}^{2}}{{(1 - m θ_{1} - m θ_{2})}^{2}} + \frac{m^{2} θ_{1} θ_{2}}{{(1 - m θ_{1} - m θ_{2})}^{2}}] \\ + 2 μ_{11} [\frac{m θ_{2} (1 - m θ_{2})}{{(1 - m θ_{1} - m θ_{2})}^{2}} + \frac{m θ_{2} (1 - m θ_{1})}{{(1 - m θ_{1} - m θ_{2})}^{2}}]} \end{matrix}$

where

$\begin{matrix} \frac{\partial μ_{12}}{\partial θ_{1}} = \frac{1}{Δ^{4}} [m θ_{2}^{2} (1 - m θ_{2}) - m θ_{2} (1 - m θ_{2}) (1 - 2 m θ_{1}) \\ + m θ_{2} [{(1 - m θ_{1})}^{2} - 2 m^{2} θ_{1} θ_{2} (1 - m θ_{1}) + 4 m^{3} θ_{1} θ_{2}^{2}] \\ + \frac{1}{Δ^{5}} [4 m^{2} θ_{1}^{2} θ_{2} (1 - m θ_{2}) - 4 m^{2} θ_{1} θ_{2} (1 - m θ_{1}) \times (1 - m θ_{2}) \\ + 4 m^{2} θ_{1} θ_{2} {(1 - m θ_{1})}^{2} + 8 m^{4} θ_{1}^{2} θ_{2}^{2} + 6 m^{4} θ_{1} θ_{2}^{3} + 3 m^{2} θ_{2}^{2} {(1 - m θ_{1})}^{2} \\ - 6 m^{3} θ_{1} θ_{2}^{2} (1 - m θ_{1}) + 6 m^{3} θ_{1} θ_{2}^{2} (1 - m θ_{2}) + 3 m θ_{2} {(1 - m θ_{1})}^{2} (1 - m θ_{2}) \\ - 6 m^{2} θ_{1} θ_{2} (1 - m θ_{1}) (1 - m θ_{2})] + \frac{5 m}{Δ^{6}} [3 m^{4} θ_{1}^{2} θ_{2}^{3} + 3 m^{2} θ_{1} θ_{2}^{2} {(1 - m θ_{1})}^{2} \\ + 3 m^{3} θ_{1}^{2} θ_{2}^{2} (1 - m θ_{2}) + 3 m θ_{1} θ_{2} {(1 - m θ_{1})}^{2} (1 - m θ_{2})] \end{matrix}$

and

$\begin{matrix} \frac{\partial μ_{12}}{\partial θ_{2}} = \frac{1}{Δ^{4}} [2 m θ_{1} θ_{2} (1 - m θ_{2}) - m^{2} θ_{1} θ_{2}^{2} - m θ_{1} (1 - m θ_{1}) (1 - m θ_{2}) \\ + m^{2} θ_{1} θ_{2} (1 - m θ_{1}) + m θ_{1} {(1 - m θ_{1})}^{2} + 4 m^{3} θ_{1}^{2} θ_{2} + 4 m^{3} θ_{1} θ_{2}^{2}] \\ + \frac{1}{Δ^{5}} [6 m^{4} θ_{1} θ_{2}^{3} + 3 m^{2} θ_{2}^{2} {(1 - m θ_{1})}^{2} - 6 m^{3} θ_{2}^{2} θ_{1} (1 - m θ_{1}) \\ + 6 m^{3} θ_{1} θ_{2}^{2} (1 - m θ_{2}) + 3 m θ_{2} {(1 - m θ_{1})}^{2} (1 - m θ_{2}) \end{matrix}$

$\begin{matrix} - 6 m^{2} θ_{1} θ_{2} (1 - m θ_{1}) (1 - m θ_{2})] + \frac{5 m}{Δ^{6}} [3 m^{4} θ_{1}^{2} θ_{2}^{3} + 3 m^{2} θ_{1} θ_{2}^{2} {(1 - m θ_{1})}^{2} \\ + 3 m^{2} θ_{1}^{2} θ_{2}^{2} (1 - m θ_{2}) + 3 m θ_{1} θ_{2} {(1 - m θ_{1})}^{2} (1 - m θ_{2})] \end{matrix}$

$μ_{30} = \frac{m^{3} θ_{1}^{2} θ_{2} + θ_{1} {(1 - m θ_{2})}^{3}}{Δ^{4}} + \frac{3 m^{3} θ_{1}^{3} θ_{2} + 3 m θ_{1}^{2} {(1 - m θ_{2})}^{2}}{Δ^{5}}$

And from the recurrence relation, we have:

$μ_{40} = \frac{θ_{1} (1 - m θ_{2})}{Δ} \cdot \frac{\partial μ_{30}}{\partial θ_{1}} + \frac{m θ_{1} θ_{2}}{Δ} \cdot \frac{\partial μ_{30}}{\partial θ_{2}} + 3 μ_{20} [\frac{{(1 - m θ_{2})}^{2}}{Δ^{2}} + \frac{m^{2} θ_{1} θ_{2}}{Δ^{2}}]$

The $μ_{40}$ depends on the:

$\frac{\partial μ_{30}}{\partial θ_{1}}$ & $\frac{\partial μ_{30}}{\partial θ_{2}}$

We can show that:

$\begin{matrix} \frac{\partial μ_{30}}{\partial θ_{1}} = \frac{1}{Δ^{4}} [2 m^{2} θ_{1} θ_{2} + {(1 - m θ_{2})}^{3}] \\ + \frac{1}{Δ^{5}} [13 m^{3} θ_{1}^{2} θ_{2} + 6 m θ_{1} {(1 - m θ_{2})}^{2} + 4 m θ_{1} {(1 - m θ_{2})}^{3}] \\ + \frac{1}{Δ^{6}} [[15 m^{4} θ_{1}^{3} θ_{2} + 15 m^{2} θ_{1}^{2} {(1 - m θ_{2})}^{2}] \end{matrix}$

$\begin{matrix} \frac{\partial μ_{30}}{\partial θ_{2}} = \frac{1}{Δ^{4}} [m^{2} θ_{1}^{2} - 3 m θ_{1} {(1 - m θ_{2})}^{2}] \\ + \frac{1}{Δ^{5}} [3 m^{3} θ_{1}^{3} - 6 m^{2} θ_{1}^{2} (1 - m θ_{2}) + 4 m^{3} θ_{1}^{2} θ_{2} + 4 m θ_{1} {(1 - m θ_{2})}^{3}] \\ + \frac{1}{Δ^{6}} [15 m^{4} θ_{1}^{3} θ_{2} + 15 m^{2} θ_{1}^{2} {(1 - m θ_{2})}^{2}] \end{matrix}$

It should also be noted that $μ_{03}$ and $μ_{04}$ can be obtained on exchanging $θ_{1}$ by $θ_{2}$ in $μ_{30}$ and $μ_{40}$ and visa-versa.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1]	Winkelmann, R. (2008) Econometric Analysis of Count Data. Springer Science & Business Media.
[2]	Cameron, A.C. and Trivedi, P.K. (2013) Regression Analysis of Count Data. 2nd Edition, Cambridge University Press.[CrossRef]
[3]	Tzougas, G. and di Cerchiara, A.P. (2021) Bivariate Mixed Poisson Regression Models with Varying Dispersion. North American Actuarial Journal, 27, 211-241.[CrossRef]
[4]	Shenton, L.R. and Consul, P.C. (1973) On Bivariate Lagrange and Borel-Tanner Distributions and Their Use in Queuing Theory. Sankhya, 35, 229-236.
[5]	Jain, G.C. and Singh, N. (1975) On Bivariate Power Series Distributions Associated with Lagrange Expansion. Journal of the American Statistical Association, 70, 951-954.[CrossRef]
[6]	Shoukri, M.M. (1982) On the Generalization and Estimation for the Double Poisson Distribution. Trabajos de Estadistica y de Investigacion Operativa, 33, 97-109.[CrossRef]
[7]	Kendall, M. and Ord, K. (1987) Advanced Theory of Statistics. 5th Edition, Griffen and Company.
[8]	Consul, P.C. and Jain, G.C. (1973) A Generalization of the Poisson Distribution. Technometrics, 15, 791-799.[CrossRef]
[9]	Laaksonen, D.E. (2002) Metabolic Syndrome and Development of Diabetes Mellitus: Application and Validation of Recently Suggested Definitions of the Metabolic Syndrome in a Prospective Cohort Study. American Journal of Epidemiology, 156, 1070-1077.[CrossRef] [PubMed]
[10]	Lakka, H. (2002) The Metabolic Syndrome and Total and Cardiovascular Disease Mortality in Middle-Aged Men. Journal of the American Medical Association, 288, 2709-2716.[CrossRef] [PubMed]
[11]	Carmelli, D., Cardon, L.R. and Fabsitz, R. (1994) Clustering of Hypertension, Diabetes, and Obesity in Adult Male Twins: Same Genes or Same Environments? American Journal of Human Genetics, 55, 566-573.
[12]	Edwards, K.L., Newman, B., Mayer, E., Selby, J.V., Krauss, R.M. and Austin, M.A. (1997) Heritability of Factors of the Insulin Resistance Syndrome in Women Twins. Genetic Epidemiology, 14, 241-253.[CrossRef]
[13]	Hong, Y., Pederson, N.L., et al. (1997) Genetic and Environmental Architecture of the Features of Insulin Resistance Syndrome. American Journal of Human Genetics, 60, 143-152.
[14]	Lee, K.E., Klein, B.E. and Klein, R. (2003) Familial Aggregation of Components of the Multiple Metabolic Syndrome in the Framingham Heart and Offspring Cohorts: Genetic Analysis Workshop Problem. BMC Genetics, 4, Article No. S94.[CrossRef] [PubMed]
[15]	Park, H.S., Park, J.Y. and Cho, S. (2006) Familial Aggregation of the Metabolic Syndrome in Korean Families with Adolescents. Atherosclerosis, 186, 215-221.[CrossRef] [PubMed]
[16]	Borch-Johnson, K. (2007) The Metabolic Syndrome in a Global Perspective. Danish Medical Bulletin, 54, 157-159.
[17]	Cameron, A.J., Shaw, J.E. and Zimmet, P.Z. (2004) The Metabolic Syndrome: Prevalence in Worldwide Populations. Endocrinology and Metabolism Clinics of North America, 33, 351-375.[CrossRef] [PubMed]
[18]	World Health Organization (1999) Definition, Diagnosis, and Classification of Diabetes Mellitus and Its Complications. Report of a WHO Consultation. Part 1: Diagnosis and Classification of Diabetes Mellitus.
[19]	Al-Gahtani, S., Shoukri, M. and Al-Eid, M. (2021) Predictors of the Aggregate of COVID-19 Cases and Its Case-Fatality: A Global Investigation Involving 120 Countries. Open Journal of Statistics, 11, 259-277.[CrossRef]

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies