Extreme Values Approach in Food Risk Modeling

Abstract

In this paper we estimate the incubation period of a possible pathology following exposure to dioxins during a poor diet. The tools developed for this purpose include the probabilistic extremal model and the stochastic behavior of the distribution tails of the contamination. We propose a cumulative distribution function for a random variable that follows both a Gaussian distribution and a GPD. A global optimization method is also explored for the efficient estimation of parameters of GPD.

Share and Cite:

Adedje, K. and Barro, D. (2022) Extreme Values Approach in Food Risk Modeling. Open Journal of Statistics, 12, 691-710. doi: 10.4236/ojs.2022.125041.

1. Introduction

It happens that food can have the unexpected opposite effect on the body when it is full of dangerous toxic substances. Cardiovascular diseases, diabetes, immune system degradation, cancer, etc. are among the consequences of consuming these foods. These events are qualified to be “extreme” insofar as the primary objective of the food is to maintain the health of the consumer. A good diet is, therefore, an essential element in the management of our health. It must be noted that most products or foods consumed today present a danger or risk of contamination, whether related to their composition or their conservation. This poses a public health problem. Faced with this observation, it noticed that in this work we set up a risk indicator that allows decision-makers to take reliable risk management measures for food and human safety. Increasingly, researchers are taking a particular interest in the modeling of extreme events in many fields, such that hydrology for flood analysis, meteorology, physics, finance, etc. However, we have developed a joint probability density of a random variable following both the generalized Pareto distribution (GPD) and the normal distribution.

In medicine, decision-makers are throwing a lot of money into clinical trials to determine the delay between contamination and the onset of symptoms of a disease in order to circumscribe the pandemic and take appropriate measures. The new estimate of the average incubation period of the SARS-CoV-2 coronavirus, i.e. the time between infection with the virus and the onset of the first symptoms of COVID-19, is based on data from patients who contracted the virus in China or through contact with an infected person. This update of clinical data, proposed in the Annals of Internal Medicine [1] and reported in the New England Journal of Medicine Journal Watch, confirms that most patients who become symptomatic develop the first symptoms within 11 or 12 days and the vast majority within 14 days of infection. This allows health authorities to quarantine infected people to break the chain of contamination. Hence this indicator in health management and safety is needed. Also, a team of researchers from Cornell University (New York) was able to show in the journal eLife that the incubation periods of infected patients are normally distributed. Their study contributed to the understanding of this variability in that many diseases take much longer to develop in some affected individuals than in others using the formalism of evolutionary graph theory.

The paper is organized as follows. Section 2 reminds the materials and methods used for the study within the framework of the extreme values theory and the concept of Optimization Preserving Operator (OPO). Section 3 presents the main results of the study. It provides the estimation of the extreme quantile, the incubation period, and a joint probability density function. Specifically the GPD parameters are estimated using the maximum likelihood method (MLM) and the numerical Alienor method (AM).

2. Materials and Methods

In this section, we remind an overview of extreme value tools, threshold determination, extremal index, and then a survey of the Alienor approach.

2.1. An Overview of Extreme Value Tools

The Peaks-Over-Threshold (POT) approach has the advantage to take into account all the observations that exceed a given threshold u large enough in the sample and not only the maximums per block (BM), see [2] for more details.

Let { X i , i = 1 , 2 , , n } be a sequence of random variables independent and identically distributed (i.i.d) of common law F and density f. According to Gnedenko [3], for X n , n = max 1 i n ( X i ) if there exist two real normalizing sequences ( a n ) n 1 > 0 and ( b n ) n 1 and a non-degenerated law γ , such that:

lim n P ( X n , n b n a n x ) = lim n F n ( a n x + b n ) = γ ( x ) ; (1)

then, is one of the three limits which are the Fréchet type laws, the Weibull type laws, and the Gumbel type laws, for more see in [4]. A parametrization of these three distributions into a single formula called Generalized Extreme Values (GEV) distribution is given by

γ ( x ) = { exp { [ 1 + γ ( x μ σ ) ] 1 γ } si γ 0 exp { exp [ ( x μ σ ) ] } si γ = 0. (2)

where { μ } , { σ > 0 } and { γ } are respectively parameters of location, scale, and shape. Notice that analytically the three extreme distributions are different from a modeling point of view, more details in [5]. In this paper, the POT approach is considered for the adequate estimation of quantiles in the sense that it is easier to obtain an excess sample. If F belongs to the catchment area of the distribution of extreme values (Weibull, Fréchet, Gumbel), then there is a function σ ( u ) strictly positive such as:

l i m u x F sup 0 < Y < x F u | F u ( y ) μ , σ ( u ) , γ ( y ) | = 0 ; (3)

where x F and μ , σ ( u ) , γ ( x ) are the right terminal point and the GPD function, respectively. In particular, in toxicology, the threshold of toxicity noted e ( u ) is derived from medical studies beyond which there may be danger to health. In the framework of our study, the value proposed by FAO/WHO is used for the efficient result. Generally, it is determined graphically by the Mean Excess Plot function under the R software. We call the mean function of the excesses of the random variable X over the threshold u < x F and note e ( u ) , the expectation function of the random variable Y excess of X over the threshold u defined by:

e ( u ) = E ( X u | X > u ) . (4)

The estimation of the GEV parameters, more precisely the estimation of the tail index γ plays a key role in the behavior of the law of extremes. There are two possible approaches for estimating γ . The BM approach uses the asymptotic GEV law and the POT one pointed out by the GPD models. Making the assumption of overestimation of the food risk, we are interested in the GPD approach. We use the maximum likelihood method to estimate the parameter γ of the GPD of excesses, see more (3.4). Most estimators of the tail parameters by GPD approach rely on the use of the order statistic, the i-th value of a sample of size n. The Hill estimator is the most used because it has a smaller asymptotic variance than the others. It is defined by the following statistic:

γ ^ k , n H = 1 k 1 j = 1 k 1 ln ( X j , n X k , n ) ; (5)

and it converges in law to the true value of the parameter for k large enough. A second estimator, the Pickands one is the simple and oldest of the tail estimators. It is defined by the statistic:

γ ^ k , n P = 1 ln 2 ln ( X k , n X 2 k , n X 2 k , n X 4 k , n ) ; (6)

where γ ^ k , n P is the estimation of the tail index which plays an essential role in the behaviour of the law of extremes (refer to [6] for more details).

2.2. A Survey of the Alienor’s Algorithm

The algorithm of the Alienor approach is as follows: the basic idea is to transform n-variable likelihood function into a single variable function θ , and the OPO, then, be used to obtain the global optimum (see [7] for more details). Let us consider n independent random variables X 1 , X 2 , , X n that are assumed to be identically distributed according to a given distribution F ( x , θ ) , where θ is the vector of parameters, with density function f ( x , θ ) . Let x 1 , x 2 , , x n be the realizations of these random variables which form the observed sample.

Step 1: definition of the objective function.

Let L be the likelihood function defined by

L : n + 3 [ 0,1 ]

( x 1 , x 2 , , x n , u , γ , σ ) i f ( x i , u , γ , σ )

we look for the values of u , γ and σ that maximize L. Given x i , we consider the likelihood L as a function of three variables u , γ and σ . By applying the following reductive transformation on the likelihood function L;

{ h 1 ( θ ) = u ( θ ) = ( min x i a ) × θ + a ; h 2 ( θ ) = γ ( θ ) = 1 2 ( 1 cos ( m × π θ ) ) ; h 3 ( θ ) = σ ( θ ) = 1 2 ( 1 cos ( m 2 × π θ ) ) ; m = 2 ; 3 , (7)

where u [ a , min x i ] , γ , σ [ 0, + ) and a = 1 m 2 , x denotes observations,

The problem becomes

G l o b . max u , γ , σ L ( u , γ , σ ) .

u [ 0, min x i ] , γ , σ [ 0, + ) ;

where the notation Glob. max means the maximum global. The problem of estimating parameters of the GPD distribution is considered as a global optimization problem.

We have a maximization problem that can be transformed into a minimization problem as follows

max θ [ 0,1 ] L ( θ ) = min θ [ 0,1 ] ( L ( θ ) ) ,

with

L : [ 0,1 ] [ 1,0 ]

θ L ( θ ) .

The sequence of steps in the algorithm is as follows:

Initialization

one randomly chooses θ 0 in [ 0 , θ max ] then,

We build the function

T L * ( θ ) = 1 2 [ L * ( θ ) L * ( θ 0 ) + | L * ( θ ) L * ( θ 0 ) | ] . (8)

Step 2: Solving algebraic equation

one solves T L * ( θ ) = 0 then,

The solution set is built

S L = { θ I , T L * ( θ ) = 0 } (9)

If S L = { θ } , then

θ is the global minimizer of L;

Otherwise go to the next step.

Step 3: modification of step i

one chooses θ i in S L = { θ I , T L * ( θ ) = 0 and L ( θ i ) < L ( θ i 1 ) } ;

Then

the function T L * ( θ ) = 1 2 [ L * ( θ ) L * ( θ i ) + | L * ( θ ) L * ( θ i ) | ] ,

Then,

one starts again at step 1;

The following section deals with our results.

3. Main Results

In estimating the parameters of the GPD and GEV distribution, the most popular and conditionally most efficient method is the maximum likelihood method. However, this method is biased towards small sample sizes. To overcome this drawback we propose an approach based on a global optimization method so called Alienor method.

3.1. Estimation of Extreme Quantile and Incubation Period

In this subsection, the estimation of extreme quantiles and the incubation period for chronic dioxin exposure are discussed. Let X 1 , X 2 , , X n be a sample of i.i.d random variables with distribution function F and x F a terminal point. Then, for a fixed threshold u < x F , consider the n observations x i ,1 , , x i , n exceeding the threshold u large enough. Let Y j = X i j u , j = 1 , , n , be the excess variables above the threshold u. Recall that the law of excesses beyond a sufficiently high u is:

F u ( y ) = P ( X u y / X > u ) = F ( u + y ) F ( u ) F ¯ ( u ) ; (10)

where F ¯ is the survival function of F given by F ¯ ( u ) = 1 F ( u ) .

Proposition 1. Let F be the common distribution of a sequence of i.i.d random ( X i ) i . Then, for all y , u > 0

F ¯ ( u + y ) = F ¯ ( u ) F ¯ u ( y ) ; (11)

where F ¯ is the GEV distribution of ( X i ) i

Proof. It is easy to see that

F u ( y ) F ¯ ( u ) = F ( u + y ) F ( u ) = 1 + F ( u + y ) + 1 F ( u ) = [ 1 F ( u + y ) ] + [ 1 F ( u ) ] = F ¯ ( u + y ) + F ¯ ( u ) .

Furthermore, it follows that

F ¯ ( u + y ) = F ¯ ( u ) F u ( y ) F ¯ ( u ) = F ¯ ( u ) [ 1 F u ( y ) ] = F ¯ ( u ) F ¯ u ( y ) . (12)

Application 1. By using (3) it comes that

F u ( y ) μ , σ ( u ) , γ ( y ) and F ¯ u ( y ) ¯ μ , σ ( u ) , γ ( y ) . (13)

For u large enough and y 0 . Therefore, an empirical estimator of F ¯ ( u ) is

F ¯ ^ ( u ) = 1 n i = 1 n I { X i > u } = N u n ; (14)

where I { X i > u } is an indicator function defined as

I { X i > u } = { 1 if X i > u for all i = 1 , , n 0 else (15)

and N u is the number of excesses above the threshold u. Notice that the law of F u excesses can be uniformly approximated by a generalized Pareto distribution defined as follows,

x > 0 and 1 + γ x σ > 0 , F u ( x ) = 1 ( 1 + γ x σ ) 1 γ forall γ 0. (16)

By using (11) and by posing F ¯ ( u + y ) = F ¯ ( x ) , and considering that F ¯ u = 1 F u , the form of the tail estimator is

F ¯ ^ ( x ) = N u n ( 1 + γ x σ ) 1 γ . (17)

Since the upper part of the tail is estimated of F ¯ , in addition the restriction of the domain of Fréchet and by inversion, the extreme quantile estimator x p t is obtained by the following Formula (see [8] for a new approach)

x ^ p t G P D = u + σ ^ γ ^ [ ( n N u ( 1 p t ) ) γ ^ 1 ] , for all γ ^ 0. (18)

1 − pt is equivalent to a confidence level of typically 95% or 99%, n is the sample size; N u is the number of observations above the threshold u; γ ^ and σ ^ are the estimators of the parameters of the GPD distribution. We, therefore, estimate the incubation period for the extreme value distribution GPD by:

T ^ G P D = n N d 1 1 γ ^ , σ ^ ( x p t ) , for all γ ^ , σ ^ > 0. (19)

3.2. Estimation of a Joint Probability Density Function

The calibration of the female contamination data shows that the distribution of the observations simultaneously follows the Laplace-Gauss law and the G D law. It is therefore necessary to reflect on the search for a joint law allowing a better appreciation of the phenomena. A joint probability density function based on the composite density function is a new way in modeling of the dependence of a random vector.

Proposition 2. Suppose that X and Y be two random variables from G D ( γ , σ 1 ) and ( m , σ 2 ) , respectively. Under some convergence conditions, the joint random variable U converges almost surely to the law whose probability density function is defined by:

g ( u ) = 1 u e π u for all u > 0. (20)

where u = ( y m σ 2 ) 2 , m the mean and σ 2 the standard deviation of the random variables Y.

Proof. Let X be a GPD random variable whose bi-parametric density [9] is given by

f X ( x ) = 1 σ 1 e ( 1 γ 1 ) ln ( 1 + γ σ 1 x ) forall x 0. (21)

According to the relation (21), we assume that

f X f Y ( y ) = f X [ f Y ( y ) ] ;

where f Y ( y ) = 1 σ 2 2 π e 1 2 ( y m σ 2 ) 2 is probability density function of the normal distribution. It follows that, for all y

f X f Y ( y ) = 1 σ 1 e ( 1 γ 1 ) ln ( 1 + γ σ 1 ( 1 σ 1 σ 2 2 π e 1 2 ( y m σ 2 ) 2 ) ) .

By setting z = y m σ 2 , it comes that

f Z ( z ) = 1 σ 1 ( 1 + γ σ 1 σ 2 2 π e 1 2 z 2 ) 1 + 1 γ .

Let’s set now γ = σ 1 σ 2 2 π , it is easy to find that,

1 + e 1 2 z 2 = 2 e 1 4 z 2 cosh ( 1 4 z 2 ) .

Finally, on obtains, for all z

f Z ( z ) = [ 1 2 σ 1 cosh ( 1 4 z 2 ) e 1 4 z 2 ] 1 + 1 γ p . s e 1 4 ( 1 + 1 γ ) z 2 .

Then, notice that for all z ,

cosh ( 1 4 z 2 ) 1 1 2 σ 1 cosh ( 1 4 z 2 ) 1 2 σ 1 < 1.

By setting ξ = 1 2 ( 1 + 1 γ ) , one has f ( z ) = e 1 2 ξ z 2 . Let’s now, show for which value of ξ , f ( z ) is a probability density function.

+ e 1 2 ξ z 2 d z = 2 0 + e 1 2 ξ z 2 d z . (22)

Let’s perform the following change of variable, by posing

t = 1 2 ξ z 2 z = 2 ξ t d z = d t 2 ξ t .

The relation (22) becomes

2 0 + e t d t 2 ξ t = 2 2 ξ 0 + t 1 2 e t d t = 2 2 ξ Γ ( 1 2 ) = 2 2 ξ π ;

Γ is gamma function. So, f is a probability density function when

ξ = 2 π et f Z ( z ) = e π z 2 .

The proof is completed by substituting u = z 2 , the probability density function of u is obtained by the following

If u > 0 ,

F ( u ) = ( U u ) = ( Z 2 u ) .

F is the cumulative distribution function. Which yields

F ( u ) = ( u Z u ) = u u e π z 2 d z .

Thus,

F ( u ) = 2 0 u e π z 2 d z and F ( u ) = 0 if u < 0.

The derivative of the cumulative distribution function is given by the probability density function as follows

g ( u ) = F ( u ) = F ( u ) u .

Recalling the following result

u ( 0 w ( u ) g ( x ) d x ) = w ( u ) g [ w ( u ) ] .

This leads to the desired result.

The previous proposition is interesting in practice because it allows to obtain the joint probability density resulting from a convex combination of G D and Laplace-Gauss law. This gives us the following statement.

Proposition 3. Let X be a random variable with a joint probability density from GPD and a Gaussian distribution. Then, the expectation and the variance are given respectively

E ( X ) = 1 2 π and V ( X ) = 1 2 π 2 . (23)

Proof. According to the above proposition 2, one has,

g ( x ) = { 1 x e π x if x > 0 0 else (24)

E ( x ) = + x g ( x ) d x = 0 + x × 1 x e π x d x

By setting u = π x , which leads to the following result

E ( x ) = π π 2 0 + u × 1 u e u d u = π π 2 0 + u 3 2 1 e u d u

We finally obtain

E ( x ) = π π 2 Γ ( 3 2 ) = π π 2 × 1 2 × π = 1 2 π

where Γ is the gamma function. Knowing that

V ( X ) = E ( X 2 ) ( E ( X ) ) 2 , (25)

one has

E ( X 2 ) = 0 + x 2 × 1 x e π x d x .

Proceeding as above, one obtains

E ( X 2 ) = π π 3 0 + u 2 × 1 u e u d u = π π 3 0 + u 5 2 1 e u d u

One finally obtains

E ( X 2 ) = π π 3 Γ ( 5 2 ) = π π 3 × 3 2 × 1 2 × π = 3 4 π 2 .

Finally, an elementary calculation allows us to obtain the following result

V ( X ) = 1 2 π 2 .

Remark. The Fisher dispersion index of the resulting density function is less than 1. Such a variation is referred to as underdispersion.

Based on this remark, we propose a density of probability that fits under-dispersed data.

3.3. The Poisson Mixture Law

The occurrence of certain phenomena may follow a Poisson law with parameter λ considered as a random variation. We will show that the Poisson law mixed with the joint law obtained previously gives an over-dispersed law. Since the prevalence of a condition differs from one stratum to another in a population, we arrive at the following definition.

Definition 1. Let X λ be a Poisson random variable of parameter λ which is the realization of a random variable u of continuous density g. We call the resulting law of a Poisson mixture by X λ , the law of a random variable Y of probability density function, for all y ; λ > 0 , θ > 0 , u > 0

p ( y , θ , λ ) = 0 + P ( X λ = y ) g ( u , λ ) d λ , with θ = λ u (26)

where P ( X λ = y ) = λ y y ! e λ and g ( u , λ ) = 1 u e 1 2 λ u is a probability density called a mixing law.

It may be noted that similar work can be found in [10]. The following result may now be formulated.

Proposition 4. Let Y be a positive random variable with the probability mass function p resulting from the Poisson mixing law and joint probability density from GPD and Laplace-Gauss. Then, for any positive constant θ , one has

p ( y , θ , λ ) = Γ ( y + 1 ) y ! λ θ ( 1 + λ 2 θ ) y (27)

with a known parameter λ .

Proof. Indeed, from (26), it is easy to show that

p ( y , θ , λ ) = 1 y ! u 0 + λ y e λ ( 1 + 1 2 u ) d λ . (28)

At this point, observe that by setting

t = λ ( 1 + 1 2 u ) λ = t 1 + 1 2 u ;

we find the probability mass function as follows

p ( y , θ , λ ) = 1 y ! u 0 + ( t 1 + 1 2 u ) y e t d t = 1 y ! u 1 ( 1 + 1 2 u ) y 0 + t y e t d t .

By posing u = λ θ , this leads to the following form

p ( y , θ , λ ) = Γ ( y + 1 ) y ! λ θ ( 1 + λ 2 θ ) y .

3.4. Estimation of the GPD Parameters by Maximum Likelihood Approach

The generalized Pareto distribution (GPD) is a two-parameter family of distributions that can be used to model exceedances over a threshold. Maximum likelihood estimators of the parameters are used, since they are asymptotically normal and asymptotically efficient. Numerical methods are required for maximizing the log-likelihood. The density of GPD function being [11]

f ( x ) = 1 σ ( 1 + γ ( x σ ) ) 1 γ 1 , for x > 0 if γ > 0.

We then deduce the log-likelihood function of the GPD distribution

log L ( x 1 , , x n , γ , σ ) = n ln σ ( 1 γ + 1 ) i = 1 n ln ( 1 + γ x i σ ) .

By deriving the log-likelihood function with respect to each parameter, one obtains the following system

{ n σ + ( 1 γ + 1 ) i = 1 n γ x i σ 2 + σ γ x i = 0 i = 1 n ln ( 1 + γ σ x i ) γ ( γ + 1 ) i = 1 n x i σ + γ x i = 0 (29)

By numerical algorithms, of the GPD function of the R software, one obtains the parameters in Table 1 based on data from dietary intake of dioxins of men exposure, such as (note that data is taken from Table 3)

M a l e = ( 181.8 , 137.7 , 230.7 , 178.2 , 181.5 , 192.6 , 113.1 , 75.6 , 59.4 , 45 , 38.1 )

3.5. Estimation of the GPD Parameters by Alienor’s Method

We now investigate the Alienor approach for efficient estimation of the parameters. Considering the bi-parametric probability density function of GPD law whose analytical expression is given by

f ( x ) = 1 σ ( 1 + γ ( x σ ) ) 1 γ 1 ;

where γ , σ , x are the shape parameter, the scale parameter, and the excess variable, respectively. By section (2.2), one has the steps as follows:

Step 1. Definition of the objective function.

The likelihood function associated with probability density function of GPD gives

Table 1. The parameters MLM.

L ( x 1 , , x n , γ , σ ) = ( 1 σ ) n i = 1 n ( 1 + γ σ x i ) 1 γ 1 .

The objective function noted

L * = L h : [ 0 , 1 ] [ 0 , 1 ]

θ L h ( θ ) .

Knowing that

h ( θ ) = γ ( θ ) = 1 2 ( 1 cos ( m × π θ ) )

h ( θ ) = σ ( θ ) = 1 2 ( 1 cos ( m 2 × π θ ) ) .

Thus,

L * ( θ ) = [ 1 1 2 ( 1 cos ( m 2 × π θ ) ) ] n × [ i = 1 n ( 1 + 1 cos ( m × π θ ) 1 cos ( m 2 × π θ ) x i ) 1 1 cos ( m × π θ ) 1 ] . (30)

We have a maximization problem that we can transform into a minimization problem as follows:

max θ [ 0 , 1 ] L * ( θ ) = min θ [ 0 , 1 ] ( L * ( θ ) ) .

L * : [ 0,1 ] [ 1,0 ]

θ L * ( θ ) .

Step 2. Application of O.P.O*

Let’s arbitrarily choose y 0 = L * ( θ o ) [ 1 , 0 ] and let’s apply O . P . O T L to the point y 0 . By Choosing θ 0 = 1 4 , m = 2 and for n= 8 distribution of contaminant observations (32) above the 70 kg/body weight threshold, by using (30), one obtains

L * ( θ 0 ) = [ 1 1 2 ( 1 cos ( 4 π × 1 4 ) ) ] 8 × [ 1 + 1 cos ( 2 π × 1 4 ) 1 cos ( 4 π × 1 4 ) × 181.8 ] 1 1 cos ( 2 π × 1 4 ) 1 × [ 1 + 1 cos ( 2 π × 1 4 ) 1 cos ( 4 π × 1 4 ) × 137.7 ] 1 1 cos ( 2 π × 1 4 ) 1

× [ 1 + 1 cos ( 2 π × 1 4 ) 1 cos ( 4 π × 1 4 ) × 230.7 ] 1 1 cos ( 2 π × 1 4 ) 1 × [ 1 + 1 cos ( 2 π × 1 4 ) 1 cos ( 4 π × 1 4 ) × 178.2 ] 1 1 cos ( 2 π × 1 4 ) 1 × [ 1 + 1 cos ( 2 π × 1 4 ) 1 cos ( 4 π × 1 4 ) × 181.5 ] 1 1 cos ( 2 π × 1 4 ) 1

× [ 1 + 1 cos ( 2 π × 1 4 ) 1 cos ( 4 π × 1 4 ) × 192.6 ] 1 1 cos ( 2 π × 1 4 ) 1 × [ 1 + 1 cos ( 2 π × 1 4 ) 1 cos ( 4 π × 1 4 ) × 113.1 ] 1 1 cos ( 2 π × 1 4 ) 1 × [ 1 + 1 cos ( 2 π × 1 4 ) 1 cos ( 4 π × 1 4 ) × 75.6 ] 1 1 cos ( 2 π × 1 4 ) 1

we obtain after the previous calculation, the value of the objective function of the sample

L * ( θ 0 ) = 2.069063 × 10 23 (31)

By applying the operator (8) and (9) of (2.2), one has θ 0 = 7.749788 e 10 , the one that maximizes likelihood, this gives the following estimators (Table 2)

4. Some Technical Results

Using the method of maximum likelihood estimation encounters difficulties including lack of solutions of the likelihood. Moreover, in the case where solutions exist, the numerical resolution of the equations requires an iterative method which is often not convergent. Thus, to solve these difficulties, the concept of OPO is introduced.

Table 2. The parameters obtained by AM.

4.1. Description of the Data

The data were provided by Health Canada, which is an active participant in the international assessments of chemical contaminants in foods conducted by the Joint FAO/WHO Expert Committee on Food Additives. Dioxins are currently on the CODEX priority list for evaluation by JECFA, which is driving our interest in this area of research. The tolerable threshold for dioxins in humans, the recent JECFA evaluations is 70 pg/kg/month and the dioxin intake by age and sex is shown in the table which represents the extreme values observed in males and females. Our study determines the incubation period and the probability of occurrence of the event if we assume that exceeding the tolerable threshold for the organism (70 pg/kg/month) is dangerous for the health. Of course, the application is done on two cohorts. The first investigation points out the evolution of the quantity of the contaminant in the man’s body and the second is based on the woman’s one. Table 3 shows that women of childbearing age between 15 and 45 years old are daily exposed to dioxins between 1.21 and 1.78 pg/kg/day.

Table 3 shows that the newborn of 4 - 6 months is much more exposed to the effects of dioxins followed by those who are 1 - 4 years old. According to Figure 1, some foods contain large amount of dioxin (pg/g body weight), namely butter, poultry, chicken, milk then ground beef, egg, etc. And an excessive

Table 3. Daily and Monthly intake of dioxin.

DDI: is Daily Dioxin Intake (pg/kg bw/day) and MDI: is Monthly Dioxin Intake (pg/kg bw/month). bw: body weight.

Figure 1. Toxin levels in food.

consumption of these foods can be dangerous for your health (the data presented in this Figure 1 are available on request from the authors). The objective of this section is to assess exposure by means of two indicators, the probability that the exposure process exceeds a threshold based on the extreme value theory and the incubation period of the event. The ruin probability is obtained by GPD model.

4.2. Digital Application

One has used the intake of dioxin in the diet, an extreme distribution by age group and gender. Let us point out that the investigation was conducted based on two cohorts, male and female.

4.2.1. Male Gender

Cancers are diseases that are usually detected at the end of a person’s life. However, carcinogenic pathologies (what we consider extreme events) are the body’s response to contamination processes. Our approach is a probabilistic approach in modeling the risks incurred by a population in the exposure to food risks, telling when a person is at risk and developing a pathology following regular consumption of food contaminated by dioxins. We have the dietary intake of dioxin in males from 0 to 65 years of age and over. According to Table 3, we have the following data: (Figure 2)

M a l e = ( 181.8,137.7,230.7,178.2,181.5,192.6,113.1,75.6,59.4,45,38.1 ) (32)

The data on the evolution of the contamination of men show according to the visual diagnosis that from 0 to 100 pg/kg/month the distributions of the observations are stationary and from 100 to 250 pg/kg/month the behavior is Gaussian. This brings us to a normality test to see if the observations converge to a normal distribution in order to generate a sample size of 1000. We adjust our data by normal law and follow Shapiro’s test. The adjustment of the data of our sample by the normal distribution gives 0.91263 representing a high rate of

Figure 2. Evolution of dioxin contamination.

compatibility between our data and the normal distribution and a p-value equal to 0.263. This last value is thus higher than the significance threshold α = 0.05 , so the sample follows a normal distribution. The choice of the threshold is very important because it induces a great variability in the estimation of extreme quantiles and parameters of the law of excesses. There are different approaches for the choice of the threshold of the POT method. Usually, the threshold is determined graphically by the Mean Excess Plot (meplot function) in R software (Figure 3).

According to Figure 4, the threshold at value u = 75. But the threshold 70 recommended by FAO/WHO is used for the estimation of the parameters. It can be seen that the distribution of excesses converges towards a Generalized Pareto Distribution (GPD). Recall that the upward trend in 4 shows heavy tail behavior. In particular, a straight line with a positive gradient above a certain threshold is a sign of Pareto tail behavior. The survival function of the GPD law depends on the following parameters from Table 1. In the simulated sample size 1000, there are 811 observations that exceed the threshold 70.

γ ^ , σ ^ ( x P T ) = 1 ( 1 + 0.3576522 x p t 106.5762338 ) 1 0.3576522 si γ 0.

We used (18) an approach based on the excess method by the approximation of GPD (knowing that N u = 811 and 1-pt is equivalent to 95%)

x ^ p t G P D = 401.9712377

The incubation period obtained using the Formula (19) based on the maximum likelihood method is:

T ^ r G P D = 13.42598873

Using the result in Table 2 based on Alienor approach, we obtain:

T ^ r G P D = 2.12058

We applied the POT approach to male dietary dioxin intake data. We estimated the parameters of the GPD distribution by using the maximum likelihood method, then the extreme quantile and the incubation period. The estimation of

Figure 3. Diagnostic of threshold.

Figure 4. Evolution of dioxin contamination.

the incubation period allows us to conclude that male exposure to dioxins can cause serious damage to the organism approximately 13.4259 months after dioxin contamination, i.e. 13 months 12 days 18 hours after the man is considered at risk and can manifest a pathology with a probability of 0.811 but approximately 2 months based on Alienor approach.

4.2.2. Female Gender

We need to use the dietary intake of dioxin for females from 0 to 65 years of age and older according to Table 3.

F e m a l e = ( 181.8,137.7,230.7,178.2,181.5,192.6,113.1,53.4,45.9,36.3,29.7 ) . (33)

After diagnosis of Figure 4, we notice Pareto shape at the beginning of the histogram and Gaussian behavior thereafter. This has motivated the hypothesis of proposition 2. Figure 5 illustrates the trajectory of the joint probability law, by numerical resolution.

The distribution obtained previously is used in the model. The GPD was fitted by the maximum likelihood method from 469 values exceeding the threshold, among the 1000 simulated values. We can estimate that the ruin probability is 0.469. The estimated parameters are summarized in Table 4.

The adequacy of the data to the GPD law allows the estimation of an extreme quantile by Table 4.

Figure 5. The trajectory of the joined probability density.

Table 4. The parameters of scale and shape.

x ^ p t G P D = 377.4350556

And using Table 4 and the survival function gives.

γ ^ σ ^ ( x p t ) = 0.9367358776

The incubation period of the event obtained after calculus is

T ^ s i m G P D = 15.83842421

Estimating the incubation period allows us to conclude that exposure of women to dioxins can cause serious damage to the body approximately 15.83842421 months after exposure. That is to say that a woman who maintains a steady consumption pattern of dioxin-contaminated foods such as fish, poultry, and butter… can develop pathology in 15 months 25 days 3 hours after infection.

In summary, one obtains that the incubation period for dioxin exposure is defined as follows

Male: 13 months 12 days 18 hours;

Woman: 15 months 25 days 3 hours.

5. Conclusion and Discussions

With this paper, we have contributed to estimating the incubation period following exposure to dioxins, a risk indicator that will allow decision-makers to take measures to reduce the exposure of consumers, and waste incinerators (resulting from respiratory contamination). This paper developed also instead of the existing law a new probability density function for a phenomenon that follows both normal and GPD distributions. In the future, it will be taken into account the different aspects of contamination which are the interaction of contaminants with others once they enter the body and the different compartments in which they are stored.

Acknowledgements

The authors would like to thank Health Canada through Jose Roy, Substance Management Information Line Officer and Cynthia Bainbridge, Unit Head, Risk Management of Persistent Organic Pollutants for the data made available and we would also like to thank the referee for his details comments. His comments have led to a much clearer and better presentation of our results.

Data Availability

The data used to support the findings of this study were supplied by Health Canada.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Lauer, S.A., Grantz, K.H., Bi, Q.F., Jones, F.K., Zheng, Q.L., Meredith, H.R., Azman, A.S., Reich, N.G. and Lessler, J. (2020) The Incubation Period of Coronavirus Disease 2019 (COVID-19) from Publicly Reported Confirmed Cases: Estimation and Application. Annals of Internal Medicine, 172, 577-582.
https://doi.org/10.7326/M20-0504
[2] Tressou, J., Crepet, A., Bertail, P., Feinberg, M.H. and Leblanc, J.Ch. (2004) Probabilistic Exposure Assessment to Food Chemicals Based on Extreme Value Theory. Application to Heavy Metals from Fish and Sea Products. Food and Chemical Toxicology, 42, 1349-1358.
https://doi.org/10.1016/j.fct.2004.03.016
[3] Moncoulon, D. and Quantin, A. (2013) Modélisation des événements extrêmes d’inondation en France métropolitaine. La Houille Blanche, No. 1 , 22-26.
https://doi.org/10.1051/lhb/2013004
[4] Diakarya, B., Blami, K. and Moussa, S. (2012) Spatial Stochastic Framework for Sampling Time Parametric Max-Stable Processes. International Journal of Statistics and Probability, 1, 203.
https://doi.org/10.5539/ijsp.v1n2p203
[5] Nguyen, T.-H., El Outayek, S., Lim, S.H. and Nguyen, V.T.V. (2017) A Systematic Approach to Selecting the Best Probability Models for Annual Maximum Rainfalls—A Case Study Using Data in Ontario (Canada). Journal of Hydrology, 553, 49-58.
https://doi.org/10.1016/j.jhydrol.2017.07.052
[6] Diamoutene, A., Kamsu-Foguem, B., Noureddine, F. and Barro, D. (2018) Prediction of U.S. General Aviation Fatalities from Extreme Value Approach. Transportation Research Part A: Policy and Practice, 109, 65-75.
https://doi.org/10.1016/j.tra.2018.01.022
[7] Konfe, B.O., Cherruault, Y., Some, B. and Benneouala, T. (2005) Alienor Method Applied to Operational Research. Kybernetes, 34, 1211-1222.
https://doi.org/10.1108/03684920510605984
[8] Adedje, K.E. and Barro, D. (2022) A Stochastic Approach to Modeling Food Pattern. International Journal of Mathematics and Mathematical Sciences, 2022, Article ID: 9011873.
https://doi.org/10.1155/2022/9011873
[9] Velázquez, J.C., Van Der Weide, J.A.M, Hernández, E. and Hernández, H.H. (2014) Statistical Modelling of Pitting Corrosion: Extrapolation of the Maximum Pit Depth-Growth. International Journal of Electrochemical Science, 9, 4129-4143.
http://www.electrochemsci.org/papers/vol9/90804129.pdf
[10] Jongbloed, G. and Koole, G. (2014) Managing Uncertainty in Call Centres Using Poisson Mixtures, Applied Stochastic Models in Business and Industry, 17, 307-318.
https://doi.org/10.1002/asmb.444
[11] Barro, D., Diallo, M. and Bagré, R.G. (2016) Spatial Tail Dependence and Survival Stability in a Class of Archimedean Copulas. International Journal of Mathematics and Mathematical Sciences, 2016, Article ID: 8927248.
https://doi.org/10.1155/2016/8927248

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.