Probability Laws Derived from the Gamma Function

Abstract

Several densities or probability laws of continuous random variables derive from the Euler Gamma function. These laws form the basis of sampling theory, namely hypothesis testing and estimation. Namely the gamma, beta, and Student law, through the chi-square law and the normal law are all distributions resulting from applications of Euleur functions.

Share and Cite:

Toure, L. and Conde, S. (2024) Probability Laws Derived from the Gamma Function. Open Journal of Statistics, 14, 106-118. doi: 10.4236/ojs.2024.141005.

1. Introduction

Application of the functions of Euler contributed to and facilitated the obtaining of important results in statistics and especially in the theories of distribution of sampling. In this paper we study some properties of the functions Gama and Beta, in the second part we also study some laws of probabilities or probability distributions derived from the function gamma and in the last part, the application concerns the laws of probabilities of certain random variables from normal populations.

2. Gamma Function

2.1. Definition

Gamma Function Γ is defined by in [1] :

Γ ( n ) = 0 e t t n 1 d t (1)

is an improper integral of the third kind. which converges if n > 0 and diverges if n 0 .

Fundamental Relationship

Γ ( n + 1 ) = n Γ ( n ) .

Indeed: Γ ( n + 1 ) = 0 e t t n d t = [ e t t n ] 0 + n 0 e t t n 1 d t but [ e t t n ] 0 = 0 if t = 0 or .

We have

Γ ( 1 ) = 0 e t d t = 1 .

Therefore, for all n integer:

Γ ( n + 1 ) = n Γ ( n ) = n ( n 1 ) Γ ( n 2 ) = n ! Γ ( 1 ) = n !

Γ ( n + 1 ) = n ! (2)

2.2. Asymptotic Formula for Γ(n)

If n is large, the difficulties inherent in calculating Γ ( n ) are obvious. A useful result in such a case provided by the relation

Γ ( n + 1 ) = 2 π n n n e n e θ / 12 ( n + 1 ) (3)

for 0 < θ < 1 .

For practical applications, the last factor which is too close to 1 for large n can be omitted.

Γ ( n + 1 ) = 2 π n n n e n

2.3. Beta Function

2.3.1. Definition

B ( p , q ) = Γ ( p ) Γ ( q ) Γ ( p + q ) (4)

2.3.2. Integral Expression of the Beta Function

Γ ( p ) = 0 e t t p 1 d t = 2 0 e u 2 u 2 p 1 d u with t = u 2 .

Therefore:

Γ ( p ) Γ ( q ) = 4 0 0 e u 2 u 2 p 1 d u e v 2 v 2 q 1 d v = 4 0 0 e u 2 + v 2 u 2 p 1 v 2 q 1 d u d v .

In polar coordinates, for u = ρ cos ( θ ) and v = ρ sin ( θ )

Γ ( p ) Γ ( q ) = 4 ρ = 0 θ = 0 π / 2 e ρ 2 ρ 2 p 1 + 2 q 1 ( cos ( θ ) ) 2 p 1 ( sin ( θ ) ) 2 q 1 ρ d ρ d θ = 4 ρ = 0 e ρ 2 ρ 2 ( p + q ) 1 θ = 0 π / 2 ( cos ( θ ) ) 2 p 1 ( sin ( θ ) ) 2 q 1 d ρ d θ = 2 Γ ( p + q ) θ = 0 π / 2 ( cos ( θ ) ) 2 p 1 ( sin ( θ ) ) 2 q 1 d θ

so

β ( p , q ) = 2 θ = 0 π / 2 ( cos ( θ ) ) 2 p 1 ( sin ( θ ) ) 2 q 1 d θ (5)

In particular

B ( 1 / 2 , 1 / 2 ) = [ Γ ( 1 / 2 ) ] 2 Γ ( 1 ) = [ Γ ( 1 / 2 ) ] 2 = 2 0 π / 2 d θ = π

Γ ( 1 / 2 ) = π (6)

Passing in Cartesian coordinates, so by posing cos 2 θ = t , we find:

B ( p , q ) = 0 1 t p 1 ( 1 t ) q 1 d t (7)

3. Randon Variable

In this section we will define random variables and some of its characteristics.

A rando variableis any function that assigns a numerical value to each possible outcome.

Probability Density Function

We describe in [2] and [3] the behavior of a continuous random variable X by specifying its probability density function which satisfies f ( x ) 0 x and f ( x ) d x = 1 .

Remember that it is only meaningful to talk about the probability that a continuous random variable X lies in an interval. It is always the case that P ( X = x ) = 0 for every possible value x.

Obtain the probability that the value of X will lie in an interval by finding the area over the interval. P ( X b ) = b f ( x ) d x = area under the density function to the left of x = b . P ( a b ) = a b f ( x ) d x = area under the density function between x = a and x = b .

Summarize a probability density of the continuous random variable X by its:

mean: μ = + x f ( x ) d x .

variance: σ 2 = + ( x μ ) 2 f ( x ) d x .

4. Some Probability Laws Derived from the Gama Functions

Distributions derived from the Gamma law are distributions that arise from the gamma law through transformations or combinations with other distributions. These distributions are used in various fields such as failure time modeling in engineering, survival analysis, econometrics and other applications where positive continuous random variables are involved.

4.1. Gamma Distribution

This distribution plays an important role in statistics.

4.1.1. Definition

In [3] and [4] A random variable X is said to be distributed as the gamma distribution of parameter theta if its density is for θ > 0 and α > 0

f ( x ) = θ α Γ ( α ) e θ x x α 1 for 0 < x < (8)

This function represents a density, because by definition of Γ ( α ) (see Equation (1)), 0 f ( x ) d x = 1 .

By definition, E ( X ) = 0 x f ( x ) d x .

We have,

E ( X ) = θ α Γ ( α ) 0 e θ x x α d x = 1 Γ ( α ) 0 e y y α d y / θ = Γ ( α + 1 ) θ Γ ( α ) = α / θ

V ( X ) = E ( X 2 ) E 2 ( X ) .

We have, V ( X ) = θ α Γ ( α ) 0 e θ x x α + 1 d x ( α / θ ) 2 = 1 Γ ( α ) 0 e y y α + 1 d y / θ 2 ( α / θ ) 2 hence, V ( X ) = Γ ( α + 2 ) θ 2 Γ ( α ) ( α / θ ) 2 = ( α + 1 ) Γ ( α + 1 ) θ 2 Γ ( α ) ( α / θ ) 2 = 1 θ 2 ( α ( α + 1 ) α 2 ) = α / θ 2 .

4.1.2. Application

Let’s study the law of the variable Y = θ X Let G ( y ) be the distribution function of the variable Y.

By definition: G ( y ) = P ( Y < y ) = P ( θ X < y ) = P ( X < y θ ) = F ( y θ ) where F est la fonction de repartition de la variable X; la densité de probabilité de Y est obtenue par dérivation: Soit g ( y ) la densité de probabilité de la variable Y. g ( y ) = 1 θ f ( y θ ) = θ p 1 Γ ( p ) e y ( y θ ) p 1 = 1 Γ ( p ) e y y p 1 , y > 0

4.2. Remarks

If the scale parameter θ is equal to 1, we write Γ ( α ,1 ) or Γ ( α ) ; if θ = 1 , the random variable Y = θ X follows Γ ( α ) .

The density of a Γ ( α ) law is:

f ( x ) = 1 Γ ( θ ) e x x θ 1 (9)

If α = 1 , the gamma law Γ ( 1, θ ) is called an exponential law with parameter θ .

4.3. The Beta Distribution

4.3.1. Type I Beta Distribution

The beta distribution is the distribution of an X; 0 < x < 1 , dependent on two parameters n and p, whose density is:

f ( x ) = 1 B ( n , p ) x n 1 ( 1 x ) p 1 (10)

n , p > o ;

where B ( n , p ) = Γ ( n ) Γ ( p ) Γ ( n + p ) then

f ( x ) = Γ ( n + p ) Γ ( n ) Γ ( p ) x n 1 ( 1 x ) p 1 (11)

As for the previous distribution we find: E ( X ) = n n + p and V ( X ) = n p ( n + p + 1 ) ( n + p ) 2

4.3.2. The Type II Beta Distribution

Let X be a random variable following a beta distribution of beta I ( n , p ) , then by definition, Y = X / ( 1 X ) follows a type II beta distribution whose density is easily obtained by changing the variable.

f ( y ) = 1 B ( n , p ) y n 1 ( 1 + y ) n + p (12)

From the properties of the function Γ we easily deduce the moments of Y

E ( Y ) = 1 B ( n , p ) 0 + y n ( 1 + y ) n + p d y = B ( n + 1 , p 1 ) B ( n , p ) = n p 1 , for p > 1

E ( Y 2 ) = 1 B ( n , p ) 0 + y n + 1 ( 1 + y ) n + p d y = B ( n + 2 , p 2 ) B ( n , p ) = n ( n + 1 ) ( p 1 ) ( p 2 ) , for p > 2 .

4.4. The Normal Distribution

One of the most important continuous probability distribution is the normal distribution, normal curve or Gaussian distribution defined by the equation:

f ( x ) = 1 σ 2 π e 1 2 ( x μ σ ) 2 , x . (13)

where μ = mean, σ = standard deviation, π = 3.14159 , e = 2.71828 f ( x ) > 0 , x + f ( x ) d x = 1 .

Indeed

+ f ( x ) d x = + 1 σ 2 π e 1 2 ( x μ σ ) 2 d x = + 1 2 e 1 2 z 2 d z .

With the change of variable:

Z = X μ σ .

We have:

+ 1 2 e 1 2 z 2 d z = 1 .

THEOREME 1. If X ~ N ( μ , σ ) Z = X μ σ ~ N ( 0 , 1 ) .

The variable Z = X μ σ ~ N ( 0 , 1 ) .

f ( z ) = 1 2 π e 1 2 z 2 is the probability density of the variable Z We will show that the variance of Z is equal to 1. V a r ( Z ) = + z 2 1 2 π e 1 2 z 2 d z = 2 2 π 0 + z 2 exp ( 1 2 z 2 ) let’s put t = z 2 / 2 , the z d z = d t : = 2 π 0 + e t = 2 π Γ ( 3 2 ) = 2 π 1 2 Γ ( 1 2 )

Γ ( 1 / 2 ) = π (see Equation (6)) then V ( Z ) = 1 .

4.5. Chi-Square Distribution

When α is a positive integer in the Gamma distribution, we obtain the chi-square distribution, used in statistical tests and confidence interval estimates. We can also difine by: The Chi Square distribution is the distribution of the sum of squared standard normal deviates. The degrees of freedom of the distribution is equal to the number of standard normal deviates being summed.

We say that X follows a Chi-square distribution with ν degrees of freedom, denote χ ν 2 , if the probability density function of X is:

f ( x ) = 1 2 ν / 2 Γ ( ν / 2 ) e x / 2 x ν / 2 1 for ν (14)

The chi-square distribution is another distribution of Gamma, Indeed for θ = 1 / 2 and α = ν / 2 (in Equation (8)). Therefore: E ( X ) = 0 x f ( x ) d x = ν and V ( X ) = 0 x 2 f ( x ) d x E 2 ( X ) = 2 ν .

4.6. The Fisher ’F Distribution

This law is related o the ratio of two independent quadratic forms. Suppose that χ 1 and χ 2 are independently distributed by chi-square distributions with ν 1 and ν 2 degrees of freedom, respectively.

We define

F = χ 1 / ν 1 χ 2 / ν 2 (15)

its density function is:

g ( f ) = 1 β ( ν 1 2 ν 2 2 ) ( ν 1 ν 2 ) ν 1 2 f ν 1 2 2 ( 1 + ν 1 ν 2 f ) ν 1 + ν 2 2 for t 0 (16)

E ( F ) = ν 2 ν 2 2 (17)

and

V a r ( F ) = 2 ν 2 2 ν 1 ν 1 + ν 2 2 ( ν 2 2 ) 2 ( n 2 4 ) (18)

4.7. Student ‘t’ Distribution

Another distribution of considerable practical importance is that of the ratio of a normally distributed variate to the root of a a variate independently distributed by chi-square distribution.

More precisely, if X is normally distributed with mean μ and variance σ 2 , if U has the chi-square distribution with ν degrees of freedom, and if X and U are independent distributed, we seek the distribution of. Probability density function Student’s t-distribution has the probability density function given by

f ( t ) = 1 ν π Γ ( ν + 1 2 ) Γ ( ν 2 ) 1 ( 1 + t 2 ν ) ν + 1 2 , for < t < + (19)

E ( t ) = 0 , ν > 1 (20)

V a r ( t ) = ν ν 2 for ν > 2 (21)

t = ( x μ ) / σ u / ν (22)

5. The Sampling Distribution

5.1. Distribution of Sample Variance

The variance s of a sample is also a random variable see [5] and [6] . The variance of a sample is given by the formula:

s 2 = i = 1 n ( x i x ¯ ) 2 n 1 (23)

We also know that for a sample of size n derived from a normal population of mean μ and variance σ 2 , the quantity i = 1 n z i 2 = i = 1 n ( x i μ σ ) 2

follows the chi-square law with n degrees of freedom. Any sum of squares of random variables is associated with a number of degrees of freedom. Thus, the sum i = 1 n ( x i μ ) 2 has a n degree of freedom, but i = 1 n ( x i x ¯ ) 2 has only ( n 1 ) degrees of freedom. Then the ratio ( n 1 ) s 2 σ 2 = i = 1 n ( x i x ¯ ) 2 σ 2 follows a law of χ 2 a ( n 1 ) degrees of freedom. χ 2 = ( n 1 ) s 2 σ 2

5.2. The Distribution of the Quotient of Two Variances

Be two normal populations of σ 1 2 and σ 2 2 variances respectively. We take two independent samples of size n 1 and n 2 , respectively. We know that X follows the chi-square law with (n − 1) degrees of freedom. Therefor F = ( n 1 1 ) s 1 2 / ( n 1 1 ) σ 1 2 ( n 2 1 ) s 2 2 / ( n 2 1 ) σ 2 2 = s 1 2 / σ 1 2 s 2 2 / σ 2 2 is distributed as F with ( n 1 1 ) and ( n 2 1 ) degrees of freedom.

5.3. Distribution of the Quantity Z = X ¯ μ s / n

Let Z be a normal variate with means 0 and variance 1.

Let U = ( n 1 ) s 2 σ 2 be a chi-square variable with ( n 1 ) degrees of freedom and let U and Z be independent. Then the random variable t = z n 1 u is distributed as Student’s with ( n 1 ) degrees of freedom.

6. Case Studies

As we said in the introduction, distributions derived from the gamma function are very important tools in the theory of statistical tests.

Our first example relates to the life expectancy of men and women in Guinea, and through the Student test, we knew that in general women live longer than men in Guinea.

The second concerns the improvement of safety conditions in a company before and after certain measures. After studies have proven that safety has been improved.

The third concerns the health conditions of children under 5 years old after a change in political regime.

The last example concerns the distribution of single-member deputies by gender and administrative region in 2013 in Guinea. We want to know if there is a dependency relationship between regions and election by gender (sex).

Example 1

The following example concerns the life expectancy of 25 people including 11 males and 14 females (2016 Guinea Statistical Yearbook).

male: 60.70 59.50 62.15 63.14 60.29 60.48 60.64 61.77 60.93 61.05 59.40.

female: 59.78 64.55 63.45 62.78 59.64 61.59 61.89 62.51 62.67 62.82 63.88 63.07 63.17 62.40.

We want to test the hypothesis that men and women have the same life expectancy. To do this, we will compare the mean age (life expectancy) of men (n = 11) with the average age of women (n = 14). As shown in Figure 1, the distribution of age (life expectancy) follows a normal distribution in both samples and the table confirms by the normality test that these data follow a normal distribution. Once the Student’s t test conditions have been met, we can use the SPSS procedure (mean comparison) to compare the averages using the t-test for independent samples.

The software SPSS gives us the following results The procedure gives us for each sample the main descriptive parameters, the number of subjects, the average, the variance and the standard deviation of each sample Figure 2.

Figure 1. Test of normality.

Figure 2. Table Test of independent.

There is also the difference between the average life expectancy between the two subjects, equal to (−1.52922), which shows that female subjects have a higher life expectancy than male subjects.

Then, for a significance level α = 0.05 , the equality of variances test gives us a P-value of 0.608 above the threshold that allows us not to reject the variance equality assumption. Then, for the same threshold, the equality test of the averages, we provided a P-value = 0.006 below the significance level and from this result we conclude that the relationship between life expectancy and gender is significant, finally.

We say that in Guinea women live longer than men.

Example 2

The following are the average weekly losses of worker hours due to accidents in 10 industrial plans before and after a certain safety program was put into operation:

Before: 45 73 46 124 33 57 83 34 26 17.

After: 36 60 44 119 35 51 77 29 24 11.

Use the 0.05 level of significance to test whether the safety program is effective.

We cannot apply the independent samples test because the before and after weekly losses of worker hours in the same industrial plan are correlated.

Here there is the obvious pairing of these two observations.

Ÿ Null hypothesis: μ D = 0 ,

Alternative hypothesis μ D > 0 .

Ÿ Level of significance α = 0.05 .

Ÿ Criterion: Reject the null hypothesis if t o b > 1.833 , the value of t 0.05 for 10 – 1 = 9 degrees of freedom, where t o b = D ¯ 0 S D / n and D ¯ and S D are the mean and the standard deviation of the differences.

Ÿ Calculations of the differences are:

9 13 2 5 −2 6 6 5 2 6

their mean is d ¯ = 5.2 , their standard deviation is s D = 4.08 , so that t o b = 5.2 0 4.08 / 10 = 4.03 .

Ÿ Decision: Since t o b = 4.03 exceeds 1.833, the null hypothesis must be rejected at level α = 0.05 . We conclude that the industrial safety program is effective.

Example 3

The following data represent the number(hundred) of children under five considered chronically malnourished, according to the natural areas in Guinea, before and after the change of political regimen.

We want to know if the sanitary conditions have changed after the change of political regime.

Before: 26.7 21.0 31.1 43.1 34.5 34.6 31.7 40.0

After: 28.1 14.6 30.7 31.9 30.5 36.9 40.8 37.9

(2016 Guinea Statistical Yearbook)

Ÿ Null hypothesis: μ D = 0 ,

Alternative hypothesis μ D > 0 .

Ÿ Level of significance α = 0.05 .

Ÿ Criterion: Reject the null hypothesis if t o b > 1.86 , the value of t 0.05 for 8 – 1 = 7 degrees of freedom, where t o b = D ¯ 0 S D / n and D ¯ and S D are the mean and the standard deviation of the differences.

Ÿ Calculations of the differences are:

−1.4 6.4 0.4 11.2 4 −2.3 −9.1 2.1

their mean is d ¯ = 1.41 , their standard deviation is s D = 6.11 , so that t o b = 1.41 0 6.11 / 8 = 0.654 .

Ÿ Decision: Since t o b = 0.654 not exceed 1.86, the null hypothesis should not be rejected at level α = 0.05 . We conclude that the health policy has not been improved despite the change of political regime.

Example 4

The following table shows the distribution of single-member deputies by sex administrative region in 2013 in Guinea (2016 Guinea Statistical Yearbook).

In the example above, the null hypothesis is translated by the absence of binding between sex and region.

Ÿ Null hypothesis: absence of binding between sex and region Alternative hypothesis sex and region are dependent.

Ÿ Level of significance α = 0.05 .

Ÿ Criterion: Reject the null hypothesis if χ 2 > 14.067 , the value of χ 0.05,7 2 for (2 − 1)(8 − 1) – 1 = 7 degrees of freedom, where χ 2 is given by the formula above.

Ÿ Calculations Calculating the expected cell frequencies, we get: e 11 = 23 × 85 165 = 11.9879 , e 11 = 19 × 85 165 = 9.9030 , by analogy other frequencies are: e 13 = 11.4667 , e 14 = 11.4667 , e 15 = 10.9455 , e 16 = 9.3818 , e 17 = 8.8606 , e 18 = 23 × 85 165 = 11.9879 , e 21 = 11.0121 , e 22 = 9.0970 , e 23 = 10.5333 , e 24 = 10.5333 , e 25 = 10.0545 , e 26 = 8.6182 , e 27 = 8.1394 and e 28 = 11.0121 .

After the calculations we find χ 2 using the last equation.

Ÿ Decision: Since χ 2 = 12.959 < 14.067 , the null hypothesis must not be rejected at level α = 0.05 .

We conclude that the sex and region there is the absence of binding between sex and region.

The SPSS software gives the same result.

With SPSS, we have the value of P = 0.073 for a bilateral test or 0.146 for a unilateral test for α ≤ P-value we accept, the hypothesis null and we reject it otherwise, As P-value > α, we accept the null hypothesis P = 0.146 > 0.05 .

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Piskonov, N. (1980) Differential and Integral Calculus. Mir, Moscou.
[2] Graybill, F.A. and Mood, A.M. (1963) An Introduction Theory of Statistics. 2nd Edition, McGraw-Hill Book Company, New York.
[3] Freud, M. (2005) Probability and Statistics for Engineers. 7th Edition, House of Electronics Industry, Beijing.
[4] Saporta, G. (2006) Probabilites Analyse des Donnees et statistiques. 2nd Edition, Technip, Paris.
[5] Bertrand, F. and Bertrand M.M. (2011) Statistique en 80 fiches pour les Scientifiques. DUNOD, Paris.
[6] Avenel, M. and Riffault, F.J. (2005) Mathematiques Appliquées à la gestion. Sup’FOUCHER Vanves.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.