Derivation of Gaussian Probability Distribution: A New Approach

Abstract

The famous de Moivre’s Laplace limit theorem proved the probability density function of Gaussian distribution from binomial probability mass function under specified conditions. De Moivre’s Laplace approach is cumbersome as it relies heavily on many lemmas and theorems. This paper invented an alternative and less rigorous method of deriving Gaussian distribution from basic random experiment conditional on some assumptions.

Share and Cite:

Adeniran, A. , Faweya, O. , Ogunlade, T. and Balogun, K. (2020) Derivation of Gaussian Probability Distribution: A New Approach. Applied Mathematics, 11, 436-446. doi: 10.4236/am.2020.116031.

1. Introduction

A well celebrated, fundamental probability distribution for the class of continuous functions is the classical Gaussian distribution named after the German Mathematician Karl Friedrich Gauss in 1809.

Definition 1.1 Let μ and σ be constants with < μ < and σ > 0 . The function

f ( x ; μ , σ ) = 1 2 π σ 2 e 1 2 ( x μ σ ) 2 ; for < x < (1)

is called the normal probability density function of a random variable X with parameters μ and σ .

Both in theories and applications, without element of equivocation, the Gaussian distribution function is the most essential and widely referencing distribution in statistics.

The well-known method of deriving this distribution first appeared in the second edition of the Doctrine of Chances by Abraham de Moivre (hence, de Moivre’s Laplace limit theorem) published in 1738 ( [1] [2] [3] [4] [5]). The mathematical statement of the popular de Moivre’s theorem follows.

Theorem 1.1 (de Moivre’s Laplace limit theorem) As n grows large ( n ), for x in the neighborhood of np, for moderate values of p ( p 0 and p 1 ), we can approximate

( n x ) p x q n x 1 2 π n p q e ( x n p ) 2 2 n p q , p + q = 1, p , q > 0. (2)

Explicitly, the theorem asserts that suppose n + , and let p and q be probabilities, with p + q = 1 . The function

b ( x ; n , p ) = ( n x ) p x ( 1 p ) n x for x = 0 , 1 , 2 , , n (3)

called the binomial probability function converges to the probability density function of the normal distribution as n with mean np and standard deviation n p ( 1 p ) .

Although, De Moivre proved the result for p = 1 2 ( [6] [7]). [8] extended and generalized the proof to all values of p (probability of success in any trial) such that p is not too small and not too big. Feller result was expounded by [9]. [10] [11] [12] [13] used uniqueness property of moment generating function technique to proof the same theorem.

In this paper, we attempt to find an answer to the question: is there any alternative procedure to the derivation of Gaussian probability density function apart from de Moivre’s Laplace limit theorem approach which relies heavily on many Lemmas and Theorems (Stirling approximation formula, Maclaurin series expansion etc.), as evidenced by the work of [8] and [9] ?

2. Existing Technique

This section presents the summary proof of the existing de Moivre’s Laplace limit theorem. First and foremost, the study state with proof, the most important lemma of the de-Moivre’s Laplace limit theorem, Stirling approximation principle.

Lemma 2.1 (Stirling Approximation Principle) Given an integer n ; n > 0 , the factorial of a large number n can be replaced with the approximation

n ! 2 π n ( n e ) n

Proof 2.1 This lemma can be derived using the integral definition of the factorial,

n ! = Γ ( n + 1 ) = 0 x n e x d x (4)

Note that the derivative of the logarithm of the integrand can be written

d d x ln ( x n e x ) = d d x ( n ln x x ) = n x 1 (5)

The integrand is sharply peaked with the contribution important only near x = n . Therefore, let x = n + δ where δ n , and write

ln ( x n e x ) = n ln ( n + δ ) ( n + δ ) = n ln [ n ( 1 + δ n ) ] ( n + δ ) = n [ ln ( n ) + ln ( 1 + δ n ) ] ( n + δ ) (6)

Recall that the Maclaurin series of f ( x ) = ln ( 1 + x ) = x 1 2 x 2 + 0 ( n ) . Therefore,

ln ( x n e x ) = n [ ln ( n ) + δ n 1 2 ( δ 2 n 2 ) + ] ( n + δ ) = ln ( n n ) n δ 2 2 n + (7)

Taking the exponential on both sides of the preceding Equation (7) gives

x n e x e ln ( n n ) n δ 2 2 n = e ln ( n n ) e n e δ 2 2 n = ( n e ) n e δ 2 2 n (8)

Plugging (8) into the integral expression for n ! , that is, (4) gives

n ! 0 ( n e ) n e δ 2 2 n d δ = ( n e ) n e δ 2 2 n d δ (9)

From (9), let I = 0 e δ 2 2 n d δ and considering δ and κ as a dummy variable such that

I 2 = 0 e δ 2 2 α d δ × 0 e κ 2 2 n d κ = 0 0 e 1 2 n ( δ 2 + κ 2 ) d δ d κ (10)

Transforming I 2 from algebra to polar coordinates yields δ = ρ cos ( θ ) , κ = ρ sin ( θ ) which implies δ 2 + κ 2 = ρ 2 with Jacobian (J) of the transformation as

J = | δ ρ δ θ κ ρ κ θ | = | cos ( θ ) ρ sin ( θ ) sin ( θ ) ρ cos ( θ ) | = ρ (11)

Hence,

I 2 = 0 2 π 0 e ρ 2 2 n | J | d ρ d θ = 0 2 π 0 e ρ 2 2 n ρ d ρ d θ = n 0 2 π [ e u ] 0 d θ = n 0 2 π d θ = 2 π n (12)

Therefore, I = I = 2 π n . Substituting for I in (9) gives

n ! 2 π n ( n e ) n (13)

We now begin with proof of theorem (1.1) using the popular existing technique.

Proof 2.2 Using the result of lemma (2.1), Equation (3) can be rewritten as

f ( x ; n , p ) 2 π n ( n e ) n 2 π x ( x e ) x 2 π ( n x ) ( n x e ) n x p x ( 1 p ) n x = 1 2 π n n + 1 2 x x + 1 2 ( n x ) n x + 1 2 p x ( 1 p ) n x (14)

Multiplying both numerator and denominator of Equation (14) by n n 1 2 to get

f ( x ; n , p ) 1 2 π n n n + 1 2 + 1 2 + x x x x + 1 2 ( n x ) n x + 1 2 p x ( 1 p ) n x = 1 2 π n ( x n ) x 1 2 ( n x n ) n + x 1 2 p x ( 1 p ) n x (15)

Since x is in the neighborhood of np, change variables x = n p + ε , where ε measures the distance from the mean, np, of the binomial and the measured quantity x. Re-write (15) in terms of ε and further simplify as follow

f ( x ; n , p ) 1 2 π n ( n p + ε n ) x 1 2 ( n n p ε n ) n + x 1 2 p x ( 1 p ) n x 1 2 π n [ p ( 1 + ε n p ) ] x 1 2 [ ( 1 p ) ( 1 ε n ( 1 p ) ) ] n + x 1 2 p x ( 1 p ) n x 1 2 π n ( 1 + ε n p ) x 1 2 ( 1 ε n ( 1 p ) ) n + x 1 2 p 1 2 ( 1 p ) 1 2

to get

f ( x ; n , p ) 1 2 π n p ( 1 p ) ( 1 + ε n p ) x 1 2 ( 1 ε n ( 1 p ) ) n + x 1 2 (16)

Note that x = exp ( ln x ) . Therefore, rewriting (16) in exponential form to have

f ( x ; n , p ) 1 2 π n p ( 1 p ) exp [ ln { ( 1 + ε n p ) x 1 2 ( 1 ε n ( 1 p ) ) n + x 1 2 } ] 1 2 π n p ( 1 p ) exp [ ( x 1 2 ) ln ( 1 + ε n p ) + ( n + x 1 2 ) ln ( 1 ε n ( 1 p ) ) ]

f ( x ; n , p ) 1 2 π n p ( 1 p ) exp [ ( n p ε 1 2 ) ln ( 1 + ε n p ) + ( n ( 1 p ) + ε 1 2 ) ln ( 1 ε n ( 1 p ) ) ] (17)

Suppose f ( x ) = ln ( 1 + x ) , using Maclaurin series f ( x ) = x 1 2 x 2 + 0 ( n ) and similarly f ( x ) = ln ( 1 x ) = x 1 2 x 2 + 0 ( n ) . So that, ln ( 1 + ε n p ) ε n p 1 2 ( ε n p ) 2 and ln ( 1 ε n ( 1 p ) ) ε n ( 1 p ) 1 2 ( ε n ( 1 p ) ) 2 . As a result,

f ( x ; n , p ) 1 2 π n p ( 1 p ) exp [ ε + ε 2 2 n p ε 2 n p + ε + ε 2 2 n ( 1 p ) ε 2 n ( 1 p ) ] = 1 2 π n p ( 1 p ) exp [ 1 2 ( ε 2 n p ( 1 p ) ) ] (18)

Recall that x = n p + ε which implies that ε 2 = ( x n p ) 2 . From binomial distribution n p = μ , and n p ( 1 p ) = σ 2 which implies that n p ( 1 p ) = σ . Making appropriate substitution of these in the Equation (18) yields

f ( x ; n , p ) 1 2 π σ 2 exp [ 1 2 ( x μ ) 2 σ 2 ] = 1 σ 2 π e 1 2 ( x μ σ ) 2 ; for < x < (19)

The theorem confirmed.

We recommend that readers interested in the detailed proof of the theorem to consult the study expounded by [9].

3. The Proposed Technique

Suppose a random experiment of throwing needle or any other dart related objects at the origin of the cartesian plane is performed with the aim of hitting the centre (see Figure 1).

Due to human nature of inconsistency or lack of perfection, varying results in the throwing generate random errors. To make the derivation possible and less rigorous, we make the following assumptions:

1) The errors are independent of the orientation of the coordinate system.

2) Errors in perpendicular directions are independent. This means that being too high doesn’t alter the probability of being off to the right.

Figure 1. The possible results of the dart experiment.

3) Small errors are more likely than large errors. That is, throwings are more likely to land in region P than either Q or R, since region P is closer to the target (origin). Similarly, for the same reason, region Q is more likely than region R. Furthermore, there is higher possibility or tendency of hitting region V than either S or T, since V has the wider or bigger surface area and the distances from the origin are approximately the same.

From Figure 2, let the probability of the needle falling in the vertical strip from x to x + Δ x be denoted as p ( x ) Δ x . Similarly, the probability of the needle falling in the horizontal strip from y to y + Δ y be p ( y ) Δ y . Obviously, the function cannot be constant, due to the stochastic nature of the experiment. In this study, our interest is to know and obtain the form and characteristics of the function p ( x ) . From second assumption, the probability of the needle falling in the shaded region ABCD (see Figure 2) is

p ( x ) Δ x p ( y ) Δ y

Note that any regions r unit from the origin with area Δ x Δ y has the same probability which is a consequence of the assumption that errors do not depend on the orientation. We can say that

p ( x ) Δ x p ( y ) Δ y = p ( x ) p ( y ) Δ x Δ y = g ( r ) Δ x Δ y (20)

where

g ( r ) = p ( x ) p ( y ) (21)

from fundamental rule of Calculus, differentiating (using product rule) both sides of Equation (21) with respect to θ gives

0 = p ( x ) d d θ p ( y ) + p ( y ) d d θ p ( x ) (22)

Here, g ( r ) = 0 since g ( . ) is independent of orientation. By transformation to polar coordinates, x = r cos θ and y = r sin θ , we can rewrite the derivatives in Equation (22) as

Figure 2. The typical example of the experiment.

0 = p ( x ) d d θ p ( y = r sin θ ) + p ( y ) d d θ p ( x = r cos θ ) (23)

Using chain rule of differentiation, (23) becomes

0 = p ( x ) p ( y ) r cos θ p ( y ) p ( x ) r sin θ (24)

Rewriting Equation (24) again by replacing r cos θ with x and r cos θ with y yields

0 = p ( x ) p ( y ) x p ( y ) p ( x ) y (25)

The above differential equation can be put in a form such that it can be solved using variable separable technique as

p ( x ) p ( x ) x = p ( y ) p ( y ) y (26)

This differential equation can only be true for any x and y, x and y are

independent, if and only if the ratio p ( x ) p ( x ) x = p ( y ) p ( y ) y defined by (26) is a constant. That is, if

p ( x ) p ( x ) x = p ( y ) p ( y ) y = c . (27)

Consider p ( x ) p ( x ) x = c in (27) and rearrange to have

p ( x ) p ( x ) = c x . (28)

Integrating Equation (28) gives

ln p ( x ) = c x 2 2 + k 1 so that p ( x ) = k e c x 2 2 ; where k = e k 1 . (29)

By third assumption, c must be negative so that we write the probability function (29)

p ( x ) = k e c 2 x 2 ; where c + (30)

If there is a horizontal shift of target from the origin to an arbitrary point μ which now mark the new center/target, then the probability function in (30) becomes

p ( x ) = k e c 2 ( x μ ) 2 (31)

Differentiating (31) and set the derivative equal to zero gives

p ( x ) = c k ( x μ ) e c ( x μ ) 2 2 = 0 (32)

since e c ( x μ ) 2 2 0 implies x = μ . Therefore, Equation (31) has maximum value at x = μ and point of inflexion at x = μ ± 1 k . Obviously, (31) has given

us the basic form of the Gaussian distribution with constants k and c, and domain of X as to . Therefore, for Equation (31) to be regarded as a proper probability density function, the total area under the curve must be 1. That is

k e c 2 ( x μ ) 2 d x = 1 (33)

For a symmetric function f ( x ) , f ( x ) d x = 2 0 f ( x ) d x . Applying this property to Equation (33) yields

0 e c 2 ( x μ ) 2 d x = 1 2 k . (34)

Squaring both sides of (34) to get

0 e c 2 ( x μ ) 2 d x 0 e c 2 ( y μ ) 2 d y = 1 2 k × 1 2 k (35)

This is possible since x and y are just dummy variables. Recall that x and y are also independent, so we can write the product in LHS of (35) as a double integral to produce

0 0 e c 2 [ ( x μ ) 2 + ( y μ ) 2 ] d x d y = 1 4 k 2 . (36)

Putting x μ = z d x = d z and y μ = w d y = d w in the preceding Equation (36) gives

0 0 e c 2 [ z 2 + w 2 ] d z d w = 1 4 k 2 . (37)

The double integral (37) can be evaluated using polar coordinates as z = r cos θ and y = r sin θ with Jacobian (J) of the transformation as

J = | ( z , w ) ( r , θ ) | = | d z d r d z d θ d w d r d w d θ | = | cos θ r sin θ sin θ r cos θ | = r , (38)

and

z 2 + w 2 = ( r cos θ ) 2 + ( r sin θ ) 2 = r 2 . (39)

So, Equation (37) now becomes

0 π 2 0 e c r 2 2 | J | d r d θ = 0 π 2 0 e c r 2 2 r d r d θ = 1 4 k 2 . (40)

Evaluating the double integral 0 π 2 0 e c r 2 2 r d r d θ in Equation (40) by first letting u = c r 2 2 , and solving for k in the resulting equation yields

k = c 2 π (41)

Putting (41) in (31), the probability density function, p ( x ) , becomes

p ( x ) = c 2 π e c 2 ( x μ ) 2 (42)

Again, integration of probability function over its domain gives 1. Therefore, from (42)

p ( x ) d x = c 2 π e c 2 ( x μ ) 2 d x = 2 0 c 2 π e c 2 ( x μ ) 2 d x = 1 (43)

Further simplification of the preceding Equation (43) gives

0 e c 2 ( x μ ) 2 = π 2 c (44)

One of the important goals in mathematical theory of statistics is to obtain the mean and variance of any probability function under study. The mean, μ , is defined to be the value of the integral x p ( x ) d x . The variance, σ 2 , is the value of the integral ( x μ ) 2 p ( x ) d x . Therefore, using Equation (42),

σ 2 = ( x μ ) 2 c 2 π e c 2 ( x μ ) 2 d x = 2 0 ( x μ ) 2 c 2 π e c 2 ( x μ ) 2 d x (45)

or equivalently as

σ 2 = 2 c π 0 ( x μ ) [ ( x μ ) e c 2 ( x μ ) 2 ] d x (46)

consider Equation (46) and using integration by part ( u d v = u v v d u ) with u = x μ d u = d x and d v = ( x μ ) e c 2 ( x μ ) 2 d x v = 1 c e c 2 ( x μ ) 2 , we have

σ 2 = 2 c π [ x μ c e c 2 ( x μ ) 2 0 1 c e c 2 ( x μ ) 2 d x ] = 2 c π [ 1 c lim n ( x μ ) e c 2 ( x μ ) 2 | 0 n + 1 c 0 e c 2 ( x μ ) 2 d x ] = 2 c π [ 1 c lim n ( n μ ) e c 2 ( n μ ) 2 + 1 c 0 e c 2 ( x μ ) 2 d x ] = 2 c π [ 0 + 1 c 0 e c 2 ( x μ ) 2 d x ] = 1 c 2 c π 0 e c 2 ( x μ ) 2 d x

putting (44) in the preceding equation above, gives

σ 2 = 1 c × 2 c π × π 2 c c = 1 σ 2 (47)

Substituting (47) in (42), the derived probability density function has form

p ( x ) = 1 2 π σ 2 e 1 2 σ 2 ( x μ ) 2 = 1 σ 2 π e 1 2 ( x μ σ ) 2 , < x < . (48)

Based on the three aforestated basic assumptions, we have easily derived Equation (48) famously known anywhere in the whole world as Normal or Gaussian distribution function with mean μ and standard deviation σ .

To verify that Equation (19) is a proper probability density function with parameters μ and σ is to show that the integral

I = 1 σ 2 π exp [ 1 2 ( x μ σ ) 2 ] d x

is equal to 1.

Change variables of integration by letting z = x μ σ , which implies that d x = σ d z . Then

I = 1 σ 2 π e z 2 2 σ d z = 2 2 π 0 e z 2 2 d z = 2 π 0 e z 2 2 d z

so that

I 2 = [ 2 π 0 e x 2 2 d x ] [ 2 π 0 e y 2 2 d y ] = 2 π 0 0 e x 2 + y 2 2 d x d y

Here x , y are dummy variables. Switching to polar coordinate by making the substitutions x = r cos θ , y = r sin θ produces r as the Jacobian of the transformation. So

I 2 = 2 π 0 π 2 0 e r 2 2 r d r d θ

Put a = r 2 2 d r = d a r . Therefore,

I 2 = 2 π 0 π 2 0 e a r d a r d θ = 2 π 0 π 2 [ e a ] 0 d θ = 2 π 0 π 2 d θ = 1

Thus I = 1 , indicating that (48) is a proper probability density function. Other properties of the distribution such as; moments, moments generating function, cumulant generating function, characteristics function, parameter estimation and the likes can be found in [14] [15] [16].

4. Conclusion

While working with the outlined objective, we are able to establish that there exists an approach that is not only serving as an alternative proof of derivation of the Gaussian probability density function but also free from rigorous mathematical analysis and independent of Lemmas and Theorems. This paper can be classified as a theoretical study of Gaussian distribution and can serve as an excellent teaching reference in probability and statistics classes where only basic calculus and skills to deal with algebraic expressions, Maclaurin series expansion and Euler distribution of second kind (gamma function) are the only background requirements.

Acknowledgements

The authors are highly grateful to the editor and anonymous referees for reading through the manuscript, constructive comments and suggestions that helped in the improvement of the revised version of the paper.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Van der Vaart, A.W. (1998) Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge.
https://doi.org/10.1017/CBO9780511802256
[2] Blume, J.D. and Royall, R.M. (2003) Illustrating the Law of Large Numbers (And Confidence Intervals). The American Statistician, 57, 51-57.
https://doi.org/10.1198/0003130031081
[3] Lesigne, E. (2005) Heads or Tails: An Introduction to Limit Theorems in Probability. American Mathematical Society, Providence, Volume 28 of Student Mathematical Library.
https://doi.org/10.1090/stml/028
[4] Walck, C. (2007) Handbook on Statistical Distributions for Experimentalists. Particle Physics Group, Fysikum University of Stockholm, Stockholm.
[5] Proschan, M.A. (2008) The Normal Approximation to the Binomial. The American Statistician, 62, 62-63.
https://doi.org/10.1198/000313008X267848
[6] Shao, J. (1999) Mathematical Statistics. Springer Texts in Statistics, Springer Verlag, New York.
[7] Soong, T.T. (2004) Fundamentals of Probability and Statistics for Engineers. John Wiley & Sons Ltd., Chichester.
[8] Feller, W. (1973) An Introduction to Probability Theory and Its Applications. Volume 1, Third Edition, John Wiley and Sons, Hoboken.
[9] Adeniran, A.T., Ojo, J.F. and Olilima, J.O. (2018) A Note on the Asymptotic Convergence of Bernoulli Distribution. Research & Reviews: Journal of Statistics and Mathematical Sciences, 4, 19-32.
[10] Inlow, M. (2010) A Moment Generating Function Proof of the Lindeberg-Lévy Central Limit Theorem. The American Statistician, 64, 228-230.
https://doi.org/10.1198/tast.2010.09159
[11] Bagui, S.C., Bhaumik, D.K. and Mehra, K.L. (2013) A Few Counter Examples Useful in Teaching Central Limit Theorem. The American Statistician, 67, 49-56.
https://doi.org/10.1080/00031305.2012.755361
[12] Bagui, S.C., Bagui, S.S. and Hemasinha, R. (2013) Non-Rigorous Proof’s Stirling’s Formula. Mathematics and Computer Education, 47, 115-125.
[13] Bagui, S.C. and Mehra, K.L. (2016) Convergence of Binomial, Poisson, Negative-Binomial, and Gamma to Normal Distribution: Moment Generating Functions Technique. American Journal of Mathematics and Statistics, 6, 115-121.
[14] Casella, G. and Berger, R.L. (2002) Statistical Inference. Second Edition, Duxbury Thomson Learning: Integre Technical Publishing Co., Albuquerque.
[15] Young, G.A. and Smith, R.L. (2005) Essentials of Statistical Inference. Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge.
https://doi.org/10.1017/CBO9780511755392
[16] Podgorski, K. (2009) Lecture Notes on Statistical Inference. Department of Mathematics and Statistics, University of Limerick, Limerick.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.