Derivation of Gaussian Probability Distribution: A New Approach ()
1. Introduction
A well celebrated, fundamental probability distribution for the class of continuous functions is the classical Gaussian distribution named after the German Mathematician Karl Friedrich Gauss in 1809.
Definition 1.1 Let
and
be constants with
and
. The function
(1)
is called the normal probability density function of a random variable X with parameters
and
.
Both in theories and applications, without element of equivocation, the Gaussian distribution function is the most essential and widely referencing distribution in statistics.
The well-known method of deriving this distribution first appeared in the second edition of the Doctrine of Chances by Abraham de Moivre (hence, de Moivre’s Laplace limit theorem) published in 1738 ( [1] [2] [3] [4] [5]). The mathematical statement of the popular de Moivre’s theorem follows.
Theorem 1.1 (de Moivre’s Laplace limit theorem) As n grows large (
), for x in the neighborhood of np, for moderate values of p (
and
), we can approximate
(2)
Explicitly, the theorem asserts that suppose
, and let p and q be probabilities, with
. The function
(3)
called the binomial probability function converges to the probability density function of the normal distribution as
with mean np and standard deviation
.
Although, De Moivre proved the result for
( [6] [7]). [8] extended and generalized the proof to all values of p (probability of success in any trial) such that p is not too small and not too big. Feller result was expounded by [9]. [10] [11] [12] [13] used uniqueness property of moment generating function technique to proof the same theorem.
In this paper, we attempt to find an answer to the question: is there any alternative procedure to the derivation of Gaussian probability density function apart from de Moivre’s Laplace limit theorem approach which relies heavily on many Lemmas and Theorems (Stirling approximation formula, Maclaurin series expansion etc.), as evidenced by the work of [8] and [9] ?
2. Existing Technique
This section presents the summary proof of the existing de Moivre’s Laplace limit theorem. First and foremost, the study state with proof, the most important lemma of the de-Moivre’s Laplace limit theorem, Stirling approximation principle.
Lemma 2.1 (Stirling Approximation Principle) Given an integer
, the factorial of a large number n can be replaced with the approximation
Proof 2.1 This lemma can be derived using the integral definition of the factorial,
(4)
Note that the derivative of the logarithm of the integrand can be written
(5)
The integrand is sharply peaked with the contribution important only near
. Therefore, let
where
, and write
(6)
Recall that the Maclaurin series of
. Therefore,
(7)
Taking the exponential on both sides of the preceding Equation (7) gives
(8)
Plugging (8) into the integral expression for
, that is, (4) gives
(9)
From (9), let
and considering
and
as a dummy variable such that
(10)
Transforming
from algebra to polar coordinates yields
,
which implies
with Jacobian (J) of the transformation as
(11)
Hence,
(12)
Therefore,
. Substituting for I in (9) gives
(13)
We now begin with proof of theorem (1.1) using the popular existing technique.
Proof 2.2 Using the result of lemma (2.1), Equation (3) can be rewritten as
(14)
Multiplying both numerator and denominator of Equation (14) by
to get
(15)
Since x is in the neighborhood of np, change variables
, where
measures the distance from the mean, np, of the binomial and the measured quantity x. Re-write (15) in terms of
and further simplify as follow
to get
(16)
Note that
. Therefore, rewriting (16) in exponential form to have
(17)
Suppose
, using Maclaurin series
and similarly
. So that,
and
. As a result,
(18)
Recall that
which implies that
. From binomial distribution
, and
which implies that
. Making appropriate substitution of these in the Equation (18) yields
(19)
The theorem confirmed.
We recommend that readers interested in the detailed proof of the theorem to consult the study expounded by [9].
3. The Proposed Technique
Suppose a random experiment of throwing needle or any other dart related objects at the origin of the cartesian plane is performed with the aim of hitting the centre (see Figure 1).
Due to human nature of inconsistency or lack of perfection, varying results in the throwing generate random errors. To make the derivation possible and less rigorous, we make the following assumptions:
1) The errors are independent of the orientation of the coordinate system.
2) Errors in perpendicular directions are independent. This means that being too high doesn’t alter the probability of being off to the right.
Figure 1. The possible results of the dart experiment.
3) Small errors are more likely than large errors. That is, throwings are more likely to land in region P than either Q or R, since region P is closer to the target (origin). Similarly, for the same reason, region Q is more likely than region R. Furthermore, there is higher possibility or tendency of hitting region V than either S or T, since V has the wider or bigger surface area and the distances from the origin are approximately the same.
From Figure 2, let the probability of the needle falling in the vertical strip from x to
be denoted as
. Similarly, the probability of the needle falling in the horizontal strip from y to
be
. Obviously, the function cannot be constant, due to the stochastic nature of the experiment. In this study, our interest is to know and obtain the form and characteristics of the function
. From second assumption, the probability of the needle falling in the shaded region ABCD (see Figure 2) is
Note that any regions r unit from the origin with area
has the same probability which is a consequence of the assumption that errors do not depend on the orientation. We can say that
(20)
where
(21)
from fundamental rule of Calculus, differentiating (using product rule) both sides of Equation (21) with respect to
gives
(22)
Here,
since
is independent of orientation. By transformation to polar coordinates,
and
, we can rewrite the derivatives in Equation (22) as
Figure 2. The typical example of the experiment.
(23)
Using chain rule of differentiation, (23) becomes
(24)
Rewriting Equation (24) again by replacing
with x and
with y yields
(25)
The above differential equation can be put in a form such that it can be solved using variable separable technique as
(26)
This differential equation can only be true for any x and y, x and y are
independent, if and only if the ratio
defined by (26) is a constant. That is, if
(27)
Consider
in (27) and rearrange to have
(28)
Integrating Equation (28) gives
(29)
By third assumption, c must be negative so that we write the probability function (29)
(30)
If there is a horizontal shift of target from the origin to an arbitrary point
which now mark the new center/target, then the probability function in (30) becomes
(31)
Differentiating (31) and set the derivative equal to zero gives
(32)
since
implies
. Therefore, Equation (31) has maximum value at
and point of inflexion at
. Obviously, (31) has given
us the basic form of the Gaussian distribution with constants k and c, and domain of X as
to
. Therefore, for Equation (31) to be regarded as a proper probability density function, the total area under the curve must be 1. That is
(33)
For a symmetric function
,
. Applying this property to Equation (33) yields
(34)
Squaring both sides of (34) to get
(35)
This is possible since x and y are just dummy variables. Recall that x and y are also independent, so we can write the product in LHS of (35) as a double integral to produce
(36)
Putting
and
in the preceding Equation (36) gives
(37)
The double integral (37) can be evaluated using polar coordinates as
and
with Jacobian (J) of the transformation as
(38)
and
(39)
So, Equation (37) now becomes
(40)
Evaluating the double integral
in Equation (40) by first letting
, and solving for k in the resulting equation yields
(41)
Putting (41) in (31), the probability density function,
, becomes
(42)
Again, integration of probability function over its domain gives 1. Therefore, from (42)
(43)
Further simplification of the preceding Equation (43) gives
(44)
One of the important goals in mathematical theory of statistics is to obtain the mean and variance of any probability function under study. The mean,
, is defined to be the value of the integral
. The variance,
, is the value of the integral
. Therefore, using Equation (42),
(45)
or equivalently as
(46)
consider Equation (46) and using integration by part (
) with
and
, we have
putting (44) in the preceding equation above, gives
(47)
Substituting (47) in (42), the derived probability density function has form
(48)
Based on the three aforestated basic assumptions, we have easily derived Equation (48) famously known anywhere in the whole world as Normal or Gaussian distribution function with mean
and standard deviation
.
To verify that Equation (19) is a proper probability density function with parameters
and
is to show that the integral
is equal to 1.
Change variables of integration by letting
, which implies that
. Then
so that
Here
are dummy variables. Switching to polar coordinate by making the substitutions
,
produces r as the Jacobian of the transformation. So
Put
. Therefore,
Thus
, indicating that (48) is a proper probability density function. Other properties of the distribution such as; moments, moments generating function, cumulant generating function, characteristics function, parameter estimation and the likes can be found in [14] [15] [16].
4. Conclusion
While working with the outlined objective, we are able to establish that there exists an approach that is not only serving as an alternative proof of derivation of the Gaussian probability density function but also free from rigorous mathematical analysis and independent of Lemmas and Theorems. This paper can be classified as a theoretical study of Gaussian distribution and can serve as an excellent teaching reference in probability and statistics classes where only basic calculus and skills to deal with algebraic expressions, Maclaurin series expansion and Euler distribution of second kind (gamma function) are the only background requirements.
Acknowledgements
The authors are highly grateful to the editor and anonymous referees for reading through the manuscript, constructive comments and suggestions that helped in the improvement of the revised version of the paper.