Exact Statistical Distribution and Correlation of Human Height and Weight: Analysis and Experimental Confirmation

Abstract

The statistical relationship between human height and weight is of especial importance to clinical medicine, epidemiology, and the biology of human development. Yet, after more than a century of anthropometric measurements and analyses, there has been no consensus on this relationship. The purpose of this article is to provide a definitive statistical distribution function from which all desired statistics (probabilities, moments, and correlation functions) can be determined. The statistical analysis reported in this article provides strong evidence that height and weight in a diverse population of healthy adults constitute correlated bivariate lognormal random variables. This conclusion is supported by a battery of independent tests comparing empirical values of 1) probability density patterns, 2) linear and higher order correlation coefficients, 3) statistical and hyperstatistics moments up to 6th order, and 4) distance correlation (dCor) values to corresponding theoretical quantities: 1) predicted by the lognormal distribution and 2) simulated by use of appropriate random number generators. Furthermore, calculation of the conditional expectation of weight, given height, yields a theoretical power law that specifies conditions under which body mass index (BMI) can be a valid proxy of obesity. The consistency of the empirical data from a large, diverse anthropometric survey partitioned by gender with the predictions of a correlated bivariate lognormal distribution was found to be so extensive and close as to suggest that this outcome is not coincidental or approximate, but may be a consequence of some underlying biophysical mechanism.

Share and Cite:

Silverman, M. (2022) Exact Statistical Distribution and Correlation of Human Height and Weight: Analysis and Experimental Confirmation. Open Journal of Statistics, 12, 743-787. doi: 10.4236/ojs.2022.125044.

1. Introduction

Scientific interest in the values and correlations of anthropometric data trace back to the beginnings of modern statistics in the late 19th and early 20th Centuries with the researches of Quetelet, Galton, Pearson, and others [1] [2] [3]. These many studies established the Gaussian function as the mathematical expression best approximating the distribution of such human features as height, weight, and other biometric attributes. So pervasive has been the Gaussian distribution that it is ubiquitously referred to as the “normal distribution”, a reference probably dating back to Quetelet’s influential study of “the average man” (L’homme Moyen) in 1835 [4].

Although more refined studies have revealed that anthropometric data can show deviations from normality, attempts to find relationships between human height and weight remained uncertain, controversial and based on approximate or indirect methods such as data fitting [5], mechanical modeling [6], and gene identification [7]. Height and weight are of particular importance since they directly relate to the body mass index (BMI) [8], which is a measure of obesity and a risk factor for metabolic disease [9] and Alzheimer’s Disease [10]. A previous paper by Silverman and Lipscombe [11], to be referred to as Part I, determined the mathematically exact statistical distribution of BMI.

The present paper, to be regarded as Part II, provides evidence for the proposition that, in a healthy adult human population with access to adequate nutrition, height and weight are distributed as correlated bivariate lognormal random variables. This conclusion is supported by a comprehensive investigation comprising four independent components:

1) Tests of the correlation functions of the height and weight of a large anthropometric data set of individuals, partitioned by gender, against predictions of the bivariate lognormal distribution;

2) Search for nonlinear correlations, not attributable to the bivariate lognormal distribution, by means of a sensitive nonparametric algorithm known as distance correlation [12] [13];

3) Comparison of statistical tests of the empirical anthropometric data set with identical tests performed on comparably sized populations artificially created with correlated lognormal random number generators (RNGs);

4) Tests of the marginal distributions of height and weight against predictions of associated univariate lognormal distributions, and of the natural logarithms of height and weight against predictions of associated univariate normal distributions.

The outcome of this four-part analysis shows that linear and higher-order correlations of human height and weight are predictable in terms of the single Pearson correlation coefficient for height and weight employed in the lognormal probability density function (PDF). Moreover, agreement between the empirical data and the predictions of lognormal theory is so extensive as to suggest that the lognormal distribution of adult human height and weight is not approximate, but an exact distribution possibly characteristic of a more fundamental underlying biophysical mechanism.

1.1. Marginal Distributions of Height and Weight

Part I [11] reported the exact probability density function of BMI that follows mathematically from the defining relation

B = W / H 2 (1)

in which height H is expressed in meters (m), the corresponding weight W is expressed in the mass unit kilograms (kg), and B is the BMI expressed in kg/m2. It is to be stressed that H, W, and therefore B, are random variables, which means that information and interpretations extrapolated from the BMI PDF refer to populations, and not to individuals, an essential point not always understood by the lay news media [11] [14].

The specific form that the general BMI density function takes depends on the statistical distributions of H and W. Such empirical distributions are often represented visually as histograms. However, if two random variables are not independent, then the histogram of each is a graphical representation of the marginal distribution of that variable, and provides no information regarding the correlation of the two variables. In Part I evidence was provided to show that height and weight of individuals measured in the Anthropometric Survey of U.S. Army Personnel (ANSUR)—a large data base comprising 4082 males and 1986 females [15] —were highly correlated. Figure 1 shows scatter plots of W against H for the separate male and female cohorts. The two patterns suggest a significant linear correlation. The superposed curves, to be discussed later, are the lines of regression (dashed red) and the conditional expectation functions (solid blue). Descriptive statistics are given in Table 1 for the two cohorts, together with theoretically predicted values, where appropriate. Details of Table 1 will be discussed at relevant points throughout the paper.

Figure 1. Correlation of weight and height for males (left) and females (right) of the ANSUR population. Lines of regression (dashed red) are obtained by the method of least squares. The conditional expectation functions of weight given height (solid blue) are calculated from the lognormal PDF (17), using ANSUR parameters in Table 1.

Table 1. Descriptive statistics of height and weight of ANSUR population.

It is to be recalled that a random variable X is lognormal if its natural logarithm, symbolized by Y = ln X , is normal. As a matter of standard notation used in this paper, random variables are represented by upper case letters (e.g. X), and realizations of that variable (referred to as variates) are represented by lower case letters (e.g. x). Histograms of the natural logarithms of the ANSUR heights and weights, partitioned by gender, were shown in Part I to be satisfactorily described by PDFs of Gaussian form

p H ( N ) ( y ) = 1 2 π s H 2 e ( y m H ) 2 2 s H 2 (2)

p W ( N ) ( y ) = 1 2 π s W 2 e ( y m W ) 2 2 s W 2 . (3)

A more detailed demonstration of the normality of lnH and lnW will be given in Section 6. From relations (2) and (3) follow the parent lognormal PDFs

p H ( Λ ) ( x ) = 1 2 π s H 2 e ( ln ( x ) m H ) 2 2 s H 2 x (4)

p W ( Λ ) ( x ) = 1 2 π s W 2 e ( ln ( x ) m W ) 2 2 s W 2 x (5)

with location parameters ( m H , m W ) and scale parameters and ( s H , s W ) for height and weight, respectively. Numerical values of these parameters are given in Table 1.

Superscripts N and Λ in the above PDFs signify normal and lognormal distributions, as well as symbolize the associated random variables (RVs)

X = Λ ( m , s 2 ) Y = N ( m , s 2 ) } { Y = ln X X = e Y (6)

Note that parameters ( m , s 2 ) defining the random variables Y and X are the mean and variance of the normal variable Y. All statistics of the marginal distribution of X are predictable in terms of the parameters ( m , s 2 ) of Y [11] [16]. For example, the mean μ X , variance σ X 2 , skewness S k X , and kurtosis K X of X take the forms [17]

μ X = e m + 1 2 s 2 . (7)

σ X 2 = e 2 m ( e 2 s 2 e s 2 ) (8)

S k X = ( e s 2 + 2 ) e s 2 1 (9)

K X = e 4 s 2 + 2 e 3 s 2 + 3 e 2 s 2 3 . (10)

1.2. Correlation of Height and Weight

When analyzing lognormal variates, it is often strategically easier—indeed necessary—to work with the logarithms of the variates, since these are distributed normally. Figure 2 shows scatter plots (black) of the scaled variates of Y W ln W against scaled variates of Y H ln H for males and females respectively in the ANSUR data set. The scaled variables ( U , V )

U ( ln H m H ) / s H V ( ln W m W ) / s W (11)

with variates ( u , v ) are measured with respect to their means and divided by their standard deviations, and are therefore dimensionless quantities distributed as standard normal variables of mean 0 and variance 1 if the variables H, W are lognormal. Figure 2 likewise clearly shows a strong linear correlation of U and V. The slope of the line of regression (dashed red) in each black scatter plot directly yields the corresponding Pearson correlation coefficient r [11] defined by

Figure 2. Correlation of scaled log weight and scaled log height for males (top left) and females (top right) of the ANSUR population. Lower panels show corresponding scatter plots created with correlated lognormal random number generators using the same empirical parameters. The patterns display a strong linear correlation. The Pearson correlation coefficient is equal to the slope of the associated lines of regression (dashed).

r U V = ( Y H m H ) ( Y W m W ) s H s W cov ( Y H , Y W ) s H s W (12)

where cov signifies covariance, as defined in Equation (12). Angular brackets are used in this paper to indicate expectation values. The scatter plots in red in Figure 2 were obtained by computer simulation using correlated lognormal RNGs, the details of which will be discussed in a later section. Suffice it to say at this point that the computer simulations employed the same distribution parameters that were extracted from the ANSUR height and weight data and are labeled ( m H , m W , s H , s W , r ) for both male and female cohorts in Table 1. Lines of regression (dashed blue) to the simulated scatter plots are nearly identical to those of the empirical plots.

Once the correlation coefficient r has been determined empirically from the sample of normal variates ( u , v ) , the correlation coefficient ρ of the parent lognormal variables X H e Y H = H and X W e Y W = W can be calculated theoretically from the relation [11]

ρ t h y = e r s H s W 1 ( e s H 2 1 ) ( e s W 2 1 ) (13)

and compared with the empirical correlation coefficient obtained directly from the data according to

ρ e m p cov ( X H , X W ) σ H σ W = ( H μ H ) ( W μ W ) σ H σ W (14)

in analogy to Equation (12). The implementation of Equation (14) can be achieved algebraically and geometrically:

1) Algebraic method: If h i and w i are variates of H and W, such as plotted in Figure 1, where i = 1 , , n , then

ρ e m p = 1 n σ H σ W i = 1 n ( h i μ H ) ( w i μ W ) (15)

in the limit of large n.

2) Geometric method: ρ e m p is equal to the slope of the line of regression in a scatter plot of the scaled variables ( W μ W ) / σ W against ( H μ H ) / σ H . To deduce ρ e m p from the line of regression in the plot of the unscaled variables (H, W) in Figure 1, one multiplies the slope by the ratio of standard deviations σ H / σ W .

Five parameters ( m 1 , m 2 , s 1 , s 2 , r ) are required to specify the PDF of two correlated bivariate normal RVs ( Y 1 , Y 2 )

p Y 1 , Y 2 ( N , N ) ( y 1 , y 2 ) = 1 2 π s 1 s 2 1 r 2 e q Y / 2 q Y = 1 1 r 2 [ ( y 1 m 1 s 1 ) 2 2 r ( y 1 m 1 s 1 ) ( y 2 m 2 s 2 ) + ( y 2 m 2 s 2 ) 2 ] (16)

from which is derived the PDF of the parent bivariate lognormal RVs ( X 1 , X 2 ) [11]

p X 1 , X 2 ( Λ , Λ ) ( x 1 , x 2 ) = 1 2 π s 1 s 2 1 r 2 e q X / 2 x 1 x 2 q X = 1 1 r 2 [ ( ln ( x 1 ) m 1 s 1 ) 2 2 r ( ln ( x 1 ) m 1 s 1 ) ( ln ( x 2 ) m 2 s 2 ) + ( ln ( x 2 ) m 2 s 2 ) 2 ] (17)

Double superscripts N and Λ signify that both variables are normal in PDF (16) and lognormal in PDF (17). The expectation operations in Equation (12) and Equation (14) are performed respectively with PDF (16) and PDF (17).

If, as proposed in this paper, H and W are correlated bivariate lognormal variables, then all measurable statistical information concerning adult human height, weight, and their correlations, should be predictable from their joint distribution Equation (17) in terms of the five parameters (2 means, 2 variances, and 1 Pearson correlation) that define a given population. This statement has important implications for the study of obesity and its associated illnesses.

The BMI (1) was introduced by Quetelet in 1835 [18] and has been widely used up to present times by clinicians and epidemiologists as a proxy for obesity under the assumption that it correlates strongly with weight, but is independent of height. This assumption is itself predicated on a by-no-means obvious assumption that human weight in a healthy adult population varies as the square of an individual’s height. These assumptions will be examined later in this paper both empirically and theoretically. It is to be noted at this point, however, that both assumptions have elicited criticism, e.g. [19] [20] [21], leading to proposals of alternative power-law measures such as the Benn Index [22] and Rohrer’s Index [23], non-power law correlations such as [24] [25], and empirical parametric models such as [26], all purporting to determine more satisfactorily than BMI a single optimal relationship between human weight and height.

With regard to the goal of capturing the relationship between height and weight, the following general statistical principles must be emphasized. First, an exact PDF of the bivariate distribution of two correlated random variables provides all the statistical information that can be learned about these two correlated variables. And second, there is no single optimal mathematical expression—apart from the PDF and its equivalent transformations1—that completely captures the statistical relation between two correlated random variables. Rather, the PDF provides a potentially infinite number of mathematical expressions that, together, characterize the complete relation between the two variables. From a practical standpoint, however, the number of testable expressions that can meaningfully characterize the correlation of two variables is limited by the size of the sample, since the intrinsic uncertainty increases with the order (i.e. power) of the variables, and can eventually exceed the mean value for a fixed sample size. These points will be elaborated on in the following sections.

1.3. Organization

The remainder of this paper is organized as follows.

In Section 2 the relation between weight (W) and height (H) is examined by means of the conditional expectation functions of W, given H.

In Section 3 the proposition that human height and weight are correlated lognormal variables is tested by examining generalized correlation functions of data sets ( H , W ) and ( ln H , ln W ) .

In Section 4 the preceding data sets are each examined for nonlinear correlations beyond those attributable to the bivariate lognormal distribution by a procedure known as distance correlation.

In Section 5 the marginal distributions of ( H , W ) and ( ln H , ln W ) are tested against predictions of the univariate lognormal and normal distributions, respectively.

Section 6 examines the implications of the distribution of ( H , W ) for the body mass index.

Section 7 discusses the computer simulation of correlated lognormal variables.

And last, the results of this comprehensive investigation are summarized and interpreted in Section 8.

2. Conditional Expectation of Weight, Given Height

The conditional expectation W p | H of W p (for p = 1 , 2 , ) given H, defined by the ratio

W p ( h ) 0 w p p H , W ( Λ , Λ ) ( h , w ) d w 0 p H , W ( Λ , Λ ) ( h , w ) d w , (18)

is a function W p ( h ) of the continuous variate of H. Since W p ( h ) derives from the joint PDF of W and H, it is more informative than an empirical line of regression such as obtained by the method of least squares or, more generally, the method of maximum likelihood [27].

Calculation of W p ( h ) in Equation (18) requires evaluation of two integrals whose kernel is the PDF p H , W ( Λ , Λ ) ( h , w ) given by Equation (17). The integrals can be greatly simplified by the transformation (11) to variables2 u = ( ln ( h ) m H ) / s H and v = ( ln ( w ) m W ) / s W , which re-expresses the bivariate normal PDF (16) more simply in the form

f U , V ( u , v ) = 1 2 π 1 r 2 exp ( 1 2 ( 1 r 2 ) ( u 2 2 r u v + v 2 ) ) (19)

and, through the inverse transformation

h = exp ( m H + s H u ) w = exp ( m W + s W v ) (20)

leads to the conditional expectation

W p ( u ) ( exp ( s W v + m W ) ) p f ( u , v ) d v f ( u , v ) d v (21)

as a function of u. Both integrals in (21) are readily evaluated in closed form., after which replacement of the normal variable u in terms of the lognormal variable h yields the general relation

W p ( h ) = h p r s W / s H exp ( p m W + 1 2 p 2 s W 2 ( 1 r 2 ) p r m H s W / s H ) . (22)

The conditional expectation of the p t h power of W is thus seen to have a power-law dependence on H with exponent p r s W / s H . The lowest two orders p = 1, 2 are of primary interest

W 1 ( h ) = h r s W s H exp ( m W + 1 2 s W 2 ( 1 r 2 ) r m H s W s H ) (23)

W 2 ( h ) = h 2 r s W s H exp ( 2 m W + 2 s W 2 ( 1 r 2 ) 2 r m H s W s H ) (24)

and yield the conditional variance and standard deviation

var ( W | H ) = W 2 ( h ) ( W 1 ( h ) ) 2 = h 2 r s W s H e ( 2 m W 2 r m H s W s H ) ( e 2 s W 2 ( 1 r 2 ) e s W 2 ( 1 r 2 ) ) (25)

σ W ( h ) σ ( W | H ) = var ( W | H ) = h r s W s H e ( m W r m H s W s H ) ( e 2 s W 2 ( 1 r 2 ) e s W 2 ( 1 r 2 ) ) 1 2 (26)

Substituting in Equations (23) and (26) the bivariant lognormal parameters for each gender cohort of the ANSUR sample listed in Table 1 leads to the expressions

( W 1 ( h ) ± σ W ( h ) ) M = ( 27.7071 ± 4.0621 ) h 1.9987 ( W 1 ( h ) ± σ W ( h ) ) F = ( 23.2156 ± 3.1523 ) h 2.1923 (27)

where subscripts M and F signify male and female, respectively.

Plots of the conditional expectations W 1 ( h ) in Equation (27) comprise the solid blue curves superposed on the scatter plots in Figure 1. Although W 1 ( h ) is a power law and the line of regression (dashed red) is linear, the two curves are virtually indistinguishable over the densest part of the plots. However, it is important to bear in mind the conceptual difference between the two curves: the line of regression is merely a fit to data, whereas the mathematical relation (22), the specific exponent p r s W / s H , and the numerical values in expressions (27) are predictions drawn from the lognormal PDF. Figure 3 graphically displays the full information content of relation (27) by displaying the regions of ±1 standard deviation about the means for the two cohorts.

The numerical values of the exponents in relations (27) bear out the fundamental assumption underlying the use of BMI that weight is a quadratic function of height in a healthy adult population. Nevertheless, for a different set of values of the parameters ( s H , s W , r ) , such as may characterize a demographic different from the one represented by the ANSUR population, the lognormal predicted exponent could be different.

3. Tests of Correlation Functions of Height (H) and Weight (W)

Correlation functions of order (p, q), defined as follows

C p , q ( s H , s W , r ) ( H μ H σ H ) p ( W μ W σ W ) q (28)

R p , q ( r ) ( ln ( H ) m H s H ) p ( ln ( W ) m W s W ) q , (29)

Figure 3. Plots of the conditional expectation W 1 ( h ) for male (solid blue line) and female (solid red line) of the ANSUR populations centered on regions (blue for males, red for females) of ±1 standard deviation σ W ( h ) . The region of overlap appears purple.

generalize the standard covariance ( p = q = 1 ) expressed in relations (12) and (14). Expectations (28) and (29) are to be implemented with the bivariate lognormal PDF (17) where indices (p, q) independently take on integer values (1, 2, …).

As a matter of notation and terminology, evaluation of functions C p , q and R p , q by substitution of empirical parameters for the arguments yield the numerical correlation coefficients c p , q , r p , q , where r 1 , 1 r and c 1 , 1 ρ as conventionally defined. If the proposition that ( H , W ) are bivariate lognormal variables is valid, then c p , q and r p , q should be predictable from the arguments shown in Equations (28) and (29) and the parameters ( s H , s W , r ) listed by gender in Table 1. It is to be recalled that the first two parameters (standard deviations) were obtained empirically from the variates of the marginal distributions ln H and ln W of the ANSUR populations, whereas the third parameter (Pearson correlation coefficient) was obtained empirically from the joint distribution of ln H and ln W , such as exhibited in Figure 2. The associated sets of means ( m H , m W ) , which are also listed by gender in Table 1, drop out of relations (28) and (29) by virtue of their expressions as ratios. Further details are given in Part I [11]. To facilitate reading the tables to follow, indices of the correlation functions and coefficients will be expressed as arguments, e.g. R ( p , q ) and r ( p , q ) in the tables.

Calculation of the correlation functions C p , q and R p , q requires evaluation of the expectation values

C p , q = 0 0 ( h μ H σ H ) p ( w μ W σ W ) q p ( H , W ) ( Λ , Λ ) ( h , w ) d h d w (30)

R p , q = 0 0 ( ln ( h ) m H s H ) p ( ln ( w ) m W s W ) q p ( H , W ) ( Λ , Λ ) ( h , w ) d h d w . (31)

As in the previous section, the two integrals can be greatly simplified by the transformation (11) to variables u = ( ln ( h ) m H ) / s H and v = ( ln ( w ) m W ) / s W , which results in the bivariate normal PDF (19). Substitution for the means ( μ H , μ W ) and standard deviations ( σ H , σ W ) by use of Equations (7) and (8) then leads to the operational expressions

C p , q = ( e s H 2 1 ) p / 2 ( e s W 2 1 ) q / 2 ( e s H u 1 2 s H 2 1 ) p ( e s W v 1 2 s W 2 1 ) q f U , V ( u , v ) d u d v (32)

and

R p , q = u p v q f U , V ( u , v ) d u d v . (33)

In the symmetric case ( p = q ), the two correlation functions will simply be designated C p and R p .

The integral in relation (33) displays a number of symmetries: 1) R p , q = R q , p , 2) R p , q = 0 if p + q is odd, otherwise 3) if p + q is even, then R p , q is of order r d where d is the smaller of p and q. Symmetries (2) and (3) do not necessarily hold for C p , q . As a matter of terminology in the sections to follow, R p , q and C p , q will be referred to as “asymmetric odd” if p + q is odd and “asymmetric even” if p + q is even.

3.1. Calculation and Measurement of Correlation Functions R p , q

Because the correlation of U and V is determined entirely by the parameter r in the probability density f ( u , v ) , it is useful to examine the structure of f ( u , v ) graphically. The left panels of Figure 4 display a sequence of theoretical 3-dimensional density plots for correlation coefficients r = 0.0, 0.5, and 0.95, which span nearly the entire positive range of r. The maximum r = 0.95 was chosen, rather than r = 1, because the latter value is simply a straight line coincident with the diagonal axis. The right panels view the underside of the density plots, or, equivalently, the projection of the plots onto the ( u , v ) plane. In the top panels, vectors U and V are independent (r = 0), and f U , V ( u , v ) factors into a product f U ( u ) f V ( v ) . Since the exponential in f U , V ( u , v ) for r = 0 is the sum u 2 + v 2 , a transformation of variables converts that sum into the square of a radial variable, which accounts for the circular symmetry of the top right panel. As the correlation coefficient increases toward +1, the density function profile becomes increasingly linear along the diagonal for which u v > 0 , as clearly shown in the bottom panels. For r 1 , the profile (not shown) would approach linearity along the diagonal for which u v < 0 .

Whereas the left side of Figure 4 shows the actual probability density profiles, the images on the right side show smooth shapes to which scatter plots of a sample of discrete paired variates ( u i , v i ) , i = 1 , , n , approach as n . This is borne out by Figure 5, which shows simulated plots of n = 2000 pairs of correlated standard normal variates of increasing correlation coefficient r. In each panel, the quantity “rho” designates the correlation parameter supplied to the RNGs; the quantity “slope” is the slope of the line of regression (dashed blue), which equals the actual correlation coefficient of the simulated sample. The two numbers in each scatter plot are close, but not identical because the scatter plot comprises a finite random sample. The connection between the PDF f U , V ( u , v ) and the scatterplot of ( U , V ) for each value of r is particularly clear when one compares the right side panels of Figure 4 to the corresponding plots of Figure 5. The orientations of the two sets of figures may be different, but the distributions are invariant to orientation.

Correlation function R p , q ( r ) expressed in Equation (33) can be evaluated in closed form, and depends only on the Pearson coefficient r. Table 2 lists the first six orders of the symmetric correlation functions and the most pertinent of the asymmetric correlation functions. It is seen that beyond the basic covariance (12), the higher-order symmetric correlations are increasingly nonlinear in r. Asymmetric correlation functions of the form R p , 1 in Table 2, where p is the exponent of the scaled variable for ln H , address the issue (referred to in the

Figure 4. Left Panels: Probability density profiles f ( u , v ) of correlated standard normal variates ( u , v ) for correlation coefficients r = 0 (top), 0.5 (center), 0.95 (bottom). Right Panels: Profiles as viewed from the underside of the associated density plots. The patterns are the smooth shapes of scatter plots of discrete samples in the limit of infinite sample size.

Figure 5. Simulated scatter plots of 2000 correlated pairs of standard normal variates with correlation coefficients increasing from 0 to nearly 1. In each plot, “rho” is the numerical value of the correlation parameter supplied to the RNGs, and “slope” (of the line of regression) is the actual correlation coefficient produced by the simulation.

Table 2. Correlation functions R (p, q) of powers of standard normal variables: Up and Vq.

Introduction) concerning how weight correlates with powers of height. Also listed are asymmetric correlation functions characterizing how powers of weight correlate with powers of height.

Although the Pearson correlation coefficient r is the expectation value of a composite random variable UV, it is regarded here as a nondistributed quantity since the values of r for male and female cohorts used throughout this paper are fixed parameters extracted from the ANSUR data. The same is true of the means ( m H , m W ) and variances ( s H 2 , s W 2 ) for the male and female cohorts. The variance of the correlation coefficient r p , q therefore characterizes only the variation of the product U p V q in the defining integral and can be expressed by [28]

var ( r p , q ) = r 2 p , 2 q r p , q 2 . (34)

It then follows that the standard error (se) of r p , q takes the form

s e ( r p , q ) = var ( r p , q ) n = r 2 p , 2 q r p , q 2 n (35)

where n is the sample size. By the same reasoning, the standard error of the correlation coefficient c p , q is

s e ( c p , q ) = var ( c p , q ) n = c 2 p , 2 q c p , q 2 n (36)

since c p , q likewise depends on the fixed ANSUR parameters. In general, however, the distribution function and moments of even the lowest correlation coefficient r 1 , 1 are difficult to obtain in closed form [29], and, at the time of writing, the author knows of no calculation in the literature of the exact closed-form distribution functions and moments of higher-order correlation coefficients of two normal or two lognormal variables.

Theoretical and empirical ANSUR correlation coefficients r p , q are displayed in Table 3 for symmetric indices up to p = 6 and for a selection of asymmetric indices ranging from (2, 1) to (6, 4). For other correlation orders, the corresponding standard errors were too large relative to the means for a comparison of theory and experiment to be meaningful. Examination of Table 3 shows striking agreement between lognormal theory and the ANSUR data.

For all but two correlation coefficients of the female cohort listed in the table, the magnitude of the difference between theoretical (thy) and empirical (emp) coefficients did not exceed 1 standard error. In other words, assuming, as justified by the Central Limit Theorem [30], that the relative error

z = ( r p , q | e m p r p , q | t h y ) / s e ( r ) follows a normal distribution, then rejection of the hypothesis r p , q | e m p = r p , q | t h y at the conventional 5% threshold would require

| z | 1.65 [31], which was not the case for any of the correlation coefficients of the female cohort. The relative errors of the two exceptional coefficients r 3 , 2 and r 4 , 3 were approximately 1.074 and 1.243, respectively.

Table 3. Correlation coefficients r (p, q) of (Scaled lnH)pand (Scaled lnW)q.

Relative errors of the male cohort were overall larger than those of the female cohort, although most did not exceed the 5% threshhold value of 1.65. Exceptions occurred primarily among the higher orders of the asymmetric odd coefficients whose theoretical means were zero and standard errors large. Under such circumstances, large deviations are to be expected and would require a larger sample size for resolution. Moreover, expressions (35) and (36) give lower limits for the standard errors since, in accordance with stated assumptions, they do not take account of the variation in lognormal parameters. It is therefore likely that a more exact estimate of the relative errors would be lower than those listed in Table 3.

In summary, to appreciate how extensive and close is the agreement of the theoretical and empirical correlations displayed in Table 3, one must bear in mind the following context. Theoretical predictions of r p , q were obtained from the bivariate normal distribution of ln H and ln W ; empirical values of r p , q were obtained from the natural logarithms of the raw ANSUR sample. If adult heights and weights were not distributed lognormally, then the comprehensive correspondence by pure chance, especially in the female cohort, of these two sets of numbers would be extremely improbable. For example, if either or both of the attributes of height and weight were themselves normally distributed, as had long been assumed, the logarithm of the variates would depart significantly from a normal distribution, as was demonstrated in Part I [11].

Nevertheless, the results in Table 3 raise a curious question. Why does the female cohort appear to bear out the predictions of lognormal theory more closely than the male cohort despite the fact that the number of men sampled is about twice that of women? As discussed in Part I [11], the Anthropometric Survey of U.S. Army Personnel was undertaken to obtain data representative of the “Total Army” [15] with regard to making accurate decisions concerning clothing, protective equipment, workspaces, and other size-dependent, work-related matters. The survey measured more than 90 human attributes directly and compiled data demographically in terms of race, ethnicity, gender, age, and geographic location. For the analyses in this paper and in Part I, the data were partitioned by gender only. Therefore, both the male and female cohorts can be regarded as diverse samples of fundamentally healthy adults. However, since there are considerably fewer women in the U.S. Army than men, it is conceivable that, irrespective of other demographic characteristics, the women who joined the U. S. Army and took part in the ANSUR sample formed a more homogeneous group in regard to body type and physical fitness than the men. Such an explanation would seem likely, since there is no biophysical basis to believe that male height and weight would be statistically distributed by a probability function of different mathematical form than female height and weight.

3.2. Density Plots Associated with Correlation Functions R p , q

Whereas the correlation coefficients r p , q are single numbers quantifying the correlation of U p and V q (i.e. powers of the scaled variables of ln H and ln W ) for a specified sample, the actual scatter plots of the two sets of variates yield a more comprehensive visual perspective of their correlation. Figure 6 shows such plots for symmetric correlation coefficients of orders p = 2, 3, 4, 5 of the ANSUR male cohort (sample size 4082). The patterns for the female cohort are similar, although less dense (sample size 1986) and not shown.

The correlations expressed in Figure 6 are highly nonlinear in two ways. Geometrically, the overall patterns of order p > 1 do not show a well-defined linear variation such as seen in Figure 2 for p = 1 . And algebraically, the associated theoretical correlation function is of order r p in the Pearson correlation parameter, as summarized in Table 2. The four plots in Figure 6 illustrate the characteristic property that, with increasing order p, the density of points clusters

Figure 6. Empirical scatter plots of Vp against Up for p = 2 (top left), 3 (top right), 4 (bottom left), 5 (bottom right), compiled from the ANSUR data. Patterns show a highly nonlinear correlation of weight and height. U is the scaled variable for lnH; V is the scaled variable for lnW.

more tightly about the coordinate axes. Fluctuations for even p extend primarily into the first quadrant (since both variates are positive). Fluctuations for odd p extend primarily into the first and third quadrants for r > 0 (and into the second and fourth quadrants for r < 0 ; not shown).

The empirical patterns and properties in Figure 6 are reproduced nearly identically (apart from random fluctuations) in the computer simulated patterns shown in Figure 7. The simulations were created by means of correlated lognormal RNGs using the ANSUR lognormal parameters in Table 1. The reproduction of empirical correlation scatter plots (and correlation coefficients) by computer simulation extends as well to asymmetric even and odd correlation functions, as displayed in Figure 8 for the pairs of variables ( U 2 , V ) , ( U 3 , V ) , and ( U 3 , V 2 ) . Plots in black are empirical; those in red are simulated for a population of corresponding size. Altogether, these results support the conclusion that the nonlinear correlations of height and weight stem exclusively from the properties of the lognormal distribution function and depend on no correlation parameters other than the Pearson coefficient r.

Probability density functions for the correlated powers ( U p , V q ) , which yield

Figure 7. Simulated scatter plots of Vp against Up for p = 2 (top left), 3 (top right), 4 (bottom left), 5 (bottom right), obtained from correlated lognormal RNGs using the lognormal parameters in Table 1. The shapes of the patterns and extent of fluctuations closely resemble the corresponding empirical plots in Figure 6.

Figure 8. Empirical (black) and simulated (red) scatter plots of Vq against Up for the asymmetric orders (p, q) = (2, 1) (top), (3, 1) (middle), (3, 2) (bottom).

the patterns approached by scatter plots in Figures 6-8 in the limit of infinite sample size, can be constructed by means of the Dirac delta function as follows

f X , Y ( p , q ) ( x , y ) = f U , V ( u , v ) δ ( u p x ) δ ( v q y ) d u d v . (37)

The subscripts ( X , Y ) represent the random variables whose lower-case variates are respectively x = u p , y = v q . Powers of standard normal variables like U, V do not, in general, follow known, named distributions to which variables X and Y can be assigned [32]. A brief recapitulation of the properties and identities of the Dirac delta function is given in Part I [11] and in mathematical physics books [33]. The integral (37) can be evaluated in closed form, but gives rise to long, cumbersome expressions for p > 2 , which will not be reproduced here.

Plots of probability density (37) for the symmetric cases p = 2 and p = 3 are shown respectively in Figure 9 and Figure 10. Left-side panels show the density

Figure 9. Left panel: Plot of the probability density f X , Y ( 2 ) ( u 2 , v 2 ) for r = 0.5. Right panel: View from the underside highlights the profile to which the scatter plot of V2 against U2 in the top left of Figure 6 approches in the limit of infinite sample size.

Figure 10. Left panel: Plot of the probability density f X , Y ( 3 ) ( u 3 , v 3 ) for r = 0.5. Right panel: View from the underside shows the profile approached by the scatter plot of V3 against U3 in the top right of Figure 6 in the limit of infinite sample size.

patterns from above the ( u p , v p ) plane; right-side panels give complementary images from below. The function f X , Y ( 2 ) ( u 2 , v 2 ) in Figure 9 shows a concentration of probability along the vertical axis with density decreasing with distance into the first quadrant, as shown empirically in Figure 7 (top left). Function f X , Y ( 3 ) ( u 3 , v 3 ) in Figure 10 (left side) sharply delineates the “cross” of probability density along the coordinate axes, whereas the projection of the pattern onto the ( u 3 , v 3 ) plane (right side) captures the point distribution in the corresponding plot of Figure 7 (top right).

3.3. Calculation and Measurement of Correlation Functions C p , q

C p , q ( s H , s W , r ) in Equation (32) expresses the correlation of height and weight directly, rather than of their logarithms. The integral in Equation (32) can be evaluated in closed form of which the lowest orders pertinent to this paper are given in Table 4 for two arbitrary, but correlated, lognormal variables X 1 and X 2 with bivariate parameter set ( m 1 , m 2 , s 1 , s 2 , r ) . Beyond the symmetric order p = 4 and asymmetric order ( p , q ) = ( 4 , 2 ) expressions for the functions become overly long and not especially informative. Some points to note: 1) symmetric functions C p ( s H , s W , r ) are invariant under the interchange of parameters s H and s W ; 2) asymmetric odd functions C p , q do not identically vanish, as do asymmetric odd R p , q ; 3) none of the functions C p , q vanishes for r = 0, as do the functions R p , q (except those of the form R ( 2 m , 2 n ) where m, n are integers > 0).

Substitution of the ANSUR parameters of Table 1 for male and female cohorts into the functions of Table 4 and variance of Equation (36) yield the corresponding empirical correlation coefficients summarized in Table 5. Agreement of the empirical values with lognormal theory is again excellent, apart from the highest orders where the standard errors are large relative to the means and signify that a larger sample size is required.

4. Test for Nonlinear Correlations by the Method of Distance Correlation

If adult human height and weight are bivariate lognormal variables, then all measures of their correlation must be calculable from the PDF (17). Evidence for the proposition of bivariate lognormality has been supported up to this point by the agreement of measured and lognormally predicted correlation coefficients and comparison of empirical and lognormally simulated probability density plots. The question remains, however, as to whether there may be nonlinear correlations beyond those intrinsic to the bivariate lognormal distribution. Correlation of distances [12], provides a sensitive method for testing the independence of random vectors.

As initially presented by its developers, the term distance covariance (dCov) of two random vectors X and Y, defined by

V ( X , Y ) g X , Y ( t , s ) g X ( t ) g Y ( s ) , (38)

Table 4. Correlation functions C (p, q) of Lognormal variables X 1 p and X 2 q (Notation: E ( x ) exp ( x ) ).

Table 5. Correlation coefficients c (p, q) of (Scaled H)p and (Scaled W)q.

is a measure of the difference between the joint characteristic function (CF) [28]

g X , Y ( t , s ) = f X , Y ( x , y ) exp ( i ( t x + s y ) ) d x d y (39)

and the product of the marginal CFs

g X ( t ) = f X ( x ) exp ( i t x ) d x g Y ( s ) = f Y ( x ) exp ( i s y ) d y (40)

As indicated in Equations (39) and (40), the CF g Z is the Fourier transform of the corresponding probability density f Z of some specific random variable or set of random variables Z .

The actual evaluation of V ( X , Y ) in Equation (38), together with its properties and associated theorems, is given in Ref. [12]. For the purposes of this paper, sufice it to say that V ( X , Y ) involves a weighted integral of | g X , Y ( t , s ) g X ( t ) g Y ( s ) | 2 over the Fourier coordinates t, s. The associated quantity of distance variance (dVar), given by V ( X , X ) , would then be the same weighted integral over | g X , X ( t , s ) g X ( t ) g X ( s ) | 2 with analogous expressions for V ( Y , Y ) . The distance correlation (dCor), expressed by R ( X , Y ) , is then defined in terms of dCov and dVar by the relation

R ( X , Y ) | V ( X , Y ) | | V ( X , X ) | | V ( Y , Y ) | . (41)

Equation (41) resembles in form Equation (12) or Equation (14) for the Pearson correlation coefficient, but its properties are significantly different, as well as its empirical evaluation. The author is unaware of any theoretical derivations of the probability density function or statistical moments of R ( X , Y ) . However, if X and Y are standard normal variables, then R ( X , Y ) has been evaluated in the following closed form [12]

R ( N , N ) ( X , Y ) = ρ arcsin ( ρ ) + 1 ρ 2 ρ arcsin ( ρ / 2 ) 4 ρ 2 + 1 1 + ( π / 3 ) 3 (42)

where the superscript (N, N) signifies the special case of bivariate standard normality, and ρ is the associated Pearson correlation coefficient.

4.1. Statistical Application of Distance Correlation (dCor) to Height and Weight

The procedure for applying dCor to a statistical system is as follows: Given samples ( x i , y i ) for i = 1 , , n of two random variables X and Y, construct the statistic

A k , l = a k , l a k · a · l + a , (43)

where ( k , l = 1 , , n ) and

{ a k , l = | x k x l | , a k · = 1 n l = 1 n a k , l a · l = 1 n k = 1 n a k , l , a = 1 n 2 k , l = 1 n a k , l (44)

and the associated statistic

B k , l = b k , l b k · b · l + b (45)

where

{ b k , l = | y k y l | , b k · = 1 n l = 1 n b k , l b · l = 1 n k = 1 n b k , l , b = 1 n 2 k , l = 1 n b k , l (46)

The squares of the empirical dCov and dVar are then given by

V n 2 ( X , Y ) = 1 n 2 k , l = 1 n A k , l B k , l (47)

V n 2 ( X , X ) = 1 n 2 k , l = 1 n A k , l 2 V n 2 ( Y , Y ) = 1 n 2 k , l = 1 n B k , l 2 (48)

from which follows the square of the empirical dCor

R n 2 ( X , Y ) = V ( X , Y ) 2 V n 2 ( X , X ) V n 2 ( Y , Y ) (49)

corresponding to Equation (41). The statistic R n ( X , Y ) 1) ranges between 0 and 1, 2) is 0 only if X and Y are independent, and 3) approaches the theoretical dCor R ( X , Y ) in the limit of infinite sample size n [12].

4.2. Distance Correlation Test of the ANSUR Data

In the sequence of analyses of the ANSUR data to follow, the pair of variables ( X , Y ) was taken to be the scaled sets 1) ( H , W ) , 2) ( ln H , ln W ) , and 3) ( ln H , ( ln W ) red ) . It is to be recalled that a “scaled” variable is in dimensionless form of zero mean and unit variance. The subscript “red” in set (3) indicates that the scaled variates of ln W were reduced by subtraction of the regression of ln W on ln H . The reason for this reduction and the way it was implemented will be clarified shortly.

The algorithm (Equations (43) to (48)) leading to Equation (49) is straightforward to implement by computer. However, for sample sizes on the order of thousands, the computation time is impractically long. To circumvent this difficulty, a sampling procedure analogous to bootstrapping [34] [35] was employed.

Each bivariate pair ( h i , w i ) of height and weight in the ANSUR data set is labeled by an index i , referred to here as the “participant number”, that ranges from 1 to the full sample size n, which is n m = 4082 for the male cohort and n f = 1986 for the female cohort. Participants in the survey were apparently measured and recorded in random order, as shown in Figure 11, which displays scatter plots by gender of the scaled height and weight vs participant number. Although histograms of the variates, analyzed in part I [11], are precisely matched by lognormal distributions, the point density plots in Figure 11 are well represented by uniform distributions across the entire range of participants. In other words, each vertical slice of points of sufficient width contains a statistical spread of variates equivalent to any other vertical slice of the same width. Given this uniform density, the ranges n m and n f were respectively partitioned into 20 and 10 subgroups of 200 participants each, as shown in Figure 12. Distance correlation of height and weight was then evaluated for 50 consecutive participants in each subgroup, starting at the participant numbers marked by diamond plotting symbols in the figure.

For example, d C o r 1 was calculated from participants [200 - 249], d C o r 2 from partipants [400 - 449], and so on up to d C o r 20 from participants [4000 to 4049] for males and up to d C o r 9 from participants [1800 to 1849] for females. As with standard bootstrapping, this method of calculating dCor by repeated sampling not only circumvented what otherwise would have been an excessively long computer calculation, but it also provided a vector of dCor values from which to estimate dCor uncertainty in the absence of a known statistical distribution.

The results of the analyses of distance correlation are summarized in Table 6.

Section I of the table records the distance correlation of the scaled variables H and W. Particularly striking is the close agreement between the empirical values

Figure 11. Plots of height (left panels) and weight (right panels) of individual participants in the ANSUR sample: male cohort (top panels); female cohort (bottom panels).

(a) (b)

Figure 12. Illustration of the resampling strategy for calculation of distance correlation of participants’ height and weight in the ANSUR population. Diamond plotting symbols mark participant numbers which begin each subgroup of 50 participants to be sampled over the range (a) 200 to 4000 (male cohort), (b) 200 to 1850 (female cohort).

obtained from the ANSUR sample (columns 2 and 3 for male and female cohorts, respectively) and values created by computer simulation using a bivariate lognormal RNG (columns 4 and 5 for the closely corresponding sample sizes 4000 and 2000, respectively). Numerical values designated by “rho” at the top of columns 4 and 5 are the correlation parameters supplied to the RNG. These values

Table 6. Test of nonlinear correlations by means of distance correlation: dCor ( X , Y ) .

(from Table 1) correspond to the empirical Pearson correlation coefficients of the variables ( ln H , ln W ) . The empirical Pearson correlation coefficients ρ (columns 2 and 3) of the variables ( H , W ) are closely matched by the corresponding values (columns 4 and 5) obtained by computer simulation. In short, the dCor values of Section I are consistent with attributing the entire correlation of height and weight, including any nonlinear contributions, to the bivariate lognormal distribution.

Section II of the table records distance correlation of the scaled variables ( ln H , ln W ) . Computer simulated populations of sizes approximating the male and female ANSUR cohorts were generated by use of bivariate normal RNGs. Agreement between empirical and corresponding computer simulated values is again very close, especially for the female cohort. The outcome indicates that the full correlation of ln H and ln W is attributable to the bivariate normal distribution which, in turn, derives from the parent lognormal distribution function.

The distance correlation procedure tests random vectors for independence irrespective of their specific distributions, provided the first moments are finite [13]. It is a nonparametric test that can reveal correlations even when the Pearson correlation coefficient is null. However, if the Pearson correlation of two normal vectors is null, then those vectors are fully independenti.e. there is no latent nonlinear correlation. Section III of the table exploits this point to ascertain whether the correlation between height and weight exhibited in Figure 1 and Figure 2 have a nonlinear contribution not attributable to the parent lognormal or derived normal distributions.

The basic idea is to subtract from the ordinate of each point in the scatter plots in Figure 2 the corresponding ordinate of the line of regression. For correlations of scaled standard normal variables ( X , Y ) , the line of regression takes the simple form y = r x , where r is the slope of the line and is equal to the Pearson correlation coefficient. A scatter plot is then made of the reduced scaled log weight

( ln W ) r e d ( ln W m W ) / s W r ( ln H m H ) / s H (50)

against the scaled log height ( ln H m H ) / s H , as shown in Figure 13. The scatter plots (black points) for male (left) and female (right) ANSUR cohorts exhibit the same isotropic patterns (apart from fluctuations) as the simulated scatter plot for null correlation (rho = 0) in Figure 5. Quantitatively, the slopes of the lines of regression (dashed red) of the reduced plots are respectively 4.1118 × 10 11 (male cohort) and 4.1011 × 10 11 (female cohort). In other words, removal of the lines of regression from the empirical scatter plots of

Figure 13. Scatter pattern (black points) and associated line of regression (dashed red) of zero slope signifying a null Pearson correlation of log weight and log height when the log weight variates were reduced by corresponding values of the line of regression (dashed blue) of the original scatter plot (cyan points).

scaled ( ln H , ln W ) has resulted in two normal random vectors of null Pearson correlation coefficient—and therefore presumably statistically independent. For comparison, the reduced scatterplots (black points) in Figure 13 are superposed on the original scatterplots (cyan points) of Figure 2 with their lines of regression (dashed blue).

The dCor values in Section III provide quantitative confirmation of the statistical independence of height and weight upon removal of the Pearson linear correlation. In the subsection A based on repeated sampling of the entire population in samples of size 50, the empirical dCor values (approximately 0.25) agree closely with dCor values (approximately 0.23 to 0.24) produced by two independent standard normal RNGs. The small deviations are attributable in part to the fact that the resulting Pearson correlation coefficients of the simulated populations were not precisely 0, but in the range 0.01 to 0.02, a consequence of the fluctuations intrinsic to finite sampling.

Potentially problematic is the discrepancy between the empirical (as well as simulated) dCor values obtained by repeated sampling and the values predicted by Equation (42), which should be close to 0 for two independent normal random variables. However, Equation (42) is strictly valid only in the limit of an infinite population. Subsection B, based on single sampling of a much larger subpopulation of 1000 participants, shows that empirical dCor values dropped to approximately 0.06, in much closer agreement with Equation (42). Evaluation of the distance correlation of a sample of 1000 required computation times longer than 8 hours. Thus, to test rigorously whether dCor approaches 0 asymptotically as a function of sample size would require impractically long computation times.

Altogether, the three sections of Table 6 consistently support the conclusion that the observed correlation between height and weight can be accounted for entirely by a bivariate lognormal distribution. In other words, the five parameters defining the bivariate lognormal distribution of height and weight suffice to predict any measureable function or test of the correlation of height and weight of a healthy adult human population.

5. Marginal Statistics of ANSUR Height and Weight Data

Previous sections concentrated on the bivariate lognormal correlation of height and weight. This section examines the marginal statistics of H and W, which are predicted to follow the respective univariate lognormal distributions Λ ( m H , s H 2 ) and Λ ( m W , s W 2 ) , and on lnH and lnW, which are predicted to follow the respective univariate normal distributions N ( m H , s H 2 ) and N ( m W , s W 2 ) as discussed in Section 1.1.

Table 7 summarizes the outcomes of chi-square tests of fitness of the four variables H, W, lnH, lnW to their respective distributions, identified explicitly in column 3. As a reminder, the chi-square statistic χ ν 2 , in column 7 of the table, is determined empirically from the relation

( χ ν 2 ) e m p = 1 κ i = 1 κ ( O i E i ) 2 E i (51)

Table 7. Chi-square tests of goodness of fit of ANSUR height and weight.

where κ is the number of test categories (bins), O i is the observed value in the ith bin, and E i is the expected value in the ith bin. The subscript ν (Greek nu) is the number of degrees of freedom (d.o.f.) in column 5 equal to κ 1 . The chi-square tests were implemented with the Maple Statistics Package, which determined the number of bins as the integer closest to the square root of the sample size. The critical values in column 4 of the table are the values of χ ν 2 resulting in P-values of 5%, which is the conventional threshhold of statistical significance; i.e. a tested hypothesis is deemed unsupported if the P-value is below threshhold. A P-value is the probability of obtaining a test result at least as extreme as the observed result χ obs 2 , and is calculated from the expression

P = χ obs 2 p χ ν 2 ( z ) d z (52)

with chi-square PDF

p χ ν 2 ( z ) = z 1 2 ν 1 e z / 2 2 ν / 2 Γ ( ν / 2 ) . (53)

The outcomes summarized in the table show that all 8 propositions (the distributions of 4 variables of 2 genders) passed their respective chi-square tests with P-values far above threshhold. This means that the propositions cannot be rejected on the basis of these tests. It does not necessarily mean, however, that the propositions are true.

For further confirmation, consider again the information in Table 1. In the first section of the table, empirical values of the mean, standard deviation, skewness, and kurtosis of ANSUR heights H and weights W are compared with corresponding values predicted by lognormal expressions (7) to (10), based on the parameters in the second section of the table, derived from the variates of lnH and lnW. Agreement of experiment and theory is seen to be within 1 standard error (se) in most cases. Table 1 employed the following published estimators of the standard errors for skewness and kurtosis [36]

s e ( n ) S k = 6 n ( n 1 ) ( n 2 ) ( n + 1 ) ( n + 3 ) (54)

s e ( n ) K = 2 6 n ( n 1 ) 2 ( n 2 ) ( n + 5 ) ( n 2 9 ) . (55)

In the second section of Table 1 the empirical skewness of both lnH and lnW for both cohorts are close to zero, as expected for normal variables. Likewise, the empirical kurtosis is very close to 3, as expected for normal variables. Skewness is a measure of the asymmetry of a distribution about the mean. Kurtosis (from the Greek root for “bulging”) is a measure of the curving or arching of the tails of a distribution; in other words, kurtosis is an indicator of the extent of outliers, relative to the normal distribution.

Ordinarily, standardized statistical moments employed in physical science and medicine include at most only the first four orders (mean, variance, skewness, kurtosis). Beyond these, higher standardized moments, such as “hyperskewness” and “hyperkurtosis” [37], are rarely used in the author’s experience, presumably because they are less readily interpetable as well as have greater measures of uncertainty for a given sample size. Nevertheless, in testing a proposed statistical distribution, it is useful to examine these higher moments, particularly if the validity of all lower moments has been confirmed.

In the terminology and notation of this paper, the hyperstatistic S p ( X ) of the random variable X is the pth standardized central moment defined by the relation

S p ( X ) ( X μ X ) p ( X μ X ) 2 p / 2 μ p ( X ) μ 2 p / 2 ( X ) (56)

where μ X is the mean of X, and μ p ( X ) is the mean of the random variable

m p ( X ) ( X μ X ) p (57)

referred to as the pth central moment. To simplify symbolic notation in the ensuing text, the argument (X) will be omitted whenever the context is clear.

Substitution in relation (56) of the univariate lognormal PDF, mean, and variance leads to the operational expression

S p ( X ) = ( e s u e 1 2 s 2 ) p p X Λ ( u ) d u ( e 2 s 2 e s 2 ) p 2 (58)

where s is the standard deviation of the variable lnX. Skewness and kurtosis correspond respectively to p = 3, 4. Table 8 lists the theoretical expressions for hyperstatistics of order 1 through 6 derived from relation (58).

Table 8. Theoretical hyperstatistics of univariate lognormal distribution. (Notation: E ( x ) exp ( x ) ).

Table 9 summarizes the empirical results for hyperstatistics of orders p = 5, 6 for the variables H and W of the ANSUR population. The empirical entries (column 3) are the sample statistics

S p ( X ) e m p = m p ¯ ( m 2 ¯ ) p 2 = 1 n i = 1 n ( x i x ¯ ) p / ( 1 n i = 1 n ( x i x ¯ ) 2 ) p 2 (59)

where x ¯ is the sample mean

x ¯ = 1 n i = 1 n x i (60)

and the numerator

m p ¯ = 1 n i = 1 n ( x i x ¯ ) p (61)

is the expectation of the sample pth central moment (57). Theoretical entries in column 4 were calculated from Equation (58). Overall, agreement between theory and experiment appears reasonably close, but several statistics show what may be significant deviations. To ascertain whether any deviation between theory and experiment is statistically significant requires knowing the standard error of the mean statistic, but the author is unaware of any published expressions for the distributions or standard errors of hyperskewness and hyperkurtosis. To estimate the pertinent standard errors, three independent approaches were taken.

The first approach was to use the approximations of error propagation theory [38] together with expressions for variance and covariance of central moments in Chapter 10 of Ref. [28] to derive an estimate of the variance of hyperstatistic S p ( X )

Table 9. Test of hyperstatistics of height and weight of ANSUR population.

var ( S p ) = var ( m p ) μ 2 k + p 2 μ p 2 4 μ 2 k + 2 var ( m 2 ) p μ k μ 2 k + 1 cov ( m k , m 2 ) (62)

in which

var ( m p ) = μ 2 p μ p 2 + p 2 μ 2 μ p 1 2 2 p μ p 1 μ p + 1 (63)

and

cov ( m p , m q ) = μ p + q μ p μ q + p q μ 2 μ p 1 μ q 1 p μ p 1 μ q + 1 q μ p + 1 μ q 1 . (64)

The standard error is then

s e ( S p ) = var ( S p ) n . (65)

Theoretical evaluations of Equation (65) by means of the univariate lognormal PDF are recorded in column 4.

The second approach was to evaluate the means of all moments in Equation (62) by their sample expectations given by Equation (61). The resulting empirical standard errors are recorded in column 3.

In the third approach three independent simulations of the male and female ANSUR populations were made with correlated lognormal RNGs representing the variables H and W, from which empirical values of the individual hyperstatistics were obtained. The three values for each hyperstatistic (p = 5, 6) of the two variables (H, W) for the two genders (M, F) are listed in column 5 in the cells for the mean of S p ( X ) . The sample standard error of each set of 3 independent means was calculated from the relation

s e ( S p ( X ) ) = 1 n ( n 1 ) k = 1 3 ( S p ( X ) k S p ( X ) ¯ ) 2 (66)

and recorded as the first entry in the cells for standard error in column 5. Note that n is Equation (66) is 3 and not the ANSUR population size of 4082 males or 1986 females. Also, because a sample size of 3 is statistically small, one uses the unbiased sample estimate of variance in Equation (66) in which the denominator is n 1 [39]. The second entry (separated from the first by a vertical bar) is the difference between the largest and smallest of the three estimated means of each S p ( X ) .

In examining the three sets of estimated standard errors, one sees that they are approximately of the same magnitude, and that the empirical values of the hyperstatistics agree with theory within ± 2 s e for at least one of the three estimates. However, one glaring exception is the value of s e ( S 6 ( W ) M ) for the male cohort obtained by simulation, which is much larger than the other standard errors for the same statistic. This occurs because of what appears to be an exceptionally high value (~38) of the mean of S 6 ( W ) M returned by one of the simulations. This occurrence raises an important conceptual issue that calls for caution when estimating standard errors under conditions where the exact statistical distribution is unknown.

As pointed out in Chapter 10 of Ref [28], the approximations based on error propagation theory (or some variant thereof) give a valid measure of precision provided that the distribution of the statistic approaches normality in the limit of a sufficiently large sample. This is not the case for higher orders of the hyperstatistics. The mere fact that a mean value of the statistic S 6 ( W ) M can arise within just 3 simulations that is 6 times the theoretical and 17 times the empirical standard errors of error propagation theory shows that such outlying values occur with much higher probabilities than would be predicted by a normal distribution. Under such circumstances, the appropriate way to proceed is to determine whether the empirical mean values of the hyperstatistics fall within the range between the lowest and highest corresponding statistics obtained by simulation; in other words to rely on a kind of Monte Carlo validation. Evaluated this way, all the empirical hyperstatistics in Table 9 are seen to be consistent with theory.

Taken together, the chi-square tests and agreement of theoretical and empirical moments (or functions of moments) up to the 6th power of the variables support the propositions that H and W are marginally lognormal variables.

6. Implications for Body Mass Index (BMI)

The BMI, defined by the random variable B in Equation (1), has long been used as a measure of obesity and a risk factor for associated diseases under the assumption that it is correlated with weight but largely independent of height. In Section 2 the conditional expectation function of weight, given height, was derived on the basis of lognormal theory and shown to be very nearly a quadratic power law W ( H ) H 2 (27) for the ANSUR male and female cohorts. If weight varies as the square of height, then the BMI (1) would be unaffected by variations in height, or, in other words, statistically independent of height.

Figure 14 provides additional justification of the BMI assumption. The left panels of the figure display scatter plots of the correlation of scaled BMI and scaled height for the ANSUR male cohort (top), ANSUR female cohort (middle), and RNG simulated female cohort (bottom). The isotropic patterns (apart from fluctuations) are very close to what are expected for the correlation of independent vectors, as shown in the first panel of Figure 5. Quantitatively, the lines of regression (dashed red for ANSUR, dashed blue for simulation) yield Pearson correlation coefficients 2.096 × 10 3 (male), 5.406 × 10 2 (female), and 6.700 × 10 2 (simulation). The three sets of variables are not standard normal variables, so a null Pearson correlation does not necessarily imply total independence.

However, evaluation of the distance correlation of the scaled variables ( H , B ) for the male and female cohorts (black points) respectively yielded by the method of repeated sampling empirical dCor values of 0.2475 ± 0.0116 and 0.2301 ± 0.0147, which are statistically equivalent to the dCor value for correlation of independent standard normal variables generated by computer simulation

Figure 14. Scatter plots of body mass index (BMI) with height (left panels) and weight (right panels). Patterns in black were calculated from the ANSUR population of male (top panels) and female (middle panels) cohorts. Patterns in red (bottom panels) were simulated for a population of 2000 (corresponding to the size of the female cohort) by means of correlated bivariate lognormal RNGs. The slopes of the lines of regression (dashed red or dashed blue) of the left panels are close to zero, signifying independence of BMI and height. The slope of the lines of regression of the right panels are close to 0.88 for the male cohort and 0.87 for the female cohort.

(red points). It is therefore reasonable to conclude that, if the ANSUR populations can serve as baselines, then the BMI and height of healthy adult male and female populations are largely statistically independent.

By contrast, the right panels of Figure 14 (black for empirical and red for simulation) show that BMI and weight are strongly linearly correlated, which was a desirable characteristic of the BMI. The Pearson correlation coefficients are respectively 0.8814 (male cohort), 0.8706 (female cohort), and 0.8755 (simulation of female cohort) as deduced algebraically from Equation (15) or from the slopes of the lines of regression.

7. Computer Simulation of Correlated Lognormal Random Vectors

Throughout this paper computer simulation of correlated random variables (RVs) has been employed for both analytical and graphical comparisons with corresponding empirical results. A brief description of the implementation of these simulations is given in this section.

The essential objective is the simulation of a pair of correlated lognormal RVs of specified parameters ( m 1 , m 2 , s 1 , s 2 , r ) . The starting point for the construction is the well-known algebraic identity for decomposition of a general normal variable [16]

N 1 ( m 1 , s 1 2 ) = m 1 + s 1 N 1 ( 0 , 1 ) (67)

N 2 ( m 2 , s 2 2 ) = m 2 + s 2 N 2 ( 0 , 1 ) (68)

where N 1 ( 0 , 1 ) and N 2 ( 0 , 1 ) are independent standard normal variables (ISNVs) of mean 0 and variance 1 that serve as basis states. Each ISNV represents a random number generator (RNG) of one’s mathematical software. Populations of size n are simulated by creating sets of n variates from each ISNV. The covariance ( N 1 ( m 1 , s 1 2 ) m 1 ) ( N 2 ( m 2 , s 2 2 ) m 2 ) is theoretically 0 (because N 1 ( 0 , 1 ) N 2 ( 0 , 1 ) is zero) and empirically should approach 0 numerically in the limit of increasing sample size n.

To create a normal RV N 2 c ( m 2 , s 2 2 , r ) correlated with the RV in Equation (67), one makes the following linear superposition

N 2 c ( m 2 , s 2 2 , r ) = m 2 + r s 2 N 1 ( 0 , 1 ) + 1 r 2 s 2 N 2 ( 0 , 1 ) . (69)

Note that, according to the algebraic rules [16] that govern manipulation of independent normal RVs,

a N 1 ( 0 , 1 ) + b N 2 ( 0 , 1 ) = N 1 ( 0 , a 2 ) + N 2 ( 0 , b 2 ) = N ( 0 , a 2 + b 2 ) , (70)

where a and b are constants, one could combine the second and third terms in the right side of Equation (69) to recover the marginal distribution represented by Equation (68), since the correlation parameter r drops out. The set of RVs (67) and (69) then comprise a pair of correlated bivariate normal RVs, which, when implemented numerically on a computer, generate the respective variates ( y 1 , i , y 2 , i ) for i = 1 , , n . In the context of simulating samples of human height and weight, these variates are

y 1 , i = ln ( h i ) y 2 , i = ln ( w i ) (71)

Once one has created the sets of normal variates (71), it remains only to exponentiate them

x 1 , i = exp ( y 1 , i ) = h i x 2 , i = exp ( y 2 , i ) = w i (72)

to simulate a sample of correlated bivariate lognormal RVs. With some mathematical applications such as Maple, one can work with the abstract vectors and simply enter a statement like X = exp ( Y ) , whereupon the application will know to exponentiate the individual variates as in relation (72). Thus, the bivariate lognormal vectors corresponding to Equations (67) and (69) generated by Maple were defined and implemented by the forms

Λ 1 ( m 1 , s 1 2 ) = exp ( m 1 + s 1 N 1 ( 0 , 1 ) ) (73)

Λ 2 c ( m 2 , s 2 2 , r ) = exp ( m 2 + r s 2 N 1 ( 0 , 1 ) + 1 r 2 s 2 N 2 ( 0 , 1 ) ) (74)

For completeness, a final comment as to the actual nature of the RNGs employed in this paper is called for. All RNGs that employ a mathematical algorithm, in contrast to RNGs based on some random quantum process such as radioactive decay [40], generate pseudo-random numbers. These are sets of numbers that are generated reproducibly from a known starting point (seed value), yet nevertheless pass diverse statistical tests for randomness. The more stringent a test, and the more tests a RNG passes, the better is the RNG.

The MersenneTwister algorithm supplied by the Maple RandomTools Subpackage has passed the diehard tests of randomness [41] by G. Marsaglia as well as other tests, and provides numbers that can be considered cryptographically secure [42] [43]. It is safe to accept, therefore, that the independent normal basis states with which the simulation algorithm began and from which the correlated bivariate lognormal distributions were created were for all practical purposes sufficiently uncorrelated.

8. Conclusions and Interpretation

Knowledge of the exact distribution function of a random quantity provides the most complete statistical information attainable about that quantity. This is especially important in regard to anthropometric attributes the statistics of which are essential to clinical medicine and epidemiology.

The fundamental conclusion of this paper is that human height H and weight W in a population of healthy adults are statistically distributed as correlated bivariate lognormal random variables. Moreover, for all practical purposes, this distribution is thought not to be approximate, but empirically rigorous in samples of sufficient size. This means that five measurable parameters, comprising two means ( m H , m W ) , two variances ( s H 2 , s W 2 ) , and the Pearson linear correlation coefficient (r), suffice to determine all statistical attributes (probabilities, moments, correlations) regarding the relation of height and weight in a specified population.

In support of this conclusion, detailed statistical analyses of an extensive anthropometric data base of diverse individuals have shown the following:

• The variates of H and W of both gender cohorts satisfy chi-square tests of fitness to univariate lognormal distributions.

• The variates of lnH and lnW of both gender cohorts satisfy chi-square tests of fitness to univariate normal distributions.

• The sample moments of ( H , W ) up to 6th order are consistent with the lognormal distribution.

• The sample moments of ( ln H , ln W ) up to 6th order are consistent with the normal distribution.

• Theoretical correlation functions R p , q ( r ) of the normal distribution predict correctly the sample correlation coefficients r p , q of the variates of ( ln H , ln W ) .

• Theoretical correlation functions C p , q ( r , s H , s W ) of the lognormal distribution predict correctly the sample correlation coefficients c p , q of the variates of ( H , W ) .

• Computer simulations using correlated lognormal random number generators (RNGs) produce correlation functions and probability density plots of the variables ( H , W ) that statistically match the corresponding empirical functions and plots.

• Computer simulations using correlated normal (RNGs) produce correlation functions and probability density plots of the variables ( ln H , ln W ) that statistically match the corresponding empirical functions and plots.

• Empirically measured distance correlation (dCor) values of the variables ( H , W ) agree with dCor values obtained from comparably sized populations simulated by correlated lognormal RNGs.

• Empirically measured distance correlation (dCor) values of the variables ( ln H , ln W ) agree with dCor values obtained from comparably sized populations simulated by correlated normal RNGs.

• Removal of the line of regression in the empirical density plot of lnW vs lnH produces an isotropic density with null Pearson correlation coefficient. The density plot, null Pearson correlation coefficient, and dCor values statistically match the corresponding outcomes from two independent normal RNGs.

In short, taken altogether, the preceding extensive set of tests supports the proposition that the distribution and correlation of height and weight of a healthy adult human population are fully accounted for by a bivariate lognormal distribution.

A secondary point worth noting, given the importance of body mass index (BMI) to current medicine and epidemiology, is that the conditional expectation of weight, given height, theoretically derived from the lognormal distribution function yielded functional relations (23), (24) between weight and height. These functions, when evaluated with the lognormal parameters of the ANSUR male and female cohorts, led in both cases to a nearly exact quadratic power law (27), thereby justifying theoretically a long-held assumption underlying the use of BMI as a risk factor for obesity-related diseases.

In concluding this paper, it is useful to clarify what is meant by an “exact” statistical distribution. In the opinion of the author, who is an atomic and nuclear physicist, statistical distributions in science can arise, broadly speaking, in two ways.

The most fundamental way is as a consequence of a particular dynamical model. In physics, for example, the decay of radioactive nuclei is rigorously accounted for by a binomial probability function, based on a physical model of the independent decay of discrete, uncorrelated nuclei [44]. If the assumption of independence were found to be invalid—and there have been a considerable number of such challenges, only to have been debunked by more careful experiment and analysis [45] [46] [47] [48] —the discovery would have led to deep new insights into the structure and behavior of matter.

The second, less fundamental way, but nevertheless one of practical utility, is by empirical recognition and subsequent verification. To return to the previous physics example, suppose that the phenomenon of radioactive decay was discovered before there was any understanding or general acceptance of atoms as discrete units of matter3. Then radioactive decay would have been empirically observed to be a Poisson process, and, indeed, the Poisson distribution is widely depicted in books as a rigorous physical law. (See, for example [49].) However, in retrospect, a Poisson process can be interpreted as a degenerate case of a binomial process in the limit of a large number N of radioactive atoms with low probability p of decay, such that Np is the mean number of decays within a specified time interval.

The point of the foregoing examples is this: The rigorously exact distribution (binomial) revealed critical information about the constituents (discrete, independent) of the system. The apparently exact distribution (Poisson) was empirical and utilitarian, but revealed little about the system other than that the decay products were discrete. Under appropriately conceived radiation experiments, the difference between the binomial and Poisson distributions can be observed [50], and the fundamentality of the binomial distribution is established.

In regard to the statistical attributes of human height and weight, the consistency with a correlated bivariate lognormal distribution is, as shown in this paper, so extensive and close, that one must wonder whether it is a rigorously exact consequence of some biophysical mechanism or a limiting case of some other statistical process. How, for example, might a lognormal distribution arise from other distributions?

One such process might entail a random variable X comprising a product of some set of arbitrarily distributed random variables, in which case application of the Central Limit Theorem to lnX could result in a normal distribution. Then the parent variable X would itself be lognormal. It is difficult to conceive in detail, however, of mechanisms by which real biological processes responsible for human height and weight could engender such a hypothetical X as to produce a correlated bivariate lognormal distribution.

More generally, a lognormal distribution can also arise under circumstances where an intrinsically positive variable has a low mean and high variance, leading to a pronounced skewness. However, any of a large number of other skewed distributions could also arise, so the mechanism is not unique. Moreover, as demonstrated in this paper, whatever mechanism is invoked must produce not only the correct skewness, but also kurtosis and other hyperstatistics as well.

At this stage and until testable mechanisms are proposed, refutation of the exactness of the correlated bivariate lognormal distribution of human height and weight can only come from further detailed statistical analysis of larger populations. And if such future tests further confirm the exactness of the bivariate lognormal relation of height and weight, then, like the example of radioactivity cited above, this knowledge will have revealed something fundamental about the physical processes underlying human development.

Acknowledgements

The author thanks Trinity College for partial support through the research fund associated with the George A. Jarvis Chair of Physics.

Appendix—Glossary of Abbreviations

BMI—Body Mass Index

CDF—Cumulative Distribution Function

CF—Characteristic Function

d.o.f.—Degrees of Freedom

dCor—Distance Correlation

dCov—Distance Covariance

dVar—Distance Variance

ISNV—Independent Standard Normal Variable

PDF—Probability Density Function

RNG—Random Number Generator

RV—Random Variable

NOTES

1These transformed functions are the characteristic function (CF), which is the Fourier transform of the PDF, and the cumulative distribution function (CDF), which is the integral of the PDF from some fixed point to the argument of the PDF. Thus, if g ( x ) is the CDF, then the PDF f ( x ) = d g ( x ) / d x .

2Integration variables, in contrast to random variables, will be represented by lower case letters.

3This supposition is actually historically correct. Radioactivity was discovered by Henri Bequerel in 1896, whereas opposition to the existence of atoms by some leading scientists of the day lasted until about 1910.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Stigler, S.M. (1986) The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard University Press, Cambridge, 265-361.
[2] Bernstein, P.L. (1998) Against the Gods: The Remarkable Story of Risk. Wiley, New York, 152-171.
https://doi.org/10.2307/2685740
[3] Porter, T.M. (2004) Karl Pearson: The Scientific Life in a Statistical Age. Princeton University Press, Princeton, 235-239, 249-266.
https://doi.org/10.5944/empiria.8.2004.989
[4] Quetelet, L.A.J. (1835) A Treatisse on Man and the Development of His Faculties. Cambridge University Press, Cambridge.
https://www.cambridge.org/core/books/treatise-on-man-and-the-development-of-his-faculties/AB13A647A6C8727C06AE5399D7422887
[5] Sager, G. (1987) Relation between Body Height and Weight in Adult Humans. Gegenbaurs morphologisches Jahrbuch, 133, 563-571.
[6] Rahmandad, H. (2014) Human Growth and Body Weight Dynamics: An Integrative Systems Model. PLOS ONE, 9, e114609.
https://doi.org/10.1371/journal.pone.0114609
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0114609
[7] Lettre, G. (2011) Recent Progress in the Study of the Genetics of Height. Human Genetics, 129, 465-472.
https://doi.org/10.1007/s00439-011-0969-x
[8] Wikipedia (2022) Body Mass Index.
https://en.wikipedia.org/wiki/Body_mass_index
[9] World Health Organization (2021) Obesity and Overweight.
https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight
[10] Moody, J.N., et al. (2021) Body Mass Index and Polygenic Risk for Alzheimer’s Disease Predict Conversion to Alzheimer’s Disease. The Journals of Gerontology Series A Biological Sciences and Medical Sciences, 76, 1415-1422.
https://doi.org/10.1093/gerona/glab117
[11] Silverman, M.P. and Lipscombe, T.C. (2022) Exact Statistical Distribution of the Body Mass Index (BMI): Analysis and Experimental Confirmation. Open Journal of Statistics, 12, 324-356.
https://doi.org/10.4236/ojs.2022.123022
[12] Szekely, G.J., Rizzo, M.L. and Bakirov, N.K. (2007) Measuring and Testing Dependence by Correlation of Distances. The Annals of Statistics, 35, 2769-2794.
https://doi.org/10.1214/009053607000000505
[13] Szekely, G.J. and Rizzo, M.L. (2009) Brownian Distance Covariance. The Annals of Applied Statistics, 3, 1236-1265.
https://doi.org/10.1214/09-AOAS312
[14] Callahan, A. (2021) Is BMI a Scam? The New York Times.
https://www.nytimes.com/2021/05/18/style/is-bmi-a-scam.html
[15] Gordon, C.C., et al. (2014) 2012 Anthropometric Survey of U.S. Army Personnel: Methods and Summary Statistics. Technical Report Natick/TR-15/007, U.S. Army Natick Soldier Research and Engineering Center, Natick.
https://www.openlab.psu.edu/ansur2
[16] Silverman, M.P. (2014) A Certain Uncertainty: Nature’s Random Ways. Cambridge University Press, Cambridge, 511-514.
https://doi.org/10.1017/CBO9781139507370
[17] Forbes, C., Evans, M., Hastings, N. and Peacock, B. (2011) Statistical Distributions. 4th Edition, Wiley, New York, 131-134.
https://doi.org/10.1002/9780470627242
[18] A’Hearn, B., Peracchi, F. and Vecchi, G. (2009) Height and the Normal Distribution: Evidence from Italian Military Data. Demography, 46, 1-25.
https://doi.org/10.1353/dem.0.0049
[19] Diverse Populations Collaborative Group (2005) Weight-Height Relationships and Body Mass Index: Some Observations from the Diverse Populations Collaboration. The American Journal of Physical Anthropology, 128, 220-229.
https://doi.org/10.1002/ajpa.20107
[20] Johnson, W., et al. (2020) Differences in the Relationship of Weight to Height, and Thus the Meaning of BMI According to Age, Sex, and Birth Year Cohort. Annals of Human Biology, 47, 199-207.
https://doi.org/10.1080/03014460.2020.1737731
[21] Sperrin, M., Marshall, A.D., Higgins, V., Renehan, A.G. and Buchan, I.E. (2015) Body Mass Index Relates Weight to Height Differently in Women and Older Adults: Serial Cross-Sectional Surveys in England (1992-2011). Journal of Public Health, 38, 607-613.
https://doi.org/10.1093/pubmed/fdv067
[22] Benn, R.T. (1971) Some Mathematical Properties of Weight-for-Height Indices Used as Measures of Adiposity. Journal of Epidemiology & Community Health, 25, 42-50.
https://doi.org/10.1136/jech.25.1.42
[23] Rohrer, F. (1921) Der Index der Körperfülle als Maß des Ernährungszustandes [The Index of Corpulence as a Measure of Nutritional Condition]. Münchener Medizinische Wochenschrift, 68, 580-582.
[24] Henneberg, M., Hugg, J. and Townsend, E.J. (1989) Body Weight/Height Relationship: Exponential Solution. American Journal of Human Biology, 1, 483-491.
https://doi.org/10.1002/ajhb.1310010412
[25] Cidras, M. (2015) Body Mass Exponential Index: An Age-Independent Anthropometric Nutritional Assessment. Open Access Library Journal, 2, 1-8.
https://doi.org/10.4236/oalib.1101943
[26] Trussell, J. and Bloom, D.E. (1979) A Model Distribution of Height or Weight at a Given Age. Human Biology, 51, 523-536.
[27] Edwards, A.W.F. (1992) Likelihood. The Johns Hopkins University Press, Baltimore, 70-143.
[28] Kendall, M.G. and Stuart, A. (1963) The Advanced Theory of Statistics Vol. 1: Distribution Theory. Hafner, New York, 94-119, 228-236.
[29] Hotelling, H. (1953) New Light on the Correlation Coefficient and Its Transforms. Journal of the Royal Statistical Society: Series B, 15, 193-232.
https://doi.org/10.1111/j.2517-6161.1953.tb00135.x
[30] Mood, A.M., Graybill, F.A. and Boes, D.C. (1974) Introduction to the Theory of Statistics. 3rd Edition, McGraw-Hill, New York, 195-198, 233-236.
[31] Chou, Y. (1969) Statistical Analysis: With Business and Economic Applications. Holt, Rinehart, and Winston, New York, 308-323.
[32] Haldane, J.B.S. (1942) Moments of the Distributions of Powers and Products of Normal Variates. Biometrika, 32, 226-242.
https://doi.org/10.1093/biomet/32.3-4.226
[33] Arfken, G.B. and Weber, H.J. (2005) Mathematical Methods for Physicists. 6th Edition, Elsevier, New York, 83-87.
[34] Wikipedia (2022) Bootstrapping (Statistics).
https://en.wikipedia.org/wiki/Bootstrapping_(statistics)
[35] Efron, B. (1979) Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7, 1-26.
https://doi.org/10.1214/aos/1176344552
[36] Harding, B., Tremblay, C. and Cousineau, D. (2014) Standard Errors: A Review and Evaluation of Standard Error Estimators Using Monte Carlo Simulations. The Quantitative Methods for Psychology, 10, 107-123.
https://doi.org/10.20982/tqmp.10.2.p107
[37] Moment (Mathematics) (2022) Wikipedia.
https://en.wikipedia.org/wiki/Moment_(mathematics)
[38] Silverman, M.P., Strange, W. and Lipscombe, T.C. (2004) The Distribution of Composite Measurements: How to Be Certain of the Uncertainties in What We Measure. American Journal of Physics, 72, 1068-1081.
https://doi.org/10.1119/1.1738426
[39] Kendall, M.G. and Stuart, A. (1961) The Advanced Theory of Statistics Vol. 2: Inference and Relationship. Charles Griffin & Co., London, 1-8.
[40] Walker, J. (1996) HotBits: Genuine Random Numbers, Generated by Radioactive Decay.
https://www.fourmilab.ch/hotbits
[41] Wikipedia (2022) Diehard Tests.
https://en.wikipedia.org/wiki/Diehard_tests
[42] Maplesoft.com (2022) Overview of the RandomTools [MersenneTwister] Subpackage.
https://www.maplesoft.com/support/help/maple/view.aspx
[43] Mapleprimes.com (2019) Are Maple’s Pseudo Random Number Generators Good Generators? Post by mmcdara 3900.
https://mapleprimes.com/posts/211598-Are-Maples-Pseudo-Random-Number-Generators
[44] Lapp, R.E. and Andrews, H.L. (1972) Nuclear Radiation Physics. Prentice-Hall, Englewood Cliffs, 36-40.
[45] Silverman, M.P., Strange, W., Silverman, C.R. and Lipscombe, T.C. (1999) Tests of Alpha-, Beta-, and Electron Capture Decays for Randomness. Physics A, 262, 265-273.
https://doi.org/10.1016/S0375-9601(99)00668-4
[46] Silverman, M.P. and Strange, W. (2009) Search for Correlated Fluctuations in the Decay of Na-22. Europhysics Letters, 87, Article No. 32001.
https://doi.org/10.1209/0295-5075/87/32001
[47] Silverman, M.P. (2015) Search for Non-Standard Radioactive Decay Based on Distribution of Activities. Europhysics Letters, 110, Article No. 52001.
https://doi.org/10.1209/0295-5075/110/52001
[48] Silverman, M.P. (2016) Search for Anomalies in the Decay of Radioactive Mn-54. Europhysics Letters, 114, Article No. 62001.
https://doi.org/10.1209/0295-5075/114/62001
[49] Miller, D.G. (1972) Radioactivity and Radiation Detection. Gordon and Breach, New York, 88-99.
[50] Foster, J., Kouris, K., Matthews, I.P. and Spyrou, N.M. (1983) Binomial vs Poisson Statistics in Radiation Studies. Nuclear Instruments and Method, 212, 301-305.
https://doi.org/10.1016/0167-5087(83)90706-8

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.