1. Introduction
Statistical decision functions—such as point and set estimates or test statistics—are typically compared in terms of loss functions. In frequentist decision theory, the risk function is the expected value of the loss with respect to the sampling distribution of the data, and it depends on the parameter of the model. The risk function is usually summarized by optimality criteria, i.e. by applying a suitable real-valued functional (see for instance [1]). We here focus on Bayesian criteria that require a prior probability distribution for the parameter of the model, which induces a distribution on the risk function as well. Among several alternatives, the most popular Bayesian criterion is the Bayes risk, the expected value of the risk function with respect to the prior. This approach is often referred to as hybrid frequentist-Bayesian since it is based on a Bayesian summary of a frequentist risk function. However, relying solely on averaging might be a limitation, since the expected value is not always a good summary of the entire distribution of a random variable. In the context of clinical trials, this point has been raised by several authors for the power function of a test, a quantity that is closely related to the risk. For instance, [2]-[5] argue that examining the whole distribution of the random power guarantees a better insight into the probability of success of an experiment. Specifically, the authors in [3] consider one-sided testing on the location parameter of a normal model and point out that the expected value of the power may be a poor representation of its distribution. Therefore, they derive the expressions of the cumulative distribution function and probability density function of the random power and study their qualitative features. Along these lines, in [5], this approach is adapted to the case of the scale parameter of distributions that belong to exponential families.
In the present article, we borrow the aforementioned ideas, and we propose to study the probability distribution of the risk function induced by a prior in the context of point and interval estimation. Inspection of the shape of the random risk distribution allows one to assess the impact of prior assumptions and sample size on the quality of a candidate estimator. In particular, the proposed approach is developed for both point and set estimation of the parameters of the Pareto model. We show that, for all considered problems, the random risk functions are scale transformations either of the random parameters or of their square. Hence, for any generic design prior, we find explicit expressions for the expected value and the probability density function (pdf) of the risk under standard loss functions. Furthermore, assuming conjugate priors, we show that the resulting risk density functions are still related to the design prior family.
The paper is structured as follows. In Section 2, we introduce notations and formalize the problem for the Pareto model. In Section 3, we derive explicit results for the risk functions of the parameters of the model: specifically, in Sections 3.1 and 3.2, we focus on shape and scale parameters respectively. Section 4 illustrates an application of the proposed methodology for the estimation of the Pareto index for the wealth distribution of the World’s Billionaires. Finally, Section 5 contains a discussion.
2. Methodology
Let
be a random sample,
its distribution depending on an unknown parameter
, where Ξ is the parameter space. Let
be a scalar function of
and let
be a point or an interval estimator of
, with
denoting the corresponding estimate. Frequentist decision theory typically employs the risk function
(1)
to assess the performance of the decision function, where
is the loss function of d and
is the expected value with respect to the sampling distribution
. For any decision function d,
is a function of
. Within the hybrid frequentist-Bayesian framework, the parameter is thought as a random variable denoted by Ω, with prior distribution
, where the subscript D is referred to the design. In fact, this is often called design prior, to stress its use in pre-experimental evaluation of the decision d. The expected value of
with respect to
is the Bayes risk of d defined as
. If Ω is an absolutely continuous random variable and
its density function,
(2)
The Bayes risk is typically employed for evaluation of a given d, comparison of alternative estimators and identification of optimal decision functions. However, a thorough inspection of the features of the random risk function
might be more informative than looking at its expectation only. More precisely, let
(3)
be the random risk function and
its sample space, and let g denote the pdf of Y. The behavior of the random risk can be studied through the density g and some of its relevant summaries. For instance, the Bayes risk can be retrieved as the expected value of Y, i.e.
(4)
In Section 3, we consider point and interval estimators for the parameters of the Pareto model and we derive the closed-form expression for g and
under standard loss functions.
2.1. Pareto Model
Suppose that
is a random sample from the following Pareto density with parameters
(5)
where
is the shape parameter, often denoted as Pareto index, and
is the scale parameter. In the following, when considering the parameters
and
random, they will be denoted as Θ and H.
In Sections 3.1 and 3.2, we derive closed-form expressions for
and g for point estimators and interval estimators of
and
. Specifically, for point estimation, we consider maximum likelihood estimators and quadratic loss function, whereas for interval estimation, we consider the length of
confidence intervals as loss function. We show that all the corresponding random risk functions are scale transformations either of the random parameters or of their square. We also show that, as a consequence, by adopting conjugate densities as design priors (generalized Gamma and Pareto respectively for Θ and H−1), each resulting density g belongs to the same family of the design prior.
2.2. Elicitation of Conjugate Design Prior Distributions
We consider independent conjugate design prior distributions for the two parameters. In particular, we adopt a Gamma density for the Pareto index Θ and a Pareto density for the reciprocal of the scale parameter H. Prior parameters are usually elicited using moments (such as mean and variance) or relevant quantiles of historical data.
Prior for Θ. For a given pair
(both positive), the prior for the Pareto index
is the Gamma density
(6)
If prior expectation m and variance v are elicited for the shape parameter, for instance, on the basis of historical data, the corresponding values for the prior parameters
and
are
(7)
Prior for H. Adopting a Pareto density for H−1 implies that the prior density of the scale parameter H is
(8)
This density is often referred to as Inverse Pareto with parameters
, but the denomination is not unique in the literature (see, for instance, [6]).
The two parameters
and
are usually chosen by eliciting: (i) expectation m and variance v; (ii) some quantiles. In case (i), the resulting pair of hyperparameters is
(9)
In case (ii) two quantiles are elicited (for example
and
are the quantiles at level
and
respectively, with
), then the prior parameters are
(10)
Example: Income Data
We now show the elicitation procedure to obtain distributions that will be used for implementing our proposed methodology in the examples of Sections 3.1 and 3.2. In this regard, we exploit data from a numerical example in Section 5 of [7]. The authors consider a dataset previously analyzed by [8] on annual wage (in multiples of 100 US dollars) relative to a random sample of 30 production-line workers in a large industrial firm. Specifically, income data are analyzed by adopting a Pareto model for the observations and independent conjugate densities for the two parameters. In our examples, we use the same prior assumptions as in [7], that are:
(i) expectation
and variance
for the shape parameter Θ;
(ii) median and 5th percentile equal to 100 and 85 respectively, for the scale parameter H.
Then, according to (7) and (10), we elicit the prior hyperparameters as follows:
(i) prior hyperparameters for Θ
(11)
(ii) prior hyperparameters for H
(12)
These values are employed as hyperparameters in the design priors for Θ and H in Sections 3.1 and 3.2, where we continue this example by deriving the density of the random risk for point and interval estimators.
3. Distribution of Risk Functions
In this section, we provide explicit expressions of risk functions, Bayes risks and densities
for point and interval estimation of the parameters
of the Pareto model. In Section 3.3, we extend the analysis to inference on a Pareto model quantile that is a function of both parameters.
3.1. Inference on the Pareto Index θ (η Known)
3.1.1. Point Estimation
It is easy to check that (see [9]) the maximum likelihood (ML) estimator of
is
that is distributed as an Inverse Gamma with shape n and scale
[10]. Noting that
and that
, using the quadratic loss function we obtain
Let us now consider the random risk of
, that is
. For any design prior density
on Θ, we have that
(13)
(14)
Assuming for Θ a conjugate Gamma prior with shape parameter
and rate parameter
, from Equations (13) and (14) it follows that
(15)
(16)
that is a Generalized Gamma density [10] with parameters (
,
,
) in the parameterization of [11]. This result is a consequence of the closure of the Generalized Gamma distribution under both scale and power transformations.
3.1.2. Interval Estimation
Since
follows an Inverse Gamma distribution, it is easy to check that (see [9])
is a pivotal quantity for
with distribution
. Therefore, the
equal-tails confidence interval is
where
denotes the
-level quantile of the
. Using the length of the interval as loss function, we have
and the resulting risk function is
where
. Note that, for any
,
tends to 0 as n diverges.
In this case, the random risk is
and it is straightforward to check that
Using again for Θ a Gamma prior density of parameters
, we have that
(17)
As a consequence of the above-mentioned closure under scale transformation, g is still a Gamma density with parameters
.
3.1.3. Example: Income Data (Cont.)
Let us consider the income data example in Section 2.2 again. Assume now that the scale parameter
is known while the Pareto index
needs to be estimated. Values of prior hyperparameters are given by (11).
Figure 1 shows: risk density (16) relative to the ML estimator (left panel); risk density (17) relative to the equal-tails 95%-confidence interval (right panel). In both cases, as an example, we consider sample sizes of
and
. Note that the density of the risk function shrinks towards 0 as the sample size increases, as a consequence of the consistency of the estimators. Moreover, for larger and larger values of n, the skewness of the distribution reduces, which results in closer and closer values of the main summaries of g. Expectation, median and mode of g are reported in Table 1 for
and
for point estimation.
Since g is remarkably right-skewed, the mean is larger than the median and the mode; discrepancies between the three summaries are smaller when the sample size increases.
Figure 1. Example (income data). Density
of the risk function when estimating Pareto index for
(black curve) and
(red curve). Left panel: point estimation, density (16); right panel: interval estimation, density (17).
Table 1. Example (income data). Summaries of the density g of the random risk function for point estimation of θ.
|
|
|
mean |
0.166 |
0.093 |
median |
0.127 |
0.071 |
mode |
0.063 |
0.036 |
3.2. Inference on the Scale Parameter η (θ Known)
3.2.1. Point Estimation
The maximum likelihood estimator for the parameter
is the sample minimum
that is a Pareto random variable with parameters
. Therefore, under the quadratic loss function,
Then, the random risk for
is and, similarly to the previous case,
where
is the design prior density for H and
.
When we assume a conjugate prior (8) for H, i.e. an inverse Pareto density with parameters
, we obtain
(18)
(19)
that is an inverse Pareto density with parameters
. Note that closure with respect to scale and power transformations also holds for the distribution of the reciprocal of a Pareto random variable.
3.2.2. Interval Estimation
A pivotal quantity for determining an interval estimator for
is
that is a Pareto random variable with parameters
. Recalling that the
quantile of this distribution is
, the
confidence interval for
is
The length of the interval is
and the risk function is
with h decreasing to 0 with n. The random risk of
is
. Assuming again the conjugate prior density (8) for
we obtain
(20)
and the resulting g is again a density of the form (8) with parameters
(21)
3.2.3. Remark
We notice that the forms
and
of the risk functions are not characteristic of the estimation method considered in the previous section. As an example, the moment estimator for
is
, with random risk function , where
.
As regards interval estimation, recalling that the asymptotic distribution of
is normal with mean
and variance
, the asymptotic
confidence interval is
, with length
, where
is the
quantile of a standard normal. Then the random risk is given by
, where
that decreases to 0 with n.
3.2.4. Example: Income Data (Cont.)
Let us consider the income data example in Section 2.2 again. Assume now that the scale parameter
needs to be estimated while the Pareto index is known to be equal to
(that corresponds to the mode of the Gamma prior density elicited in 0). Values of prior hyperparameters for H are given by (12).
Table 2. Example (income data). Summaries of the density g of the random risk function for point estimation of η.
|
|
|
mean |
9.124 |
3.201 |
median |
9.442 |
3.312 |
mode |
10.413 |
3.652 |
Figure 2 shows: risk density (19) relative to the ML estimator (left panel); risk density (21) relative to the equal-tails 95%-confidence interval (right panel). In both cases, as an example, sample sizes of
and
have been considered. Again, a larger value of the sample size induces higher densities for lower values of the risk function of
. Note however that in this case, g is always an increasing function (which is remarkably left-skewed) but, again, the values of the main summaries of g become closer and closer as n increases. Numerical values of expectation, median and mode of g are given in Table 2 for
and
for point estimation.
Figure 2. Example (income data). Density
of the risk function when estimating the scale parameter for
(black curve) and
(red curve). Left panel: point estimation, density (19); right panel: interval estimation, density (21).
3.3. Inference on Quantiles
In many relevant applications that involve the Pareto model, the quantity of inferential interest is often represented by a specific quantile of the distribution, for instance, Value at Risk [12]. In this section, we extend the approach of the previous sections to this problem. Specifically, for a Pareto distribution with parameters
, the γ-quantile is
, which is a function of both parameters. The maximum likelihood estimator for this quantity is
(22)
For the sake of analytical tractability, let us consider the following loss function
i.e. the quadratic loss on the logarithmic scale. The corresponding risk function is
with
.
It follows that
and its density is
When a Gamma prior with parameters
is adopted, the expression of g becomes
(23)
that is the density function of a Generalized Inverse Gamma of parameters (
,
,
) in the parametrization of [11].
Example: Income Data (Cont.)
Risk density can be a useful tool when comparing different estimators. Using the same prior information as in the income data example of Section 2.2, i.e. (11) and (12), we consider the maximum likelihood estimator (22) of the 95th percentile of a Pareto population and the nonparametric estimator obtained as the 95th sample percentile. Since we do not have an analytic expression for the distribution of the risk relative to the nonparametric estimator, we have simulated the distribution of the risk, whose histogram is reported in Figure 3 and Figure 4. The red curves in the same figure are the plot of the density (23) of the risk relative to the maximum likelihood estimator. The advantage, in terms of risk, of the maximum likelihood estimator seems to be quite evident from Figure 3 and Figure 4. Table 3 reports the mean and median of the risk associated with the two estimators for moderate (
) and large (
) sample sizes.
![]()
Figure 3. Example (income data). Histogram of the simulated risk relative to the nonparametric estimator of the 95th quantile and risk density (red curve) relative to the maximum likelihood estimator for samples of size
.
Table 3. Example (income data). Summaries of the density g of the random risk for the ML estimator and the nonparametric estimator of the quantile.
ML estimator |
|
|
|
mean |
0.148 |
0.045 |
median |
0.092 |
0.028 |
Nonparametric estimator |
|
|
|
mean |
0.278 |
0.095 |
median |
0.171 |
0.057 |
Figure 4. Example (income data). Histogram of the simulated risk relative to the nonparametric estimator of the 95th quantile and risk density (red curve) relative to the maximum likelihood estimator for samples of size
.
4. Application: Prediction of Pareto Index for the World’s
Billionaires
As argued by [13]: “The Pareto distribution is commonly used to represent situations where a small portion of the population controls a disproportionately large share of resources, such as income or wealth distribution.”
In the present application, we consider the World’s Billionaires List which is published yearly by Forbes Magazine at https://www.forbes.com/billionaires/.
Historical data reporting wealth (net worth) of billionaires are available at https://stats.areppim.com/stats/links_billionairexlists.htm.
We specifically refer to data relative to 2018, which is the last complete list available. The two plots in Figure 5 show that the Pareto density fits quite accurately the data.
Figure 5. Application (World’s Billionaires). Fit of the Pareto distribution with the 2018 billionaires’ net worths (left panel: all distribution; right panel: most crowded net worth interval 0 - 40 billion).
Our goal is to obtain the density
of the random risk for point and interval estimators of next year’s Pareto index. To elicit the design prior
, we use the above mentioned 2018 list as historical data. As in Section 3.1 we adopt a Gamma design prior density of parameters
. These values are fixed using Equation (7) where m and v are based on historical data. Specifically, we set
(i.e. the ML estimate of
based on 2018 historical data) and
(i.e. estimated variance of the ML estimates of
in the period 2001-2018). The resulting values for the prior hyperparameters are
and
. Using (16) and (17), we obtain Figure 6 which shows the densities of the random risk for point and interval estimation of
. We consider two different sample sizes (
and
) to highlight the advantage that larger samples produce in terms of loss: as n increases, the distribution of the risk tends to be more concentrated on values close to 0. In this case, g is substantially symmetric and the values of Bayes risk, median and mode for both point and interval estimation for
and
are almost coincident, as reported in Table 4.
For additional insight, in Figure 7, we consider a much larger prior variance (sixteen times as much), with other prior choices being equal. The increase in prior variance amplifies the skewness of the distributions and, eventually, reduces the gain in terms of loss produced by a larger sample size. As expected, in this case, the values of the mean are larger than those of the median and mode for both point and interval estimators (see Table 5).
Table 4. Application (World’s Billionaires). Summaries of the density g of the random risk for point and interval estimator of
.
ML estimator |
|
|
|
mean |
0.068 |
0.030 |
median |
0.067 |
0.030 |
mode |
0.065 |
0.029 |
Confidence interval |
|
|
|
mean |
0.893 |
0.633 |
median |
0.891 |
0.631 |
mode |
0.886 |
0.628 |
Figure 6. Application (World’s Billionaires). Density
of the random risk for maximum likelihood estimator (left panel) and for 95% confidence interval (right panel) for the Pareto index
. Curves obtained for
(red) and for
(black), given prior variance 0.008.
Table 5. Application (World’s Billionaires). Summaries of the density g of the random risk for point and interval estimator of
when considering a much larger prior variance (sixteen times as much).
ML estimator |
|
|
|
mean |
0.076 |
0.033 |
median |
0.062 |
0.027 |
mode |
0.038 |
0.017 |
Confidence interval |
|
|
|
mean |
0.893 |
0.633 |
median |
0.857 |
0.607 |
mode |
0.784 |
0.556 |
Figure 7. Application (World’s Billionaires). Density of the random risk for maximum likelihood estimator (left panel) and for 95% confidence interval (right panel) for the Pareto index
. Curves obtained for
(red) and for
(black), given a prior variance 16 × 0.008.
5. Closing Remarks
In this paper, we study the distribution of the risk function for point and interval estimation for the Pareto model, when interest is on the shape parameter
(Pareto index) or on the scale parameter
or on a quantile that is a function of both
and
. Using conjugate priors, we obtain closed-form expressions for both the expected value and the density functions of the risk of each parameter under suitable losses. Interestingly, due to the analytical expressions of the risk function, in all the cases considered, the densities of the risk always belong to the same family as the corresponding design prior. This is a consequence of the closure of both Generalized Gamma and Inverse Pareto families with respect to scale and power transformations. Inspection of the shape of the density functions allows one to evaluate the impact of prior assumptions and sample size on the risk and to select an appropriate summary of the risk distribution. All these ideas are illustrated through a numerical example related to income data [7] and an application based on Forbes World’s Billionaires list. Future developments of this work may be devoted to sample size determination in the spirit of [14].