Bias Correction Technique for Estimating Quantiles of Finite Populations under Simple Random Sampling without Replacement

Nicholas Makumi; Romanus Odhiambo Otieno; George Otieno Orwa; Festus Were; Habineza Alexis

doi:10.4236/ojs.2021.115050

Open Journal of Statistics > Vol.11 No.5, October 2021

Bias Correction Technique for Estimating Quantiles of Finite Populations under Simple Random Sampling without Replacement

Nicholas Makumi¹, Romanus Odhiambo Otieno², George Otieno Orwa³, Festus Were³, Habineza Alexis¹
¹Pan African University, Institute for Basic Sciences, Technology and Innovation, Nairobi, Kenya.
²Department of Mathematics, Meru University of Science and Technology, Meru, Kenya.
³Department of Statistics and Actuarial Sciences, JKUAT, Nairobi, Kenya.
DOI: 10.4236/ojs.2021.115050 PDF HTML XML 259 Downloads 1,186 Views Citations

Abstract

In this paper, the problem of nonparametric estimation of finite population quantile function using multiplicative bias correction technique is considered. A robust estimator of the finite population quantile function based on multiplicative bias correction is derived with the aid of a super population model. Most studies have concentrated on kernel smoothers in the estimation of regression functions. This technique has also been applied to various methods of non-parametric estimation of the finite population quantile already under review. A major problem with the use of nonparametric kernel-based regression over a finite interval, such as the estimation of finite population quantities, is bias at boundary points. By correcting the boundary problems associated with previous model-based estimators, the multiplicative bias corrected estimator produced better results in estimating the finite population quantile function. Furthermore, the asymptotic behavior of the proposed estimators is presented. It is observed that the estimator is asymptotically unbiased and statistically consistent when certain conditions are satisfied. The simulation results show that the suggested estimator is quite well in terms of relative bias, mean squared error, and relative root mean error. As a result, the multiplicative bias corrected estimator is strongly suggested for survey sampling estimation of the finite population quantile function.

Keywords

Quantile Function, Kernel Estimator, Multiplicative Bias Correction Technique, Simple Random Sampling without Replacement

Share and Cite:

Makumi, N. , Otieno, R. , Orwa, G. , Were, F. and Alexis, H. (2021) Bias Correction Technique for Estimating Quantiles of Finite Populations under Simple Random Sampling without Replacement. Open Journal of Statistics, 11, 854-869. doi: 10.4236/ojs.2021.115050.

1. Introduction

In recent years, the estimation of population distribution functions in the context of survey sampling has received considerable attention. A particular focus of this attention was the median, which is often considered to be a more acceptable position measure than the mean, especially when the interest variable follows a distorted distribution. Modern population mean or total estimators may typically be significantly enhanced when appropriate supplementary information is made available. Accordingly, the use of the auxiliary information in sample quantile estimators seems highly desirable. Use of known auxiliary knowledge both at the estimation stage and at the selection stage contributes to better estimation strategies in the sampling of surveys. If such information is not fully known or missing and information on the auxiliary variable(s) is relatively cheaper to obtain, one may consider taking a broad preliminary sample to estimate the auxiliary variable population mean(s).

Traditional kernel estimation methods have generally held that the performance of kernel methods depends largely on the smoothing bandwidth of the kernel, and very little depends on the type of the kernel. Most kernels used are symmetric kernels and are set once chosen. This may be useful for estimating unbounded support curves, but not for curves that have compact support and are discontinuous at boundary points. For curves of this kind, a fixed kernel shape leads to a boundary bias. This boundary bias is due to the weight allocation of the fixed symmetric kernel outside the distribution support when smoothing close to the boundary takes place. In addition, standard kernel methods yield wiggly estimates in the tail of the distribution as the reduction of the boundary bias leads to a limited bandwidth that prevents the pooling of appropriate data. Even otherwise, as noted in [1] when estimating the probability density function, the standard kernel estimator “works well for densities not far from Gaussian in shape”, however, it can perform very poorly when the shape seems far from Gaussian, particularly near the boundary.

Boundary bias is a well-known problem, and several scholars have proposed ways to eliminate it. In the context of nonparametric regression, [2] [3] [4] proposed the use of boundary kernels, while [5] used Richardson’s extrapolation to combine two kernel estimates with different bandwidths. In density estimation, [6] proposed data reflection, [7] considered empirical transformations, and [8] proposed a framework of jaccknife methods for correcting boundary bias. In recent years, it has been shown by [9] [10], that in nonparametric regression, local linear smoother is free of boundary bias and achieves the optimal convergence for mean integrated squared error. It is interesting to note a local linear smoother uses a fixed kernel in its initial form, and the local least-regression implicitly employs different kernels at different places. The transformation method is among the numerous methods suggested to deal with data on $[0, + \infty]$ . In order to minimize the boundary bias in the density estimation framework, [1] [11] [12], among others, studied general transformation methods. The transformation may operate under unique conditions and it is important to select the appropriate transformation by analyzing the subject matter and related studies.

The estimation of population quantiles is of great interest when a parametric form for the underlying distribution is not available. In a broad range of statistical applications, quantile estimation plays an important role: the Q-Q plot; the goodness-of-fit, the computation of extreme quantiles and value at risk in insurance business and financial risk management. Also, a large class of actuarial risk measures can be defined as functional of quantiles see ( [13] ). Most contributions have been made based on simple random sampling (SRS) to estimate the pth quantile using a kernel function. The reader can be referred to [14] [15] [16].

Quantile estimation has been intensively used in many fields. Most of the existing quantile estimators suffer from either a bias or an inefficiency for high probability levels. In order to correct the bias problems, [17] suggested several nonparametric quantile estimators based on the beta-kernel and applied them to transformed data. A Monte Carlo based study showed that those estimators improve the efficiency of the traditional ones, not only for light tailed distributions, but also for heavy tailed, when the probability level is close to 1, [18] used transformed kernel estimate. In their study, they overcame this inconsistency by using a new approach based on the modified Champernowne distribution which behaves as the Pareto distribution.

As a result, the aim of this paper is to develop a nonparametric estimator for the quantile function of finite populations using a bias corrected approach to address the shortcomings of previously studied estimation methods. There are two unique features about this approach. One is that it ensures an accurate estimate and the other is that it reduces the estimation bias with negligible increase in variance.

The concept of Multiplicative Bias Correction (MBC) approach was first considered in [19], and the results obtained showed that the estimator of the regression function had desirable properties compared to existing estimators, including solving the boundary problems. This form of correction is especially well suited for changing non-negative regression function because it does not change the sign of the regression function and ensures an accurate estimate and reduces the estimation bias with negligible increase in variance. As there is always a bias-variance trade off for non-parametric smoothers in finite samples, smoothers can be generated whose asymptotic bias converges to zero while maintaining the same asymptotic variance. For a deeper discussion of Multiplicative Bias Correction technique we refer the reader to [20] [21] [22] [23].

Outline of the paper

In Section 2, we propose an estimator for finite population quantile function using a bias correction technique. Asymptotic properties of the proposed estimator are derived in Section 3. Empirical study of the results is given in Section 4 and the conclusion of the findings is given in Section 5.

2. Proposed Estimator

In the sampling survey, we are time and again interested in studying the distribution of a specific variable of interest, Y. The efficient technique to illustrate the distribution function is by assessing the quantiles of the distribution. By the p^th quantile of the distribution, we imply the value Q, which would be $P (Y \leq Q) = p$ . One way of designing quantile estimators is to invert the estimator of the distribution function. Let ${\hat{F}}_{y} (t)$ denote an estimator of $P (Y \leq t)$ . Since the estimator ${\hat{F}}_{y}$ is often a step function, the form of the quantile estimator may not be smooth.

In this section we discuss a quantile estimator derived from a model-based multiplicative bias correction distribution function estimator that integrates auxiliary information. This distribution function estimator was introduced by [24]. The quantile estimator is based on inverting the [24] distribution function estimator. We derive a Bahadur representation for the quantile estimator.

Let $F (y) = P (Y < y)$ be a probability distribution function. The population quantile of order $α$ is defined as

$Q (α) = \inf {t : F (t) > α}$ (1)

for $0 < α < 1$ . If F is continuous and strictly increasing, then

$Q (α) = F^{- 1} ( α )$

is the unique solution to Equation (1). In general, $Q (α)$ satisfies

$P {Y < Q (α) -} < α < P {Y < Q (α)},$ (2)

or equivalently

$F (Q (α) -) < α < F (Q (α)) .$ (3)

Suppose that ${X^{'}}_{i} s$ for $i = 1, 2, \dots, N$ are independent and identically distributed (i.i.d) random variables with conforming survey values $Y_{i} (i = 1, 2, \dots, N)$ . By definition, $Y_{1}, \dots, Y_{N}$ are independent, identically distributed random variables, each with common distribution function F. For all real t, the empirical population distribution function for $Y_{1}, \dots, Y_{N}$ is defined to be

$F_{N_{y}} (t) = \frac{1}{N} \sum_{i = 1}^{N} I (y_{i} \leq t)$ (4)

where

$I (y_{i} \leq t) = {\begin{array}{l} 1 & if y_{i} \leq t; \\ 0 & if otherwise . \end{array}$

The sample quantile of order $α$ is defined as

$Q_{N_{y}} (α) = \inf {t : F_{N_{y}} (t) > α}$ (5)

The sample quantile of order $α$ is a strongly consistent estimator of $Q (α)$ , unless $F (Q (α)) = α$ and $F (Q (α)) = F (Q (α) + ε)$ for some $ε > 0$ (i.e., unless F is flat in a right neighborhood of $Q (α)$ ). See ( [25] ).

Theorem 1 ( [26] ). Let $Y_{1}, \dots, Y_{N}$ be a random sample of size n with common distribution function F, and let $0 < p < 1$ . If $y = q (p)$ is the unique solution of $F (y -) < p < F (y)$ , then ${\hat{q}}_{n} (p) \overset{a .s}{\to} q (p)$ as $n \to \infty$ .

Suppose a sample s of n units is drawn through simple random sampling without replacement from a finite population and $r = u - s$ be the non-sampled units of the finite population. Let Y be the survey variable associated with auxiliary variable X which are assumed to follow superpopulation model under model-based approach. A commonly used working model for the finite population is

$Y_{i} = μ (x_{i}) + σ (x_{i}) e_{i}, i = 1, 2, \dots, N$ (6)

where $σ (x_{i})$ is a known function of $x_{i}$ that accounts for heteroscedasticity and ${e^{'}}_{i} s$ are independent and identically distributed (i.i.d) random variables with mean 0 and variance $σ^{2}$ , $E (Y_{i}) = μ (x_{i})$ and

$C o v (Y_{i}, Y_{j}) = {\begin{array}{l} σ^{2} (x_{i}) & if i = 1,2, \dots, N \\ 0 & elsewhere . \end{array}$

Under model-based approach Equation (4) can be expressed as

$F_{N_{y}}^{*} (t) = \frac{1}{N} [\sum_{i \in s} I (y_{i} \leq t) + \sum_{j \in r} I (y_{j} \leq t)]$ (7)

where $\sum_{i} I (y_{i} \leq t)$ represent the sampled part and is known while $\sum_{j} I (y_{i} \leq t)$ is the non-sampled part which is unknown.

The problem is estimating the second term of Equation (7). To estimate Equation (7), [24] proposed a multiplicative bias corrected estimator for finite population distribution given by

${\hat{F}}_{M B C} (t) = \frac{1}{N} {\sum_{i \in s} I (y_{i} \leq t) + \sum_{j \in r} \hat{H} (t - \hat{μ} (x_{j}))}$ (8)

where $\hat{μ} (x_{j})$ is the model-based nonparametric estimator for $μ (x_{j})$ and $\hat{H} (t - \hat{μ} (x_{j}))$ is the estimated distribution function of the residuals defined by $e_{j} = y_{j} - \hat{μ} (x_{j})$ .

In this study, we propose a multiplicative bias corrected quantile estimator for finite population based on finite population distribution in Equation (8) given by

$\begin{matrix} {\hat{Q}}_{M B C} (α) = \inf {t \in U : {\hat{F}}_{M B C} (t) > α} = {\hat{F}}_{M B C} ({\hat{Q}}_{M B C} (α)) \\ = \frac{1}{N} {\sum_{i \in s} I (y_{i} \leq {\hat{Q}}_{M B C} (α)) + \sum_{j \in r} \hat{H} ({\hat{Q}}_{M B C} (α) - \hat{μ} (x_{j}))} \end{matrix}$ (9)

The problem is to estimate ${\hat{Q}}_{M B C} (α)$ for any $α$ given. Thus, from the sample of n units of a population of size N, we observe $y_{l}, \dots, y_{n}$ . The general method is formulated as follows: first obtain an estimator of the distribution function, ${\hat{F}}_{M B C} (t)$ , and then estimate the quantile by taking the inverse.

3. Properties of Proposed Estimator

3.1. Asymptotic Unbiasedness

In simple random sampling, as $n {\hat{F}}_{M B C} (Q_{N_{y}} (α))$ is hypergeometrically distributed variable, then

$E [{\hat{F}}_{M B C} (Q_{N_{y}} (α))] = F_{N_{y}} (Q_{N_{y}} (α)) = α$ (10)

and

$V a r [{\hat{F}}_{M B C} (Q_{N_{y}} (α))] = \frac{1 - f}{n} α (1 - α)$ (11)

If the sample size n is sufficiently large then ${\hat{F}}_{M B C} (Q_{N_{y}} (α))$ is approximately normal.

Theorem 2 ( [27] ). Let x be in the interval $A_{1}$ containing $q (γ_{1}^{0})$ as an interior point. Then the sample quantile,

${\hat{q}}_{r n} (γ) = q (γ) - {[f (q (γ))]}^{- 1} [F_{r n} (q (γ)) - F (q (γ))] + R_{r n}^{*} (γ)$ (12)

with $R_{r n}^{*} (γ) = o_{p} (n_{r}^{- 1 / 2})$ uniformly in $γ$ for $γ$ in $H_{1}$ , where $H_{1} = {γ : F (x) = γ and x \in A_{1}}$

Proof: For proof see [27].

We now study the properties of the ${\hat{Q}}_{M B C} (α)$ estimator. For this, a linear approximation is needed because ${\hat{Q}}_{M B C} (α)$ is not a continuous function. The estimator ${\hat{Q}}_{M B C} (α)$ can be expressed asymptotically as a linear function of the estimated distribution function evaluated at the quantile $Q_{N_{y}} (α)$ by the Bahadur representation (see [28] ) together with the results from Theorem 2 above.

Let $F_{M B C}$ be the multiplicative bias corrected distribution function of the density $f_{M B C}$ .

Theorem 3 (Taylor’s Theorem). Let $k \geq 1$ be an integer and let the function $f : ℝ \to ℝ$ be k times differentiable at the point $a \in ℝ$ . Then there exists a function $h_{k} : ℝ \to ℝ$ such that

$f (x) = f (a) + \sum_{n = 1}^{k} \frac{f^{(n)} (a)}{n!} {(x - a)}^{n} + h_{k} (x) {(x - a)}^{k},$ (13)

and $\lim_{x \to a} h_{k} (x) = 0$ . This is called the Peano form of the remainder.

Then using Taylor series expansion of the function $F_{M B C} ({\hat{Q}}_{M B C} (α))$ around $Q_{N_{y}} (α)$ we can write

$\begin{array}{l} F_{M B C} ({\hat{Q}}_{M B C} (α)) = \sum_{n = 0}^{\infty} \frac{F_{M B C}^{(n)} (Q_{N_{y}} (α))}{n!} {[{\hat{Q}}_{M B C} (α) - Q_{N_{y}} (α)]}^{n} \\ = F_{M B C} (Q_{N_{y}} (α)) + {F^{'}}_{M B C} (Q_{N_{y}} (α)) [{\hat{Q}}_{M B C} (α) - Q_{N_{y}} (α)] \\ + {F^{″}}_{M B C} (Q_{N_{y}} (α)) {[{\hat{Q}}_{M B C} (α) - Q_{N_{y}} (α)]}^{2} + \dots \\ = F_{M B C} (Q_{N_{y}} (α)) + f_{M B C} (Q_{N_{y}} (α)) [{\hat{Q}}_{M B C} (α) - Q_{N_{y}} (α)] + O (n^{- \frac{1}{2}}) \end{array}$ (14)

where ${F^{'}}_{M B C} (Q_{N_{y}} (α)) = f_{M B C} (Q_{N_{y}} (α))$ , according to [29] since $F_{M B C}$ contains two derivatives in a $Q_{N_{y}} (α)$ neighborhood, this neighborhood is bound by the second derivative and ${F^{'}}_{M B C} (Q_{N_{y}} (α))$ is positive.

Then solving for ${\hat{Q}}_{M B C} (α)$ in Equation (14) we have the Bahadur’s representation as

$\begin{array}{l} {\hat{Q}}_{M B C} (α) - Q_{N_{y}} (α) \\ = \frac{1}{f_{M B C} (Q_{N_{y}} (α))} [F_{M B C} ({\hat{Q}}_{M B C} (α)) - F_{M B C} (Q_{N_{y}} (α))] + O (n^{- \frac{1}{2}}) \end{array}$ (15)

Moreover, it can be shown that

$\begin{array}{l} F_{M B C} ({\hat{Q}}_{M B C} (α)) - F_{M B C} (Q_{N_{y}} (α)) \\ = {\hat{F}}_{M B C} ({\hat{Q}}_{M B C} (α)) - {\hat{F}}_{M B C} (Q_{N_{y}} (α)) + O (n^{- \frac{1}{2}}) \end{array}$ (16)

Substituting the above results of Equation (16) in Equation (15) yields

$\begin{array}{l} {\hat{Q}}_{M B C} (α) - Q_{N_{y}} (α) \\ = \frac{1}{f_{M B C} (Q_{N_{y}} (α))} ({\hat{F}}_{M B C} ({\hat{Q}}_{M B C} (α)) - {\hat{F}}_{M B C} (Q_{N_{y}} (α))) + O (n^{- \frac{1}{2}}) \\ = \frac{1}{f_{M B C} (Q_{N_{y}} (α))} (α - {\hat{F}}_{M B C} (Q_{N_{y}} (α))) + O (n^{- \frac{1}{2}}) \end{array}$ (17)

where $f_{M B C} (.)$ denotes the derivative of the limiting value of $F_{M B C} (.)$ as $N \to \infty$ and ${\hat{F}}_{M B C} ({\hat{Q}}_{M B C} (α)) = α$ .

The linear approximation previously used by [30] [31] helps to study the asymptotic properties of the estimator. On the other hand, the estimator ${\hat{Q}}_{M B C} (α)$ is asymptotically unbiased because ${\hat{F}}_{M B C} (t)$ is unbiased estimator of ${\hat{F}}_{N} (t)$ (see [24] ). In this way

$\begin{array}{l} E [{\hat{Q}}_{M B C} (α) - Q_{N_{y}} (α)] \\ = E [\frac{1}{f_{M B C} (Q_{N_{y}} (α))} (α - {\hat{F}}_{M B C} (Q_{N_{y}} (α))) + O (n^{- \frac{1}{2}})] \\ = \frac{1}{f_{M B C} (Q_{N_{y}} (α))} E (α - {\hat{F}}_{M B C} (Q_{N_{y}} (α))) + O (n^{- \frac{1}{2}}) \end{array}$ (18)

but $E [α - {\hat{F}}_{M B C} (Q_{N_{y}} (α))] = 0$ and by using Equation (18) it can be seen that

$E [{\hat{Q}}_{M B C} (α)] = Q_{N_{y}} (α) + O (n^{- \frac{1}{2}})$ (19)

The bias of ${\hat{Q}}_{M B C} (α)$ is of order $O (n^{- \frac{1}{2}})$ . Thus, it converges to zero at a faster rate. Therefore, ${\hat{Q}}_{M B C} (α)$ is asymptotically unbiased.

3.2. Asymptotic Variance

Asymptotic Variance of ${\hat{Q}}_{M B C} (α)$ will be obtained as follows, Consider the Bahadur’s representation:

${\hat{Q}}_{M B C} (α) - Q_{N_{y}} (α) = \frac{1}{f_{M B C} (Q_{N_{y}} (α))} (α - {\hat{F}}_{M B C} (Q_{N_{y}} (α))) + O (n^{- \frac{1}{2}})$ (20)

Then applying variance on both side of Equation (20) we have

$\begin{array}{l} V a r [{\hat{Q}}_{M B C} (α) - Q_{N_{y}} (α)] \\ = V a r [\frac{1}{f_{M B C} (Q_{N_{y}} (α))} (α - {\hat{F}}_{M B C} (Q_{N_{y}} (α))) + O (n^{- \frac{1}{2}})] \\ = {[\frac{1}{f_{M B C} (Q_{N_{y}} (α))}]}^{2} V a r (α - {\hat{F}}_{M B C} (Q_{N_{y}} (α))) \\ = {[\frac{1}{f_{M B C} (Q_{N_{y}} (α))}]}^{2} {V a r (α) + V a r ({\hat{F}}_{M B C} (Q_{N_{y}} (α)))} \\ = {[\frac{1}{f_{M B C} (Q_{N_{y}} (α))}]}^{2} \frac{1 - f}{n} α (1 - α) \\ = \frac{1 - f}{n} α (1 - α) {[f_{M B C} (Q_{N_{y}} (α))]}^{- 2} \end{array}$ (21)

3.3. Asymptotic Mean Squared Error

The asymptotic mean squared error of the estimator ${\hat{Q}}_{M B C} (α)$ is given by

$M S E ({\hat{Q}}_{M B C} (α)) = V a r ({\hat{Q}}_{M B C} (α)) + {[B i a s ({\hat{Q}}_{M B C} (α))]}^{2}$ (22)

Substituting Equations (19) and (21) we get

$\begin{matrix} M S E ({\hat{Q}}_{M B C} (α)) = \frac{1 - f}{n} α (1 - α) {[f_{M B C} (Q_{N_{y}} (α))]}^{- 2} + {[O (n^{- \frac{1}{2}})]}^{2} \\ = \frac{1 - f}{n} α (1 - α) {[f_{M B C} (Q_{N_{y}} (α))]}^{- 2} + O (\frac{1}{n}) \end{matrix}$ (23)

Equation (23) tends to zero as $n \to \infty$ and thus $M S E ({\hat{Q}}_{M B C} (α)) \to 0$ . This shows that ${\hat{Q}}_{M B C} (α)$ is asymptotically consistent.

4. Empirical Study

The main purpose of this section is to compare the performance of the proposed estimator MBCQE with the existing quantile estimators: RKMQE, CDQE, FAQE and NWQE. In this study, two populations are considered, which are generated from the regression model given by

$y_{i} = m (x_{i}) + e_{i}$

where $1 \leq i \leq 1000$ with the following mean functions described in Table 1.

A population of 1000 auxiliary values $x_{i}$ are generated as independent and identically distributed uniform random variables, $x_{i} \in [0,1]$ . The mean functions

Table 1. Mean functions used in the simulation study.

represent a class of correct and incorrect model specifications for the estimators being considered. The errors are assumed to be independent and identically distributed (i.i.d) normal random variables having mean 0 and standard deviation, $σ = 1$ . They contain 1000 units and the population is simulated as i.i.d uniform random variables. The population values ${y^{'}}_{i} s$ are generated from the mean functions by adding the errors ${e^{'}}_{i} s$ in each of the cases. 1000 samples are simulated using simple random sampling without replacement for each case.

Nadaraya-Watson kernel weights are used in the smoothing of $y_{i}$ to obtain the rough estimator, ${\tilde{μ}}_{n} (x_{i}) = \sum_{j = 1}^{n} w_{i} (x; l) y_{j}$ , of the mean function $(x_{i})$ . A

ratio $β_{i} = \frac{y_{i}}{{\tilde{μ}}_{n} (x_{i})}$ is evaluated and is smoothed further to obtain the correction

factor $\hat{α} (x_{i})$ which is then used together with the rough estimator to obtain the multiplicative bias corrected estimator, ${\hat{μ}}_{n} (x_{i})$ , of the mean function.

The existing estimators for quantile functions for finite populations that were used for comparison with our developed estimator Multiplicative Bias Corrected Quantile Estimator (MBCQE);

${\hat{Q}}_{M B C} (α) = \inf {t \in U : {\hat{F}}_{M B C} (t) > α}$

are:

1) Chamber and Dunstan Quantile Estimator (CDQE):

${\hat{Q}}_{y, C D, α} = \inf {t | {\hat{F}}_{y, C D} (t) \geq α}$

2) Nadaraya Watson Quantile Estimator (NWQE):

${\tilde{Q}}_{N W} (p) = \inf {x : {\hat{F}}_{n} (x) \geq p}, 0 < p < 1$

3) Rao Kovar Mantel Quantile Estimator (RKMQE):

${\hat{Q}}_{R K M; α} = {\hat{F}}_{r k m}^{- 1} ( α )$

4) Dorfman and Hall Quantile Estimator (FAQE):

${\hat{Q}}_{D H} (p) = \inf {t : {\hat{F}}_{D H} (t) \geq p}$

The results of this simulation study are summarized in Table 2. Table 2 shows the unconditional Biases, Relative Mean Error (RME) and Relative Root Mean Squared Error (RRMSE) for the estimators at various values of the quantile $α$ (i.e. 0.25, 0.5 and 0.75). Linear and cosine mean functions were used to obtain the tabulated results. Similar results and conclusions can be obtained using other mean functions such as quadratic, sine, bump etc. To analyze the performance of the proposed estimator against some specified estimators, unconditional Relative

Table 2. Unconditional biases, relative mean errors and relative root mean squared errors.

Mean Error and Relative Root Mean Squared Errors for the estimator ${\hat{Q}}_{n, X} (p)$ are computed as

$RME = \frac{1}{Q (p)} {\frac{1}{N} \sum_{s = 1}^{N} ({\hat{Q}}_{n, X}^{(s)} (p) - Q (p))}$ (24)

and

$RRMSE = \sqrt{\frac{1}{Q (p)} {\frac{1}{N} \sum_{s = 1}^{N} {({\hat{Q}}_{n, X}^{(s)} (p) - Q (p))}^{2}}}$ (25)

where ${\hat{Q}}_{n, X}^{(s)} (p)$ is the quantile corresponding to the s^th simulated sample ${X_{1}^{s}, X_{2}^{s}, \dots, X_{n}^{s}}$ and N is the number of replications. The RME indicates the measure of how close the estimator being considered is from the actual value, while RRMSE indicates measure of accuracy of the estimator. For instance, an estimator, MBCQE, will be said to be “better” or more preferable than the other estimators if its RRMSE is comparably smaller.

Bias of a quantile estimator refers to the deviation of the expected value of the estimator from the true quantile value. All of the quantile estimators considered here are biased but comparetively MBCQE exhibits a smaller bias. MBCQE can be seen to be a very efficient estimator of the empirical quantile function at all levels of the α-quantile followed closely by RKMQE and FAQE. CDQE proved to be a very inefficient estimator at all levels of α.

Further, comparison of estimators was done with respect to empirical quantile function which further affirmed the results tabulated above. Table 3 and Table 4 give a tabulation of all the estimators listed below.

Table 3. Quantile estimates for linear mean function.

Table 4. Quantile estimates for cosine mean function.

CDQE overestimates the empirical quantile function at all points while MBCQE give an almost perfect estimation of the empirical quantile function. On the other hand, NWQE underestimates the true quantile function at some points towards the lower tail while it overestimates the same function at other points along the upper tail.

The conditional performance of the estimator was done and compared with the performance of other existing quantile estimators. To do this, 200 random samples, all of size 400, were selected and the mean of the auxiliary values $x_{i}$ was computed for each sample to obtain 200 values of $\bar{X}$ . These sample means were then sorted in ascending order and further grouped into clusters of size 20 such that a total of 10 groups were realized. Further, group means of the means of auxiliary variables was calculated to get $\bar{\bar{X}}$ . Empirical means and biases were then computed for all the estimators RKMQE, CDQE, FAQE, NWQE and MBCQE. The conditional biases were plotted against $\bar{\bar{X}}$ to provide a good understanding of the pattern generated. Figures 1-6 show the behavior of the conditional biases, relative absolute biases and mean squared error realized by all the estimators of quantile functions under linear and cosine mean functions at various values of the quantile $α$ (i.e. 0.25, 0.5 and 0.75).

In most cases, there are significant differences among the bias characteristics of the various estimators. A detailed examination of the plots reveals that MBCQE and RKMQE have lower levels of bias overall, as indicated by the

Figure 1. Conditional biases, RAB and MSE for the estimators using a linear mean function at $α = 0.25$ .

Figure 2. Conditional biases, RAB and MSE for the estimators using a linear mean function at $α = 0.5$ .

Figure 3. Conditional biases, RAB and MSE for the estimators using a linear mean function at $α = 0.75$ .

Figure 4. Conditional biases, RAB and MSE for the estimators using a cosine mean function at $α = 0.25$ .

Figure 5. Conditional biases, RAB and MSE for the estimators using a cosine mean function at $α = 0.5$ .

Figure 6. Conditional biases, RAB and MSE for the estimators using a cosine mean function at $α = 0.75$ .

proximity of plotted curves to the horizontal (no bias) line at 0.0 on the vertical axis. Interestingly, despite the rather entangled nature of some of the plots, estimator MBCQE emerges clearly as the least biased for nearly every group means of the means of auxiliary variables and quantile level. For the median, several estimators exhibit identical bias, and for most of the estimators, bias is not symmetrical with respect to quantile level.

Plots of Conditional MSE versus group means of the means of auxiliary variables similarly reveal coincident behavior for the quantiles. MBCQE and RKMQE produce generally the lowest MSE values. In particular, MBCQE yields the lowest MSE in most cases among all other estimators. MBCQE is consistently better than all other estimators for both bias and MSE. All of these estimators are asymptotically unbiased and they all exhibit MSE consistency in that the MSE values tend toward zero as sample size increases.

From the plots it can be seen that MBCQE and RKMQE performed equally better than all other estimators of the true quantile function and it can be seen that sample balancing does not affect the performance of the estimators.

5. Conclusions and Suggestions

In conclusion, using the results from Table 2-4 and Figures 1-6, MBCQE was found to be an efficient estimator of the quantile function for finite population. NWQE was found to be very inefficient of all the estimators with large conditional bias, relative absolute bias and mean squared error compared to the other estimators. MBCQE can therefore be used in estimating quantile functions for various units in the population in various sectors of the economy. Finally, further work can be done on the construction of confidence intervals for the proposed estimator, and a researcher can investigate various bias correction strategies such as Adaptive Boosting and the Bootstrap bias reduction techniques in quantile function estimation.

Acknowledgements

Sincere thanks to the Pan-African University Institute of Basic Sciences, Technology and Innovation (PAUSTI) for funding this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	Wand, M.P., Marron, J.S. and Ruppert, D. (1991) Transformations in Density Estimation. Journal of the American Statistical Association, 86, 343-353. https://doi.org/10.1080/01621459.1991.10475041
[2]	Gasser, T. and Müller, H.-G. (1979) Kernel Estimation of Regression Functions. In: Smoothing Techniques for Curve Estimation, Springer, Berlin, 23-68. https://doi.org/10.1007/BFb0098489
[3]	Müller, H.-G. (1991) Smooth Optimum Kernel Estimators near Endpoints. Biometrika, 78, 521-530. https://doi.org/10.1093/biomet/78.3.521
[4]	Müller, H.-G. and Wang, J.-L. (1994) Hazard Rate Estimation under Random Censoring with Varying Kernels and Bandwidths. Biometrics, 50, 61-76. https://doi.org/10.2307/2533197
[5]	John, R. (1984) Boundary Modification for Kernel Regression. Communications in Statistics—Theory and Methods, 13, 893-900. https://doi.org/10.1080/03610928408828728
[6]	Schuster, E.F. (1985) Incorporating Support Constraints into Nonparametric Estimators of Densities. Communications in Statistics—Theory and Methods, 14, 1123-1136. https://doi.org/10.1080/03610928508828965
[7]	Marron, J.S. and Ruppert, D. (1994) Transformations to Reduce Boundary Bias in Kernel Density Estimation. Journal of the Royal Statistical Society: Series B (Methodological), 56, 653-671. https://doi.org/10.1111/j.2517-6161.1994.tb02006.x
[8]	Jones, M.C. (1993) Simple Boundary Correction for Kernel Density Estimation Statistics and Computing. Statistics and Computing, 3, 135-146. https://doi.org/10.1007/BF00147776
[9]	Fan, J. and Gijbels, I. (1992) Variable Bandwidth and Local Linear Regression Smoothers. The Annals of Statistics, 20, 2008-2036. https://doi.org/10.1214/aos/1176348900
[10]	Fan, J. (1993) Local Linear Regression Smoothers and Their Minimax Efficiencies. The Annals of Statistics, 21, 196-216. https://doi.org/10.1214/aos/1176349022
[11]	Devroye, L. (1985) Nonparametric Density Estimation. The L1 View. John Wiley, Hoboken.
[12]	Jones, L.O. and Nielsen, J. (1995) A Simple Bias Reduction Method for Density Estimation. Biometrika, 82, 327-338. https://doi.org/10.1093/biomet/82.2.327
[13]	Denuit, M., Goovaerts, M. and Kaas, R. (2006) Actuarial Theory for Dependent Risks: Measures, Orders and Models. John Wiley & Sons, Hoboken. https://doi.org/10.1002/0470016450
[14]	Nadaraya, E.A. (1964) Some New Estimates for Distribution Functions. Theory of Probability & Its Applications, 9, 497-500. https://doi.org/10.1137/1109069
[15]	Lio, Y. and Padgett, W. (1991) A Note on the Asymptotically Optimal Bandwidth for Nadaraya’s Quantile Estimator. Statistics & Probability Letters, 11, 243-249. https://doi.org/10.1016/0167-7152(91)90150-P
[16]	Jones, M.C. (1992) Estimating Densities, Quantiles, Quantile Densities and Density Quantiles. Annals of the Institute of Statistical Mathematics, 44, 721-727. https://doi.org/10.1007/BF00053400
[17]	Charpentier, A. and Oulidi, A. (2010) Beta Kernel Quantile Estimators of Heavy-Tailed Loss Distributions. Statistics and Computing, 20, 35-55. https://doi.org/10.1007/s11222-009-9114-2
[18]	Sayah, A. (2012) Kernel Quantile Estimation for Heavy-Tailed Distributions.
[19]	Linton, O. and Nielsen, J.P. (1994) A Multiplicative Bias Reduction Method for Nonparametric Regression. Statistics & Probability Letters, 19, 181-187. https://doi.org/10.1016/0167-7152(94)90102-3
[20]	Malenje, B.M., Mokeira, W.O., Odhiambo, R. and Orwa, G.O. (2016) A Multiplicative Bias Corrected Nonparametric Estimator for a Finite Population Mean. American Journal of Theoretical and Applied Statistics, 5, 317-325. https://doi.org/10.11648/j.ajtas.20160505.21
[21]	Hengartner, N., Matzner-Løber, E., Rouviere, L. and Burr, T. (2009) Multiplicative Bias Corrected Nonparametric Smoothers.
[22]	Burr, T., Hengartner, N., Matzner-Løber, E., Myers, S. and Rouviere, L. (2010) Smoothing Low Resolution Gamma Spectra. IEEE Transactions on Nuclear Science, 57, 2831-2840. https://doi.org/10.1109/TNS.2010.2054110
[23]	Stephane, K.T., Otieno, R.O. and Mageto, T. (2017) A Multiplicative Bias Correction for Nonparametric Approach and the Two Sample Problem in Sample Survey. Open Journal of Statistics, 7, 1053. https://doi.org/10.4236/ojs.2017.76073
[24]	Onsongo, W.M., Otieno, R.O. and Orwa, G.O. (2018) Bias Reduction Technique for Estimating Finite Population Distribution Function under Simple Random Sampling without Replacement. International Journal of Statistics and Applications, 8, 259-266.
[25]	Serfling, R. (1980) Approximation Theorems of Mathematical Statistics. John Wiley and Sons, New York. https://doi.org/10.1002/9780470316481
[26]	Francisco, C.A. (1987) Estimation of Quantiles and the Interquartile Range in Complex Surveys. The Annals of Statistics.
[27]	Francisco, C.A. and Fuller, W.A. (1991) Quantile Estimation with a Complex Survey Design. The Annals of Statistics, 19, 454-469. https://doi.org/10.1214/aos/1176347993
[28]	Chambers, R.L. and Dunstan, R. (1986) Estimating Distribution Functions from Survey Data. Biometrika, 73, 597-604. https://doi.org/10.1093/biomet/73.3.597
[29]	Bahadur, R.R. (1966) A Note on Quantiles in Large Samples. The Annals of Mathematical Statistics, 37, 577-580. https://doi.org/10.1214/aoms/1177699450
[30]	Kuk, A.Y. and Mak, T. (1989) Median Estimation in the Presence of Auxiliary Information. Journal of the Royal Statistical Society: Series B (Methodological), 51, 261-269. https://doi.org/10.1111/j.2517-6161.1989.tb01763.x
[31]	Chen, J. and Wu, C. (2002) Estimation of Distribution Function and Quantiles Using the Model-Calibrated Pseudo Empirical Likelihood Method. Statistica Sinica, 12, 1223-1239.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies