Bayesian Interval Estimation of the Prevalence Rate Using Pool Testing Strategy with Retesting

Velmah Kiprop; Matiri George Munene; Luke Akongo Orawo

doi:10.4236/ojs.2025.153017

Open Journal of Statistics > Vol.15 No.3, June 2025

Bayesian Interval Estimation of the Prevalence Rate Using Pool Testing Strategy with Retesting

Velmah Kiprop, Matiri George Munene, Luke Akongo Orawo
Mathematics Department, Egerton University, Njoro Nakuru, Kenya.
DOI: 10.4236/ojs.2025.153017 PDF HTML XML 4 Downloads 33 Views

Abstract

Accurate estimation of disease prevalence is crucial for effective public health intervention and resource allocation. Generating data by individual testing methods is often impractical and expensive for large populations, particularly when disease prevalence is low. Pool testing involves combining samples from multiple individuals into a pool and performing a single test, and offers a cost-effective and efficient alternative. In pool testing strategy with retesting, if a pool tests negative, it is classified as non-defective, whereas if it is positive, then a retest is needed. The retesting strategy mitigates the effects of initial test errors, thereby enhancing the accuracy of the estimation of the prevalence rate. Evidence in the literature indicates that the traditional Wald method has been used to construct approximate confidence intervals for the prevalence rate. However, this interval estimation method is based on the normality approximation and hence may not be accurate when the true prevalence rate is close to zero. In this paper, we propose a Bayesian interval estimation approach which is not affected by extreme values of the prevalence rate and allows for incorporating prior information about the prevalence rate. We assumed that the prior distribution for the unknown prevalence rate $p$ is a Beta distribution with parameters $α_{0}$ and $β_{0}$ and based on pool testing outcomes for the $n$ pools each of size $k$ , $100 (1 - α) %$ credible intervals were constructed from the resulting posterior distribution. Simulation studies were carried out to compare the efficiencies of the Bayesian and Wald interval estimation methods for various values of $p$ .

Keywords

Pool Testing, Specificity, Sensitivity, Trinomial Distribution, Posterior Distribution, Gridding Method, Credible Intervals, Coverage Probability

Share and Cite:

Kiprop, V. , Munene, M. and Orawo, L. (2025) Bayesian Interval Estimation of the Prevalence Rate Using Pool Testing Strategy with Retesting. Open Journal of Statistics, 15, 323-336. doi: 10.4236/ojs.2025.153017.

1. Introduction

Estimating disease prevalence is among the significant activities in the conduct of epidemiological studies. One has to note that traditional individual testing can be very costly, and in most cases requires a lot of time and resources especially when it is carried out on large samples. The pool testing technique, which began with Dorfman [1], provides an alternative economical way of testing clinical samples and involves putting several clinical samples into a pool or group to be tested in a diagnostic test. In addition, pool testing is also ideal when the population has a low disease prevalence rate. Pool testing with retesting consists of testing a pool, and if the pool tests positive, the pool is tested again. When a pool tests positive on the retest, individual members of that pool are tested and the pool is categorized as either defective or non-defective. These aids in reducing the loss of sensitivity which may be evident in pool testing. It has been demonstrated that the use of pool testing techniques with retesting decreases the number of misclassifications, secondly it increases the efficiency of the testing kit and thirdly it increases the efficiency of the estimator of the prevalence rate [2] [3]. Nyongesa [4] considered a single-stage pool design to estimate the prevalence of a disease in the presence of inspection errors. He investigated the effects of specificity and sensitivity, and sample size on the efficiency of the estimator of the prevalence rate and found that the estimator is more efficient for large sample sizes and low specificity and sensitivity.

Bayesian techniques have been applied in many epidemiological settings, such as disease monitoring, outbreak simulation, and prevalence quantification. For instance, McDonald and Hodson [5] applied Bayesian techniques to predict disease prevalence based on tests that are not perfect, and their study showed that this approach has superior efficiency to the other methods in the same context.

The Bayesian approach to prevalence rate estimation incorporates the prior information about the prevalence rate and combines it with the data collected to estimate the prevalence rate. To do this, one has to assign a prior distribution to the prevalence rate, form the likelihood of the prevalence rate based on the observed data and then apply the Bayes’ Theorem to obtain the posterior distribution of the prevalence rate. The estimation of the prevalence rate is based on the posterior distribution. Bayesian method has been used in the estimation of the prevalence rate in group testing as explained by Liu and Liu [6]. The study focused on describing Bayesian interval estimation using Beta priors and binomial likelihoods; the study also showed the superiority of using Bayesian credible intervals in the group testing paradigm. Indeed, the authors explained how the addition of prior information could enhance estimation, especially where the prevalence was low.

Spiegelhalter and Best [7] wrote about the object of applying Bayesian methods to complex models in epidemiology, with the self-note that Bayesian credible intervals are suitable for the modeling of the costs and effects in the cost effectiveness analysis. They demonstrated how even with all the different approaches to evidence and uncertainty EA is a powerful tool that can be utilized for decision-making in the field of public health. Gelman et al. [8] describe the general Bayesian approach to statistical data analysis and for single parameter models such as the binomial distribution, they give illustrative practical examples of the construction and interpretation of Bayesian credible intervals for the unknown parameter of interest.

In the study done by Brett, Rohani, and Drake [9], bayesian methods of prediction of epidemic transitions with observational data of less accuracy were used. To measure the uncertainty of the estimates of some important epidemiological parameters, they used Bayesian credible intervals and pointed out that intervals of this kind are rather robust to data uncertainty and model misspecification.

Orawo [10] compared four interval estimation methods for the construction of the interval estimate of the binomial proportion: These include, the Wilson, Clopper-Pearson, likelihood and Wald confidence intervals. The analysis of the study, via simulation, established that the Clopper-Pearson interval was a conservative interval both for small and big samples and, on the other hand, the Wald confidence interval had coverage probabilities lower than the nominal level and had the problem of overshoot. The Wilson and likelihood intervals are relatively close in terms of coverage probabilities near the nominal level, with the likelihood interval also being slightly shorter.

Hepworth [11] compared four confidence interval methods of estimating the proportion of positively classified units when the number of patterns in each group is different and the sample is very large. He stated that these methods should be compared to the exact likelihood ratio, which is preferable to the precise method. However, Biggerstaff [12] pointed out the confidence interval method for the difference of two proportions based on the pooled sample.

The applicability of Bayesian statistical methods in interval estimation of prevalence rate through pool testing with retesting strategy has remained relatively uncharted. The traditional Wald method has been applied in constructing the confidence interval for the prevalence rate based on data generated by different pool testing designs; however, it is not accurate when the normal approximation to the binomial is poor. This is expected to occur since the prevalence rate is low and is close to zero. Other intervals such as Clopper-Pearson and Wilson intervals for a proportion, are restricted to binomial sampling and thus are not possible to compute in a complex trinomial probability model, resulting from a pool testing with retesting, whose probabilities involve the prevalence rate. This paper considers the numerical construction of Bayesian credible intervals and compares them with the Wald confidence intervals on the basis of coverage probabilities and mean interval lengths based on simulated data.

The paper is organized as follows: Section 2 describes the pool testing with retesting design. In section 3, the trinomial probability model is derived from the retesting design. The Bayes credible intervals and Wald confidence intervals are outlined in section 4. In section 5, the simulation results on the coverage probability and mean interval length of the interval estimates, generated by the two methods, are presented. Section 6 is devoted to concluding remarks.

2. The Retesting Design

Suppose that a population of the size $N$ is divided into $n$ groups each of size $k$ . Each of the $n$ pools is subjected to an initial test. Pools that test positive are retested. On the other hand, further testing is discontinued for pools that test negative. If a pool tests positive on re-testing, its constituent members are tested individually for the presence of characteristics of interest. Retesting is considered to improve the sensitivity and specificity of the testing scheme. It has also been proved that retesting reduces misclassification significantly compared to the Dorfman procedure and enhances the efficiency of the testing kit.

Figure 1 illustrates the process of pool testing with retesting.

The figure shows the $n$ constructed groups and the test result on the $i^{t h}$ group, $i = 1, 2, \dots, n$ .

Figure 1. Pool testing with retesting design.

The following indicator functions was used to classify the $n$ binary observations generated by the pool testing with the retesting procedure

$T_{i} = {\begin{cases} 1; if the i^{t h} pool test positive on the test kit \\ 0; otherwise \end{cases}$

$T_{i}^{1} = {\begin{cases} 1; if the i^{t h} pool test positive on the retest on the test kit \\ 0; otherwise \end{cases}$

$D_{i} = {\begin{cases} 1; if the i^{t h} pool is positive \\ 0; otherwise \end{cases}$

$T_{i j} = {\begin{cases} 1; if the j^{t h} individual in an i^{t h} pool test positive on the test kit \\ 0; otherwise \end{cases}$

and

$δ_{i j} = {\begin{cases} 1; if the j^{t h} individual in the i^{t h} group is positive with probability p \\ 0; otherwise \end{cases}$

The observations of the constituent members of the $i^{t h}$ Group was denoted by $(δ_{i 1}, δ_{i 2}, \dots, δ_{i k})$ . By definition,

$P (D_{i} = 0) = P (δ_{i 1} = 0, δ_{i 2} = 0, \dots, δ_{i k} = 0)$ .

Suppose that the constituent members of a group are assumed to act independently of each other; then.

$P (D_{i} = 0) = {(1 - p)}^{k}$ , where $p$ is the prevalence rate.

3. The Model Formation

Let the random variables $X_{1}$ and $X_{2}$ denote the number of groups that test positive and negative on the initial test, respectively. Also, denote by $X_{11}$ and $X_{12}$ the number of groups that test positive and negative on the retest, respectively. Utilizing the indicator functions $T_{i}, T_{i}^{1}$ and $D_{i}$ and Figure 1, the probability of declaring the $i^{t h}$ pool negative on the initial test $λ_{1} = p r (T_{i} = 0)$ is derived as follows. By the law of total probability

$\begin{matrix} λ_{1} = P r (T_{i} = 0, D_{i} = 0 or D_{i} = 1) \\ = P r (T_{i} = 0, D_{i} = 0) + P r (T_{i} = 0, D_{i} = 1) \\ = P r (D_{i} = 0) P r (T_{i} = 0 / D_{i} = 0) + P r (D_{i} = 1) P r (T_{i} = 0 / D_{i} = 1) \\ = {(1 - p)}^{k} β + (1 - {(1 - p)}^{k}) (1 - η), \end{matrix}$

where $η$ is the sensitivity (the probability of correctly classifying a defective pool), and $β$ is the specificity (the probability of correctly classifying a non-defective pool).

Similarly, the probability of declaring a pool as negative on retesting pool that is initially classified as positive is given by; $λ_{2} = p r (T_{i} = 1, T_{i}^{1} = 0)$ and is obtained as

$\begin{matrix} λ_{2} = P r (T_{i} = 1, T_{i}^{1} = 0, D_{i} = 1 or D_{i} = 0) \\ = P r (T_{i} = 1, T_{i}^{1} = 0, D_{i} = 0) + P (T_{i} = 1, T_{i}^{1} = 0, D_{i} = 1) \\ = P r (T_{i} = 1, T_{i}^{1} = 0 / D_{i} = 0) P r (D_{i} = 0) \\ + P r (T_{i} = 1, T_{i}^{1} = 0 / D_{i} = 1) P r (D_{i} = 1) \\ = β (1 - β) {(1 - p)}^{k} + η (1 - η) (1 - {(1 - p)}^{k}) . \end{matrix}$

Finally, the probability of declaring a pool positive on retesting the initially announced positive pool can be derived as

$\begin{matrix} λ_{3} = P r (T_{i} = 1, T_{i}^{1} = 1) \\ = P r (T_{i} = 1, T_{i}^{1} = 1, D_{i} = 1 or D_{i} = 0) \\ = P r (T_{i} = 1, T_{i}^{1} = 1, D_{i} = 0) + P (T_{i} = 1, T_{i}^{1} = 1, D_{i} = 1) \\ = P r (T_{i} = 1, T_{i}^{1} = 1 / D_{i} = 0) P r (D_{i} = 0) \\ + P r (T_{i} = 1, T_{i}^{1} = 1 / D_{i} = 1) P r (D_{i} = 1) \\ = {(1 - β)}^{2} {(1 - p)}^{k} + η^{2} (1 - {(1 - p)}^{k}) . \end{matrix}$

The above three probabilities $λ_{1}, λ_{2}$ and $λ_{3}$ are used to formulate the model of group testing with retesting as the joint probability distribution of the random variables $X_{2}, X_{11}$ and $X_{12}$ with joint probability mass function

$f (x_{2}, x_{11}; p) = \frac{n!}{x_{2}! x_{11}! (n - x_{2} - x_{11})!} λ_{1}^{x_{2}} λ_{2}^{x_{11}} {(1 - λ_{1} - λ_{2})}^{n - x_{2} - x_{11}}$ , (1)

which is a trinomial probability model with parameters $n, λ_{1}, λ_{2}$ .

The likelihood function of the parameter $p$ is proportional to the trinomial probability mass function in (1). Taking $C = 1 / (\frac{n!}{x_{2}! x_{11}! x_{12}!})$ as the constant of proportionality, the likelihood function is given as

$L (p) \propto λ_{1}^{x_{2}} λ_{2}^{x_{11}} {(1 - λ_{1} - λ_{2})}^{n - x_{2} - x_{11}}$ . (2)

The log-likelihood function is

$l (p) \propto x_{2} \log (λ_{1}) + x_{11} \log (λ_{2}) + (n - x_{2} - x_{11}) \log (1 - λ_{1} - λ_{2})$ (3)

4. Interval Estimation Methods

Interval estimation is a statistical technique used to estimate the range within which a population parameter is expected to lie with a certain level of confidence. It provides information regarding the closeness of a point estimate to the true parameter value by giving a range of plausible values for the parameter.

4.1. Wald Confidence Interval

The Wald confidence interval is the most widely used approximate confidence interval and is based on the asymptotic normality property of the maximum likelihood estimator of the parameter of interest. Let $\hat{p}$ be the MLE of the prevalence rate $p$ obtained by maximizing the log-likelihood function in equation (3). Then by the asymptotic property of the MLE the sampling distribution of $\frac{\hat{p} - p}{S E (\hat{p})}$ is approximately standard normal. The standard error is defined as $S E (\hat{p}) = \frac{1}{\sqrt{I (\hat{p})}}$ , where $I (\hat{p})$ is the observed Fisher information. Given $α \in (0, 1)$ , the $100 (1 - α) %$ Wald confidence interval of the prevalence rate $p$ is given by $\hat{p} \pm Z_{\frac{α}{2}} S E (\hat{p})$ , where $Z_{\frac{α}{2}}$ is the upper $100 (\frac{α}{2}) t h$ percentile of the standard normal distribution. The most commonly used values of α are 0.1, 0.05, and 0.01, which correspond to confidence levels of 90%, 95%, and 99%, respectively.

4.2. Bayesian Credible Intervals

In the Bayesian credible interval approach, the prevalence rate parameter $p$ is a random variable with on probability density function on the parameter space, $Θ = (0, 1)$ . The joint probability mass function in equation (1) will be considered as a conditional mass function of $X_{2}, X_{11}, X_{12}$ given $p$ and written as $f (x_{2}, x_{11}, x_{12} / p)$ . The unknown parameter is assigned a probability distribution, denoted by $π (p)$ , called the prior distribution of $p$ , This reflects an experimenter’s subjective belief regarding which p-values are more or less likely when one considers the whole parameter space. The evidences about $p$ from the prior distribution $π (p)$ and the likelihood function $L (p)$ are combined by means of the Bayes Theorem to come up with what is known as a posterior distribution. In this study, we use the Beta distribution as a prior for the parameter of interest because the distribution is flexible and appropriate for modeling proportions. The Beta distribution has many shapes and is popularly used in Bayesian statistics for modeling an unknown proportion The values of parameters of the Beta distribution can be chosen so that it reflects a prior information about a small proportion. Since the prevalence rate is close to zero using a non-informative uniform distribution over the interval (0,1) may lead to inaccurate interval estimates.

Let $g (x_{2}, x_{11}, x_{12}, p)$ be the joint probability distribution of the $X_{2}, X_{11}, X_{12}$ and $p$ . The posterior distribution of $p$ given $x_{2}, x_{11}, x_{12}$ , denoted by $π (p / x_{2}, x_{11}, x_{12})$ , is defined as

$π (p / x_{1}, x_{11}, x_{12}) = \frac{g (x_{2}, x_{11}, x_{12}, p)}{\int_{0}^{1} g (x_{2}, x_{11}, x_{12}, p) d p}$ . (4)

The credible interval of the prevalence rate $p$ is constructed using the above posterior distribution and is defined as follows: for fixed $α \in (0, 1)$ , a subset $Θ^{*}$ of the parameter space $Θ$ is called $100 (1 - α) %$ credible interval for $p$ if and only if the posterior probability of the subset $Θ^{*}$ is at least $1 - α$

$\int_{Θ^{*}} π (p / x_{2}, x_{11}, x_{12}) d p \geq 1 - α$ . (5)

For a given value $α \in (0, 1)$ there is a long list of $(1 - α) 100 %$ competing credible intervals, and the optimal credible interval should be short and includes only those values of $p$ which are very likely according to the posterior distribution. Such a credible interval is called highest posterior density (HPD) and is defined as follows: for a fixed $α \in (0, 1)$ , a subset $Θ^{*}$ of the parameter space $Θ$ is called $100 (1 - α) %$ HPD credible interval for $p$ if and only if the subset $Θ^{*}$ has the following form:

${θ : π (p / x_{1}, x_{11}, x_{12}) \geq k}$ , (6)

where $k$ depends on $α$ and data, and is the largest constant such that the probability of $Θ^{*}$ is at least $1 - α$ . If the posterior density is symmetric about its finite mean, then the $100 (1 - α) %$ HPD credible interval $Θ^{*}$ will be the shortest, equal-tailed and symmetric about the posterior mean.

We note that it is not possible to derive the analytical form of the posterior density in (4) and therefore we apply the gridding approximation algorithm to simulate it. Suppose that G values of the prevalence rate $p$ are simulated from the posterior density. Then an equal-tailed credible interval is computed as $(q_{\frac{α}{2}}, q_{1 - \frac{α}{2}})$ , where $q_{z}$ is the z-quantile of the posterior distribution.

5. Simulation Study

In this section, simulation studies are conducted to evaluate the performance of the Bayesian interval estimation method relative to the traditional Wald method. Various simulation scenarios with different prevalence rates $p$ , pool sizes $k$ , and pool numbers $n$ are considered. For each scenario, the performances of Bayesian credible interval and Wald confidence interval for the prevalence rate $p$ were compared on the basis of coverage probability and expected length. Let $k$ and $r$ denote the observed values of the random variables $X_{2}$ and $X_{11}$ and $f (k, r; p)$ denote the value of the trinomial probability mass function in (1). It follows that for any confidence interval method for estimating the prevalence rate $p$ the actual coverage probability at a fixed value of $p$ is given by

$C p (p, k, r, n) = \sum_{k} \sum_{r} I (k, r, n) f (k, r; p)$ , (7)

where the indicator function $I (k, r, n)$ takes the value 1 if the interval covers the value of $p$ and the value 0 if the interval does not cover the value of $p$ . Let $L (X_{2}, X_{11})$ and $U (X_{2}, X_{11})$ denote the lower and upper confidence limits, respectively. The expected length of the random interval $[L (X_{2}, X_{11}), U (X_{2}, X_{11})]$ is given by

$E L (p, k, r, n) = \sum_{k} \sum_{r} [U (k, r) - L (k, r)] f (k, r; p)$ , (8)

For the simulation study we arbitrarily set the number of pools at $n = 100$ . The 95% Bayesian credible intervals and 95% Wald confidence intervals were constructed for 1000 simulations of $(X_{2}, X_{11})$ for a fixed values of the prevalence rate $p$ and pool size $k$ . The coverage probability and expected length are then computed for each of the two interval estimation methods. The values of the prevalence rates and pool sizes used in the simulation study are $p = 0.003, 0.01, 0.05, 0.1$ and $k = 5, 15, 20$ . Table 1 and Table 2 respectively show the computed coverage probabilities and mean interval length (in parenthesis) for the Bayesian credible intervals and Wald confidence intervals for various values of $p$ and $k$ . The Bayesian credible interval has coverage probabilities which are higher than the nominal level and the mean interval lengths are extremely short. The coverage probabilities draw closer to the nominal level as pool size increases. On the other hand, the coverage probabilities for the Wald confidence interval are lower but on average closer to the nominal level and the mean interval lengths are very large as compared to those of credible intervals.

To investigate the effects of group size $k$ and the prevalence rate $p$ on the performance of the two methods of interval estimation, coverage probabilities and mean lengths of the credible and Wald confidence intervals were computed repeatedly for different pairs of values of $k$ and $p$ . The following three pairs of values of $k$ and $p$ were arbitrarily chosen and used in the simulation study: $(10, 0.1)$ , $(20, 0.1)$ and $(10, 0.01)$ . Figure 2 and Figure 3 show plots of coverage probabilities and mean interval lengths of the 95% Bayesian credible intervals and 95% Wald confidence intervals, respectively, when $k = 10$ and $p = 0.1$ . The coverage probabilities (in Figure 2) for both types of interval estimates for the prevalence rate show short upward and downward spikes and are close to the nominal level, however, most coverage probabilities for the Wald confidence intervals are below the normimal level as compared to those of the Bayesian credible intervals which are evenly distributed about the nominal level. The graphs in Figure 3 demonstrate that on average the mean interval length for the Bayes credible intervals are smaller than that of the wald comfidence interval.

Table 1. Coverage probabilities and mean interval lengths (in parenthesis) of the nominal 95% Bayesian interval credible intervals for the prevalence rate $p$ .

$k$ $p$	5	15	20
0.05	0.973 [0.0000156]	0.957 [0.0000088]	0.956 [0.000005806]
0.01	0.986 [0.00001702]	0.963 [0.0000102]	0.952 [0.00000941]
0.05	0.983 [0.00003764]	0.965 [0.0000258]	0.959 [0.0000240]
0.1	0.988 [0.00004985]	0.972 [0.00003864]	0.980 [0.00004966]

Table 2. Coverage probabilities and mean interval lengths (in parenthesis) of the nominal 95% Wald confidence intervals for the prevalence rate $p$ .

$k$ $p$	5	15	20
0.05	0.998 [0.030517]	0.909 [0.008167]	0.950 [0.00673]
0.01	0.931 [0.020388]	0.938 [0.010986]	0.941 [0.009811]
0.05	0.942 [0.04274]	0.953 [0.02934]	0.938 [0.02775]
0.1	0.945 [0.06341]	0.949 [0.05522]	0.961 [0.06386]

Figure 2. Plots of coverage probabilities of the 93% credible intervals and 95% Wald confidence intervals when $k = 10$ and $p = 0.1$ .

Figure 3. Plots of mean interval lengths of the 95% credible intervals and 95% Wald confidence intervals when $k = 10$ and $p = 0.1$ .

Figure 4 and Figure 5 show the graphs of coverage probabilities and mean interval lengths for the 95% credible intervals and 95% Wald confidence intervals when the group size is increased to 20 and the prevalence rate is kept constant at 0.1. Figure 4 shows that the coverage probability of the Wald confidence interval is increased above the nominal level when the group size is increased to 20. On the other hand, the coverage probabilities of Bayesian credible interval are close to the nominal level and are not affected by the upward change in group size. In Figure 4, the mean interval lengths of the 95% Wald confidence intervals have long upward and downward spikes as compared to those of the credible intervals, which are short and uniform. This implies that Wald method produces wider confidence intervals when group size is increased and hence is less accurate. Increasing group size reduces the number of groups (less data) and hence the Wald method performs poorly since it is based on large sample theory.

Figure 4. Plots of coverage probabilities of the 95% credible intervals and 95% Wald confidence intervals when $k = 20$ and $p = 0.1$ .

Figure 5. Plots of mean interval lengths of the 95% credible intervals and 95% Wald confidence intervals when $k = 20$ and $p = 0.1$ .

Lastly, Figure 6 and Figure 7 present the plots of coverage probabilities and mean interval lengths for the 95% credible intervals and 95% Wald confidence intervals when the group size is kept constant at 20 and prevalence rate is decreased to 0.01. In Figure 6 it can be observed that the coverage probabilities of the interval estimates produced by both interval estimation methods are on average less than the nominal level; the upward and downward spikes show the same pattern. However, the plots of mean interval lengths in Figure 7 indicate that the performance of the Wald method decreases further with decrease in the prevalence rate while that of the Bayesian method is good.

5.1. Results Interpretation

The simulation results demonstrate that Bayesian credible intervals generally outperform Wald confidence intervals, particularly in scenarios with low prevalence rates and smaller pool sizes. This superior performance is attributed to the Bayesian

Figure 6. Plots of coverage probabilities of the 95% credible intervals and 95% Wald confidence intervals when $k = 20$ and $p = 0.01$ .

Figure 7. Plots of mean interval lengths of the 95% credible intervals and 95% Wald confidence intervals when $k = 20$ and $p = 0.01$ .

method’s ability to incorporate prior information and its robustness to variability, making it less sensitive to small sample sizes and low prevalence rates. In contrast, the Wald method relies on normal approximation, which becomes unreliable when the prevalence rate is close to zero or the sample size is limited. Consequently, the Wald intervals tend to be wider and less accurate under such conditions, whereas Bayesian intervals maintain consistent coverage probabilities close to the nominal level and have narrower interval widths.

However, in cases where the prevalence rate and sample size are relatively large, both methods tend to perform similarly, as the Wald method’s reliance on asymptotic normality becomes more reliable. Despite this, the Bayesian approach still provides a slight advantage by yielding more stable and efficient interval estimates.

5.2. Practical Implications

The findings have significant implications for public health decision-making. Accurate prevalence estimation is crucial in managing infectious disease outbreaks and allocating healthcare resources efficiently. The use of Bayesian credible intervals allows for more precise estimation, especially in low-prevalence settings, enabling policymakers to make data-driven decisions with greater confidence. Furthermore, the ability to incorporate prior information can enhance predictive accuracy when prior data or expert knowledge is available. In contrast, reliance on the Wald method in such scenarios might lead to overestimating uncertainty, resulting in either excessive caution or insufficient intervention measures. Adopting Bayesian methods can therefore optimize decision-making processes, particularly in surveillance and screening programs.

6. Conclusions

The results of the above simulation study indicate that the Bayesian interval estimation method produced more accurate interval estimates with coverage probabilities close to the nominal level of 0.95 and short interval widths than the traditional Wald method, for all the cases considered. The Wald confidence intervals had coverage probabilities close to the nominal level for all the cases considered. However, their mean interval widths were large and hence were inaccurate. In Figure 7, the graph of mean interval width for the Wald confidence interval has long spikes implying that as the pool size increases the Wald confidence interval becomes wider and hence inaccurate. On the other hand, the graph of Bayesian interval widths on the same figure has short spikes, indicating the precision of the credible intervals is not significantly affected by increased pool size and hence it is more efficient in estimating prevalence rate. Finally, it can be observed that the accuracy of the interval estimates constructed by both methods is good at lower prevalence rates and pool sizes and deteriorates (although slightly for the Bayesian method) with an increase in the prevalence rate and pool size. This may be due to the fact that the efficiency of pool testing is low at a higher prevalence rate. Also, the accuracy of the Wald method depends on normal approximation to the binomial, which may be poor for all the prevalence rates used in the simulation study. In this paper we assume that both the specificity and sensitivity of the testing kit are equal and fixed at 0.95. However, they can be varied in a future study to investigate their effects on the efficiency of the two interval estimation methods for the prevalence rate of a rare trait.

This study demonstrates that the Bayesian interval estimation method provides more accurate and efficient interval estimates for prevalence rates compared to the traditional Wald method. The Bayesian approach consistently achieves coverage probabilities closer to the nominal level, even when prevalence rates are low or sample sizes are small. In contrast, the Wald method often produces wider intervals and lower coverage probabilities under similar conditions due to its reliance on normal approximation. The practical implications of these findings are significant for public health decision-making. Accurate prevalence estimation enables better resource allocation, outbreak management, and screening program optimization. By incorporating prior information and maintaining robustness to variability, Bayesian methods enhance the reliability of prevalence estimates, particularly in low-prevalence scenarios. Future research should investigate the impact of varying specificity and sensitivity, and the number of pools on interval estimation efficiency. Adjusting these parameters could provide further insights into optimizing the Bayesian approach for different diagnostic contexts. Our next research project is to construct Likelihood confidence intervals for the prevalence rate in pool testing with retesting design and compare them with the Bayes credible intervals using simulation.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Dorfman, R. (1943) The Detection of Defective Members of Large Populations. The Annals of Mathematical Statistics, 14, 436-440. https://doi.org/10.1214/aoms/1177731363
[2]	Spiegelhalter, D.J. and Best, N.G. (2003) Bayesian Approaches to Multiple Sources of Evidence and Uncertainty in Complex Cost‐Effectiveness Modelling. Statistics in Medicine, 22, 3687-3709. https://doi.org/10.1002/sim.1586
[3]	Tamba, C.L., and Nandelenga, M.W. (2014) Computation of Moments in Group Testing with Retesting and with Errors in Inspection. International Journal of Contemporary Advanced Mathematics (IJCM), 3, 1-15.
[4]	Nyongesa, L.K. (2017) Pool Testing Algorithm for Estimating Prevalence with Im-Perfect Test. International Journal of Statistics and Systems, 12, 823-830.
[5]	Liu, A. and Liu, C. (2012) Bayesian Group Testing for the Estimation of the Prevalence Rate. Journal of Statistical Planning and Inference, 142, 750-763.
[6]	Hepworth, G. (2005) Confidence Intervals for Proportions Estimated by Group Testing with Groups of Unequal Size. Journal of Agricultural, Biological, and Environmental Statistics, 10, 478-497. https://doi.org/10.1198/108571105x81698
[7]	Orawo, L.A. (2021) Confidence Intervals for the Binomial Proportion: A Comparison of Four Methods. Open Journal of Statistics, 11, 806-816. https://doi.org/10.4236/ojs.2021.115047
[8]	Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A. and Rubin, D.B. (2013) Bayesian Data Analysis. 3rd Edition, CRC Press.
[9]	Brett, T.S., Rohani, P., and Drake, J.M. (2018) Anticipating Epidemic Transitions with Imperfect Data. PLOS Computational Biology, 14, e1006204.
[10]	McDonald, J.L. and Hodgson, D.J. (2018) Prior Precision, Prior Accuracy, and the Estimation of Disease Prevalence Using Imperfect Diagnostic Tests. Frontiers in Veterinary Science, 5, Article 83. https://doi.org/10.3389/fvets.2018.00083
[11]	Helman, S.K., Thompson, R.A. and Fox, S.E. (2020) Bayesian Latent Class Analysis to Estimate Disease Prevalence and Diagnostic Test Accuracy in Wildlife Populations. Preventive Veterinary Medicine, 180, Article 105030.
[12]	Biggerstaff, B.J. (2008) Confidence Intervals for the Difference of Two Proportions Estimated from Pooled Samples. Journal of Agricultural, Biological, and Environmental Statistics, 13, 478-496. https://doi.org/10.1198/108571108x379055

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies