Analysis of a Two-Stage Adaptive Negative Binomial Group Testing Model for Estimating Prevalence of a Rare Trait

Abstract

Group testing is an efficient method for classifying observations and estimating trait prevalence in a population. However, using appropriate group sizes is crucial for maximizing its benefits. Adaptive schemes have been developed to address improper group size selection issues. Existing adaptive schemes are based on a Binomial sampling model, requiring testing of all groups before recording successes. In certain scenarios, like infectious diseases, immediate reporting of estimates upon detection is necessary. A two-stage adaptive Negative Binomial group testing model for such cases was constructed. This adaptive model adjusted group sizes based on estimates from previous stages thus using optimal sizes to minimize the mean squared error and variance of the prevalence rate estimate. The maximum likelihood estimation method was employed to find the model’s parameter estimate, and its properties were also investigated. The comparative analysis highlighted the superiority of the adaptive model over the non-adaptive model especially under low prevalence emphasizing the importance of incorporating adaptivity in group testing procedures, particularly in disease screening and surveillance, such as for COVID-19.

Share and Cite:

Akomboh, J., Waliaula, W.R., Tamba, C. and Okenye, J.O. (2023) Analysis of a Two-Stage Adaptive Negative Binomial Group Testing Model for Estimating Prevalence of a Rare Trait. Open Access Library Journal, 10, 1-14. doi: 10.4236/oalib.1110960.

1. Introduction

Group testing, also known as pool testing or batch testing, is a method that involves combining individuals into various pools and conducting tests on these pools simultaneously. The tests are used to detect the presence of infections or defects, as seen in epidemiological studies. The idea was first introduced by Dorfman in 1943 to improve cost-saving techniques in detecting soldiers with syphilis [1] . Since Dorfman’s pioneering work, group testing has been applied in various fields, including epidemiology, quality control, and genetics, with two main objectives including classification [2] [3] and estimation of prevalence [4] . The classification or identification of individuals as either positive or negative of a trait serves as the first objective of Group testing championed by [1] . To enhance cost-effectiveness and reduce the required tests, an adaptation of the Dorfman testing scheme has been investigated and extended to incorporate multi-stage testing [5] . Apart from classification problems, group testing has also been used extensively in the estimation problem of the prevalence rate of a trait. This is the second objective of group testing pioneered by [6] and served as the main focus of this study. In his study, he used the maximum likelihood estimation method, which was found to be reliable when the population was small [7] . Extended the estimation work using the MLE method and incorporating testing errors to account for real practical situations where errors are bound to happen during testing procedure.

Subsequent studies on estimation focused on design matters especially based on selection of group sizes used in group testing procedures [8] . The main aim of putting more emphasis on selection of group sizes was to reduce the chances of obtaining all negative or all positive groups during testing. Another aim was to minimize MSE by incorporating prior information in choosing k [9] . All these studies were carried out using the Binomial sampling model [10] . Later suggested the use of Negative Binomial sampling for estimating efficient p when the prevalence rate is small. Point and interval estimation has been explored under the negative binomial model using equal group sizes and has been found to be efficient in surveillance cases where quick response is desired [11] . Recent works that have considered inverse binomial models include estimation using the Bayesian approach as well as confidence interval estimation [12] .

Group testing can be categorized into two forms: non-adaptive and adaptive group testing schemes. Non-adaptive group testing involves testing groups of a fixed size to obtain dichotomous results. On the other hand, adaptive group testing adjusts the group sizes from one stage to the next, allowing for more flexibility and efficiency [13] . Adaptive estimators have been developed to improve efficiency by reducing the Mean Squared Error under the binomial model [9] . Other probabilistic models, such as the beta-binomial, geometric, and hyper-geometric models, have also been considered [14] . For urgent situations where the prevalence rate of a trait needs to be estimated quickly, the Negative Binomial sampling model has been found to be preferable over the Binomial model [11] . The Negative Binomial model has been applied in emergency situations like disease outbreaks and natural disasters to measure risk promptly. Despite the benefits of group testing, its success depends on choosing appropriate group sizes. Inadequate group size selection can lead to deficiencies in the procedure. To address these deficiencies, this study aimed to investigate a two-stage adaptive Negative Binomial group testing procedure for estimating the prevalence rate of a rare trait.

2. Methodology

2.1. The Model

The technique known as Negative Binomial sampling holds significant importance in the context of biological sample collection. If the proportion of individuals possessing a specific character trait is denoted as p, and sampling continues until a predetermined number, such as x individuals, is observed, the distribution of the sampled individuals follows a negative binomial distribution. Combining Negative Binomial sampling with group testing provides an attractive approach for delivering early estimates during the screening process.

This model operates under the assumption that the number of pools having a trait of interest is pre-established, and the testing process continues until the desired number of positive pools is identified.

The adaptive scheme involved testing groups in stages and adjusting the group sizes from one stage to the next. The group size used at a stage depended entirely on the outcome of the preceding stages. This implies that k’s were determined sequentially as the experiment progressed. The value of k1 is determined by optimizing the variance of the estimator obtained from the non-adaptive Negative Binomial group testing scheme, which serves as prior information. In Stage One, the number of positive pools to be observed X1 is fixed, and T1 the number of groups to be tested before observing X1 positive pools follows a Negative Binomial distribution.

In Stage Two, the group size k2 is constructed by minimizing the variance of the estimator obtained in Stage One.

k 2 = arg min k [ var ( p ^ 1 ) ] p = p ^ 1 (1)

where k2 was the value which minimizes the variance of p ^ 1 which is the estimator obtained in stage 1. The goal is to select the group size that minimizes the variance of the estimator from Stage One. This approach is crucial for enhancing the precision and accuracy of the overall estimation procedure. By minimizing the variance, the estimation process aims to achieve a more reliable and stable outcome, contributing to the effectiveness of the adaptive estimation model.

The number of positive pools to be observed in Stage Two X2 is fixed, and the number of groups to be tested T2 to achieve X2 positive pools follows a Negative Binomial distribution, which depends on the output of Stage One. This adaptive model ensures an efficient allocation of resources and provides a robust strategy for large-scale screening and identification of positive groups in group testing scenarios.

The derivation presented focused on two models; the usual Negative Binomial model and the proposed two-stage adaptive Negative Binomial group testing model for estimating the prevalence of a rare trait.

Using the usual non-adaptive Negative Binomial to get p ^ N as in Katholi (2006). T follows a Negative Binomial with parameter x and π .

f ( t / p ) = ( t 1 x 1 ) [ π ( p ) ] x [ 1 π ( p ) ] t x (2)

The Likelihood function of Equation (2) is expressed as;

L ( p / t , x ) [ π ( p ) ] x [ 1 π ( p ) ] t x (3)

The log Likelihood function to base 10 is given as;

log L ( p / t , x ) x log [ π ( p ) ] + ( t x ) log [ 1 π ( p ) ] (4)

The maximum likelihood estimator of the non-adaptive model is obtained as the solution to

log L ( . ) ( p ) = 0 (5)

Which is equivalent to;

x π ( p ) t x 1 π ( p ) = 0 (6)

where π = 1 ( 1 p ) k which is the probability of obtaining a positive group. Equation (6) yields the results obtained by Katholi (2006) as;

p ^ N = 1 ( 1 x t ) 1 k (7)

Finding the variance of p ^ N the Cramer Rao Lower Bound was used where the Fisher’s information of the likelihood function was utilized;

Varianceof p ^ N = 1 I ( p ^ N ) (8)

where the Fisher’s information is given as;

I ( p ^ N ) = E [ 2 p 2 log L ( . ) ] 1 (9)

The second derivative of the log likelihood function is given as;

= t ( π ( p ) ) 2 π ( p ) ( 1 π ( p ) ) (10)

It is worth noting that π ( p ) = 1 ( 1 p ) k and π ( p ) = k ( 1 p ) k 1 . Substituting in Equation (10) and taking expectation gives;

= t k 2 ( 1 p ) 2 k 2 1 ( 1 p ) k ( 1 p ) k (11)

The E ( T ) = x π . Thus, taking the expectation and the inverse of Equation (11) will yield

Var ( p ^ 0 ) = 1 ( 1 p ) k t k 2 ( 1 p ) k 2 (12)

The proposed two-stage adaptive model aimed at optimizing resource allocation and estimation efficiency. The study considered two sets of desired positive groups, X1 and X2. The first stage estimator was based on the Negative Binomial distribution with a prior derived from the non-adaptive estimator. In the second stage, the study introduces the two-stage adaptive estimator.

Stage two proceeds by testing groups of size k2 and T2 is the number of the groups to be tested to obtain X2 positive groups. Thus, T2 is conditioned on T1. This follows that T2 has a negative binomial distribution. Specifically,

T 2 / T 1 ~ NegativeBinomial ( X 2 , π 2 / 1 = 1 ( 1 p ) k 2 ( t 1 ) )

Equation (6) gives the joint distribution of T1 and T2 as;

f ( T 2 , T 1 ) = f ( T 2 / T 1 ) × f ( T 1 ) = NegativeBinomial ( X 1 , π = 1 ( 1 p ) k 1 ) × NegativeBinomial ( X 2 , π 2 / 1 = 1 ( 1 p ) k 2 ( t 1 ) ) (13)

The joint distribution of T1 and T2 was used to derive the final two-stage adaptive estimator p ^ 2 and is given as;

f ( t 1 , t 2 ) = ( t 1 1 x 1 1 ) [ 1 ( 1 p ) k 1 ] x 1 ( 1 p ) k 1 ( t 1 x 1 ) × ( t 2 1 x 2 1 ) [ 1 ( 1 p ) k 2 ] x 2 ( 1 p ) k 2 ( t 2 x 2 ) (14)

( t 1 1 x 1 1 ) and ( t 2 1 x 2 1 ) are constants of proportionality thus we replace with giving;

f ( t 1 , t 2 ) [ 1 ( 1 p ) k 1 ] x 1 ( 1 p ) k 1 ( t 1 x 1 ) × [ 1 ( 1 p ) k 2 ] x 2 ( 1 p ) k 2 ( t 2 x 2 ) (15)

The log likelihood function to base 10 for the joint distribution of T1 and T2 was obtained as.

ln L x 1 ln [ 1 ( 1 p ) k 1 ] + k 1 ( t 1 x 1 ) ln ( 1 p ) + x 2 [ 1 ( 1 p ) k 2 ] + k 2 ( t 2 x 2 ) ln ( 1 p ) (16)

The adaptive estimator is obtained by solving Equation (17) iteratively since the solution equated to zero was not in a closed form thus was not tractable.

log L p x 1 π 1 π 1 t 1 x 1 1 π 1 π 1 + x 2 π 2 π 2 t 2 x 2 1 π 2 π 2 (17)

where π 1 = 1 ( 1 p ) k 1 and π 2 = 1 ( 1 p ) k 2 . Also π 1 = k 1 ( 1 p ) k 1 1 and π 2 = k 2 ( 1 p ) k 2 1 .

The variance of the adaptive estimator was derived using Fisher’s information. Where the Fisher’s information is given as;

I = E [ 2 p 2 log L ( . ) ] (18)

Thus, the variance of the adaptive estimator was derived and obtained as;

Var ( p ^ A ) = 1 i = 1 2 x i k i 2 ( 1 p ) 2 k i 2 π i 2 ( 1 π i ) (19)

This variance was used to construct the Wald confidence interval as;

p ^ A ± Z α 2 Var ( p ^ A ) (20)

2.2. Simulation

Data was simulated using an algorithm which mimics the Negative Binomial process as illustrated in Figure 1.

Figure 1. Flow chart for negative binomial group testing simulation.

Steps for Simulation

Step 1: Specify p, k and X then set x = 0.

Step 2: Generate k Bernoulli random variables.

Set Y = (Y1, …, Yk).

Step 3: If the sum of yis is greater than 0. A success is considered. If the sum of yis is less than the group is considered negative.

Step 4: If the success was recorded, repeat the loop if x is not equal to X. If x = X, the procedure stops.

Step 5: Report T and calculate p estimate.

3. Results and Discussion

3.1. Relationship between t, p, and k

The Negative Binomial sampling method used in group testing experiments involves a random number of trials, while the required positive pools and group size are fixed. The number of tests needed to obtain the required positive groups depends on various variables, so it’s important to understand how changing the values of p and k affects the number of testing trials required.

As the probability of success increases in the negative binomial group testing model, the number of trials required to obtain the desired number of positive groups generally decreases as illustrated in Figure 2. Examining the plots, for a fixed value of k, it can be observed that as p increases, the number of trials decreases. This trend holds true across different values of k. This behavior is expected because a higher probability of success implies a greater likelihood of encountering positive groups during testing. Therefore, fewer trials are needed to reach the desired number of positive groups when the probability of success is higher. It is worth noting that increasing the group size from 5 to 100 in the negative binomial group testing model typically leads to a reduction in the number of trials required to achieve the desired number of positive groups as well.

Figure 2. Plots of T versus p for k = 5, 20, 50, 100.

3.2. Adaptive Estimator and Its Properties

The results of the maximum likelihood estimator and its properties including the variance, bias, and mean squared error for the adaptive group testing model are presented in Table 1. The results are organized based on different group sizes and true probabilities while the number of predetermined desired positive groups set at X = 30 as set by [15] .

Table 1. Adaptive estimator with its properties for k = 5, 10, 20, 50, 100 when X = 30.

A scrutiny of Table 1 shows that the estimated probabilities tend to increase as the probability increases, although the increase is generally small. It is important to note that the MLE values of the adaptive model exhibits monotonic behavior as the model dynamically adjusts the group size based on stage one’s outcomes which lead to more consistent and accurate estimations. The MLE generally increases as p increases for all values of k as found by [16] . The variance of the estimated probabilities remains relatively small across different values of p. The bias of the estimation shows negative values indicating a slight underestimation of the true population probability. However, the bias remains relatively small across all values of p. The MSE combines the variance and bias to provide an overall measure of estimation accuracy. The bias remains relatively small and consistent across different values of p indicating the effectiveness of the adaptive group testing model in reducing bias.

3.3. Relationship between p ^ and p

The results presented in this section examine the relationship between the adaptive maximum likelihood estimates and the true probability for different values of group size. The results are represented in four graphs for k = 5, 20, 50, 100. These findings highlighted that the adaptive nature of the model in adjusting the estimations was based on observed outcomes and the varying performance of the adaptive approach across different group sizes.

The relationship was further investigated by plotting the values p ^ against p while varying the waiting parameter X for different values of group size k.

Figure 3 illustrates the relationship between the Maximum Likelihood Estimation values and the true probability for different combinations of X in the adaptive approach. The MLE values generally increase as p increases, as well as with larger values of X and k. However, the rate and pattern of this increase vary depending on the specific combination of X and k. For X = 30, the MLE graph

Figure 3. Plots for Adaptive p ^ versus p for k = 5, 20, 50, 100 and x = 10, 20, 30.

shows a relatively steep curve, indicating that small changes in p lead to noticeable changes in the estimated probabilities of success. For X = 20, the MLE values also increase as p increases, but the curve is relatively flat, suggesting a less pronounced change in the MLE values as p varies. The adaptive MLE in this scenario exhibits a slower rate of increase compared to when X = 30, implying a less sensitive response to changes in p.

Similar patterns are observed for the other combinations of X and k. The adaptive approach tends to provide more conservative estimates with low MLE values. The MLE values increase with increasing p, but the specific patterns and sensitivities depend on the combination of X and k. The adaptive approach consistently provides more conservative estimates with lower MLE values, indicating a cautious approach in estimating the prevalence of the rare trait.

3.4. Relationship between Variance of p ^ and p

We examined the relationship between the variance of the estimated proportion p ^ and the true proportion p in the context of the two-stage adaptive negative binomial model. Figure 4 provides insights into this relationship for different combinations of X and k.

Figure 4 presents the interplay between X, k, and the variance of p ^ in the two-stage adaptive negative binomial model. They highlight the impact of the true proportion p, the desired number of positive groups X, and the group size k on the variability of the estimates. As the true proportion p increases, the variance of the estimated proportion p ^ also tends to increase, although the magnitude of increase varies across different X and k values. This suggests that higher probabilities of success lead to greater variability in the estimates, highlighting the increased uncertainty associated with higher p values. Comparing different X values, we find that as x increases, the variance of p ^ generally tends to increase. This implies that aiming for a higher number of positive groups introduces more variability into the estimates. Analyzing the effect of k, we notice

Figure 4. Variance of p ^ for k = 5, 10, 20, 50 and x = 20, 30.

that for a fixed x value, as k increases, the variance of p ^ tends to decrease. This indicates that larger group sizes result in more precise estimates and lower variability. Larger group sizes provide more information, reducing the sampling error and enhancing the precision of the estimates. The insights obtained from this analysis can inform the selection of appropriate values for X and k to optimize the accuracy and reliability of the model in estimating the proportion of successes in a population.

4. Comparison of the Model

We conducted a model comparison to evaluate the performance of the two-stage adaptive negative binomial group testing model for estimating the prevalence of a rare trait over the non-adaptive model. In this study, we used two statistical measures, Asymptotic Relative Efficiency and the Relative Mean Squared Error, to compare the efficiency and accuracy of the proposed two-stage adaptive negative binomial group testing model with an existing non-adaptive model by [6] .

The estimator of non-adaptive model was denoted by p ^ N since it is developed under the usual Negative Binomial model while the computed estimator was denoted as p ^ A since is developed under the adaptive Negative Binomial model. Then, ARE was obtained as

ARE = Var ( p ^ N ) Var ( p ^ A ) (21)

ARE values of greater than one implied that our model is more efficient than the non-adaptive model. ARE measures how much more efficient the adaptive model is compared to the non-adaptive model as the sample size approaches infinity. Higher ARE values indicate that the adaptive model provides better estimates and inferences. The comparison was done for different combinations of group size k and the desired number of positive groups X.

The study found that as X increased, there was an overall increasing trend in the ARE values, indicating improved performance in detecting the desired outcome (Figure 5). On the other hand, increasing k for a fixed X value led to a decreasing trend in the ARE values, implying that larger group sizes result in better performance in detecting the desired outcome.

The RMSE was used to compare the mean squared errors of the estimators obtained from the constructed adaptive model with the one by [6] . This is a convenient way of comparing the MSE of the estimates obtained using different procedures. It is expected that a good model to produce an estimator with a small MSE. For this study, RMSE was computed as;

RMSE = MSE ( p ^ N ) MSE ( p ^ A ) (22)

The study found that the computed estimator was more efficient as compared to the [10] since the values of RMSE were greater than one (Figure 6). It was worth noting also that as the true probability of success p increases, the RMSE

Figure 5. Plots of ARE versus p for k = 5, 10, 20, 50, 100 and X = 20, 30.

Figure 6. Plots of RMSE versus p for k = 20, 50 and X = 20, 30.

decreases, indicating better fit and improved estimation accuracy at low prevalence. In addition, aiming for a greater number of positive groups and increasing the group size also contributed to higher values of RMSE values, suggesting more accurate estimation and improved model accuracy especially when the prevalence is low.

5. Conclusion and Recommendations

In conclusion, this study successfully achieved its objectives by developing and analyzing a two-stage adaptive negative binomial model in group testing for estimating the prevalence of a rare trait using MLE. The adaptive estimator demonstrated superior performance compared to the non-adaptive estimator, providing more accurate and smaller estimates, while maintaining low variance and bias. The comprehensive simulations further confirmed the superiority of the adaptive model, showing better efficiency, lower mean squared error MSE, and improved fit to the data. The study recommends future research to incorporate imperfect tests in the model to reflect real-world scenarios and evaluate their impact on estimation accuracy. In addition, exploring the extension of the adaptive estimator to multi-stage group testing procedures could enhance the model’s applicability to larger populations and improve logistical considerations for estimation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Dorfman, R. (1943) The Detection of Defective Members of Large Populations. The Annals of Mathematical Statistics, 14, 436-440. https://doi.org/10.1214/aoms/1177731363
[2] Sobel, M. and Elashoff, R.M. (1975) Group Testing with a New Goal, Estimation. Biometrika, 62, 181-193. https://doi.org/10.1093/biomet/62.1.181
[3] Tamba, C.L. (2012) Computational Statistical Model for Group Testing with Retesting. Ph.D. Thesis, Egerton University, Kenya.
[4] Wanyonyi, R.W. (2015) Estimation of Proportion of a Trait by Batch Testing Model in a Quality Control Process. American Journal of Theoretical and Applied Statistics, 4, 610-613. https://doi.org/10.11648/j.ajtas.20150406.34
[5] Bilder, C.R., Tebbs, J.M. and Chen, P. (2010) Informative Retesting. Journal of the American Statistical Association, 105, 942-955. https://doi.org/10.1198/jasa.2010.ap09231
[6] Thompson, K.H. (1962) Estimation of the Proportion of Vectors in a Natural Population of Insects. Biometrics, 18, 568-576. https://doi.org/10.2307/2527902
[7] Nyongesa, L.K. (2004) Testing for the Presence of Disease by Pooling Samples. New Zealand Journal of Statistics, 46, 383-390. https://doi.org/10.1111/j.1467-842X.2004.00337.x
[8] Chiang, C.L. and Reeves, W.C. (1962) Statistical Estimation of Virus Infection Rates in Mosquito Vector Populations. American Journal of Epidemiology, 75, 377-391. https://doi.org/10.1093/oxfordjournals.aje.a120259
[9] Hughes-Oliver, J.M. and Swallow, W.H. (1994) A Two-Stage Adaptive Group-Testing Procedure for Estimating Small Proportions. Journal of the American Statistical Association, 89, 982-993. https://doi.org/10.1080/01621459.1994.10476832
[10] Katholi, C.R. and Unnasch, T.R. (2006) Important Experimental Parameters for Determining Infection Rates in Arthropod Vectors Using Pool Screening Approaches. The American Journal of Tropical Medicine and Hygiene, 74, 779-785. https://doi.org/10.4269/ajtmh.2006.74.779
[11] Pritchard, N.A. and Tebbs, J.M. (2010) Estimating Disease Prevalence Using Inverse Binomial Pooled Testing. Journal of Agricultural, Biological, and Environmental Statistics, 16, 70-87. https://doi.org/10.1007/s13253-010-0036-4
[12] Pritchard, N.A. and Tebbs, J.M. (2011) Bayesian Inference for Disease Prevalence Using Negative Binomial Group Testing. Biometrical Journal, 53, 40-56. https://doi.org/10.1002/bimj.201000148
[13] Okoth, A.W., Nyongesa, L.K. and Kwatch, B.O. (2017) Multi-Stage Adaptive Pool Testing Model with Test Errors; Improved Efficiency. IOSR Journal of Mathematics, 13, 43-55. https://doi.org/10.9790/5728-1301024355
[14] Turechek, W.W. and Madden, L.V. (2003) A Generalized Linear Modeling Approach for Characterizing Disease Incidence in a Spatial Hierarchy. Phytopathology, 93, 458-466. https://doi.org/10.1094/PHYTO.2003.93.4.458
[15] Xiong, W. (2015) The Optimal Group Size Using Inverse Binomial Group Testing Considering Misclassification. Communications in Statistics—Theory and Methods, 45, 4600-4610. https://doi.org/10.1080/03610926.2014.923461
[16] Swallow, W.H. (1985) Group Testing for Estimating Infection Rates and Probabilities of Disease Transmission. Phytopathology, 75, 882. https://doi.org/10.1094/Phyto-75-882

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.