Sample Size Affect Ethnobotanical Index Values: Bootstrap as a Remedial Approach ()
1. Introduction
The concept of quantitative ethnobotany is relatively new and was introduced in 1987 by Prance and coworkers [1]. Quantitative ethnobotany may be defined as “the application of quantitative techniques to the direct analysis of contemporary plant use data” [2] [3]. Quantification and associated hypothesis testing help to generate quality information, which in turn contributes substantially to resource conservation and development [4]. Quantitative ethnobotany is very useful for making decisions about the sustainable use of plant resources [5] [6]. Recent studies have increasingly analyzed ethnobotanical aspects of plants with the use of ecological indices. The use of ecological indices in ethnobotany dates back to 1980 [7] with the paper of Phillips and Gentry on cultural value indices [2] [3]. Since the introduction of these indices, other studies have proposed various ethnobotanical indices often for the same purpose thus creating an embarrassment of choice and a confusion for the users [8] [9] [10] [11] [12].
Despite their widespread usage, these ethnobotanical indices may be abused. Like ecological indices, ethnobotanical indices are statistics calculated from a sampled data from a population and used to make conclusions about the population [13]. In most cases, scientists decide the sample size using an acceptable sampling technique. However, the values of the indices calculated can depend on the sample size, and in this case the conclusion about the population can be biased. This is a major problem in quantitative ethnobotany needing solutions.
In our knowledge, there are very few studies dealing with this research gap in quantitative ethnobotany. Begossi evaluated the sampling effort on the diversity of the plants used by local people [14]. This author examined how many informants should be enough to analyze the diversity of plants used in a population. He found that, plants diversity varies according to the sample size considered.
The present study proposes bootstrapping as an approach to compute stable indices in quantitative ethnobotany. Bootstrapping, first introduced by Efron and Tibshirani, is an accepted and widely used technique for deriving a larger sample size [15]. The method’s general concept is to re-sample the original sample taken from the population a large number of times, with statistical inference based on the effects of the samples thus collected. The data used for the simulation of the ethnobotanical indices in the present study are relative to the uses of Jatropha curcas L. by the local people of Benin (West Africa). According to Maes et al., J. curcasis the most promising species as a sustainable option for biofuels [16]. It is a shrub of the family of Euphorbiaceae, native to Central America. In Benin, J. curcasis found throughout the country [17]. It produces seeds containing up to 35% oil easily convertible into biodiesel [18]. This oil has been found to be useful for medicinal and veterinary uses as an insecticide, and for soap production [19]. The other organs of the species are used in the pharmacopoeia due to their many therapeutic virtues [17]. This study aimed at: i) evaluating the effect of sampling effort on index values in quantitative ethnobotany, and ii) using the most stable indices to analyze the sociolinguistic differences in use value of J. curcas.
2. Material and Methods
2.1. Material
2.1.1. Theory of Bootstrap
The general principle of bootstrap method is the resampling of a large number of times (n) a sample initially taken from a population [20]. Parameter estimation and statistical inference are then based on the results of the bootstrapped samples obtained.
The best known application of the bootstrap is to estimating the mean, µ say, of a population with distribution function F, from data drawn by sampling randomly from that population. The mean is:
(1)
The sample mean is the same functional of the empirical distribution function, i.e. of
(2)
where
denote the data. Therefore the bootstrap estimator of the population mean, μ, is the sample mean,
:
(3)
Likewise, the bootstrap estimator of a population variance is the corresponding sample variance; the bootstrap estimator of a population correlation coefficient is the corresponding empirical correlation coefficient; and so on.
More generally, if
denotes the true value of a parameter, where θ is a functional, then
(4)
is the bootstrap estimator of
.
Parametric (simulation), semi-parametric (adding noise) and nonparametric bootstrap (resampling) are the three variants of bootstrap. These variants differ in the way the sample is drawn from the population data. In contrast to nonparametric and semi-parametric bootstraps, the parametric bootstrap assumes that the data comes from a known distribution of unknown parameters (for example, data may come from Poisson or negative binomial distributions for counts, or normal distribution for continuous characters). Three methods of bootstrap are used to draw sampling from population: Ordinary, Balanced, and Moving Block Bootstraps. In ordinary bootstrap, the bootstrap samples are drawn by a simple random sample, with replacement, from the values in the approximated population. The number of bootstrap for accurate estimation of the standard errors varies from 50 to 1000 while 1000 or more bootstrap samples are recommended to calculate the confidence intervals [15]. In this study, 1000 was considered appropriate to obtain desired results.
2.1.2. Framework of the Bootstrap
Many statistical problems can be represented as follows: given a functional
from a class
, we wish to determine the value of a parameter t that solves the population equation,
(5)
where
denotes the population distribution function and
is the empirical distribution function, computed from the sample
.
The basic steps in the bootstrap procedure are [15]:
· Step 1. Construct an empirical probability distribution,
, from the sample by placing a probability of 1/n at each observed value
of the sample. This is the empirical distribution function of the sample, which is the nonparametric maximum likelihood estimate of the population distribution, F.
· Step 2. From the empirical distribution function,
, draw a random sample of size n with replacement. This is a resample.
· Step 3. Calculate the statistic of interest,
, for this resample, yielding
.
· Step 4. Repeat steps 2 and 3 B times, where B is a large number, in order to create B resamples. The practical size of B depends on the tests to be run on the data. Typically, B is at least equal to 1000 when an estimate of confidence interval around
is required as in this study.
· Step 5. Construct the relative frequency histogram from the B number of
by placing a probability of 1/B at each point,
. The distribution obtained is the bootstrapped estimate of the sampling distribution of
. This distribution can now be used to make inferences about the parameter, which is to be estimated by
.
2.1.3. Ethnobotanical Indices
According to Phillips, quantitative ethnobotany is the application of quantitative techniques to analyze plant use data [21]. These methods can be used to answer the question: how important is species, or plants to people? Quantitative ethnobotany gives a response on the Relative Cultural Importance (RCI) of plants and vegetation to people [22]. The most used technique by ethnobotanists to assess the RCI of plants is the calculation of an ethnobotanical index. In function of RCI, ethnobotanical indices can be grouped and divided into three categories [3]: Informant consensus, subjective allocation, and uses totaled. The Informant consensus approach is the most used in quantitative ethnobotany [23] [24]. The indices considered in this study are selected from the informant consensus approach (Table 1). They are defined to determine the number of interviewees who use a given species and the distribution of the use among the interviewees (ID), to measures the degree of homogeneity of the interviewee’s knowledge (IE), and to measures the importance of the use categories of the species and their contribution to the total use value (UE).
2.1.4. Case Study
In total, 44 localities were surveyed at a rate of four per Department of Benin except the Department of Littoral where the agricultural production systems are almost non-existent. The sample size (704) for this study was calculated using the standard approximation of the binomial distribution [26], with a margin error of 3% and a proportion of individuals who had previously used the organs of J. curcas of 79.17% (obtained from an exploratory study). So, ethnobotanical surveys were conducted among 704 respondents (individual). The respondents were randomly selected from each locality and the data collected from each respondent concerned the use forms of J. curcas.
2.2. Methods
2.2.1. Simulation Approach
The sampling distribution of each ethnobotanical index (Table 1) was studied by simulation using Matlabv: 2011. The initial sample was a binary matrix of 704 lines representing the respondents and 10 columns representing the different use forms. The knowledge of a given use form was coded 1 and 0 otherwise. The algorithm for characterizing the distribution of the ethnobotanical indices considered is illustrated below:
Step 1. From the initial sample x of 704 interviewees, one bootstrapped sample
, of size n, was obtained by simple random sample with replacement;
Step 2. Compute four ethnobotanical indices bootstrapped ID*, IE*, UD* et UE*;
Step 3. Step 1 and 2 are repeated 1000 times and the average values of the four ethnobotanical indices were computed;
Step 4. Step 1 to 3 were repeated for 200 different values of sample size n between 50 and 10,000 in increments of 50;
Step 5. Computed position (mean, median, 1st quartile, 3rd quartile, minimum, maximum) and dispersion (standard deviation) parameters for the 200 values of
Table 1. Ethnobotanical indices computed.
each ethnobotanical index to characterize their empirical distribution; frequency histograms of index values were also constructed;
Step 6. For each index, a trend line was constructed by considering the 200 index values as a function of the n-sample sizes considered.
2.2.2. Application to Assess the Ethnic Differences in Use Value of J. curcas
The 704 respondents were classified per sociolinguistic group (Bariba, Dendi, Fon, Haoussa, Lokpa, Nago, Otamari, Peulh, Wama), age group, and gender. Young people were those under 30 years, adults were those between 30 and 60 years, and old people were those over 60 years. For each category related to age, gender or sociolinguistic group, the sample was scaled 1000 times with the same size of 100 to eliminate the effect of sample size on the index values. The different categories were compared using permutation variance analysis [27].
3. Results and Discussion
3.1. Simulation Results
3.1.1. Effect of Variation in Sample Size on the Ethnobotanical Index Values
The local people interviewed listed a total of ten uses for J. curcas, with medicinal and bioenergy uses predominating (Table 2). The characteristics of the sampling distribution for the ethnobotanical indices are presented in Table 3. Figure 1 presents the simulated population values of the four ethnobotanical indices considered. This figure shows that use diversity value (UD) (Figure 1(c)) and use equitability value (UE) (Figure 1(d)) varied very little in function of the number of respondents, with average values of 1.82 and 0.55, respectively. The interviewee’s diversity (ID) (Figure 1(a)) and equitability (IE) (Figure 1(b)) beliefs, on the other hand, varied significantly depending on the number of respondents considered. The interviewee’s diversity (ID) (Figure 1(a)) values ranged
Table 2. Ethnobotanical uses of J. curcas.
*Total number of citations for each use divided by the total number of citations for all uses.
Table 3. Empirical parameters of the distribution of ethnobotanical indices related to J. curcas.
Figure 1. Simulated populations values of the 4 ethnobotanical indices considered.
from 5 to 13 on a scale of 1 to 13. The largest values of these indices are also those with the highest frequencies in the simulated population (Figure 1). ID (Figure 1(a)) was more sensitive to the sample size used, enabling only comparisons of various cases with the same number of respondents, partly supporting Reyes-García et al. findings [11]. UD (Figure 1(c)) and UE (Figure 1(d)), on the other hand, are less affected by sample size than ID (Figure 1(a)). These findings back up previous findings in species ecology. In fact, Pielou’s fairness in measuring the homogeneity of the distribution of species in a population does not depend on the number of species [28]. Furthermore, UD (Figure 1(c)) discovered to be more sensitive to the number of uses considered than to the size of the respondents, which explains its relative low variance as the size of respondents increases.
3.1.2. Determination of the Minimal Sample Size for the Stability of the Indices
Figure 2 presents the evolutionary trend of the four ethnobotanical indices considered in function of the samples size. The value of the interviewee’s diversity (ID) (Figure 2(a)) increases steadily with the sample size while the other three indices have a plateau starting from characteristic values. For interviewee equitability (IE) (Figure 2(c)), the maximum value (0.99) was noted around 1000 interviewees (greater than the study sample size) while for the use diversity (UD) (Figure 2(b)), the maximum value of 1.82 was obtained for a sample size of 750 (slightly higher than the study sample size of 704). The same trend was noted for the use equitability (UE) (Figure 2(d)).
3.2. Application Results: Ethnic Differences in Use Value of J. curcas
The analysis of variance with permutation showed significant difference between age-sex categories for all indices (P value < 0.05; Table 4). A high difference of knowledge was noted for adult people (man and women) compared to other age-sex groups (Table 4). The knowledge of adults on J. curcas was more homogeneous than the other age sex groups (IE = 0.98; see Table 1 for interpretation of the indices). Considering the UE index, we noted that there is a better homogeneity of knowledge about use categories of J. curcas among old people, in particular old women (UE = 0.716).
Figure 2. Evolutionary trend of ethnobotanical indices as a function of sample size.
There was significant difference between socio-cultural groups for all indices (Pvalue < 0.05; Table 5). We noted a high diversity of knowledge for Fon compared to other socio-cultural groups. Moreover, the knowledge of Nago on J. curcas was more homogeneous than the other socio-cultural groups (IE = 0.981).
Table 4. Mean (m), standard of deviation (s) and probability values of ethnobotanical indices in function of gender.
ID = interviewee diversity index; IE = interviewee equitability index; UD = use diversity index; UE = use equitability index. AW: Adult women; AM: Adult man; YM: Young man; YW: Youn women; OW: Odl women; OM: old man.
Table 5. Mean (m), standard of deviation (s) and probability values of ethnobotanical indices in function of socio-cultural groups.
ID = interviewee diversity index; IE = interviewee equitability index; UD = use diversity index; UE = use equitability index.
Regarding the use diversity and the use equitability, the highest values of the indices were obtained by the Wama socio-cultural groups. The people knowledge on the uses of J. curcas is very homogeneous (IE index close to 1) and can be explained by the fact that the uses of the plant are very little diversified and are transmitted from several generations. As a result, the new generations share the same knowledge about the plant. Moreover, the present study revealed that the index of interviewee’s diversity (ID) linked to J. curcas varies significantly according to age-sex and of the interviewed, unlike the results obtained with Vitex doniana [29], and also varies between the sociolinguistic groups of the people surveyed as for Parkia biglobosa [30] and Adansonia digitata [31].
4. Conclusion
This study shows that the bootstrap is one statistical tool that can help avoid the effect of sample size on the estimation of ethnobotanical indices. The high the sample size, the more accurate and stable the estimation of ethnobotanical indices. We recommend that scientists consider the results of this study to better appreciate the cultural importance of plant species.
Acknowledgements
The authors would like to thank Mr Gai Alier John Makuei for proof reading this manuscript. They also appreciate the efforts of the journal’s amiable editorial team and its paper referee and presenter.