Crowdsourced Sampling of a Composite Random Variable: Analysis, Simulation, and Experimental Test ()
ABSTRACT
A
composite random variable is a product (or sum of products) of statistically
distributed quantities. Such a variable can represent the solution to a multi-factor
quantitative problem submitted to a large, diverse, independent, anonymous
group of non-expert respondents (the “crowd”). The objective of this research
is to examine the statistical distribution of solutions from a large crowd to a
quantitative problem involving image analysis and object counting. Theoretical
analysis by the author, covering a range of conditions and types of factor
variables, predicts that composite random variables are distributed
log-normally to an excellent approximation. If the factors in a problem are
themselves distributed log-normally, then their product is rigorously
log-normal. A crowdsourcing experiment devised by the author and implemented
with the assistance of a BBC (British Broadcasting Corporation) television
show, yielded a sample of approximately 2000 responses consistent with a
log-normal distribution. The sample mean was within ~12% of the true count.
However, a Monte Carlo simulation (MCS) of the experiment, employing either
normal or log-normal random variables as factors to model the processes by
which a crowd of 1 million might arrive at their estimates, resulted in a
visually perfect log-normal distribution with a mean response within ~5% of the
true count. The results of this research suggest that a well-modeled MCS, by
simulating a sample of responses from a large, rational, and incentivized
crowd, can provide a more accurate solution to a quantitative problem than
might be attainable by direct sampling of a smaller crowd or an uninformed
crowd, irrespective of size, that guesses randomly.