TITLE:
Extraction of Information from Crowdsourcing: Experimental Test Employing Bayesian, Maximum Likelihood, and Maximum Entropy Methods
AUTHORS:
M. P. Silverman
KEYWORDS:
Crowdsourcing, Bayesian Priors, Maximum Likelihood, Principle of Maximum Entropy, Parameter Estimation, Log-Normal Distribution
JOURNAL NAME:
Open Journal of Statistics,
Vol.9 No.5,
October
24,
2019
ABSTRACT: A
crowdsourcing experiment in which viewers (the “crowd”) of a British
Broadcasting Corporation (BBC) television show submitted estimates of the
number of coins in a tumbler was shown in an antecedent paper (Part 1) to
follow a log-normal distribution ∧(m,s2). The coin-estimation experiment is an archetype of a broad class of
image analysis and object counting problems suitable for solution by
crowdsourcing. The objective of the current paper (Part 2) is to
determine the location and scale parameters (m,s) of ∧(m,s2) by both Bayesian and maximum likelihood (ML) methods and to compare the
results. One outcome of the analysis is the resolution, by means of Jeffreys’
rule, of questions regarding the appropriate Bayesian prior. It is shown that
Bayesian and ML analyses lead to the same expression for the location
parameter, but different expressions for the scale parameter, which become
identical in the limit of an infinite sample size. A
second outcome of the analysis concerns use of the sample mean as the measure of information of the crowd in applications
where the distribution of responses is not sought or known. In the
coin-estimation experiment, the sample mean was found to differ widely
from the mean number of coins calculated from ∧(m,s2). This discordance raises critical questions concerning whether, and
under what conditions, the sample mean provides a reliable measure of the
information of the crowd. This paper resolves that problem by use of the
principle of maximum entropy (PME). The PME yields a set of equations for
finding the most probable distribution consistent with given prior information
and only that information. If there
is no solution to the PME equations for a specified sample mean and sample
variance, then the sample mean is an unreliable statistic, since no measure can
be assigned to its uncertainty. Parts 1 and 2 together demonstrate that the
information content of crowdsourcing resides
in the distribution of responses (very
often log-normal in form), which can be obtained empirically or by appropriate modeling.