Dirichlet Compound Multinomials Statistical Models ()
Affiliation(s)
ABSTRACT
This contribution deals with a generative approach for the analysis of textual data. Instead of creating heuristic rules forthe representation of documents and word counts, we employ a distribution able to model words along texts considering different topics. In this regard, following Minka proposal (2003), we implement a Dirichlet Compound Multinomial (DCM) distribution, then we propose an extension called sbDCM that takes explicitly into account the different latent topics that compound the document. We follow two alternative approaches: on one hand the topics can be unknown, thus to be estimated on the basis of the data, on the other hand topics are determined in advance on the basis of a predefined ontological schema. The two possible approaches are assessed on the basis of real data.
KEYWORDS
Share and Cite:
Copyright © 2024 by authors and Scientific Research Publishing Inc.
This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.