Dirichlet Compound Multinomials Statistical Models

HTML  Download Download as PDF (Size: 363KB)  PP. 2089-2097  
DOI: 10.4236/am.2012.312A288    6,146 Downloads   8,670 Views  Citations

ABSTRACT

This contribution deals with a generative approach for the analysis of textual data. Instead of creating heuristic rules forthe representation of documents and word counts, we employ a distribution able to model words along texts considering different topics. In this regard, following Minka proposal (2003), we implement a Dirichlet Compound Multinomial (DCM) distribution, then we propose an extension called sbDCM that takes explicitly into account the different latent topics that compound the document. We follow two alternative approaches: on one hand the topics can be unknown, thus to be estimated on the basis of the data, on the other hand topics are determined in advance on the basis of a predefined ontological schema. The two possible approaches are assessed on the basis of real data.

Share and Cite:

P. Cerchiello and P. Giudici, "Dirichlet Compound Multinomials Statistical Models," Applied Mathematics, Vol. 3 No. 12A, 2012, pp. 2089-2097. doi: 10.4236/am.2012.312A288.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.