Applied Mathematics

Volume 3, Issue 12 (December 2012)

ISSN Print: 2152-7385   ISSN Online: 2152-7393

Google-based Impact Factor: 0.58  Citations  

Dirichlet Compound Multinomials Statistical Models

HTML  Download Download as PDF (Size: 363KB)  PP. 2089-2097  
DOI: 10.4236/am.2012.312A288    6,179 Downloads   8,444 Views  Citations

ABSTRACT

This contribution deals with a generative approach for the analysis of textual data. Instead of creating heuristic rules forthe representation of documents and word counts, we employ a distribution able to model words along texts considering different topics. In this regard, following Minka proposal (2003), we implement a Dirichlet Compound Multinomial (DCM) distribution, then we propose an extension called sbDCM that takes explicitly into account the different latent topics that compound the document. We follow two alternative approaches: on one hand the topics can be unknown, thus to be estimated on the basis of the data, on the other hand topics are determined in advance on the basis of a predefined ontological schema. The two possible approaches are assessed on the basis of real data.

Share and Cite:

Cerchiello, P. and Giudici, P. (2012) Dirichlet Compound Multinomials Statistical Models. Applied Mathematics, 3, 2089-2097. doi: 10.4236/am.2012.312A288.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.