[1]
|
Shannon, C.E. (1948) A Mathematical Theory of Communication. Bell Systems Technical Journal, 27, 379-423, 623-656.
|
[2]
|
Pearl, J. (1988) Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA.
|
[3]
|
Basu, A., Harris, I.R., Hjort, N.L. and Jones, M.C. (1998) Robust and Efficient Estimation by Minimising a Density Power Divergence. Biometrika, 85, 549-559. http://dx.doi.org/10.1093/biomet/85.3.549
|
[4]
|
Beran, R. (1977) Minimum Hellinger Distance Estimates for Parametric Models. Annals of Statistics, 5, 445-463.
http://dx.doi.org/10.1214/aos/1176343842
|
[5]
|
Jaynes, E.T. (1957) Information Theory and Statistical Mechanics. Physical Review, 106, 620-630.
http://dx.doi.org/10.1103/PhysRev.106.620
|
[6]
|
Kullback, S. and Leibler, R.A. (1951) On Information and Sufficiency. Annals of Mathematical Statistics, 22, 79-86.
http://dx.doi.org/10.1214/aoms/1177729694
|
[7]
|
Callen, H.B. (1985) Thermodynamics and an Introduction to Thermostatistics. 2nd Edition, John Wiley & Sons, Hoboken, NJ.
|
[8]
|
Kittel, C. and Kroemer, H. (1980) Thermal Physics. W. H. Freeman, San Francisco, CA.
|
[9]
|
Isozaki, T., Kato, N. and Ueno, M. (2009) “Data Temperature” in Minimum Free Energies for Parameter Learning of Bayesian Networks. International Journal on Artificial Intelligence Tools, 18, 653-671.
http://dx.doi.org/10.1142/S0218213009000342
|
[10]
|
Hofmann, T. (1999) Probabilistic Latent Semantic Analysis. Proceedings of Conference on Uncertainty in Artificial Intelligence (UAI-99), Stockholm, 30 July-1 August 1999, 289-296.
|
[11]
|
LeCun, Y. and Huang, F.J. (2005) Loss Functions for Discriminative Training of Energy-Based Models. Proceedings of International Workshop on Artificial Intelligence and Statistics (AISTATS-05), Barbados, 6-8 January 2005, 206-213.
|
[12]
|
Pereira, F., Tishby, N. and Lee, L. (1993) Distributional Clustering of English Words. In: Proceedings of Annual Meeting on Association for Computational Linguistics (ACL-93), Association for Computational Linguistics, Stroudsburg, 183-190. http://dx.doi.org/10.3115/981574.981598
|
[13]
|
Ueda, N. and Nakano, R. (1995) Deterministic Annealing Variant of the EM Algorithm. Proceedings of Advances in Neural Information Processing Systems 7 (NIPS 7), Denver, 29 November-1 December 1994, 545-552.
|
[14]
|
Watanabe, K., Shiga, M. and Watanabe, S. (2009) Upper Bound for Variational Free Energy of Bayesian Networks. Machine Learning, 75, 199-215. http://dx.doi.org/10.1007/s10994-008-5099-x
|
[15]
|
Jones, M.C., Hjort, N.L., Harris, I.R. and Basu, A. (2001) A Comparison of Related Density-Based Minimum Divergence Estimators. Biometrika, 88, 865-873. http://dx.doi.org/10.1093/biomet/88.3.865
|
[16]
|
Windham, M.P. (1995) Robustifying Model Fitting. Journal of the Royal Statistical Society B, 57, 599-609.
|
[17]
|
Pöschel, T., Ebeling, W., Frömmel, C. and Ramírez, R. (2003) Correction Algorithm for Finite Sample Statistics. The European Physical Journal E, 12, 531-541. http://dx.doi.org/10.1140/epje/e2004-00025-4
|