Modeling the Browsing Behavior of World Wide Web Users


The World Wide Web is essential to general public nowadays. From a data analysis viewpoint, it provides rich opportunities to gather observational data on a large-scale. This paper focuses on modeling the behavior of visitors to an academic website. Although the conventional probability models, which were used by other literature for fitting in a commercial web site, capture the power law behavior in our data, they fail to capture other important features like the long tail. We propose a new model based on the identities of the users. Qualitative and quantitative tests, which are used for comparing the model fitting to our data, show that the new model outperforms other two conventional probability models.

Share and Cite:

F. Phoa and J. Sanchez, "Modeling the Browsing Behavior of World Wide Web Users," Open Journal of Statistics, Vol. 3 No. 2, 2013, pp. 145-154. doi: 10.4236/ojs.2013.32016.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] J. Srivastava, R. Cooley, D. Mujund and P. N. Tan, “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data,” SIGKDD Explorations, Vol. 1, No. 2, 2000, pp. 12-23. doi:10.1145/846183.846188
[2] J. Sanchez and Y. He, “Internet Data Analysis for the Undergraduate Statistics Curriculum,” Journal of Statistics Education, Vol. 13, No. 3, 2005, pp. 1-20.
[3] M. Eirinaki and M. Vazirgiannis, “Web Mining for Web personalization,” ACM Transactions on Internet Technology, Vol. 3, No. 1, 2003, pp. 1-27. doi:10.1145/643477.643478
[4] S. Park, N. C. Suresh and B. Jeong, “Sequence-Based Clustering for Web Usage Mining: A New Experimental Framework and ANN-Enhanced K-Means Algorithm,” Data & Knowledge Engineering, Vol. 65, No. 3, 2008, pp. 512-543. doi:10.1016/j.datak.2008.01.002
[5] J. G. Dias and J. K. Vermunt, “Latent Class Modeling of Website Users’ Search Patterns: Implications for Online Market Segmentation,” Journal of Retailing and Consumer Services, Vol. 14, No. 6, 2007, pp. 359-368. doi:10.1016/j.jretconser.2007.02.007
[6] P. Baldi, P. Frasconi and P. Smyth, “Modeling the Internet and the Web: Probabilistic Methods and Algorithms,” John Wiley and Sons Ltd., Hoboken, 2003.
[7] R. Sen and M. Hansen, “Predicting Web Users Next Access Based on Log Data,” Journal of Computational and Graphical Statistics, Vol. 12, No. 1, 2003, pp. 143-155.
[8] I. Cadez, D. Heckerman, C. Meek, P. Smyth and S. White, “Model-Based Clustering and Visualization of Navigation Patterns on a Web Site,” Journal of Data Mining and Knowledge Discovery, Vol. 7, No. 4, 2003, pp. 399-424. doi:10.1023/A:1024992613384
[9] B. A. Huberman, P. L. T. Pirolli, J. E. Pitkow and R. M. Lukose, “Strong Regularities in World Wide Web Surfing,” Science, Vol. 280, No. 3, 1998, pp. 95-97. doi:10.1126/science.280.5360.95
[10] D. Heckerman, “The UCI KDD Archive,” Department of Information and Computer Science, University of California, Oakland, 2013.
[11] J. Eason and J. Johannesen, “Meaningful Data from Web Logs,” Proceedings of the Twenty-Ninth Annual SAS Users Group International Conference (SUGI 29), SAS Institute Inc., Cary, 2004.
[12] J. Callender, “Perl for Web Site Management,” O’Reilly, Sebastopol, 2001.
[13] “Robots Database,” 2008.
[14] I. M. Chakravarti, R. G. Laha and J. Roy, “Handbook of Methods of Applied Statistics, Volume I,” John Wiley and Sons, Hoboken, 1967, pp. 392-394.
[15] J. M. Hilbe, “Negative Binomial Regression,” Cambridge University Press, Cambridge, 2007. doi:10.1017/CBO9780511811852
[16] V. Pareto, “Cours d’Economie Politique: Nouvelle Edition par G.-H. Bousquet et G. Busino,” Librairie Droz, Geneva, 1964, pp. 299-345.
[17] W. J. Reed and M. Jorgensen, “The Double Pareto-Lognormal Distribution—A New ParametricModel for Size Distributions,” Communications in Statistics: Theory and Methods, Vol. 33, No. 8, 2004, pp. 1733-1753. doi:10.1081/STA-120037438
[18] F. K. H. Phoa and W. C. Liu, “High-Quality Winners Take More: Modeling Non-Scale-Free Bulletin Forums with Content Variations,” Journal of Data Science, in Press, 2013.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.