An Efficient Agglomerative Clustering Algorithm for Web Navigation Pattern Identification

HTML  XML Download Download as PDF (Size: 1224KB)  PP. 2349-2356  
DOI: 10.4236/cs.2016.79205    1,651 Downloads   2,655 Views  Citations
Author(s)

ABSTRACT

Web log mining is analysis of web log files with web page sequences. Discovering user access patterns from web access are necessary for building adaptive web servers, to improve e-commerce, to carry out cross-marketing, for web personalization, to predict web access sequence etc. In this paper, a new agglomerative clustering technique is proposed to identify users with similar interest, and to determine the motivation for visiting a website. Using this approach, web usage mining is done through different stages namely data cleaning, preprocessing, pattern discovery and pattern analysis. Results are given to explain how this approach produces tight usage clusters than the existing web usage mining techniques. Rather than traditional distance based clustering, the similarity measure is considered during clustering process in order to reduce computational complexity. This paper also deals with the problem of assessing the quality of user session clusters and cluster validity is measured by using statistical test, which measures the distances of clusters distributions to infer their dissimilarity and distinguish level. Using such statistical measures, it is proved that cluster accuracy is improved to the extent of 0.83, over existing k-means clustering with validity measure 0.26, FCM (Fuzzy C Means) clustering with validity measure 0.56. Rough set based clustering with validity measure 0.54 Generation of dense clusters is essential for finding interesting patterns needed for further mining and analysis.

Share and Cite:

Anitha, A. (2016) An Efficient Agglomerative Clustering Algorithm for Web Navigation Pattern Identification. Circuits and Systems, 7, 2349-2356. doi: 10.4236/cs.2016.79205.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.