TITLE:
Clustering Categorical Data Based on Within-Cluster Relative Mean Difference
AUTHORS:
Jinxia Su, Chunjing Su
KEYWORDS:
Clustering, Categorical Variable, Distinctive Attribute, Pooled Within-Cluster Mean Relative Difference, Hamming Distance
JOURNAL NAME:
Open Journal of Statistics,
Vol.7 No.2,
April
20,
2017
ABSTRACT: The clustering on categorical variables has received intensive attention. In dataset with categorical features, some features show the superior performance on clustering procedure. In this paper, we propose a simple method to find such distinctive features by comparing pooled within-cluster mean relative difference and then partition the data upon such features and give subspace of the subgroups. The applications on zoo data and soybean data illustrate the performance of the proposed method.