TITLE:
Ensemble-based active learning for class imbalance problem
AUTHORS:
Yanping Yang, Guangzhi Ma
KEYWORDS:
Class Imbalance, Active learning, Ensemble, Random Subspace, Misclassification Cost
JOURNAL NAME:
Journal of Biomedical Science and Engineering,
Vol.3 No.10,
October
28,
2010
ABSTRACT: In medical diagnosis, the problem of class imbalance is popular. Though there are abundant unlabeled data, it is very difficult and expensive to get labeled ones. In this paper, an ensemble-based active learning algorithm is proposed to address the class imbalance problem. The artificial data are created according to the distribution of the training dataset to make the ensemble diverse, and the random subspace re-sampling method is used to reduce the data dimension. In selecting member classifiers based on misclassification cost estimation, the minority class is assigned with higher weights for misclassification costs, while each testing sample has a variable penalty factor to induce the ensemble to correct current error. In our experiments with UCI disease datasets, instead of classification accuracy, F-value and G-means are used as the evaluation rule. Compared with other ensemble methods, our method shows best performance, and needs less labeled samples.