Quality Assessment of Training Data with Uncertain Labels for Classification of Subjective Domains

HTML  XML Download Download as PDF (Size: 1696KB)  PP. 152-168  
DOI: 10.4236/jcc.2017.57014    1,295 Downloads   2,315 Views  Citations
Author(s)

ABSTRACT

In order to improve the performance of classifiers in subjective domains, this paper defines a metric to measure the quality of the subjectively labelled training data (QoSTD) by means of K-means clustering. Then, the QoSTD is used as a weight of the predicted class scores to adjust the likelihoods of instances. Moreover, two measurements are defined to assess the performance of the classifiers trained by the subjective labelled data. The binary classifiers of Traditional Chinese Medicine (TCM) Zhengs are trained and retrained by the real-world data set, utilizing the support vector machine (SVM) and the discrimination analysis (DA) models, so as to verify the effectiveness of the proposed method. The experimental results show that the consistency of likelihoods of instances with the corresponding observations is increased notable for the classes, especially in the cases with the relatively low QoSTD training data set. The experimental results also indicate the solution how to eliminate the miss-labelled instances from the training data set to re-train the classifiers in the subjective domains.

Share and Cite:

Dai, Y. (2017) Quality Assessment of Training Data with Uncertain Labels for Classification of Subjective Domains. Journal of Computer and Communications, 5, 152-168. doi: 10.4236/jcc.2017.57014.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.