A novel over-sampling method and its application to miRNA prediction

HTML  Download Download as PDF (Size: 504KB)  PP. 236-248  
DOI: 10.4236/jbise.2013.62A029    4,494 Downloads   7,608 Views  Citations

ABSTRACT

MicroRNAs (miRNAs) are short (~22nt) non-coding RNAs that play an indispensable role in gene regulation of many biological processes. Most of current computational, comparative, and non-comparative methods commonly classify human precursor micro- RNA (pre-miRNA) hairpins from both genome pseudo hairpins and other non-coding RNAs (ncRNAs). Although there were a few approaches achieving promising results in applying class imbalance learning methods, this issue has still not solved completely and successfully yet by the existing methods because of imbalanced class distribution in the datasets. For example, SMOTE is a famous and general over-sampling method addressing this problem, however in some cases it cannot improve or sometimes reduces classification performance. Therefore, we developed a novel over-sampling method named incre-mental- SMOTE to distinguish human pre-miRNA hairpins from both genome pseudo hairpins and other ncRNAs. Experimental results on pre-miRNA datasets from Batuwita et al. showed that our method achieved better Sensitivity and G-mean than the control (no over- sampling), SMOTE, and several successsors of modified SMOTE including safe-level-SMOTE and border-line-SMOTE. In addition, we also applied the novel method to five imbalanced benchmark datasets from UCI Machine Learning Repository and achieved improvements in Sensitivity and G-mean. These results suggest that our method outperforms SMOTE and several successors of it in various biomedical classification problems including miRNA classification.

Share and Cite:

Dang, X. , Hirose, O. , Saethang, T. , Tran, V. , Nguyen, L. , Le, T. , Kubo, M. , Yamada, Y. and Satou, K. (2013) A novel over-sampling method and its application to miRNA prediction. Journal of Biomedical Science and Engineering, 6, 236-248. doi: 10.4236/jbise.2013.62A029.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.