Combined Use of k-Mer Numerical Features and Position-Specific Categorical Features in Fixed-Length DNA Sequence Classification

HTML  XML Download Download as PDF (Size: 942KB)  PP. 390-401  
DOI: 10.4236/jbise.2017.108030    1,269 Downloads   2,865 Views  Citations

ABSTRACT

To classify DNA sequences, k-mer frequency is widely used since it can convert variable-length sequences into fixed-length and numerical feature vectors. However, in case of fixed-length DNA sequence classification, subsequences starting at a specific position of the given sequence can also be used as categorical features. Through the performance evaluation on six datasets of fixed-length DNA sequences, our algorithm based on the above idea achieved comparable or better performance than other state-of-the art algorithms.

Share and Cite:

Phan, D. , Nguyen, N. , Lumbanraja, F. , Faisal, M. , Abapihi, B. , Purnama, B. , Delimayanti, M. , Kubo, M. and Satou, K. (2017) Combined Use of k-Mer Numerical Features and Position-Specific Categorical Features in Fixed-Length DNA Sequence Classification. Journal of Biomedical Science and Engineering, 10, 390-401. doi: 10.4236/jbise.2017.108030.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.