Journal of Biomedical Science and Engineering

Volume 11, Issue 6 (June 2018)

ISSN Print: 1937-6871   ISSN Online: 1937-688X

Google-based Impact Factor: 0.66  Citations  h5-index & Ranking

Improving Protein Sequence Classification Performance Using Adjacent and Overlapped Segments on Existing Protein Descriptors

HTML  XML Download Download as PDF (Size: 405KB)  PP. 126-143  
DOI: 10.4236/jbise.2018.116012    1,044 Downloads   2,031 Views  Citations

ABSTRACT

In protein sequence classification research, it is popular to convert a variable length sequence of protein into a fixed length numerical vector by using various descriptors, for instance, composition of k-mer composition. Such position-independent descriptors are useful since they are applicable to any length of sequence; however, positional information of subsequence is discarded even though it might have high contribution to classification performance. To solve this problem, we divided the original sequence into some segments, and then calculated the numerical features for them. It enables us to partially introduce positional information (for instance, compositions of serine in anterior and posterior segments of a sequence). Through comprehensive experiments on the number of segments and length of overlapping region, we found our classification approach with sequence segmentation and feature selection is effective to improve the performance. We evaluated our approach on three protein classification problems and achieved significant improvement in all cases which have a dataset with sufficient amino acid in each sequence. This result has shown the great potential of using additional segments in protein sequence classification to solve other sequence problems in bioinformatics.

Share and Cite:

Faisal, M. , Abapihi, B. , Nguyen, N. , Purnama, B. , Delimayanti, M. , Phan, D. , Lumbanraja, F. , Kubo, M. and Satou, K. (2018) Improving Protein Sequence Classification Performance Using Adjacent and Overlapped Segments on Existing Protein Descriptors. Journal of Biomedical Science and Engineering, 11, 126-143. doi: 10.4236/jbise.2018.116012.

Cited by

[1] Learning functional properties of proteins with language models
Nature Machine …, 2022
[2] Similarity Karya Ilmiah Mera Kartika Delimayanti (02/2022)
2022
[3] Natural Disaster on Twitter: Role of Feature Extraction Method of Word2Vec and Lexicon Based for Determining Direct Eyewitness
Trends in …, 2021
[4] Implementation protein sequence segmentation in AAC and DC as protein descriptors for improving a classification performance of acetylation prediction
2021
[5] Evaluation of Methods for Protein Representation Learning: A Quantitative Analysis
2020
[6] Studi Ekstraksi Fitur Berbasis Vektor Word2Vec pada Pembentukan Fitur Berdimensi Rendah
2020
[7] Clustering and Classification of Breathing Activities by Depth Image from Kinect.
2019
[8] A Study on Effect of Generated Features From Word2Vec Vectors For Text Classification
Conference Paper, 2019
[9] Clustering and Classification of Breathing Activities by Depth Image from Kinect
Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019), 2019
[10] Automatic Annotation of Hyperventilation and Sleep Stages in Electroencephalogram Examination
[11] ARTIKEL ILMIAH JURNAL INTERNASIONAL BEREPUTASI, SEMINAR INTERNASIONAL TERINDEX SCOPUS DAN SCIMAGOJR, SEMINAR …

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.