Predicting Βeta-Turns and Βeta-Turn Types Using a Novel Over-Sampling Approach

Abstract

β-turn is one of the most important reverse turns because of its role in protein folding. Many computational methods have been studied for predicting β-turns and β-turn types. However, due to the imbalanced dataset, the performance is still inadequate. In this study, we proposed a novel over-sampling technique FOST to deal with the class-imbalance problem. Experimental results on three standard benchmark datasets showed that our method is comparable with state-of-the-art methods. In addition, we applied our algorithm to five benchmark datasets from UCI Machine Learning Repository and achieved significant improvement in G-mean and Sensitivity. It means that our method is also effective for various imbalanced data other than β-turns and β-turn types.

Share and Cite:

Nguyen, L. , Dang, X. , Le, T. , Saethang, T. , Tran, V. , Ngo, D. , Gavrilov, S. , Nguyen, N. , Kubo, M. , Yamada, Y. and Satou, K. (2014) Predicting Βeta-Turns and Βeta-Turn Types Using a Novel Over-Sampling Approach. Journal of Biomedical Science and Engineering, 7, 927-940. doi: 10.4236/jbise.2014.711090.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Chou, K.C. (2000) Prediction of Tight Turns and Their Types in Proteins. Analytical Biochemistry, 286, 1-16. http://dx.doi.org/10.1006/abio.2000.4757
[2] Marcelino, A.M.C. and Gierasch, L.M. (2008) Roles of Beta-Turns in Protein Folding: From Peptide Models to Protein Engineering. Biopolymers, 89, 380-391.
http://dx.doi.org/10.1002/bip.20960
[3] Guruprasad, K. and Rajkumar, S. (2000) Beta-and Gamma-Turns in Proteins Revisited: A New Set of Amino Acid Turn-Type Dependent Positional Preferences and Potentials. Journal of Biosciences, 25, 143-156.
[4] Takano, K., Yamagata, Y. and Yutani, K. (2000) Role of Amino Acid Residues at Turns in The Conformational Stability and Folding of Human Lysozyme. Biochemistry, 39, 8655-8665.
http://dx.doi.org/10.1021/bi9928694
[5] Hutchinson, E.G. and Thornton, J.M. (1994) A Revised Set of Potentials for Beta-Turn Formation in Proteins. Protein Science, 3, 2207-2216.
http://dx.doi.org/10.1002/pro.5560031206
[6] Shepherd, A.J., Gorse, D. and Thornton, J.M. (1999) Prediction of the Location and Type of Beta-Turns in Proteins Using Neural Networks. Protein Science, 8, 1045-1055.
http://dx.doi.org/10.1110/ps.8.5.1045
[7] Kaur, H. and Raghava, G.P.S. (2003) Prediction of Beta-Turns in Proteins from Multiple Alignment Using Neural Network. Protein Science, 12, 627-634.
http://dx.doi.org/10.1110/ps.0228903
[8] Petersen, B., Lundegaard, C. and Petersen, T.N. (2010) NetTurnP—Neural Network Prediction of Beta-Turns by Use of Evolutionary Information and Predicted Protein Sequence Features. PLoS ONE, 5, e15079.
http://dx.doi.org/10.1371/journal.pone.0015079
[9] Kountouris, P. and Hirst, J.D. (2010) Predicting Beta-Turns and Their Types Using Predicted Backbone Dihedral Angles and Secondary Structures. BMC Bioinformatics, 11, Article ID: 407.
http://dx.doi.org/10.1186/1471-2105-11-407
[10] Pham, T.H., Satou, K. and Ho, T.B. (2003) Prediction and Analysis of Beta-Turns in Proteins by Support Vector Machine. Genome Informatics, 14, 196-205.
[11] Zhang, Q., Yoon, S. and Welsh, W.J. (2005) Improved Method for Predicting β-Turn Using Support Vector Machine. Bioinformatics, 21, 2370-2374.
http://dx.doi.org/10.1093/bioinformatics/bti358
[12] Hu, X. and Li, Q. (2008) Using Support Vector Machine to Predict β- and γ-Turns in Proteins. Journal of Computational Chemistry, 29, 1867-1875.
http://dx.doi.org/10.1002/jcc.20929
[13] Zheng, C. and Kurgan, L. (2008) Prediction of β-Turns at Over 80% Accuracy Based on an Ensemble of Predicted Secondary Structures and Multiple Alignments. BMC Bioinformatics, 9, 430.
http://dx.doi.org/10.1186/1471-2105-9-430
[14] Elbashir, M., Wang, J., Wu, F.X. and Wang, L. (2013) Predicting β-Turns in Proteins Using Support Vector Machines with Fractional Polynomials. Proteome Science, 11, S5.
http://dx.doi.org/10.1186/1477-5956-11-S1-S5
[15] Elbashir, M.K., Wang, J., Wu, F. and Li, M. (2012) Sparse Kernel Logistic Regression for β-Turns Prediction. 2012 IEEE 6th International Conference on Systems Biology (ISB), Xi’an, 18-20 August 2012, 246-251.
[16] Kirschner, A. and Frishman, D. (2008) Prediction of β-Turns and β-Turn Types by a Novel Bidirectional Elman-Type Recurrent Neural Network with Multiple Output Layers (MOLEBRNN). Gene, 422, 22-29.
http://dx.doi.org/10.1016/j.gene.2008.06.008
[17] Fuchs, P.F.J. and Alix, A.J.P. (2005) High Accuracy Prediction of β-Turns and Their Types Using Propensities and Multiple Alignments. Proteins: Structure, Function, and Bioinformatics, 59, 828-839.
http://dx.doi.org/10.1002/prot.20461
[18] Shi, X., Hu, X., Li, S. and Liu, X. (2011) Prediction of β-Turn Types in Protein by Using Composite Vector. Journal of Theoretical Biology, 286, 24-30.
http://dx.doi.org/10.1016/j.jtbi.2011.07.001
[19] Nakamura, M., Kajiwara, Y., Otsuka, A. and Kimura, H. (2013) LVQ-SMOTE—Learning Vector Quantization Based Synthetic Minority Over-Sampling Technique for Bio-medical Data. BioData Mining, 6, 16.
[20] He, H. and Garcia, E.A. (2009) Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21, 1263-1284.
http://dx.doi.org/10.1109/TKDE.2008.239
[21] Hutchinson, E.G. and Thornton, J.M. (1996) PROMOTIF—A Program to Identify and Analyze Structural Motifs in Proteins. Protein Science, 5, 212-220.
http://dx.doi.org/10.1002/pro.5560050204
[22] Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Research, 25, 3389- 3402.
http://dx.doi.org/10.1093/nar/25.17.3389
[23] Tang, Z., Li, T., Liu, R., Xiong, W., Sun, J., Zhu, Y. and Chen, G. (2011) Improving the Performance of β-Turn Prediction Using Predicted Shape Strings and a Two-Layer Support Vector Machine Model. BMC Bioinformatics, 12, 283.
http://dx.doi.org/10.1186/1471-2105-12-283
[24] Sun, J., Tang, S., Xiong, W., Cong, P. and Li, T. (2012) DSP: A Protein Shape String and Its Profile Prediction Server. Nucleic Acids Research, 40, W298-W302.
http://dx.doi.org/10.1093/nar/gks361
[25] Offmann, B., Tyagi, M. and de Brevern, A.G. (2007) Local Protein Structures. Current Bioinformatics, 2, 165-202.
http://dx.doi.org/10.2174/157489307781662105
[26] Joseph, A.P., Agarwal, G., Mahajan, S., Gelly, J.C., Swapna, L.S., Offmann, B., Cadet, F., Bornot, A., Tyagi, M., Valadié, H., Schneider, B., Etchebest, C., Srinivasan, N. and de Brevern, A.G. (2010) A Short Survey on Protein Blocks. Biophysical Reviews, 2, 137-145.
http://dx.doi.org/10.1007/s12551-010-0036-1
[27] De Brevern, A.G., Etchebest, C. and Hazout, S. (2000) Bayesian Probabilistic Approach for Predicting Backbone Structures in Terms of Protein Blocks. Proteins: Structure, Function, and Bioinformatics, 41, 271-287.
http://dx.doi.org/10.1002/1097-0134(20001115)41:3<271::AID-PROT10>3.0.CO;2-Z
[28] De Brevern, A.G. (2005) New Assessment of a Structural Alphabet. In Silico Biology, 5, 283-289.
[29] Joseph, A.P., Srinivasan, N. and de Brevern, A.G. (2011) Improvement of Protein Structure Comparison Using a Structural Alphabet. Biochimie, 93, 1434-1445.
http://dx.doi.org/10.1016/j.biochi.2011.04.010
[30] Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002) SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357.
[31] Karatzoglou, A., Wien, T.U., Smola, A., Hornik, K. and Wien, W. (2004) Kernlab—An S4 Package for Kernel Methods in R. Journal of Statistical Software, 11, 1-20.
[32] Altidor, W., Khoshgoftaar, T.M. and Hulse, J.V. (2011) Robustness of Filter-Based Feature Ranking: A Case Study. Proceedings of 24th Florida Artificial Intelligence Research Society Conference (FLAIRS-24), Palm Beach, 18-20 May 2011, 453
[33] Sonego, P., Kocsor, A. and Pongor, S. (2008) ROC Analysis: Applications to the Classification of Biological Sequences and 3D Structures. Briefings in Bioinformatics, 9, 198-209.
http://dx.doi.org/10.1093/bib/bbm064
[34] Bache, K. and Lichman, M. (2013) UCI Machine Learning Repository. School of Information and Computer Sciences, University of California, Irvine.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.