Using the improved position specific scoring matrix and ensemble learning method to predict drug-binding residues from protein sequences

HTML  Download Download as PDF (Size: 806KB)  PP. 304-312  
DOI: 10.4236/ns.2012.45043    5,517 Downloads   9,384 Views  Citations

ABSTRACT

Identification of the drug-binding residues on the surface of proteins is a vital step in drug discovery and it is important for understanding protein function. Most previous researches are based on the structural information of proteins, but the structures of most proteins are not available. So in this article, a sequence-based method was proposed by combining the support vector machine (SVM)-based ensemble learning and the improved position specific scoring matrix (PSSM). In order to take the local environment information of a drug-binding site into account, an improved PSSM profile scaled by the sliding window and smoothing window was used to improve the prediction result. In addition, a new SVM-based ensemble learning method was developed to deal with the imbalanced data classification problem that commonly exists in the binding site predictions. When performed on the dataset of 985 drug-binding residues, the method achieved a very promising prediction result with the area under the curve (AUC) of 0.9264. Furthermore, an independent dataset of 349 drug- binding residues was used to evaluate the pre- diction model and the prediction accuracy is 84.68%. These results suggest that our method is effective for predicting the drug-binding sites in proteins. The code and all datasets used in this article are freely available at http://cic.scu.edu.cn/bioinformatics/Ensem_DBS.zip.

Share and Cite:

Li, J. , Zhang, Y. , Qin, W. , Guo, Y. , Yu, L. , Pu, X. , Li, M. and Sun, J. (2012) Using the improved position specific scoring matrix and ensemble learning method to predict drug-binding residues from protein sequences. Natural Science, 4, 304-312. doi: 10.4236/ns.2012.45043.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.