Predicting DNA methylation status using word composition

HTML  Download Download as PDF (Size: 227KB)  PP. 672-676  
DOI: 10.4236/jbise.2010.37091    5,103 Downloads   9,596 Views  Citations

Affiliation(s)

.

ABSTRACT

Background: DNA methylation will influence the gene expression pattern and cause the changes of the genetic functions. Computational analysis of the methylation status for nucleotides can help to explore the underlying reasons for developing methylations. Results: We present a DNA sequence based method to analyze the methylation status of CpG dinucleotides using 5bp (5-mer) DNA fragments – named as the word composition encoding method. The prediction accuracy is 75.16% when all 5bp word compositions are used (totally 45 = 1024). Furthermore, 5-bp DNA fragments/words having the most impact on the methylation status are identified by mRMR (Maximum-Relevant-Minimum-Redundancy) feature selection method. As a result, 58 words are selected, and they are used to build a compact predictor, which achieves 77.45% prediction accuracy. When the word composition encoding method and the feature selection strategy are coupled together, the meaning of these words can be analyzed through their contribution towards the prediction. The biological evidence in the literature supports that the surrounding DNA sequence of the CpG dinucleotides will affect the methylation of the CpG dinucleotides. Conclusions: The main contribution of this paper is to find out and analyze the key DNA words taken from the neighbor-hood of the CpG dinucleotides that are inducing the DNA methylation.

Share and Cite:

Lu, L. , Lin, K. , Qian, Z. , Li, H. , Cai, Y. and Li, Y. (2010) Predicting DNA methylation status using word composition. Journal of Biomedical Science and Engineering, 3, 672-676. doi: 10.4236/jbise.2010.37091.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.