A reduced computational load protein coding predictor using equivalent amino acid sequence of DNA string with period-3 based time and frequency domain analysis


Development of efficient gene prediction algorithms is one of the fundamental efforts in gene prediction study in the area of genomics. In genomic signal processing the basic step of the identification of protein coding regions in DNA sequences is based on the period-3 property exhibited by nucleotides in exons. Several approaches based on signal processing tools and numerical representations have been applied to solve this problem, trying to achieve more accurate predictions. This paper presents a new indicator sequence based on amino acid sequence, called as aminoacid indicator sequence, derived from DNA string that uses the existing signal processing based time-domain and frequency domain methods to predict these regions within the billions long DNA sequence of eukaryotic cells which reduces the computational load by one-third. It is known that each triplet of bases, called as codon, instructs the cell machinery to synthesize an amino acid. The codon sequence therefore uniquely identifies an amino acid sequence which defines a protein. Thus the protein coding region is attributed by the codons in amino acid sequence. This property is used for detection of period-3 regions using amino acid sequence. Physico-chemical properties of amino acids are used for numerical representation. Various accuracy measures such as exonic peaks, discriminating factor, sensitivity, specificity, miss rate, wrong rate and approximate correlation are used to demonstrate the efficacy of the proposed predictor. The proposed method is validated on various organisms using the standard data-set HMR195, Burset and Guigo and KEGG. The simulation result shows that the proposed method is an effective approach for protein coding prediction.

Share and Cite:

Meher, J. , Dash, G. , Meher, P. and Raval, M. (2011) A reduced computational load protein coding predictor using equivalent amino acid sequence of DNA string with period-3 based time and frequency domain analysis. American Journal of Molecular Biology, 1, 79-86. doi: 10.4236/ajmb.2011.12010.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Burge, C.B and Karlin, S. (1998) Finding the genes in genomic DNA. Current Opinion in Structural Biology, 8, 346-354. doi:10.1016/S0959-440X(98)80069-9
[2] Gusfield, D. (1997) Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge. doi:10.1017/CBO9780511574931
[3] Wang, Z., Chen, Y.Z. and Li, Y.X. (2004) A brief review of computational gene prediction methods. Genomics Proteomics Bioinformatics, 2, 216-221.
[4] Fickett, J.W. (1982) Recognition of protein coding regions in DNA sequences. Nucleic Acids Research, 10, 5303-5318. doi:10.1093/nar/10.17.5303
[5] Silverman, B.D. and Linsker, R. (1986) A measure of DNA periodicity. Journal of Theoretical Biology, 118, 295-300. doi:10.1016/S0022-5193(86)80060-1
[6] Tiwari, S., Ramachandran, S. and Bhattachalya, A. (1997) Prediction of probable gene by Fourier analysis of genomic sequences. CABIOS, 13, 263-270.
[7] Anastassiou, D. (2000) Frequency-domain analysis of biomolecular sequences. Bioinformatics, 16, 1073-1081. doi:10.1093/bioinformatics/16.12.1073
[8] Anastassiou, D. (2001) Genomic Signal Processing. IEEE Signal Processing Magazine, 8-20. doi:10.1109/79.939833
[9] Vaidyanathan, P.P. and Yoon, B.J. (2002) Digital filters for gene prediction applications. Proceedings of the 36th Asilomar Conference on Signals, Systems and Computers, 3-6 November 2002, 306-310.
[10] Fuentes, A., Ginori, J. and Abalo, R. (2008) A new predictor of coding regions in genomic sequences using a combination of different approaches. International Journal of Biological, Biomedical and Medical sciences.
[11] Jesus, P., Chalco, M. and Carrer, H. (2008) Identification of protein coding regions using the modified gaborwavelet tranform. IEEE/ACM Transaction on Computational Biology and Bioinformatics, 5, 198-207.
[12] Galleani, L. and Garello, R. (2010) The minimum entropy mapping spectrum of a dna sequence. IEEE Transaction on Information Theory, 56, 771-783. doi:10.1109/TIT.2009.2037041
[13] Tuqan, J. and Rushdi, A. (2008) A DSP approach for finding the codon bias in dna sequences. IEEE Journal of Selected Topics in Signal Processing, 2, 343-356. doi:10.1109/JSTSP.2008.923851
[14] Akhtar, M., Epps, J. and Ambikairajah, E. (2007) On DNA numerical representations for period-3 based exon prediction. Proceedings of IEEE International Workshop on Genomic Signal Processing and Statistics, Tuusula, 1-4. doi:10.1109/GENSIPS.2007.4365821
[15] Akhtar, M., Epps, J. and Ambikairajah, K. (2008) Signal processing in sequence analysis:Advances in eukaryotic gene prediction. IEEE Journal of Selected Topics in Signal Processing, 2, 310-321. doi:10.1109/JSTSP.2008.923854
[16] Voss, R. (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Physical Review Letters, 68, 3805-3808. doi:10.1103/PhysRevLett.68.3805
[17] Zhang, R. and Zhang, C.T. (1994) Z curves, an intuitive tool for visualizing and analyzing the DNA sequences. Journal of Biomol. Struct. Dyn., 11, 767-782.
[18] Rushdi, A. and Tuqan, J. (2006) Gene identification using the Z-curve representation. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, 14-19 May 2006, 1024-1027.
[19] Cristea, P.D. (2002) Genetic signal representation and analysis. Proc. SPIE Conference, International Biomedical Optics Symposium (BIOS'02), 4623, 77-84.
[20] Brodzik, A.K. and Peters (2005) Symbol-balanced quaternionic periodicity transform for latent pattern detection in DNA sequences. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 5, 18-23 March 2005, 373-376.
[21] Rosen, G.L. (2006) Signal processing for biologically-inspired gradient source localization and DNA sequence analysis. Ph.D. Thesis, Georgia Institute of Technology, Atlanta.
[22] Nair, T.M., Tambe, S.S. and Kulkarni, B.D. (1994) Application of artificial neural networks for prokaryotic transcription terminator prediction. FEBS Letters, 346, 273-277. doi:10.1016/0014-5793(94)00489-7
[23] Nair, A.S. and Sreenathan, S.P. (2006) A coding measure scheme employing electronion interaction pseudopotential (EIIP). Bioinformation, 1, 197-202.
[24] Nair, A.S. and Sreenathan, S.P. (2006) An improved digital filtering technique using frequency indicators for locating exons. Journal of the Computer Society of India, 36.
[25] Burset, M. and Guigo, ?.R. (1996) Evaluation of gene structure prediction programs. Genomics, 34, 353-367. doi:10.1006/geno.1996.0298
[26] Rogic, S., Mackworth, A. and Ouellette, F. (2001) Evaluation of genefinding programs on mammalian sequences. Genome Resarch, 11, 817-832. doi:10.1101/gr.147901
[27] Kanehisa, M. and Goto, S. (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acid Research, 28, 27-30. doi:10.1093/nar/28.1.27
[28] Biju, I. and Gajendra P.S.R. (2004) EGPred: Prediction of eukaryotic genes using ab initio methods after combining with sequence similarity approaches. Genome Research, 14, 1756-1766. doi:10.1101/gr.2524704

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.