TITLE:
Improved Comb Filter based Approach for Effective Prediction of Protein Coding Regions in DNA Sequences
AUTHORS:
Jayakishan Meher, Pramod K. Meher, Gananath Dash
KEYWORDS:
Cascaded Differentiator Comb (CDC), Generalized Comb Filter (GCF), Indicator Sequence, Period-3, Signal Processing
JOURNAL NAME:
Journal of Signal and Information Processing,
Vol.2 No.2,
May
26,
2011
ABSTRACT: The prediction of protein coding regions in DNA sequences is an important problem in computational biology. It is observed that nucleotides in the protein coding regions or exons of a DNA sequence show period-3 property. Hence identification of the period-3 regions helps in predicting the gene locations within the billions long DNA sequence of eukaryotic cells. The period-3 property exhibited in exons of eukaryotic gene sequences enables signal processing based time-domain and frequency domain methods to predict these regions efficiently. Several approaches based on signal processing tools have, therefore, been applied to this problem, to predict these regions effectively. This paper describes novel and efficient comb filter-based techniques for the prediction of protein coding region based on the period-3 behavior of codon sequences. The proposed method is then validated on Burset/Guigo1996, HMR195 and KEGG standard datasets using various prediction measures. It is shown that cascaded differentiator comb (CDC) filter can be used for prediction of protein coding region with better prediction efficiency, and involves less computational complexity compared with the other signal processing techniques based on period-3 property.