Positional Information Storage in Sequence Patterns


We build a model of storage of well-defined positional information in probabilistic sequence patterns. Once a pattern is defined, it is possible to judge the effect of any mutation in it. We show that the frequency of beneficial mutations can be high in general and the same mutation can be either advantageous or deleterious depending on the pattern’s context. The model allows to treat positional information as a physical quantity, formulate its conservation law and to model its continuous evolution in a whole genome, with meaningful applications of basic physical principles such as optimal efficiency and channel capacity. A plausible example of optimal solution analytically describes phase transitions-like behavior. The model shows that, in principle, it is possible to store error-free information on sequences with arbitrary low conservation. The described theoretical framework allows one to approach from novel general perspectives such long-standing paradoxes as excessive junk DNA in large genomes or the corresponding G- and C-values paradoxes. We also expect it to have an effect on a number of fundamental concepts in population genetics including the neutral theory, cost-of-selection dilemma, error catastrophe and others.

Share and Cite:

A. Shadrin, A. Grigoriev and D. Parkhomchuk, "Positional Information Storage in Sequence Patterns," Computational Molecular Bioscience, Vol. 3 No. 2, 2013, pp. 18-26. doi: 10.4236/cmb.2013.32003.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] C. E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical Journal, Vol. 27, No. 3, 1948, pp. 379-423, 623-656.
[2] O. Aftab, P. Cheung, A. Kim, S. Thakkar and N. Yeddanapudi, “Information Theory and the Digital Age,” 6.933—Final Paper, The Structure of Engineering Revolutions, Massachusetts Institute of Technology, Cambridge, 2001.
[3] H. Yockey, “Information Theory, Evolution, and the Origin of Life,” Cambridge University Press, New York, 2005. doi:10.1017/CBO9780511546433
[4] H. A. Johnson, “Information Theory in Biology after 18 Years,” Science, Vol. 168, No. 3939, 1970, pp. 1545-1550. doi:10.1126/science.168.3939.1545
[5] M. Eigen, “Selforganization of Matter and the Evolution of Biological Macromolecules,” Die Naturwissenschaften, Vol. 58, No. 10, 1971, pp. 465-523. doi:10.1007/BF00623322
[6] P. H. Von Hippel and O. G. Berg, “On the Speci?city of DNA-Protein Interactions,” Proceedings of the National Academy of Science USA, Vol. 83, No. 6, 1986, pp. 1608- 1612. doi:10.1073/pnas.83.6.1608
[7] O. G. Berg and P. H. Von Hippel, “Selection of DNA Binding Sites by Regulatory Proteins. Statistical-Mechanical Theory and Application to Operators and Promoters,” Journal of Molecular Biology, Vol. 193, No. 4, 1987, pp. 723-750. doi:10.1016/0022-2836(87)90354-8
[8] G. D. Stormo, “DNA Binding Sites: Representation and Discovery,” Bioinformatics, Vol. 16, No. 1, 2000, pp. 16- 23. doi:10.1093/bioinformatics/16.1.16
[9] J. Berg, S. Willmann and M. L?ssig, “Adaptive Evolution of Transcription Factor Binding Sites,” BMC Evolutionary Biology, Vol. 4, 2004, p. 42. doi:10.1186/1471-2148-4-42
[10] S. A. Frank, “Natural Selection. V. How to Read the Fundamental Equations of Evolutionary Change in Terms of Information Theory,” Journal of Molecular Evolution, Vol. 25, No. 12, 2000, pp. 2377-2396.
[11] T. D. Schneider, G. D. Stormo, L. Gold and A. Ehrenfeucht, “Information Content of Binding Sites on Nucleotide Sequences,” Journal of Molecular Biology, Vol. 188, No. 3, 1986, pp. 415-431. doi:10.1016/0022-2836(86)90165-8
[12] T. D. Schneider and R. M. Stephens, “Sequence Logos: a New Way to Display Consensus Sequences,” Nucleic Acids Research, Vol. 18, No. 20, 1990, pp. 6097-6100. doi:10.1093/nar/18.20.6097
[13] R. M. Stephens and T. D. Schneider, “Features of Spliceosome Evolution and Function Inferred from an Analysis of the Information at Human Splice Sites,” Journal of Molecular Biology, Vol. 228, No. 4, 1992, pp. 1124- 1136. doi:10.1016/0022-2836(92)90320-J
[14] T. D. Schneider, “Evolution of Biological Information,” Nucleic Acids Research, Vol. 28, No. 14, 2000, pp. 2794- 2799. doi:10.1093/nar/28.14.2794
[15] V. Girardin, “On the Different Extensions of the Ergodic Theorem of Information Theory,” In: R. Baeza-Yates, J. Glaz, H. Gzyl, J. Hüsler and J. L. Palacios, Eds, Recent Advances in Applied Probability, Springer, New York, 2005, pp. 163-179. doi:10.1007/0-387-23394-6_7
[16] T. M. Cover and J. A. Thomas, “Asymptotic Equiparti- tion Property,” In: Elements of Information Theory, Sec- ond Edition, John Wiley & Sons, Inc., Hoboken, 2005, pp. 57-69. doi:10.1002/047174882X.ch3
[17] J. H. Postlethwait, “Modern Biology,” Holt, Rinehart and Winston, 2009.
[18] D. Charlesworth, “Balancing Selection and Its Effects on Sequences in Nearby Genome Regions,” PLoS Genetics, Vol. 2, No. 4, 2006, p. e64. doi:10.1371/journal.pgen.0020064
[19] H. Levene, “Genetic Equilibrium When More than One Ecological Niche Is Available,” The American Naturalist, Vol. 87, No. 836, 1953, pp. 331-333. doi:10.1086/281792
[20] R. Houlston, “Mutations: Penetrance,” General & Introductory Life Sciences, 2006, Online.
[21] J. F. Crow, “Some Possibilities for Measuring Selection Intensities in Man,” Human Biology, Vol. 30, No. 1, 1958, pp. 1-13.
[22] D. W. Collins and T. H. Jukes, “Rates of Transition and Transversion in Coding Sequences Since the Human- Rodent Divergence,” Genomics, Vol. 20, No. 3, 1994, pp. 386-396. doi:10.1006/geno.1994.1192
[23] T. P. Runarsson and X. Yao, “Stochastic Ranking for Constrained Evolutionary Optimization,” IEEE Transactions on Evolutionary Computation, Vol. 4, No. 3, 2000, pp. 284-294. doi:10.1109/4235.873238
[24] S. Haider, B. Ballester, D. Smedley, J. Zhang, P. Rice, A. Kasprzyk, “BioMart Central Portal—Uni?ed Access to Biological Data,” Nucleic Acids Research, Vol. 37, No. Web-Server, 2009, pp. W23-W27.
[25] P. Flicek, M. R. Amode, D. Barrell, K. Beal, S. Brent, Y. Chen, P. Clapham, G. Coates, S. Fairley, S. Fitzgerald, et al., “Ensembl 2011,” Nucleic Acids Research, Vol. 39, Suppl. 1, 2011, pp. D800-D806. doi:10.1093/nar/gkq1064
[26] P. A. Fujita, B. Rhead, A. S. Zweig, A. S. Hinrichs, D. Karolchik, M. S. Cline, M. Goldman, G. P. Barber, H. Clawson, A. Coelho, et al., “The UCSC Genome Browser Database: Update 2011,” Nucleic Acids Research, Vol. 39, Suppl. 1, 2011, pp. D876-D882. doi:10.1093/nar/gkq963
[27] M. Kamal, X. Xie and E. S. Lander, “A Large Family of Ancient Repeat Elements in the Human Genome is under Strong Selection,” Proceedings of the National Academy of Sciences USA, Vol. 103, No. 8, 2006, pp. 2740-2745. doi:10.1073/pnas.0511238103
[28] N. G. S. Smith, M. Brandstr¨om and H. Ellegren, “Evidence for Turnover of Functional Noncoding DNA in Mammalian Genome Evolution,” Genomics, Vol. 84, No. 5, 2004, pp. 806-813. doi:10.1016/j.ygeno.2004.07.012
[29] C. P. Ponting and R. C. Hardison, “What Fraction of the Human Genome is Functional?” Genome Research, Vol. 21, 2011, pp. 1769-1776. doi:10.1101/gr.116814.110
[30] P. Sniegowski, “Evolution: Constantly Avoiding Mutation,” Current Biology, Vol. 11, No. 22, 2001, pp. R929- R931. doi:10.1016/S0960-9822(01)00557-7

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.