The Syhomy of the Genetic Code Is the Path to the Real Speech Characteristics of the Encoded Proteins ()
1. The Wobble Hypothesis by F. Crick
A lot has been written about the hypothesis of F. Crick, including the works of the author himself, but most of the judgments are based on a formulation from F. Crick’s book “What a Mad Pursuit” 1988. [5]. Here are the key words: “An important point to notice is that although the genetic code has certain regularities―in several cases it is the first two bases that encode one amino acid, the nature of the third being irrelevant―its structure otherwise makes no obvious sense.”
However, there are some significant additional issues that stem from this brief message. This is what this article is about. “The standard” genetic protein code was obtained by M. Nirenberg’s group as a result from studying protein synthesis in E. coli. This work resulted in the table of the standard genetic code. It reflects the functions of protein genes as a static code structure, where all codons UNAMBIGUOUSLY encode amino acids and stop positions. It is important that according to the Wobble Hypothesis, half of the known 64 codons, i.e. 32, are redundant for 20 known amino acids. As for the 21st amino acid, selenocysteine and its coding―it will be explained later in this article. Redundant codons are synonyms that, in varying degrees of repetition, code the same, but different, amino acids and stop positions. These are the main provisions in M. Nirenberg’s model, later followed by F. Crick. This understanding has prevailed for 50 years, since M. Nirenberg received the Nobel Prize for this model in 1968. Now, theoretical and experimental results have accumulated, that suggest the introduction of amendments to this understanding of protein genetic coding. They are as follows.
2. Unambiguity and Degeneracy Factor of the E. coli Protein Code
The table of the standard code is functionally divided into two symmetric and equal parts, where 32 codons UNAMBIGIUOSLY and REDUNDANTLY encode only amino acids. These codons are synonyms. 32 other codons (not synonyms), called homonyms [1] [2] [3] , AMBIGIOUSLY encrypt amino acids and stop positions, and not always in accordance with the standard code table. Namely, each codon homonym encrypts simultaneously two different amino acids, or an amino acid and a stop position. This means, that to ensure correct protein synthesis, it is necessary to make a CHOICE from two different amino acids―either choose one amino acid or choose an amino acid or a stop position. The deciphered amino acids or stop positions in this case may not correspond to the table of the standard code, since they are recognized and selected by the ribosome according to codons-homonyms DYNAMICALLY while the ribosome is reading and logically analyzing the context of mRNA. This contradicts M. Nirenberg’s and F. Crick’s dogma of unambiguous coding, which was accepted as ‘carved in stone’ up to the works [1] [2] [3] and the article [4] , which experimentally proves that codon of the selenocysteine amino acid simultaneously encrypts another amino acid―cysteine. This provided reason to doubt the evidence of the dogma and called for a search to explain this phenomenon so not to break the dogma of unambiguous coding, but to confirm it from the standpoint of the linguistic principle of homonymy, that is, the real (not metaphorical) textuality of genes (mRNA). This occurs in the process of protein biosynthesis as opposed to the contrary position of synonymy (also linguistic), about the codification of one amino acid by many codons. The latter corresponds to F. Crick’s Wobble Hypothesis and is experimentally proven by the presence of isoacceptor tRNAs. The ultimately general definition of synonymy and homonymy can be formulated as follows. Synonymy is when one meaning is represented (coded) by many different words. Homonymy is when one word represents many different meanings. This is demonstrated in the new view of Table 1 of the genetic code, where you can see the functional and symmetrical division of the code into codons-synonyms and codons-homonyms. Groupings of codons by family are carried out according to the Lagerkvist scheme [6] , where the family-forming factor is the first two nucleotides in codons (triplets). The families themselves are grouped by us differently―on the basis of synonymy and homonymy.
The table is symmetrically divided into codons-synonyms (in blue) and
Table 1. The table of the genetic (protein) code.
syhoms (in red). Table adapted from article [3].
3. The Choice of Amino Acids and Stop Positions in the Case of Ribosome Interaction with the Codon-Homonyms on mRNA
Such a CHOICE is made by the ribosome due to the fact that it (and/or the whole cell) takes into account the context of the given mRNA. This choice automatically implies quasi-consciousness of the protein synthesizing system, more precisely, its biocomputer functions [7]. Quasi-consciousness is present because mRNA (gene copy) is a text in a literal, non-metaphorical sense [1] [2] [3].
The situation of UNAMBIGUOUS coding by synonyms is determined by the fact that in each of the 8 codon families, ALL TRIPLETS (codons) are DIFFERENT. For this reason, in the triplet families, coding is performed by ALL three letters (nucleotides) and all triplets in each family encode only one amino acid. This coding is UNAMBIGUOUS AND REDUNDANT. Replacement of the third nucleotides in codons does not change the coding.
The situation of PRIMARY UNAMBIGUITY of coding by HOMONYM-triplets from the beginning (before ribosome reading of mRNA) is available in half of the codons―which are not synonyms (i.e., in fact, homonyms). This depends on the fact that the 3rd codon nucleotide―the key participant in the work of the genome-biocomputer of each cell―before the act of reading mRNA by the ribosome, in a static state, “does not plan” participation in the coding and can potentially be any of the 4 possible ones. Let me remind you that F. Crick did not comment on such cases of ribosome dynamics. So, the first two nucleotides (doublets) are coded. At the same time, in 6 homonym-families, it happens that the pairs of IDENTICAL doublets encode different amino acids. Wherein, in two families, it happens as follows... The doublet of TA-family encodes tyrosine and stop twice. In two TG-doublets: One doublet pair encodes cysteine; the other doublet pair encodes stop and tryptophan. In general, this means that in this case, there is also a homonymy factor, but with important additional characteristics. This phenomenon was discovered by the group of M. Nirenberg and F. Crick on the example of T(U)T(U) codon family [8] , when the triplet UUU simultaneously encodes phenylalanine and leucine. This simultaneity was not understood by F. Crick and M. Nirenberg. So, they did not see it as contradictory to their postulate about the UNAMBIGUITY of coding by all 64 codons of amino acids and stops. This was believed until the work of Turanov et al. [4] , where they demonstrated the same simultaneity of coding for selenocysteine and cysteine, which similarly had long ago been detected by Crick and Nirenberg for the UUU codon [8]. This work [4] experimentally demonstrated and theoretically substantiated the phenomenon of UGA codon ambiguity [1] [2] [3]. This work [4] brought the first doubts in the evidence of dogma on unambiguous
Table 2. Synonymous-homonymous two-dimensionality of the genetic code.
coding of amino acids and stop positions by all 64 codons. A new representation (Table 2) of the Genetic Code presents this new and significant information―regarding the homonymy of half the codons―in a clearly visible way. The Table is taken from work [1].
The biofunction of such synonymous-homonymous dualism (within the mixed codon families), perhaps, is about providing even more flexibility to the code. This duality actually means hybridization of code capabilities in eight mixed codon families. Therefore, it would be more convenient to call them SYHOM-families (portmanteau from the words SYnonym and HOMonym), and to call this characteristic SYHOMY.
An important point, the standard genetic code table of the E. coli protein code, adopted by the scientific community, is STATIC and does not reflect the most important factor of dynamics in the process of protein biosynthesis in vivo. This a reason why the majority has an incomplete understanding of the key linguistic function of the third nucleotide in the syhom-codons, the codons which take the genome to the level of real, non-metaphorical, textual constructions of DNA and RNA. Syhomy provides endless horizons of semantic (quasi-speech) coding to the protein-synthesizing-system. This is especially important for the functions of human brain neurons, where the acts of thinking and consciousness are realized, probably, along the way of materialization of short-lived DNA-RNA-PROTEIN texts in the form of physical fields as materialized equivalents of thoughts [9].
Thus, we see synonymous-homonymic degeneracy (SOHOMY) of the protein genetic code, which is amending the previous code model. This fact reflects the unity of the opposite codon functions of codons-synonyms and codons-syhoms, where synonyms stand for redundancy and coding accuracy, and syhoms―for flexibility and adaptability of code to environmental changes
4. Amendment to F. Crick’s Wobble Hypothesis
For synonyms, all 4 different nucleotides (T, C, A, G) in the 3-position in codons can change places in any way. This does not affect their coding functions. But for syhoms, this fundamentally contradicts the official standard genetic code table. This does not mean the denial of unambiguous coding. Unambiguity is achieved by contextual orientations of ribosomes on mRNA. In syhoms, at the level of mRNA translation into proteins in cases of reading frame shifts, the substitutions of 3rd nucleotides may lead to anomaly context dependent choices of amino acid coding and/or stop positions. This will happen, if during reading frame shift, the syhom 3rd nucleotide will take the position of the 1st or 2nd syhom nucleotides.
Such substitutions are random and not indifferent to the biosynthesis of proteins, as required by F. Crick’s Wobble Hypothesis. The values of the first two nucleotides of syhoms are dictated by mRNA contexts, and the third nucleotide roles are reduced to: 1) participation in the coding with a “delegated function”, 2) (in addition) physically strengthen the codon-anticodon pair on the ribosomes.
What is a “delegated function”? Since the order of nucleotides in the protein gene and, consequently, in mRNA, is rigid (hereditary), when a ribosome is reading mRNA and interacting with a codon-syhom doublet (with the 1st and 2nd triplet nucleotides), it brings about a situation of uncertainty, associated with the 3rd wobbling nucleotide, according to F. Crick (and the Nature of the code). What is the 3rd nucleotide’s linguistic/semantic role in the codon-syhom text? Probably, it is actualized by means of “delegation” of missing linguistic/semantic function to the 3rd syhom nucleotide, according to the scheme of contextual orientations. Here is an example from linguistics. In the sentences: 1) “He heard the caS mewing”; 2) “Tom usually wears a cowboy hat, but today he’s wearing a baseball caL”. Proceeding from the contexts, in the first sentence in the word caS, the letter S should be delegated the function of the letter T, and in the second sentence, the letter L should delegate the function of the letter P. As a result, the syhom-doublets lose their semantic uncertainty and, as a part of the integral triplet (the former syhom), acquire the only correct and unambiguous semantics of the coding triplet―the choice of the amino acid and/or stop position―syhom dualism is lost, resulting in unambiguity. Unambiguous coding is acquired, but in a dynamic act of ribosome reading of mRNA. This is the strategic consequence of synonymous-homonymous degeneracy (two-dimensionality, syhomy) of the protein code. The principle is simple, like all ingenious “invented” by Nature. This is akin to the reassignment (recodification) of codons when a biosystem is in a stressful state (heat shock, the presence of exogenous antibiotics, amino acid starvation), this has been known about for a long time, however is not yet understood with respect to temporarily ambiguous doublets within codons-syhoms. In linguistics, there is a canonical example of the role of word endings in a sentence (within the context) for delegating meaning to previously incomprehensible “words”: “The iggle squiggs trazed wombly in the harlish hoop”1. It seems like nonsense, but in fact intuitively you may sense the meaning. Endings of the “words” provide, delegate them relatively clear meaning. It is possible that it is similar in DNA and mRNA texts. The
Table 3. Two-dimensional synonymous-homonymous redundancy (syhomy) of the protein triplet code.
synonymous-homonymous two-dimensionality of codons-syhoms can be seen in Table 3, with the TA codon family as an example, where paired synonymy takes place. For the TG family, paired synonymy works only partially: for codons TGA and TGG, there is no synonymy, they encode Stop and Trp, this is an exception. Paired synonymy as well as a paired opposing homonymy is observed for all other syhom families.
In syhom-codons, substitution of the third (3’) nucleotide will result in context dependent choices of amino acid and/or stop positions. Such substitutions are random and can only occur from accidental radiation, chemical or artificially induced mutations, only these can replace, or rather, damage the third (3’) nucleotides in the syhom-codons, which are hereditarily rigid.
One may propose the following rule: the third nucleotides in syhom-codons take upon them delegated meanings of the four nucleotides―A, U, G, C―chosen by the ribosome nanobiocomputer in the course of reading the mRNA context. In turn, this choice determines which “amino acid-tRNA-anticodon:codon-syhom” complex will be involved for the inclusion of the selected amino acid in the growing peptide chain.
5. Why Stop Codons Are in Syhom Families
Termination―the end of protein synthesis, is carried out when one of the stop codons―UAG, UAA, UGA―appears in the A-site of the ribosome. Due to absence of tRNAs, corresponding to these codons, peptidyl-tRNA remains bound to the P-site of the ribosome. Here, specific RF1 or RF2 proteins are involved that catalyze the separation of the polypeptide chain from mRNA, as well as RF3, which causes dissociation of mRNA from the ribosome. RF1 recognizes in the A-site UAA or UAG; RF-2 - UAA or UGA.
This is preceded by an important event―the decision to stop protein synthesis with three stop codons (syhoms). The “solution” in this case is not an empty metaphor, but the result of the work of a nanobiocomputer, which probably a protein synthesizing system is [7]. It is the nanobiocomputer that analyzes the CONTEXT of mRNA sequences, and then, and only then, one of the three ambiguous syhom triplets (either stop, or amino acid) acquires the value of either stop or amino acid.
Why so? Imagine that stop functions belong to some codons-synonyms. Then the strategic function of analysis of the textual, semantic component of genes (mRNA) is lost. After all, synonyms strictly, unambiguously and redundantly encode amino acids, which follows from the invariance of natural native gene texts (mRNA). In contrast to the strict unambiguity of codons-synonyms, the stop-syhoms exist in mRNA in a ‘standby mode’ of meaning of mRNA (gene) context. Depending on context, a decision is made on the exact meaning of the ambiguous codon-syhom: to be the amino acid code and continue protein synthesis, or to stop, as it is meant to be a stop codon.
6. Discussion
The study presents a logically non-contradictory idea that ribosomes (or the entire protein-synthesizing-system) “choose” necessary amino acids and stop positions: When the ribosome traverses non-synonymous codons (syhom-codons), it actually reads and considers the meanings of mRNA contexts. A choice is made between two similar tRNA anticodons, which carry different amino acids. These anticodons are recognized, considered and selected by the complex “ribosome + syhom-codons within the mRNA context”, based on the meaning of the mRNA context. The choice of the semantics of the syhom-codons and, respectively, one of the two tRNAs, carrying two different amino acids, or alternatively the option of an amino acid or stop signal, occurs due to the semantic orientation of the ribosome within the mRNA contexts, functioning as a nanobiocomputer. It might seem that this contradicts the genetics canon about unambiguous genetic coding of all amino acids. However, this “choice” does not negate the correct key thesis about unambiguous amino acid coding during proteins biosynthesis. The apparent contradiction is removed by a special function of the third nucleotide in the 32 non-synonymous syhom-codons, the strategic importance of this fact is that protein coding is passed into governance by the laws of linguistics, previously unknown in relation to genome operation. The function of the third nucleotide in 32 syhom-codons is a distinctive semantic marking of synonymous and synonymously-homonymous (syhoms) triplet-nucleotide-families involved in proteins biosynthesis. This process involves linguistic (human speech-like) laws for constructing protein texts (speech) from amino acid letters. Such understanding of genome operation now ceases to be metaphorical and acquires an exact meaning, based not only on pure logic, but also on experimental proof of the complex mixed semantic duality of syhom-codons [4].
The metaphor “choice” of amino acids and stop positions in protein biosynthesis ceases to be a metaphor and becomes one of the scientific facts of more developed Mendelian genetics and molecular biology. The role of the third (3’) nucleotide in syhom-codons during protein biosynthesis is based on theoretical analysis [1] [2] [3] and experimental work [4]. Its role is seen from a broader view compared to existing understanding. Third (3’) nucleotide functionally and symmetrically divides the codons into 32 synonyms and 32 syhom-codon families. Wherein, syhom-codons have a strategic function to participate in activation of nonlocal nanobiocomputer ribosomal analysis of mRNA as a real context in the mRNA language. Such an analysis is a natural necessity for selection of one amino acid from two different amino acids or between an amino acid and a stop position in a situation where a ribosome traverses syhom-codons which have a function of double-coding. This was theoretically substantiated earlier [1] [2] [3]. Experimental work [4] confirmed this theory: It was demonstrated that two different amino acids, selenocysteine and cysteine, are coded by a single UGA-syhom-codon for Euplotes crassus, which to a certain extent, is principally applicable to the human genome. This result does not call into question the dogma of unambiguous coding of amino acids and stop positions by the cells genomes, but it requires us to introduce some significant corrections into the long-accepted and uncontestable model of genetic coding. These amendments are based on a broadened understanding of the special linguistic (semantic) role of the third (3’) nucleotide in codons and on the acceptance of the idea of real rather than metaphorical textuality of protein genes. Recognition of the speech-like nature of genes (mRNA) and the role of the codon’s third (3’) nucleotide in this process leads to a simple statement about the quasi-intelligence (biocomputing) of the protein-synthesizing-system and its ability to consider the specific (actual) mRNA context (meaning) for the decision-making choice between amino acids and stops in syhom situations, based on gene text (mRNA) meaning.
Why are the additional characteristics of the protein code proposed here more pragmatic than M. Nirenberg’s and F. Crick’s code model [8] that is tactically correct, but strategically incomplete? And why were the attempts to see more within the code than its creators unsuccessful? These attempts were made in the works of Lagerkvist [6] and Rumer [10]. Lagerkvist was mistaken, believing that mixed2 codons (syhoms, according to new terminology) appear with low probability in mRNA. Rumer saw symmetry in the genetic code, classifying it according to the strength of codon-anticodon hydrogen bonds, which is quite close to the division of codon families into synonyms and the syhoms. However, they did not see that the synthesis of proteins would be correct if the codons were functionally divided into two mutually complementary symmetric groups, as it actually is. One of them, synonymous, provides the accuracy and redundancy of amino acid coding. The other, syhoms, provides flexibility and adaptability of synthesized proteins to environmental changes, due to changes in the amino acid composition and sequences of synthesized proteins. This is the wisdom of protein code.
For this representation of the genetic code model, the publication of Lolle et al., about the recurrent genetics of some plants, is worth reviewing [11]. This study demonstrated that there are no differences in the DNA sequences of the wild-type of Ler gene and the HTH gene of the mutant Arabidopsis thaliana plant, which are responsible for the direct relationship between the biological properties of the cuticle, cell adhesion and reproduction of Arabidopsis. The authors write: “In every case, the sequence of the reverted HTH allele matched the Ler wild-type sequence exactly” (In each case, the sequence of the returned HTH allele corresponded exactly to the sequence of the wild type Ler). This means that Lolle and Pruitt found the effect of a return to part of the ancestral genetics of Arabidopsis. This fact is fantastic because the returned “wild” gene and the mutant gene are identical in sequences, which is inexplicable in Mendelian genetics. But this can be explained from the standpoint of linguistic-wave genetics. Why does the same gene manifest in different phenotypes?
To obtain an answer within the framework of the considered amendments of the protein code model, it is necessary to check the collinearity of mRNAs and their protein products in wild and mutant genes. It can be predicted that the amino acid sequences of the products of these genes will be different. Amino acid sequences will differ in amino acid composition, since adjacent DNA sequences from the 3’ and 5’ ends of the wild and mutant genes are different, which results in context variations and, hence, variations of meanings for the same codons in mRNA of the wild and mutant genes. The authors write that a high level of reversion from mutant to the wild type at the nucleotide level, was an exact duplicate of the wild-type gene observed in previous generations. Unfortunately, the nucleotide sequences given by the authors of the wild and mutant coding regions of the genome are not divided by codons. But the other point is obvious: Adjacent DNA sequences from the 3 'and 5' ends of both genes are different, hence, the contextual content of both their mRNAs is different. This allows to predict different amino acid sequences of the protein products of both “pseudo identical” genes and, naturally, the different morphogenesis of the plant regions encoded by these genes.
A detailed analysis of the work by Lolle et al. [11] , together with the study of Turanov et al. [4] , are interesting, since their main results encourage geneticists to research genetic protein coding strategies further. As you can see, much more needs to be clarified. This new understanding in genetics facilitates the anticipation of possible faults in recombinant technologies of artificial hybridization of various genes. Such artificial hybridization may lead to semantic uncertainty at the level of mRNA meanings, which determine the choice and accuracy of amino acid and stop position coding by syhom-codons. The paradox of the situation in genetics is that over the 50 years of existence of the protein code model, it has never been checked on a large-scale: on hundreds of proteins, with all the statistics, and “proteins - mRNA codons” collinearity. If within the standard code table, any inconsistences for E. coli proteins are found, then, this would not deny the code model of M. Nirenberg and F. Crick. This would mean that the principles of genetic coding of proteins, especially in a linguistic, quasi-speech direction, are unlimited.
NOTES
1Translator’s note: This is an English equivalent by H.A.Gleason to the original Russian example given by Acad. V.S. Shcherba “Глокая куздра штеко будланула бокра и курдячит бокрёнка”.
2The term “mixed” was introduced by Lagerkvist.