Minimal Secondary Structure Formation on mRNAs with a Shine-Dalgarno Sequence for Chromosomal Genes in Rhodobacter sphaeroides ()
1. Introduction
Translation initiation, a rate-limiting step in protein biosynthesis, involves the recognition, attachment, and adaptation of the mRNA to the 30S subunit of the ribosome [1]. Messenger RNA recognition is facilitated by the non-random distribution of purines about 5 - 10 nucleotides upstream the start codon [2] [3]. This purine-rich sequence (typically 3 - 6 nucleotides long), known as the Shine-Dalgarno (SD) sequence, is also complementary to a conserved region at the 3’ end of the 16S rRNA located in the platform of the 30S subunit [4] [5]. By complementary base pairing between the 16S rRNA and mRNA, the mRNA is attached to the 30S platform. Trans-acting initiation factors (IF1, IF2, and IF3) and ribosomal proteins mediate this attachment to the small subunit of the ribosome and help to unfold the mRNA for its accommodation in the channel of the ribosome. Although mutations in SD have been shown to alter protein expression levels up to 250-fold, SD itself is not obligatory for translation of some genes, e.g. rpsA in Escherichia coli [5]. In some of the cases where there is no complementarity between 16S rRNA and the sequence upstream of the mRNA start site, it has been shown that ribosomal protein S1 interacts with AU-rich regions to facilitate translation initiation [3].
A recent study in Lactococcus lactis revealed that in cases where mRNA-16S rRNA and/or mRNA-ribosomal protein interaction is absent, mRNA stability, or its lack thereof, contributes significantly in translation initiation efficiency [2] [6]. Hence, analyzing mRNA secondary structure is critical in understanding translation initiation, as the formation of highly stable hairpin structures around a start codon could not only occlude translation from that codon, but also drive translation from a weaker start codon with less secondary structure interference downstream [7] [8] [9] [10] [11]. Since SD serves as a recognition signal for the selection of the right reading frame for translation, it is expected that this sequence is somewhat sensitive to secondary structure formation.
A study of mRNA stability across alphaproteobacterial, gammaproteobacterial, cyanobacterial, plastid, metazoan mitochondrial, fungal mitochondrial and plant mitochondrial genomes was previously performed [12], and the results of randomly sampled 5000 genes from each group revealed that, on average, mRNAs without SD have less secondary structure than mRNAs with SD in organisms where SD-dependent and SD-independent translation coexist [12]. Furthermore, in these organisms, mRNAs with and without SD generally have minimal secondary structure around the start codon, compared to the upstream and downstream regions of the start codon. The secondary structure analysis was based on predicting minimum free energy (MFE) of mRNAs with RNAfold function in the Vienna package [13], which is publicly available.
The aforementioned organism-specific studies which have been done, highlight the possible influence of SD-16S rRNA interaction in minimizing secondary structure formation to promote translation initiation in unipartite genomes. Our study seeks to assess the influence of SD on the secondary structure formation for mRNAs in multi-partite genomes using Rhodobacter sphaeroides, with two chromosomes and five plasmids, as a model. Two hypotheses were tested: 1) mRNAs with SD and mRNAs without SD retain similar stability in chromosomes and plasmids, and 2) secondary structure around the start codon is minimized for mRNAs with SD and not for mRNAs without SD.
2. Materials and Methods
A total of 3579 protein-coding gene sequences of R. sphaeroides were sampled from the National Center for Biotechnology Information (NCBI) database, and then analyzed using the Bayesian estimation method below [14] [15].
(1)
About 70% of these genes were predicted to be organized in gene-operons. The 27 SD motifs (Table 1) classified by Prodigal [16] were searched for, and only 19 ribosomal binding site (RBS) motifs were identified in the R. sphaeroides’ genome.
Generally, given that the set of non-overlapping secondary structures, P, for a sequence, S, follows a Boltzmann Distribution and L is the length of the sequence, wherein,
(2)
then probability of a base pair (i, j) for S is given below [17] [18]:
(3)
Minimum free energy (MFE) was used as a measure of secondary structure formation as previously described [12] [19]. MFE value is computed by adding up energy contributions of two consecutive base pairs according to nearest-neighbor-pairing rules [19] [20]. The RNAfold function in Matlab Bioinformatics Toolbox [21] [22], which implements the Turner energy model [19] [23] [24], was used to compute the MFE values in this study. The RNAfold in Matlab incorporated some sequence-dependent adjustments in thermodynamic parameters to improve free energy minimization for RNA structure prediction [21].
This revised function performs better sequence knowledge-based computations of MFE, and a low MFE value for an input sequence indicates that the sequence is stable [25] [26] [27]. Furthermore, less secondary structure around the start codon and the SD of mRNAs would suggest that both the accessibility of the start codon and the exposure of the SD sequence for complementary pairing with 16S rRNA might be necessary for efficient translation initiation [8] [11] [28].
No known alternative or Non-SD binding motifs were found in the search scheme. Sequence patterns with a random distribution of nucleotides (no consensus SD) were represented as bin 0. Experimentally validated MFE values at T = 37 were obtained for each input DNA sequence from the NCBI database, using RNAFold algorithm [13] [25] [26] in Matlab Bioinformatics Toolbox [21] [22].
Table 1. RBS Motifs classified by Prodigal.
a. Tablewas adapted with permission from Hyatt et al., 2010 [28], under license http://creativecommons.org/licenses/by/2.0/legalcode. Changes were made to original table to incorporate depiction of Non-SD RBS as Bin Number = −1. An “x” in the middle of a motif indicates a mismatch is allowed. The rightmost column shows the spacer distance allowed between the translation start and the motif. The bin number on the leftmost column indicates the initial “score” assigned by prodigal to the RBS motif in the first iteration.
These energy values were computed by applying dynamic programming, and the corresponding mRNA structures and mountain plots were deduced. Genes were separated based on location on specific chromosome and plasmids, and a sliding window (spanning 50 nucleotides) analysis was performed on the region of 200 nucleotides, −100 to +100, on the mRNA. The following constraints were implemented in the probabilistic determination of base pairing: 1) One nucleotide can be paired to at most one other nucleotide; 2) the smallest number of unpaired nucleotides in the loop is three [29]. Although these requirements may not be biologically relevant (indicative of the formation of pseudoknots), they make identification of secondary structure more realistic and probable [30] [31].
A Kruskal-Wallis rank sum test, with adjustments for tied ranks, was performed to evaluate statistically significant differences in distribution of MFE values within each of the two chromosomes and five plasmids. Post-hoc analyses were completed using the Mann-Whitney and Kruskal-Wallis tests to evaluate any pair-wise differences, with Bonferroni correction for Type-1 error, among different regions upstream and downstream the start codon.
3. Results and Discussion
3.1. Chromosomal Genes with SD Have mRNAs with Less Secondary Structure
Figure 1 and Figure 2 show the distribution of MFE values obtained for genes located in chromosome 1 and chromosome 2, respectively. MFE values for genes with SD are significantly different from values for genes without SD (P < 0.001) in both chromosomes. Furthermore, comparing genes with SD to those without SD (bin 0) reveals a nonrandom distribution of median MFE values for these genes, wherein median MFE values for bin 0 is lower than medians for most of the other bin numbers. This suggests that mRNAs with SD form less stable secondary structures in comparison to those without SD for genes located in chromosomes 1 and 2.
Figure 1. mRNA stability for genes located in Chromosome 1. Dashed line indicates a baseline comparison of median MFE values of other bins with median MFE value of bin 0. Here, mRNAs without SD (bin 0) have a relatively higher stability (P < 0.001, χ2 = 83.56) [Kruskal Wallis Test].
Figure 2. mRNA stability for genes located in Chromosome 2. Dashed line indicates a baseline comparison of median MFE values of other bins with median MFE value of bin 0. Here, mRNAs without SD (bin 0) have a relatively higher stability (P < 0.001, χ2 = 52.846) [Kruskal Wallis Test].
3.2. The Role of SD Is Independent of mRNA Secondary Structure in Plasmid-Encoded Genes
Figure 3 shows the combined distribution of MFE values for genes located in the plasmids. Performing an MFE value comparison parallel to that of the two chromosomes, reveals a random distribution of medians for all the bins, including bin 0. Moreover, MFE values for all genes are not significantly different from each other for Plasmid A (P = 0.088), Plasmid B (P = 0.148), Plasmid C (P = 0.341), Plasmid D (P = 0.186) and Plasmid E (P = 0.644). This then suggests that in plasmids, there is no apparent difference in stability for mRNAs with SD and those lacking SD. This indicates less efficient translation of transcripts for plasmid genes compared to those of chromosomal genes. However, since these endogenous plasmids exist in multiple copies in the cell, loss of translation efficiency may be compensated by the overabundance of transcripts available for protein synthesis [32].
3.3. The Impact of SD on mRNA Stability around the Start Codon Is Influenced by Intrinsic Genome Composition
Sliding window analysis also revealed that mRNAs with SD are less stable than those without SD for genes on chromosomes 1 and 2 (Figure 4 and Figure 5, respectively), refuting the first hypothesis that mRNAs with SD and mRNAs without SD retain similar stability; although no statistically significant difference is seen for the plasmids, especially for regions upstream the start codon. Furthermore, a pronounced maximum mRNA instability around the start codon is only seen for genes located in chromosome 2 (Figure 5). A similar RNA stability is maintained for genes in chromosome 1 and plasmids. The variability seen in free energy for genes in Figure 6 is because of the high standard deviation in means for some of the plasmids. Nonetheless, the second hypothesis that secondary structure around the start codon is minimized for mRNAs with SD and
Figure 3. mRNA stability for genes located in plasmids. Dashed line indicates a baseline comparison of median MFE values of other bins with median MFE value of bin 0. No statistically significant difference is seen in mRNA stability. Plasmid A (P = 0.088,
= 19.003), Plasmid B (P = 0.148,
= 18.25), Plasmid C (P = 0.341,
= 13.393), Plasmid D (P = 0.186,
= 14.918), Plasmid E (P = 0.644,
= 3.365) [Kruskal Wallis Test].
Figure 4. Sliding window analysis of mRNA stability for genes in Chromosome 1. The arrow indicates the position of the start codon. A cartoon depiction of secondary structure formation for mRNAs with SD is also shown on the graph.
Figure 5. Sliding window analysis of mRNA stability for genes in Chromosome 2. The arrow indicates the position of the start codon. A cartoon depiction of secondary structure formation for mRNAs with SD is also shown on the graph.
Figure 6. Sliding window analysis of mRNA stability for genes on Plasmids A, B, C, D and E. The arrow indicates the position of the start codon. A cartoon depiction of secondary structure formation for mRNAs with SD and mRNAs without SD are also shown on the graph.
not for mRNAs without SD is refuted. This indicates that SD is only sensitive to secondary structure formation globally on the mRNA, and that the influence of SD on mRNA free energy is organism-specific, and possibly influenced by intrinsic genome composition.
Even though one would expect a less stable initiation region on the mRNA, it is possible that mRNA instability has an adverse effect on translation as it reduces the half-life of the mRNA [33]. Therefore, a tradeoff between mRNA stability and start codon accessibility might come into play, especially for essential genes on chromosome 1 that are retained mostly in a single copy. This then highlights the possibility of the contribution of factors, other than the presence of SD and mRNA stability (for start codon accessibility), like protein stability, codon bias and GC content, in determining translation efficiency [9] [34] [35] [36].
4. Conclusion
In summary, our work on R. sphaeroides has shown a possible underlying influence of organism specificity on mRNA stability in SD-dependent and SD-independent translation systems. In R. sphaeroides, Chromosomes 1 and 2, which mostly exist in single copies, contain less stable mRNAs in SD-dependent initiation system, with the premise that the presence of SD implicates its use in driving translation of the mRNA. This is not the case for the plasmids which exist in multiple copies, wherein, mRNA stability is not significantly different for both SD-dependent and SD-independent translation systems. Further analyses of mRNA stability around the start codon also show replicon-specific formation of secondary structure for both mRNAs with SD and those without SD. Future efforts could, therefore, be directed at elucidating the effects of intrinsic genomic features other than the presence of SD and mRNA secondary structure in order to assess the efficiency of translation in bacteria.
Authors’ Contributions
DO read the literature, organized the data, generated figures, performed statistical tests, and drafted manuscript. HC generated MFE values from Matlab bioinformatics toolbox, provided much insight on how to best graph/display data, and helped to draft the manuscript. MC proposed the study, guided the researchers performing the data mining and analyses, provided much insight on how to best display data and interpret outcomes, and helped to draft the manuscript. All authors have read and approved the final manuscript.
Authors’ Information
DO was a graduate student in the Department of Biological Sciences at Sam Houston State University when this work was completed and is currently a medical student at The University of Calgary Cumming School of Medicine. HC is a Professor in the Department of Computer Science at Sam Houston State University. MC is a Professor in the Department of Biological Sciences at Sam Houston State University.
Acknowledgements
We would like to thank the Department of Biological Sciences at Sam Houston State University for their support during the completion of this project.