Translational Selection Is Ubiquitous in Prokaryotes
Codon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed) ribosomal protein genes by their codon frequencies. Unlike previous reports, we show evidence that translational selection in prokaryotes is practically universal: in 460 of 461 examined microbial genomes, we find that a subset of genes shows a higher codon usage similarity to the ribosomal proteins than would be expected from the local sequence composition. These genes constitute a substantial part of the genome—between 5% and 33%, depending on genome size—while also exhibiting higher experimentally measured mRNA abundances and tending toward codons that match tRNA anticodons by canonical base pairing. Certain gene functional categories are generally enriched with, or depleted of codon-optimized genes, the trends of enrichment/depletion being conserved between Archaea and Bacteria. Prominent exceptions from these trends might indicate genes with alternative physiological roles; we speculate on specific examples related to detoxication of oxygen radicals and ammonia and to possible misannotations of asparaginyl–tRNA synthetases. Since the presence of codon optimizations on genes is a valid proxy for expression levels in fully sequenced genomes, we provide an example of an “adaptome” by highlighting gene functions with expression levels elevated specifically in thermophilic Bacteria and Archaea.
Vyšlo v časopise:
Translational Selection Is Ubiquitous in Prokaryotes. PLoS Genet 6(6): e32767. doi:10.1371/journal.pgen.1001004
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1001004
Souhrn
Codon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed) ribosomal protein genes by their codon frequencies. Unlike previous reports, we show evidence that translational selection in prokaryotes is practically universal: in 460 of 461 examined microbial genomes, we find that a subset of genes shows a higher codon usage similarity to the ribosomal proteins than would be expected from the local sequence composition. These genes constitute a substantial part of the genome—between 5% and 33%, depending on genome size—while also exhibiting higher experimentally measured mRNA abundances and tending toward codons that match tRNA anticodons by canonical base pairing. Certain gene functional categories are generally enriched with, or depleted of codon-optimized genes, the trends of enrichment/depletion being conserved between Archaea and Bacteria. Prominent exceptions from these trends might indicate genes with alternative physiological roles; we speculate on specific examples related to detoxication of oxygen radicals and ammonia and to possible misannotations of asparaginyl–tRNA synthetases. Since the presence of codon optimizations on genes is a valid proxy for expression levels in fully sequenced genomes, we provide an example of an “adaptome” by highlighting gene functions with expression levels elevated specifically in thermophilic Bacteria and Archaea.
Zdroje
1. ChenSL
LeeW
HottesAK
ShapiroL
McAdamsHH
2004 Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S A 101 3480 3485
2. KnightRD
FreelandSJ
LandweberLF
2001 A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol 2 RESEARCH0010
3. DaubinV
PerriereG
2003 G+C3 structuring along the genome: a common feature in prokaryotes. Mol Biol Evol 20 471 483
4. LobryJR
SueokaN
2002 Asymmetric directional mutation pressures in bacteria. Genome Biol 3 RESEARCH0058
5. RochaEP
DanchinA
2002 Base composition bias might result from competition for metabolic resources. Trends Genet 18 291 294
6. ZeldovichKB
BerezovskyIN
ShakhnovichEI
2007 Protein and DNA sequence determinants of thermophilic adaptation. PLoS Comput Biol 3 e5 doi:10.1371/journal.pcbi.0030005
7. RochaEP
2004 The replication-related organization of bacterial genomes. Microbiology 150 1609 1627
8. DethlefsenL
SchmidtTM
2005 Differences in codon bias cannot explain differences in translational power among microbes. BMC Bioinformatics 6 3
9. KanayaS
YamadaY
KudoY
IkemuraT
1999 Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis. Gene 238 143 155
10. XiaX
1998 How optimized is the translational machinery in Escherichia coli, Salmonella typhimurium and Saccharomyces cerevisiae? Genetics 149 37 44
11. StoletzkiN
Eyre-WalkerA
2007 Synonymous codon usage in Escherichia coli: selection for translational accuracy. Mol Biol Evol 24 374 381
12. NajafabadiHS
LehmannJ
OmidiM
2007 Error minimization explains the codon usage of highly expressed genes in Escherichia coli. Gene 387 150 155
13. DittmarKA
SorensenMA
ElfJ
EhrenbergM
PanT
2005 Selective charging of tRNA isoacceptors induced by amino-acid starvation. EMBO Rep 6 151 157
14. OresicM
ShallowayD
1998 Specific correlations between relative synonymous codon usage and protein secondary structure. J Mol Biol 281 31 48
15. Kimchi-SarfatyC
OhJM
KimIW
SaunaZE
CalcagnoAM
2007 A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science 315 525 528
16. CarboneA
2006 Computational prediction of genomic functional cores specific to different microbes. J Mol Evol 63 733 746
17. McInerneyJO
1998 Replicational and transcriptional selection on codon usage in Borrelia burgdorferi. Proc Natl Acad Sci U S A 95 10698 10703
18. LafayB
AthertonJC
SharpPM
2000 Absence of translationally selected synonymous codon usage bias in Helicobacter pylori. Microbiology 146 851 860
19. RispeC
DelmotteF
van HamRC
MoyaA
2004 Mutational and selective pressures on codon and amino acid usage in Buchnera, endosymbiotic bacteria of aphids. Genome Res 14 44 53
20. HerbeckJT
WallDP
WernegreenJJ
2003 Gene expression level influences amino acid usage, but not codon usage, in the tsetse fly endosymbiont Wigglesworthia. Microbiology 149 2585 2596
21. BanerjeeT
BasakS
GuptaSK
GhoshTC
2004 Evolutionary forces in shaping the codon and amino acid usages in Blochmannia floridanus. J Biomol Struct Dyn 22 13 23
22. CharlesH
CalevroF
VinuelasJ
FayardJM
RahbeY
2006 Codon usage bias and tRNA over-expression in Buchnera aphidicola after aromatic amino acid nutritional stress on its host Acyrthosiphon pisum. Nucleic Acids Res 34 4583 4592
23. dos ReisM
SavvaR
WernischL
2004 Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res 32 5036 5044
24. CarboneA
KepesF
ZinovyevA
2005 Codon bias signatures, organization of microorganisms in codon space, and lifestyle. Mol Biol Evol 22 547 561
25. SharpPM
BailesE
GrocockRJ
PedenJF
SockettRE
2005 Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res 33 1141 1153
26. SupekF
VlahovicekK
2004 INCA: synonymous codon usage analysis and clustering by means of self-organizing map. Bioinformatics 20 2329 2330
27. SharpPM
LiWH
1987 The codon Adaptation Index–a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15 1281 1295
28. CarboneA
MaddenR
2005 Insights on the evolution of metabolic networks of unicellular translationally biased organisms from transcriptomic data and sequence analysis. J Mol Evol 61 456 469
29. MrazekJ
SpormannAM
KarlinS
2006 Genomic comparisons among gamma-proteobacteria. Environ Microbiol 8 273 288
30. PerriereG
ThioulouseJ
2002 Use and misuse of correspondence analysis in codon usage studies. Nucleic Acids Res 30 4548 4555
31. SuzukiH
SaitoR
TomitaM
2005 A problem in multivariate analysis of codon usage data and a possible solution. FEBS Lett 579 6499 6504
32. BreimanL
2001 Random forests. Machine Learning 45 5 32
33. KarlinS
MrazekJ
2000 Predicted highly expressed genes of diverse prokaryotic genomes. J Bacteriol 182 5238 5250
34. SupekF
VlahovicekK
2005 Comparison of codon usage measures and their applicability in prediction of microbial gene expressivity. BMC Bioinformatics 6 182
35. GrocockRJ
SharpPM
2002 Synonymous codon usage in Pseudomonas aeruginosa PA01. Gene 289 131 139
36. WeinerRM
TaylorLE2nd
HenrissatB
HauserL
LandM
2008 Complete genome sequence of the complex carbohydrate-degrading marine bacterium, Saccharophagus degradans strain 2-40 T. PLoS Genet 4 e1000087 doi:10.1371/journal.pgen.1000087
37. LawrenceJG
OchmanH
1997 Amelioration of bacterial genomes: rates of change and exchange. J Mol Evol 44 383 397
38. KarlinS
1998 Global dinucleotide signatures and analysis of genomic heterogeneity. Curr Opin Microbiol 1 598 610
39. ReschAM
CarmelL
Marino-RamirezL
OgurtsovAY
ShabalinaSA
2007 Widespread positive selection in synonymous sites of mammalian genes. Mol Biol Evol 24 1821 1831
40. KarlinS
BrocchieriL
CampbellA
CyertM
MrazekJ
2005 Genomic and proteomic comparisons between bacterial and archaeal genomes and related comparisons with the yeast and fly genomes. Proc Natl Acad Sci U S A 102 7309 7314
41. ParmleyJL
HuynenMA
2009 Clustering of codons with rare cognate tRNAs in human genes suggests an extra level of expression regulation. PLoS Genet 5 e1000548 doi:10.1371/journal.pgen.1000548
42. MarioniJC
MasonCE
ManeSM
StephensM
GiladY
2008 RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18 1509 1517
43. NeuhauserM
SenskeR
2004 The Baumgartner-Weiss-Schindler test for the detection of differentially expressed genes in replicated microarray experiments. Bioinformatics 20 3553 3564
44. WagnerA
2000 Inferring lifestyle from gene expression patterns. Mol Biol Evol 17 1985 1987
45. BennetzenJL
HallBD
1982 Codon selection in yeast. J Biol Chem 257 3026 3031
46. ChanPP
LoweTM
2009 GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res 37 D93 97
47. RozenskiJ
CrainPF
McCloskeyJA
1999 The RNA Modification Database: 1999 update. Nucleic Acids Res 27 196 197
48. AgrisPF
2004 Decoding the genome: a modified view. Nucleic Acids Res 32 223 238
49. AgrisPF
VendeixFA
GrahamWD
2007 tRNA's wobble decoding of the genome: 40 years of modification. J Mol Biol 366 1 13
50. MeierF
SuterB
GrosjeanH
KeithG
KubliE
1985 Queuosine modification of the wobble base in tRNAHis influences ‘in vivo’ decoding properties. EMBO J 4 823 827
51. KrugerMK
PedersenS
HagervallTG
SorensenMA
1998 The modification of the wobble base of tRNAGlu modulates the translation rate of glutamic acid codons in vivo. J Mol Biol 284 621 631
52. GrosjeanH
de Crecy-LagardV
MarckC
2009 Deciphering synonymous codons in the three domains of life: Co-evolution with specific tRNA modification enzymes. FEBS Lett
53. HershbergR
PetrovDA
2009 General rules for optimal codon choice. PLoS Genet 5 e1000556 doi:10.1371/journal.pgen.1000556
54. KooninEV
WolfYI
2008 Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res 36 6688 6719
55. RaneaJA
GrantA
ThorntonJM
OrengoCA
2005 Microeconomic principles explain an optimal genome size in bacteria. Trends Genet 21 21 25
56. RochaEP
2004 Codon usage bias from tRNA's point of view: redundancy, specialization, and efficient decoding for translation optimization. Genome Res 14 2279 2286
57. KarlinS
MrazekJ
MaJ
BrocchieriL
2005 Predicted highly expressed genes in archaeal genomes. Proc Natl Acad Sci U S A 102 7303 7308
58. KanayaS
YamadaY
KinouchiM
KudoY
IkemuraT
2001 Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis. J Mol Evol 53 290 298
59. IshihamaY
SchmidtT
RappsilberJ
MannM
HartlFU
2008 Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics 9 102
60. RoyH
BeckerHD
ReinboltJ
KernD
2003 When contemporary aminoacyl-tRNA synthetases invent their cognate amino acid metabolism. Proc Natl Acad Sci U S A 100 9837 9842
61. HessDC
LuW
RabinowitzJD
BotsteinD
2006 Ammonium toxicity and potassium limitation in yeast. PLoS Biol 4 e351 doi:10.1371/journal.pbio.0040351
62. SeaverLC
ImlayJA
2001 Alkyl hydroperoxide reductase is the primary scavenger of endogenous hydrogen peroxide in Escherichia coli. J Bacteriol 183 7173 7181
63. GlyakinaAV
GarbuzynskiySO
LobanovMY
GalzitskayaOV
2007 Different packing of external residues can explain differences in the thermostability of proteins from thermophilic and mesophilic organisms. Bioinformatics 23 2231 2238
64. MizuguchiK
SeleM
CubellisMV
2007 Environment specific substitution tables for thermophilic proteins. BMC Bioinformatics 8 Suppl 1 S15
65. D'AmicoS
CollinsT
MarxJC
FellerG
GerdayC
2006 Psychrophilic microorganisms: challenges for life. EMBO Rep 7 385 389
66. ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
67. SelengutJD
HaftDH
DavidsenT
GanapathyA
Gwinn-GiglioM
2007 TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res 35 D260 264
68. http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi
69. FawcettT
2006 An introduction to ROC analysis. Pattern Recognition Letters 27 861 874
70. http://fast-random-forest.googlecode.com/
71. LuP
VogelC
WangR
YaoX
MarcotteEM
2007 Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol 25 117 124
72. McDonaldJH
2008 Sign test. Handbook of Biological Statistics. Baltimore Sparky House Publishing 185 189
73. NadeauC
BengioY
2003 Inference for the generalization error. Machine Learning 52 239 281
74. ChenK
RobertsE
Luthey-SchultenZ
2009 Horizontal gene transfer of zinc and non-zinc forms of bacterial ribosomal protein S4. BMC Evol Biol 9 179
75. MolinaN
van NimwegenE
2008 Universal patterns of purifying selection at noncoding positions in bacteria. Genome Res 18 148 160
76. LangilleMG
BrinkmanFS
2009 IslandViewer: an integrated interface for computational identification and visualization of genomic islands. Bioinformatics 25 664 665
77. McDonaldJH
2008 Fisher's exact test of independence. Handbook of Biological Statistics. Baltimore Sparky House Publishing 64 68
78. AshburnerM
BallCA
BlakeJA
BotsteinD
ButlerH
2000 Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25 25 29
79. TatusovRL
FedorovaND
JacksonJD
JacobsAR
KiryutinB
2003 The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4 41
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2010 Číslo 6
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- The IG-DMR and the -DMR at Human Chromosome 14q32.2: Hierarchical Interaction and Distinct Functional Properties as Imprinting Control Centers
- Amplification of a Cytochrome P450 Gene Is Associated with Resistance to Neonicotinoid Insecticides in the Aphid
- Copy Number Variation and Transposable Elements Feature in Recent, Ongoing Adaptation at the Locus
- Understanding Adaptation in Large Populations