Emergence, Retention and Selection: A Trilogy of Origination for Functional Proteins from Ancestral LncRNAs in Primates
Although gene duplication has been believed as a predominant mechanism for creating new genes, recent reports suggested that new proteins could evolve “de novo” from non-coding DNA regions. These de novo genes are also named as “motherless” genes due to their lack of ancestral proteins as precursors, while recently we and others found that lncRNAs may represent an intermediate stage of their origination. To further elucidate this lncRNA-protein transition process, here we identified 64 hominoid-specific de novo genes and report a new mechanism for the origination of functional de novo proteins from ancestral non-coding transcripts: These non-coding “precursors” are generally not more selectively constrained than other lncRNA loci; and the existence of these de novo proteins is not beyond anticipation under neutral expectation; however, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution.
Vyšlo v časopise:
Emergence, Retention and Selection: A Trilogy of Origination for Functional Proteins from Ancestral LncRNAs in Primates. PLoS Genet 11(7): e32767. doi:10.1371/journal.pgen.1005391
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1005391
Souhrn
Although gene duplication has been believed as a predominant mechanism for creating new genes, recent reports suggested that new proteins could evolve “de novo” from non-coding DNA regions. These de novo genes are also named as “motherless” genes due to their lack of ancestral proteins as precursors, while recently we and others found that lncRNAs may represent an intermediate stage of their origination. To further elucidate this lncRNA-protein transition process, here we identified 64 hominoid-specific de novo genes and report a new mechanism for the origination of functional de novo proteins from ancestral non-coding transcripts: These non-coding “precursors” are generally not more selectively constrained than other lncRNA loci; and the existence of these de novo proteins is not beyond anticipation under neutral expectation; however, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution.
Zdroje
1. Jacob F (1977) Evolution and tinkering. Science 196: 1161–1166. 860134
2. Zhou Q, Zhang G, Zhang Y, Xu S, Zhao R, et al. (2008) On the origin of new genes in Drosophila. Genome Res 18: 1446–1455. doi: 10.1101/gr.076588.108 18550802
3. Long M, Betran E, Thornton K, Wang W (2003) The origin of new genes: glimpses from the young and old. Nat Rev Genet 4: 865–875. 14634634
4. Begun DJ, Lindfors HA, Kern AD, Jones CD (2007) Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176: 1131–1137. 17435230
5. Begun DJ, Lindfors HA, Thompson ME, Holloway AK (2006) Recently evolved genes identified from Drosophila yakuba and D. erecta accessory gland expressed sequence tags. Genetics 172: 1675–1681. 16361246
6. Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ (2006) Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci U S A 103: 9935–9939. 16777968
7. Cai J, Zhao R, Jiang H, Wang W (2008) De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics 179: 487–496. doi: 10.1534/genetics.107.084491 18493065
8. Heinen TJ, Staubach F, Haming D, Tautz D (2009) Emergence of a new gene from an intergenic region. Curr Biol 19: 1527–1531. doi: 10.1016/j.cub.2009.07.049 19733073
9. Knowles DG, McLysaght A (2009) Recent de novo origin of human protein-coding genes. Genome Res 19: 1752–1759. doi: 10.1101/gr.095026.109 19726446
10. Toll-Riera M, Bosch N, Bellora N, Castelo R, Armengol L, et al. (2009) Origin of primate orphan genes: a comparative genomics approach. Mol Biol Evol 26: 603–612. doi: 10.1093/molbev/msn281 19064677
11. Li CY, Zhang Y, Wang Z, Zhang Y, Cao C, et al. (2010) A human-specific de novo protein-coding gene associated with human brain functions. PLoS Comput Biol 6: e1000734. doi: 10.1371/journal.pcbi.1000734 20376170
12. Li D, Dong Y, Jiang Y, Jiang H, Cai J, et al. (2010) A de novo originated gene depresses budding yeast mating pathway and is repressed by the protein encoded by its antisense strand. Cell Res 20: 408–420. doi: 10.1038/cr.2010.31 20195295
13. Wu DD, Irwin DM, Zhang YP (2011) De novo origin of human protein-coding genes. PLoS Genet 7: e1002379. doi: 10.1371/journal.pgen.1002379 22102831
14. Carvunis AR, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, et al. (2012) Proto-genes and de novo gene birth. Nature 487: 370–374. doi: 10.1038/nature11184 22722833
15. Sabath N, Wagner A, Karlin D (2012) Evolution of viral proteins originated de novo by overprinting. Mol Biol Evol 29: 3767–3780. doi: 10.1093/molbev/mss179 22821011
16. Xie C, Zhang YE, Chen JY, Liu CJ, Zhou WZ, et al. (2012) Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs. PLoS Genet 8: e1002942. doi: 10.1371/journal.pgen.1002942 23028352
17. Reinhardt JA, Wanjiru BM, Brant AT, Saelao P, Begun DJ, et al. (2013) De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences. PLoS Genet 9: e1003860. doi: 10.1371/journal.pgen.1003860 24146629
18. Zhao L, Saelao P, Jones CD, Begun DJ (2014) Origin and spread of de novo genes in Drosophila melanogaster populations. Science 343: 769–772. doi: 10.1126/science.1248286 24457212
19. Neme R, Tautz D (2014) Evolution: dynamics of de novo gene emergence. Curr Biol 24: R238–240. doi: 10.1016/j.cub.2014.02.016 24650912
20. Palmieri N, Kosiol C, Schlotterer C (2014) The life cycle of Drosophila orphan genes. Elife 3: e01311. doi: 10.7554/eLife.01311 24554240
21. Ruiz-Orera J, Messeguer X, Subirana JA, Alba MM (2014) Long non-coding RNAs as a source of new peptides. Elife 3: e03523. doi: 10.7554/eLife.03523 25233276
22. Chen S, Zhang YE, Long M (2010) New genes in Drosophila quickly become essential. Science 330: 1682–1685. doi: 10.1126/science.1196380 21164016
23. Charlesworth B (2009) Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat Rev Genet 10: 195–205. doi: 10.1038/nrg2526 19204717
24. Chen HY, Wen JM, Xiao XW, Li DJ, Guo XL, et al. (2010) [Expression of human testis development related gene 1 in testicular cancer detected by tissue microarray]. Zhonghua Nan Ke Xue 16: 883–886. 21243750
25. Gonzalez D, Else M, Wren D, Usai M, Buhl AM, et al. (2013) CLLU1 expression has prognostic value in chronic lymphocytic leukemia after first-line therapy in younger patients and in those with mutated IGHV genes. Haematologica 98: 274–278. doi: 10.3324/haematol.2012.070201 22899580
26. Moreaux J, Hose D, Bonnefond A, Reme T, Robert N, et al. (2010) MYEOV is a prognostic factor in multiple myeloma. Exp Hematol 38: 1189–1198 e1183. doi: 10.1016/j.exphem.2010.09.002 20854874
27. Moyers BA, Zhang J (2015) Phylostratigraphic bias creates spurious patterns of genome evolution. Mol Biol Evol 32: 258–267. doi: 10.1093/molbev/msu286 25312911
28. Vizcaino JA, Cote RG, Csordas A, Dianes JA, Fabregat A, et al. (2013) The Proteomics Identifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Research 41: D1063–D1069. doi: 10.1093/nar/gks1262 23203882
29. Kim MS, Pinto SM, Getnet D, Nirujogi RS, Manda SS, et al. (2014) A draft map of the human proteome. Nature 509: 575–+. doi: 10.1038/nature13302 24870542
30. Desiere F, Deutsch EW, King NL, Nesvizhskii AI, Mallick P, et al. (2006) The PeptideAtlas project. Nucleic Acids Research 34: D655–D658. 16381952
31. Pauli A, Valen E, Lin MF, Garber M, Vastenhouw NL, et al. (2012) Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis. Genome Res 22: 577–591. doi: 10.1101/gr.133009.111 22110045
32. Chen JY, Peng Z, Zhang R, Yang XZ, Tan BC, et al. (2014) RNA editome in rhesus macaque shaped by purifying selection. PLoS Genet 10: e1004274. doi: 10.1371/journal.pgen.1004274 24722121
33. Gardiner-Garden M, Frommer M (1987) CpG islands in vertebrate genomes. J Mol Biol 196: 261–282. 3656447
34. Liu Y, Han D, Han Y, Yan Z, Xie B, et al. (2011) Ab initio identification of transcription start sites in the Rhesus macaque genome by histone modification and RNA-Seq. Nucleic Acids Res 39: 1408–1418. doi: 10.1093/nar/gkq956 20952408
35. Quek XC, Thomson DW, Maag JL, Bartonicek N, Signal B, et al. (2015) lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res 43: D168–173. doi: 10.1093/nar/gku988 25332394
36. Fang X, Zhang Y, Zhang R, Yang L, Li M, et al. (2011) Genome sequence and global sequence variation map with 5.5 million SNPs in Chinese rhesus macaque. Genome Biol 12: R63. doi: 10.1186/gb-2011-12-7-r63 21733155
37. Yan G, Zhang G, Fang X, Zhang Y, Li C, et al. (2011) Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques. Nat Biotechnol 29: 1019–1023. doi: 10.1038/nbt.1992 22002653
38. Gokcumen O, Tischler V, Tica J, Zhu Q, Iskow RC, et al. (2013) Primate genome architecture influences structural variation mechanisms and functional consequences. Proc Natl Acad Sci U S A 110: 15764–15769. doi: 10.1073/pnas.1305904110 24014587
39. Oliver JL, Marin A (1996) A relationship between GC content and coding-sequence length. J Mol Evol 43: 216–223. 8703087
40. Cusack BP, Arndt PF, Duret L, Roest Crollius H (2011) Preventing dangerous nonsense: selection for robustness to transcriptional error in human genes. PLoS Genet 7: e1002276. doi: 10.1371/journal.pgen.1002276 22022272
41. Wu X, Sharp PA (2013) Divergent transcription: a driving force for new gene origination? Cell 155: 990–996. doi: 10.1016/j.cell.2013.10.048 24267885
42. Tautz D, Domazet-Loso T (2011) The evolutionary origin of orphan genes. Nat Rev Genet 12: 692–702. doi: 10.1038/nrg3053 21878963
43. Siepel A (2009) Darwinian alchemy: Human genes from noncoding DNA. Genome Res 19: 1693–1695. doi: 10.1101/gr.098376.109 19797681
44. Domazet-Loso T, Tautz D (2003) An evolutionary analysis of orphan genes in Drosophila. Genome Res 13: 2213–2219. 14525923
45. Neme R, Tautz D (2013) Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics 14: 117. doi: 10.1186/1471-2164-14-117 23433480
46. Chamary JV, Parmley JL, Hurst LD (2006) Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 7: 98–108. 16418745
47. Galtier N, Piganeau G, Mouchiroud D, Duret L (2001) GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics 159: 907–911. 11693127
48. Singh ND, Davis JC, Petrov DA (2005) Codon bias and noncoding GC content correlate negatively with recombination rate on the Drosophila X chromosome. J Mol Evol 61: 315–324. 16044248
49. Meunier J, Duret L (2004) Recombination drives the evolution of GC-content in the human genome. Mol Biol Evol 21: 984–990. 14963104
50. Slater GS, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6: 31. 15713233
51. Brawand D, Soumillon M, Necsulea A, Julien P, Csardi G, et al. (2011) The evolution of gene expression levels in mammalian organs. Nature 478: 343–348. doi: 10.1038/nature10532 22012392
52. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, et al. (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456: 470–476. doi: 10.1038/nature07509 18978772
53. Zhang SJ, Liu CJ, Yu P, Zhong X, Chen JY, et al. (2014) Evolutionary interrogation of human biology in well-annotated genomic framework of rhesus macaque. Mol Biol Evol 31: 1309–1324. doi: 10.1093/molbev/msu084 24577841
54. Deutsch EW (2010) The PeptideAtlas Project. Methods Mol Biol 604: 285–296. doi: 10.1007/978-1-60761-444-9_19 20013378
55. Wilhelm M, Schlegl J, Hahne H, Moghaddas Gholami A, Lieberenz M, et al. (2014) Mass-spectrometry-based draft of the human proteome. Nature 509: 582–587. doi: 10.1038/nature13319 24870543
56. Zhang SJ, Liu CJ, Shi M, Kong L, Chen JY, et al. (2013) RhesusBase: a knowledgebase for the monkey research community. Nucleic Acids Res 41: D892–905. doi: 10.1093/nar/gks835 22965133
57. Au KF, Sebastiano V, Afshar PT, Durruthy JD, Lee L, et al. (2013) Characterization of the human ESC transcriptome by hybrid sequencing. Proc Natl Acad Sci U S A 110: E4821–4830. doi: 10.1073/pnas.1320101110 24282307
58. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, et al. (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14: R36. doi: 10.1186/gb-2013-14-4-r36 23618408
59. Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J, et al. (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28: 503–510. doi: 10.1038/nbt.1633 20436462
60. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, et al. (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7: 562–578. doi: 10.1038/nprot.2012.016 22383036
61. Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, et al. (2011) Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25: 1915–1927. doi: 10.1101/gad.17446611 21890647
62. Lin MF, Jungreis I, Kellis M (2011) PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27: i275–282. doi: 10.1093/bioinformatics/btr209 21685081
63. Eddy SR (2011) Accelerated Profile HMM Searches. PLoS Comput Biol 7: e1002195. doi: 10.1371/journal.pcbi.1002195 22039361
64. Zhang J, Webb DM (2003) Evolutionary deterioration of the vomeronasal pheromone transduction pathway in catarrhine primates. Proc Natl Acad Sci U S A 100: 8337–8341. 12826614
65. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754–1760. doi: 10.1093/bioinformatics/btp324 19451168
66. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20: 1297–1303. doi: 10.1101/gr.107524.110 20644199
67. Yuan Q, Zhou Z, Lindell SG, Higley JD, Ferguson B, et al. (2012) The rhesus macaque is three times as diverse but more closely equivalent in damaging coding variation as compared to the human. BMC Genet 13: 52. doi: 10.1186/1471-2156-13-52 22747632
68. Paten B, Herrero J, Beal K, Fitzgerald S, Birney E (2008) Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res 18: 1814–1828. doi: 10.1101/gr.076554.108 18849524
69. Paten B, Herrero J, Fitzgerald S, Beal K, Flicek P, et al. (2008) Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res 18: 1829–1843. doi: 10.1101/gr.076521.108 18849525
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2015 Číslo 7
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Functional Constraint Profiling of a Viral Protein Reveals Discordance of Evolutionary Conservation and Functionality
- Reversible Oxidation of a Conserved Methionine in the Nuclear Export Sequence Determines Subcellular Distribution and Activity of the Fungal Nitrate Regulator NirA
- Modeling Implicates in Nephropathy: Evidence for Dominant Negative Effects and Epistasis under Anemic Stress
- Nutritional Control of DNA Replication Initiation through the Proteolysis and Regulated Translation of DnaA