Pervasive Transcription of the Human Genome Produces Thousands of Previously Unidentified Long Intergenic Noncoding RNAs
Known protein coding gene exons compose less than 3% of the human genome. The remaining 97% is largely uncharted territory, with only a small fraction characterized. The recent observation of transcription in this intergenic territory has stimulated debate about the extent of intergenic transcription and whether these intergenic RNAs are functional. Here we directly observed with a large set of RNA-seq data covering a wide array of human tissue types that the majority of the genome is indeed transcribed, corroborating recent observations by the ENCODE project. Furthermore, using de novo transcriptome assembly of this RNA-seq data, we found that intergenic regions encode far more long intergenic noncoding RNAs (lincRNAs) than previously described, helping to resolve the discrepancy between the vast amount of observed intergenic transcription and the limited number of previously known lincRNAs. In total, we identified tens of thousands of putative lincRNAs expressed at a minimum of one copy per cell, significantly expanding upon prior lincRNA annotation sets. These lincRNAs are specifically regulated and conserved rather than being the product of transcriptional noise. In addition, lincRNAs are strongly enriched for trait-associated SNPs suggesting a new mechanism by which intergenic trait-associated regions may function. These findings will enable the discovery and interrogation of novel intergenic functional elements.
Vyšlo v časopise:
Pervasive Transcription of the Human Genome Produces Thousands of Previously Unidentified Long Intergenic Noncoding RNAs. PLoS Genet 9(6): e32767. doi:10.1371/journal.pgen.1003569
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1003569
Souhrn
Known protein coding gene exons compose less than 3% of the human genome. The remaining 97% is largely uncharted territory, with only a small fraction characterized. The recent observation of transcription in this intergenic territory has stimulated debate about the extent of intergenic transcription and whether these intergenic RNAs are functional. Here we directly observed with a large set of RNA-seq data covering a wide array of human tissue types that the majority of the genome is indeed transcribed, corroborating recent observations by the ENCODE project. Furthermore, using de novo transcriptome assembly of this RNA-seq data, we found that intergenic regions encode far more long intergenic noncoding RNAs (lincRNAs) than previously described, helping to resolve the discrepancy between the vast amount of observed intergenic transcription and the limited number of previously known lincRNAs. In total, we identified tens of thousands of putative lincRNAs expressed at a minimum of one copy per cell, significantly expanding upon prior lincRNA annotation sets. These lincRNAs are specifically regulated and conserved rather than being the product of transcriptional noise. In addition, lincRNAs are strongly enriched for trait-associated SNPs suggesting a new mechanism by which intergenic trait-associated regions may function. These findings will enable the discovery and interrogation of novel intergenic functional elements.
Zdroje
1. HindorffLA, SethupathyP, JunkinsHA, RamosEM, MehtaJP, et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106: 9362–9367.
2. BertoneP, StolcV, RoyceTE, RozowskyJS, UrbanAE, et al. (2004) Global identification of human transcribed sequences with genome tiling arrays. Science 306: 2242–2246.
3. ChengJ, KapranovP, DrenkowJ, DikeS, BrubakerS, et al. (2005) Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308: 1149–1154.
4. KapranovP, CawleySE, DrenkowJ, BekiranovS, StrausbergRL, et al. (2002) Large-scale transcriptional activity in chromosomes 21 and 22. Science 296: 916–919.
5. KapranovP, ChengJ, DikeS, NixDA, DuttaguptaR, et al. (2007) RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316: 1484–1488.
6. BirneyE, StamatoyannopoulosJA, DuttaA, GuigoR, GingerasTR, et al. (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447: 799–816.
7. MercerTR, GerhardtDJ, DingerME, CrawfordJ, TrapnellC, et al. (2012) Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat Biotechnol 30: 99–104.
8. MattickJS (2009) The genetic signatures of noncoding RNAs. PLoS Genet 5: e1000459.
9. ClarkMB, AmaralPP, SchlesingerFJ, DingerME, TaftRJ, et al. (2011) The reality of pervasive transcription. PLoS Biol 9: e1000625 discussion e1001102.
10. KapranovP, St LaurentG (2012) Dark Matter RNA: Existence, Function, and Controversy. Front Genet 3: 60.
11. van BakelH, NislowC, BlencoweBJ, HughesTR (2011) Response to “The reality of pervasive transcription.”. PloS Biol 9: e1001102.
12. van BakelH, NislowC, BlencoweBJ, HughesTR (2010) Most “dark matter” transcripts are associated with known genes. PLoS Biol 8: e1000371.
13. BernsteinBE, BirneyE, DunhamI, GreenED, GunterC, et al. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57–74.
14. DjebaliS, DavisCA, MerkelA, DobinA, LassmannT, et al. (2012) Landscape of transcription in human cells. Nature 489: 101–108.
15. WangKC, ChangHY (2011) Molecular mechanisms of long noncoding RNAs. Mol Cell 43: 904–914.
16. CabiliMN, TrapnellC, GoffL, KoziolM, Tazon-VegaB, et al. (2011) Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25: 1915–1927.
17. FlockhartRJ, WebsterDE, QuK, MascarenhasN, KovalskiJ, et al. (2012) BRAFV600E remodels the melanocyte transcriptome and induces BANCR to regulate melanoma cell migration. Genome Res 22: 1006–1014.
18. KhalilAM, GuttmanM, HuarteM, GarberM, RajA, et al. (2009) Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A 106: 11667–11672.
19. DerrienT, JohnsonR, BussottiG, TanzerA, DjebaliS, et al. (2012) The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression. Genome Res 22: 1775–1789.
20. HarrowJ, FrankishA, GonzalezJM, TapanariE, DiekhansM, et al. (2012) GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res 22: 1760–1774.
21. MortazaviA, WilliamsBA, McCueK, SchaefferL, WoldB (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5: 621–628.
22. GuoH, IngoliaNT, WeissmanJS, BartelDP (2010) Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466: 835–840.
23. BanfaiB, JiaH, KhatunJ, WoodE, RiskB, et al. (2012) Long noncoding RNAs are rarely translated in two human cell lines. Genome Res 22: 1646–1657.
24. IngoliaNT, LareauLF, WeissmanJS (2011) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147: 789–802.
25. PonjavicJ, PontingCP, LunterG (2007) Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res 17: 556–565.
26. OromUA, DerrienT, BeringerM, GumireddyK, GardiniA, et al. (2010) Long noncoding RNAs with enhancer-like function in human cells. Cell 143: 46–58.
27. SatiS, GhoshS, JainV, ScariaV, SenguptaS (2012) Genome-wide analysis reveals distinct patterns of epigenetic features in long non-coding RNA loci. Nucleic Acids Res 40: 10018–10031.
28. HawkinsRD, HonGC, LeeLK, NgoQ, ListerR, et al. (2010) Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. Cell Stem Cell 6: 479–491.
29. ListerR, PelizzolaM, DowenRH, HawkinsRD, HonG, et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462: 315–322.
30. WiluszJE, FreierSM, SpectorDL (2008) 3′ end processing of a long nuclear-retained noncoding RNA yields a tRNA-like cytoplasmic RNA. Cell 135: 919–932.
31. GuttmanM, AmitI, GarberM, FrenchC, LinMF, et al. (2009) Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458: 223–227.
32. UlitskyI, ShkumatavaA, JanCH, SiveH, BartelDP (2011) Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell 147: 1537–1550.
33. GlinskiiAB, MaJ, MaS, GrantD, LimCU, et al. (2009) Identification of intergenic trans-regulatory RNAs containing a disease-linked SNP sequence and targeting cell cycle progression/differentiation pathways in multiple common human disorders. Cell Cycle 8: 3925–3942.
34. JinG, SunJ, IsaacsSD, WileyKE, KimST, et al. (2011) Human polymorphisms at long non-coding RNAs (lncRNAs) and association with prostate cancer risk. Carcinogenesis 32: 1655–1659.
35. IshiiN, OzakiK, SatoH, MizunoH, SaitoS, et al. (2006) Identification of a novel non-coding RNA, MIAT, that confers risk of myocardial infarction. J Hum Genet 51: 1087–1099.
36. JendrzejewskiJ, HeH, RadomskaHS, LiW, TomsicJ, et al. (2012) The polymorphism rs944289 predisposes to papillary thyroid carcinoma through a large intergenic noncoding RNA gene of tumor suppressor type. Proc Natl Acad Sci U S A 109: 8646–8651.
37. WangKC, YangYW, LiuB, SanyalA, Corces-ZimmermanR, et al. (2011) A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature 472: 120–124.
38. DingerME, PangKC, MercerTR, MattickJS (2008) Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput Biol 4: e1000176.
39. OkazakiY, FurunoM, KasukawaT, AdachiJ, BonoH, et al. (2002) Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420: 563–573.
40. LinMF, JungreisI, KellisM (2011) PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27: i275–282.
41. GuttmanM, GarberM, LevinJZ, DonagheyJ, RobinsonJ, et al. (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28: 503–510.
42. HutchinsonJN, EnsmingerAW, ClemsonCM, LynchCR, LawrenceJB, et al. (2007) A screen for nuclear transcripts identifies two linked noncoding RNAs associated with SC35 splicing domains. BMC Genomics 8: 39.
43. DieciG, FiorinoG, CastelnuovoM, TeichmannM, PaganoA (2007) The expanding RNA polymerase III transcriptome. Trends Genet 23: 614–622.
44. TilgnerH, KnowlesDG, JohnsonR, DavisCA, ChakraborttyS, et al. (2012) Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res 22: 1616–1625.
45. KoehlerR, IssacH, CloonanN, GrimmondSM (2011) The uniqueome: a mappability resource for short-tag sequencing. Bioinformatics 27: 272–274.
46. YangL, DuffMO, GraveleyBR, CarmichaelGG, ChenLL (2011) Genomewide characterization of non-polyadenylated RNAs. Genome Biol 12: R16.
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2013 Číslo 6
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- BMS1 Is Mutated in Aplasia Cutis Congenita
- Sex-stratified Genome-wide Association Studies Including 270,000 Individuals Show Sexual Dimorphism in Genetic Loci for Anthropometric Traits
- Distinctive Expansion of Potential Virulence Genes in the Genome of the Oomycete Fish Pathogen
- Distinct Neuroblastoma-associated Alterations of Impair Sympathetic Neuronal Differentiation in Zebrafish Models