A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing
While it is widely held that an organism's genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trace Archive, to look for clusters of mismatches of the same type, which are a hallmark of editing events caused by APOBEC3 and ADAR. We align 603,249,815 traces from the NCBI trace archive to their reference genomes. In clusters of mismatches of increasing size, at least one systematic sequencing error dominates the results (G-to-A). It is still present in mismatches with 99% accuracy and only vanishes in mismatches at 99.99% accuracy or higher. The error appears to have entered into about 1% of the HapMap, possibly affecting other users that rely on this resource. Further investigation, using stringent quality thresholds, uncovers thousands of mismatch clusters with no apparent defects in their chromatograms. These traces provide the first reported candidates of endogenous DNA editing in human, further elucidating RNA editing in human and mouse and also revealing, for the first time, extensive RNA editing in Xenopus tropicalis. We show that the NCBI Trace Archive provides a valuable resource for the investigation of the phenomena of DNA and RNA editing, as well as setting the stage for a comprehensive mapping of editing events in large-scale genomic datasets.
Vyšlo v časopise:
A Survey of Genomic Traces Reveals a Common Sequencing Error, RNA Editing, and DNA Editing. PLoS Genet 6(5): e32767. doi:10.1371/journal.pgen.1000954
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1000954
Souhrn
While it is widely held that an organism's genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trace Archive, to look for clusters of mismatches of the same type, which are a hallmark of editing events caused by APOBEC3 and ADAR. We align 603,249,815 traces from the NCBI trace archive to their reference genomes. In clusters of mismatches of increasing size, at least one systematic sequencing error dominates the results (G-to-A). It is still present in mismatches with 99% accuracy and only vanishes in mismatches at 99.99% accuracy or higher. The error appears to have entered into about 1% of the HapMap, possibly affecting other users that rely on this resource. Further investigation, using stringent quality thresholds, uncovers thousands of mismatch clusters with no apparent defects in their chromatograms. These traces provide the first reported candidates of endogenous DNA editing in human, further elucidating RNA editing in human and mouse and also revealing, for the first time, extensive RNA editing in Xenopus tropicalis. We show that the NCBI Trace Archive provides a valuable resource for the investigation of the phenomena of DNA and RNA editing, as well as setting the stage for a comprehensive mapping of editing events in large-scale genomic datasets.
Zdroje
1. BassBL
2002 RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem 71 817 846
2. HurstSR
HoughRF
AruscavagePJ
BassBL
1995 Deamination of mammalian glutamate receptor RNA by Xenopus dsRNA adenosine deaminase: similarities to in vivo RNA editing. Rna 1 1051 1060
3. KimU
WangY
SanfordT
ZengY
NishikuraK
1994 Molecular cloning of cDNA for double-stranded RNA adenosine deaminase, a candidate enzyme for nuclear RNA editing. Proc Natl Acad Sci U S A 91 11457 11461
4. MelcherT
MaasS
HerbA
SprengelR
SeeburgPH
1996 A mammalian RNA editing enzyme. Nature 379 460 464
5. O'ConnellMA
KrauseS
HiguchiM
HsuanJJ
TottyNF
1995 Cloning of cDNAs encoding mammalian double-stranded RNA-specific adenosine deaminase. Mol Cell Biol 15 1389 1397
6. MaasS
KawaharaY
TamburroKM
NishikuraK
2006 A-to-I RNA editing and human disease. RNA Biol 3 1 9
7. KeeganLP
LeroyA
SproulD
O'ConnellMA
2004 Adenosine deaminases acting on RNA (ADARs): RNA-editing enzymes. Genome Biol 5 209
8. LiJB
LevanonEY
YoonJK
AachJ
XieB
2009 Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science 324 1210 1213
9. AthanasiadisA
RichA
MaasS
2004 Widespread A-to-I RNA Editing of Alu-Containing mRNAs in the Human Transcriptome. PLoS Biol 2 e391 10.1371/journal.pbio.0020391
10. BlowM
FutrealPA
WoosterR
StrattonMR
2004 A survey of RNA editing in human brain. Genome Res 14 2379 2387
11. KimDD
KimTT
WalshT
KobayashiY
MatiseTC
2004 Widespread RNA editing of embedded alu elements in the human transcriptome. Genome Res 14 1719 1725
12. LevanonEY
EisenbergE
YelinR
NemzerS
HalleggerM
2004 Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat Biotechnol 22 1001 1005
13. ConticelloSG
2008 The AID/APOBEC family of nucleic acid mutators. Genome Biol 9 229
14. NavaratnamN
MorrisonJR
BhattacharyaS
PatelD
FunahashiT
1993 The p27 catalytic subunit of the apolipoprotein B mRNA editing enzyme is a cytidine deaminase. J Biol Chem 268 20709 20712
15. TengB
BurantCF
DavidsonNO
1993 Molecular cloning of an apolipoprotein B messenger RNA editing protein. Science 260 1816 1819
16. HarrisRS
Petersen-MahrtSK
NeubergerMS
2002 RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators. Mol Cell 10 1247 1253
17. MuramatsuM
SankaranandVS
AnantS
SugaiM
KinoshitaK
1999 Specific expression of activation-induced cytidine deaminase (AID), a novel member of the RNA-editing deaminase family in germinal center B cells. J Biol Chem 274 18470 18476
18. MuramatsuM
KinoshitaK
FagarasanS
YamadaS
ShinkaiY
2000 Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell 102 553 563
19. RevyP
MutoT
LevyY
GeissmannF
PlebaniA
2000 Activation-induced cytidine deaminase (AID) deficiency causes the autosomal recessive form of the Hyper-IgM syndrome (HIGM2). Cell 102 565 575
20. JarmuzA
ChesterA
BaylissJ
GisbourneJ
DunhamI
2002 An anthropoid-specific locus of orphan C to U RNA-editing enzymes on chromosome 22. Genomics 79 285 296
21. SheehyAM
GaddisNC
ChoiJD
MalimMH
2002 Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein. Nature 418 646 650
22. WedekindJE
DanceGS
SowdenMP
SmithHC
2003 Messenger RNA editing in mammals: new members of the APOBEC family seeking roles in the family business. Trends Genet 19 207 216
23. MehtaA
KinterMT
ShermanNE
DriscollDM
2000 Molecular cloning of apobec-1 complementation factor, a novel RNA-binding protein involved in the editing of apolipoprotein B mRNA. Mol Cell Biol 20 1846 1854
24. LellekH
KirstenR
DiehlI
ApostelF
BuckF
2000 Purification and molecular cloning of a novel essential component of the apolipoprotein B mRNA editing enzyme-complex. J Biol Chem 275 19848 19856
25. HarrisRS
BishopKN
SheehyAM
CraigHM
Petersen-MahrtSK
2003 DNA deamination mediates innate immunity to retroviral infection. Cell 113 803 809
26. MangeatB
TurelliP
CaronG
FriedliM
PerrinL
2003 Broad antiretroviral defence by human APOBEC3G through lethal editing of nascent reverse transcripts. Nature 424 99 103
27. MarianiR
ChenD
SchrofelbauerB
NavarroF
KonigR
2003 Species-specific exclusion of APOBEC3G from HIV-1 virions by Vif. Cell 114 21 31
28. VartanianJP
GuetardD
HenryM
Wain-HobsonS
2008 Evidence for editing of human papillomavirus DNA by APOBEC3 in benign and precancerous lesions. Science 320 230 233
29. YuQ
KonigR
PillaiS
ChilesK
KearneyM
2004 Single-strand specificity of APOBEC3G accounts for minus-strand deamination of the HIV genome. Nat Struct Mol Biol 11 435 442
30. EsnaultC
HeidmannO
DelebecqueF
DewannieuxM
RibetD
2005 APOBEC3G cytidine deaminase inhibits retrotransposition of endogenous retroviruses. Nature 433 430 433
31. ChiuYL
GreeneWC
2008 The APOBEC3 cytidine deaminases: an innate defensive network opposing exogenous retroviruses and endogenous retroelements. Annu Rev Immunol 26 317 353
32. EwingB
GreenP
1998 Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8 186 194
33. EwingB
HillierL
WendlMC
GreenP
1998 Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8 175 185
34. LeeYN
MalimMH
BieniaszPD
2008 Hypermutation of an ancient human retrovirus by APOBEC3G. J Virol 82 8762 8770
35. ChiuYL
WitkowskaHE
HallSC
SantiagoM
SorosVB
2006 High-molecular-mass APOBEC3G complexes restrict Alu retrotransposition. Proc Natl Acad Sci U S A 103 15588 15593
36. LehmannKA
BassBL
2000 Double-stranded RNA adenosine deaminases ADAR1 and ADAR2 have overlapping specificities. Biochemistry 39 12875 12884
37. WongSK
SatoS
LazinskiDW
2001 Substrate recognition by ADAR1 and ADAR2. Rna 7 846 858
38. HillierLD
LennonG
BeckerM
BonaldoMF
ChiapelliB
1996 Generation and analysis of 280,000 human expressed sequence tags. Genome Res 6 807 828
39. EisenbergE
NemzerS
KinarY
SorekR
RechaviG
2005 Is abundant A-to-I RNA editing primate-specific? Trends Genet 21 77 81
40. NeemanY
LevanonEY
JantschMF
EisenbergE
2006 RNA editing level in the mouse is determined by the genomic repeat repertoire. Rna 12 1802 1809
41. BassBL
WeintraubH
1987 A developmentally regulated activity that unwinds RNA duplexes. Cell 48 607 613
42. ScaddenAD
2007 Inosine-containing dsRNA binds a stress-granule-like complex and downregulates gene expression in trans. Mol Cell 28 491 500
43. KimelmanD
KirschnerMW
1989 An antisense mRNA directs the covalent modification of the transcript encoding fibroblast growth factor in Xenopus oocytes. Cell 59 687 696
44. SaccomannoL
BassBL
1999 A minor fraction of basic fibroblast growth factor mRNA is deaminated in Xenopus stage VI and matured oocytes. Rna 5 39 48
45. TuzunE
SharpAJ
BaileyJA
KaulR
MorrisonVA
2005 Fine-scale structural variation of the human genome. Nat Genet 37 727 732
46. McKernanKJ
PeckhamHE
CostaGL
McLaughlinSF
FuY
2009 Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res
47. BentleyDR
BalasubramanianS
SwerdlowHP
SmithGP
MiltonJ
2008 Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456 53 59
48. WheelerDL
BarrettT
BensonDA
BryantSH
CaneseK
2008 Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 36 D13 21
49. KentWJ
SugnetCW
FureyTS
RoskinKM
PringleTH
2002 The human genome browser at UCSC. Genome Res 12 996 1006
50. ZhangZ
SchwartzS
WagnerL
MillerW
2000 A greedy algorithm for aligning DNA sequences. J Comput Biol 7 203 214
51. ZaranekA
CleggT
VandewegeW
ChurchG
Free Factories: Unified Infrastructure for Data Intensive Web Services; 2008 Boston, MA 391 404
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2010 Číslo 5
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
Najčítanejšie v tomto čísle
- Common Genetic Variants near the Brittle Cornea Syndrome Locus Influence the Blinding Disease Risk Factor Central Corneal Thickness
- All About Mitochondrial Eve: An Interview with Rebecca Cann
- The Relationship among Gene Expression, the Evolution of Gene Dosage, and the Rate of Protein Evolution
- SMA-10/LRIG Is a Conserved Transmembrane Protein that Enhances Bone Morphogenetic Protein Signaling