Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data
Autoři:
Modupeore O. Adetunji aff001; Susan J. Lamont aff002; Behnam Abasht aff001; Carl J. Schmidt aff001
Působiště autorů:
Department of Animal and Food Sciences, University of Delaware, Newark, Delaware, United States of America
aff001; Department of Animal Science, Iowa State University, Ames, Iowa, United States of America
aff002
Vyšlo v časopise:
PLoS ONE 14(9)
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pone.0216838
Souhrn
The wealth of information deliverable from transcriptome sequencing (RNA-seq) is significant, however current applications for variant detection still remain a challenge due to the complexity of the transcriptome. Given the ability of RNA-seq to reveal active regions of the genome, detection of RNA-seq SNPs can prove valuable in understanding the phenotypic diversity between populations. Thus, we present a novel computational workflow named VAP (Variant Analysis Pipeline) that takes advantage of multiple RNA-seq splice aware aligners to call SNPs in non-human models using RNA-seq data only. We applied VAP to RNA-seq from a highly inbred chicken line and achieved high accuracy when compared with the matching whole genome sequencing (WGS) data. Over 65% of WGS coding variants were identified from RNA-seq. Further, our results discovered SNPs resulting from post transcriptional modifications, such as RNA editing, which may reveal potentially functional variation that would have otherwise been missed in genomic data. Even with the limitation in detecting variants in expressed regions only, our method proves to be a reliable alternative for SNP identification using RNA-seq data. The source code and user manuals are available at https://modupeore.github.io/VAP/.
Klíčová slova:
Gene expression – Genome analysis – Molecular genetics – Alleles – Transcriptome analysis – RNA sequencing – Genotyping – RNA editing
Zdroje
1. Metzker ML. Sequencing technologies the next generation. Nat Rev Genet. 2010;11: 31–46. doi: 10.1038/nrg2626 19997069
2. Guo Y, Zhao S, Sheng Q, Samuels DC, Shyr Y. The discrepancy among single nucleotide variants detected by DNA and RNA high throughput sequencing data. BMC Genomics. 2017;18: 690. doi: 10.1186/s12864-017-4022-x 28984205
3. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10: 57–63. doi: 10.1038/nrg2484 19015660
4. Oikkonen L, Lise S. Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection. Wellcome open Res. 2017;2: 6. doi: 10.12688/wellcomeopenres.10501.2 28239666
5. Piskol R, Ramaswami G, Li JB. Reliable Identification of Genomic Variants from RNA-Seq Data. Am J Hum Genet. 2013;93: 641–651. doi: 10.1016/j.ajhg.2013.08.008 24075185
6. Tang X, Baheti S, Shameer K, Thompson KJ, Wills Q, Niu N, et al. The eSNV-detect: a computational system to identify expressed single nucleotide variants from transcriptome sequencing data. Nucleic Acids Res. 2014;42: e172. doi: 10.1093/nar/gku1005 25352556
7. Rimmer A, Phan H, Mathieson I, Iqbal Z, Twigg SRF, Wilkie AOM, et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat Genet. 2014;46: 912–918. doi: 10.1038/ng.3036 25017105
8. Castel SE, Levy-Moonshine A, Mohammadi P, Banks E, Lappalainen T. Tools and best practices for data processing in allelic expression analysis. Genome Biol. 2015;16: 195. doi: 10.1186/s13059-015-0762-6 26381377
9. Quinn EM, Cormican P, Kenny EM, Hill M, Anney R, Gill M, et al. Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data. Futscher BW, editor. PLoS One. 2013;8: e58815. doi: 10.1371/journal.pone.0058815 23555596
10. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30: 2114–20. doi: 10.1093/bioinformatics/btu170 24695404
11. Chen S, Huang T, Zhou Y, Han Y, Xu M, Gu J. AfterQC: automatic filtering, trimming, error removing and quality control for fastq data. BMC Bioinformatics. 2017;18: 80. doi: 10.1186/s12859-017-1469-3 28361673
12. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL, et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14: R36. doi: 10.1186/gb-2013-14-4-r36 23618408
13. Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12: 357–360. doi: 10.1038/nmeth.3317 25751142
14. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29: 15–21. doi: 10.1093/bioinformatics/bts635 23104886
15. Medina I, Tárraga J, Martínez H, Barrachina S, Castillo MI, Paschall J, et al. Highly sensitive and ultrafast read mapping for RNA-seq analysis. DNA Res. 2016;23: 93–100. doi: 10.1093/dnares/dsv039 26740642
16. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25: 2078–9. doi: 10.1093/bioinformatics/btp352 19505943
17. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20: 1297–303. doi: 10.1101/gr.107524.110 20644199
18. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38: e164. doi: 10.1093/nar/gkq603 20601685
19. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17: 122. doi: 10.1186/s13059-016-0974-4 27268795
20. Zhuo Z, Lamont SJ, Abasht B. RNA-Seq Analyses Identify Frequent Allele Specific Expression and No Evidence of Genomic Imprinting in Specific Embryonic Tissues of Chicken. Sci Rep. 2017;7: 11944. doi: 10.1038/s41598-017-12179-9 28931927
21. Fleming DS, Koltes JE, Fritz-Waters ER, Rothschild MF, Schmidt CJ, Ashwell CM, et al. Single nucleotide variant discovery of highly inbred Leghorn and Fayoumi chicken breeds using pooled whole genome resequencing data reveals insights into phenotype differences. BMC Genomics. 2016;17: 812. doi: 10.1186/s12864-016-3147-7 27760519
22. Kranis A, Gheyas AA, Boschiero C, Turner F, Yu L, Smith S, et al. Development of a high density 600K SNP genotyping array for chicken. BMC Genomics. 2013;14: 59. doi: 10.1186/1471-2164-14-59 23356797
23. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 2013; doi: 10.1186/s13756-018-0352-y
24. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinforma. 2013;43: 11.10.1–11.10.33. doi: 10.1002/0471250953.bi1110s43 25431634
25. Yan Y, Yi G, Sun C, Qu L, Yang N. Genome-Wide Characterization of Insertion and Deletion Variation in Chicken Using Next Generation Sequencing. Wang J, editor. PLoS One. 2014;9: e104652. doi: 10.1371/journal.pone.0104652 25133774
26. Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33: 290–295. doi: 10.1038/nbt.3122 25690850
27. Kalari KR, Necela BM, Tang X, Thompson KJ, Lau M, Eckel-Passow JE, et al. An integrated model of the transcriptome of HER2-positive breast cancer. PLoS One. 2013;8: e79298. doi: 10.1371/journal.pone.0079298 24223926
28. Frésard L, Leroux S, Roux P-F, Klopp C, Fabre S, Esquerré D, et al. Genome-Wide Characterization of RNA Editing in Chicken Embryos Reveals Common Features among Vertebrates. Gibas C, editor. PLoS One. 2015;10: e0126776. doi: 10.1371/journal.pone.0126776 26024316
29. Moiseyeva IG, Romanov MN, Nikiforov AA, Sevastyanova AA, Semyenova SK. Evolutionary relationships of Red Jungle Fowl and chicken breeds. Genet Sel Evol. 2003;35: 403. doi: 10.1186/1297-9686-35-5-403 12927074
30. Kumar V, Shukla SK, Mathew J, Sharma D. Genetic Diversity and Population Structure Analysis Between Indian Red Jungle Fowl and Domestic Chicken Using Microsatellite Markers. Anim Biotechnol. 2015;26: 201–210. doi: 10.1080/10495398.2014.983645 25831041
31. Han Y, Gao S, Muegge K, Zhang W, Zhou B. Advanced applications of RNA sequencing and challenges. Bioinform Biol Insights. 2015;9: 29–46. doi: 10.4137/BBI.S28991 26609224
32. Bakhtiarizadeh MR, Shafiei H, Salehi A. Large-scale RNA editing profiling in different adult chicken tissues. bioRxiv. 2018; 319871. doi: 10.1101/319871
Článok vyšiel v časopise
PLOS One
2019 Číslo 9
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
- Nejasný stín na plicích – kazuistika
- Masturbační chování žen v ČR − dotazníková studie
- Úspěšná resuscitativní thorakotomie v přednemocniční neodkladné péči
- Fixní kombinace paracetamol/kodein nabízí synergické analgetické účinky
Najčítanejšie v tomto čísle
- Graviola (Annona muricata) attenuates behavioural alterations and testicular oxidative stress induced by streptozotocin in diabetic rats
- CH(II), a cerebroprotein hydrolysate, exhibits potential neuro-protective effect on Alzheimer’s disease
- Comparison between Aptima Assays (Hologic) and the Allplex STI Essential Assay (Seegene) for the diagnosis of Sexually transmitted infections
- Assessment of glucose-6-phosphate dehydrogenase activity using CareStart G6PD rapid diagnostic test and associated genetic variants in Plasmodium vivax malaria endemic setting in Mauritania