HaTSPiL: A modular pipeline for high-throughput sequencing data analysis
Autoři:
Edoardo Morandi aff001; Matteo Cereda aff002; Danny Incarnato aff001; Caterina Parlato aff002; Giulia Basile aff002; Francesca Anselmi aff001; Andrea Lauria aff001; Lisa Marie Simon aff001; Isabelle Laurence Polignano aff001; Francesca Arruga aff002; Silvia Deaglio aff002; Elisa Tirtei aff004; Franca Fagioli aff004; Salvatore Oliviero aff001
Působiště autorů:
Department of Life Sciences and System Biology, University of Turin, Turin, Italy
aff001; Italian Institute for Genomic Medicine (IIGM), Turin, Italy
aff002; Department of Medical Sciences, University of Turin, Turin, Italy
aff003; Paediatric Onco-Haematology, Stem Cell Transplantation and Cellular Therapy Division, City of Science and Health of Turin, Regina Margherita Children’s Hospital, Turin, Italy
aff004
Vyšlo v časopise:
PLoS ONE 14(10)
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pone.0222512
Souhrn
Background
Next generation sequencing methods are widely adopted for a large amount of scientific purposes, from pure research to health-related studies. The decreasing costs per analysis led to big amounts of generated data and to the subsequent improvement of software for the respective analyses. As a consequence, many approaches have been developed to chain different software in order to obtain reliable and reproducible workflows. However, the large range of applications for NGS approaches entails the challenge to manage many different workflows without losing reliability.
Methods
We here present a high-throughput sequencing pipeline (HaTSPiL), a Python-powered CLI tool designed to handle different approaches for data analysis with a high level of reliability. The software relies on the barcoding of filenames using a human readable naming convention that contains any information regarding the sample needed by the software to automatically choose different workflows and parameters. HaTSPiL is highly modular and customisable, allowing the users to extend its features for any specific need.
Conclusions
HaTSPiL is licensed as Free Software under the MIT license and it is available at https://github.com/dodomorandi/hatspil.
Klíčová slova:
User interfaces – Research validity – Next-generation sequencing – Computer software – Software tools – Software design – Programming languages – Mutational analysis
Zdroje
1. Goecks J, Nekrutenko A, Taylor J, Team G. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome biology. 2010;11:R86. doi: 10.1186/gb-2010-11-8-r86 20738864
2. Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, et al. Taverna: a tool for building and running workflows of services. Nucleic acids research. 2006;34:W729–W732. doi: 10.1093/nar/gkl320 16845108
3. Halbritter F, Vaidya HJ, Tomlinson SR. GeneProf: analysis of high-throughput sequencing experiments. Nature methods. 2011;9:7–8. doi: 10.1038/nmeth.1809 22205509
4. Desvillechabrol D, Legendre R, Rioualen C, Bouchier C, van Helden J, Kennedy S, et al. Sequanix: a dynamic graphical interface for Snakemake workflows. Bioinformatics (Oxford, England). 2018;34:1934–1936. doi: 10.1093/bioinformatics/bty034
5. Goodstadt L. Ruffus: a lightweight Python library for computational pipelines. Bioinformatics (Oxford, England). 2010;26:2778–2779. doi: 10.1093/bioinformatics/btq524
6. Mishima H, Sasaki K, Tanaka M, Tatebe O, Yoshiura KI. Agile parallel bioinformatics workflow management using Pwrake. BMC research notes. 2011;4:331. doi: 10.1186/1756-0500-4-331 21899774
7. Taura K, Matsuzaki T, Miwa M, Kamoshida Y, Yokoyama D, Dun N, et al. Design and Implementation of GXP Make—A Workflow System Based on Make. Future Gener Comput Syst. 2013;29(2):662–672. doi: 10.1016/j.future.2011.05.026
8. Cingolani P, Sladek R, Blanchette M. BigDataScript: a scripting language for data pipelines. Bioinformatics (Oxford, England). 2015;31:10–16. doi: 10.1093/bioinformatics/btu595
9. Sadedin SP, Pope B, Oshlack A. Bpipe: a tool for running and managing bioinformatics pipelines. Bioinformatics (Oxford, England). 2012;28:1525–1526. doi: 10.1093/bioinformatics/bts167
10. Köster J, Rahmann S. Snakemake-a scalable bioinformatics workflow engine. Bioinformatics (Oxford, England). 2018;34:3600. doi: 10.1093/bioinformatics/bty350
11. Leipzig J. A review of bioinformatic pipeline frameworks. Briefings in bioinformatics. 2017;18:530–536. doi: 10.1093/bib/bbw020 27013646
12. Silva TC, Colaprico A, Olsen C, D’Angelo F, Bontempi G, Ceccarelli M, et al. TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages. F1000Research. 2016;5:1542. doi: 10.12688/f1000research.8923.1 28232861
13. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal. 2011;17(1):10.
14. Shen W, Le S, Li Y, Hu F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PloS one. 2016;11:e0163962. doi: 10.1371/journal.pone.0163962 27706213
15. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England). 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324
16. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics (Oxford, England). 2013;29:15–21. doi: 10.1093/bioinformatics/bts635
17. Conway T, Wazny J, Bromage A, Tymms M, Sooraj D, Williams ED, et al. Xenome–a tool for classifying reads from xenograft samples. Bioinformatics (Oxford, England). 2012;28:i172–i178. doi: 10.1093/bioinformatics/bts236
18. Ahdesmäki MJ, Gray SR, Johnson JH, Lai Z. Disambiguate: An open-source application for disambiguating two species in next generation sequencing data from grafted samples. F1000Research. 2016;5:2741. doi: 10.12688/f1000research.10082.1 27990269
19. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research. 2010;20:1297–1303. doi: 10.1101/gr.107524.110 20644199
20. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature biotechnology. 2013;31:213–219. doi: 10.1038/nbt.2514 23396013
21. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome research. 2012;22:568–576. doi: 10.1101/gr.129684.111 22300766
22. Saunders CT, Wong WSW, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics (Oxford, England). 2012;28:1811–1817. doi: 10.1093/bioinformatics/bts271
23. Forbes SA, Beare D, Boutselakis H, Bamford S, Bindal N, Tate J, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic acids research. 2017;45:D777–D783. doi: 10.1093/nar/gkw1121 27899578
24. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: the NCBI database of genetic variation. Nucleic acids research. 2001;29:308–311. doi: 10.1093/nar/29.1.308 11125122
25. Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic acids research. 2018;46:D1062–D1067. doi: 10.1093/nar/gkx1153 29165669
Článok vyšiel v časopise
PLOS One
2019 Číslo 10
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
- Nejasný stín na plicích – kazuistika
- Masturbační chování žen v ČR − dotazníková studie
- Úspěšná resuscitativní thorakotomie v přednemocniční neodkladné péči
- Fixní kombinace paracetamol/kodein nabízí synergické analgetické účinky
Najčítanejšie v tomto čísle
- Correction: Low dose naltrexone: Effects on medication in rheumatoid and seropositive arthritis. A nationwide register-based controlled quasi-experimental before-after study
- Combining CDK4/6 inhibitors ribociclib and palbociclib with cytotoxic agents does not enhance cytotoxicity
- Experimentally validated simulation of coronary stents considering different dogboning ratios and asymmetric stent positioning
- Prevalence of pectus excavatum (PE), pectus carinatum (PC), tracheal hypoplasia, thoracic spine deformities and lateral heart displacement in thoracic radiographs of screw-tailed brachycephalic dogs