A Likelihood-Based Framework for Variant Calling and Mutation Detection in Families
Family samples, which can be enriched for rare causal variants by focusing on families with multiple extreme individuals and which facilitate detection of de novo mutation events, provide an attractive resource for next-generation sequencing studies. Here, we describe, implement, and evaluate a likelihood-based framework for analysis of next generation sequence data in family samples. Our framework is able to identify variant sites accurately and to assign individual genotypes, and can handle de novo mutation events, increasing the sensitivity and specificity of variant calling and de novo mutation detection. Through simulations we show explicit modeling of family relationships is especially useful for analyses of low-frequency variants and that genotype accuracy increases with the number of individuals sequenced per family. Compared with the standard approach of ignoring relatedness, our methods identify and accurately genotype more variants, and have high specificity for detecting de novo mutation events. The improvement in accuracy using our methods over the standard approach is particularly pronounced for low-frequency variants. Furthermore the family-aware calling framework dramatically reduces Mendelian inconsistencies and is beneficial for family-based analysis. We hope our framework and software will facilitate continuing efforts to identify genetic factors underlying human diseases.
Vyšlo v časopise:
A Likelihood-Based Framework for Variant Calling and Mutation Detection in Families. PLoS Genet 8(10): e32767. doi:10.1371/journal.pgen.1002944
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1002944
Souhrn
Family samples, which can be enriched for rare causal variants by focusing on families with multiple extreme individuals and which facilitate detection of de novo mutation events, provide an attractive resource for next-generation sequencing studies. Here, we describe, implement, and evaluate a likelihood-based framework for analysis of next generation sequence data in family samples. Our framework is able to identify variant sites accurately and to assign individual genotypes, and can handle de novo mutation events, increasing the sensitivity and specificity of variant calling and de novo mutation detection. Through simulations we show explicit modeling of family relationships is especially useful for analyses of low-frequency variants and that genotype accuracy increases with the number of individuals sequenced per family. Compared with the standard approach of ignoring relatedness, our methods identify and accurately genotype more variants, and have high specificity for detecting de novo mutation events. The improvement in accuracy using our methods over the standard approach is particularly pronounced for low-frequency variants. Furthermore the family-aware calling framework dramatically reduces Mendelian inconsistencies and is beneficial for family-based analysis. We hope our framework and software will facilitate continuing efforts to identify genetic factors underlying human diseases.
Zdroje
1. ConsortiumTGP (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073.
2. BilguvarK, OzturkAK, LouviA, KwanKY, ChoiM, et al. (2010) Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations. Nature 467: 207–210.
3. NgSB, BighamAW, BuckinghamKJ, HannibalMC, McMillinMJ, et al. (2010) Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat Genet 42: 790–793.
4. NgSB, BuckinghamKJ, LeeC, BighamAW, TaborHK, et al. (2010) Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 42: 30–35.
5. NejentsevS, WalkerN, RichesD, EgholmM, ToddJA (2009) Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 324: 387–389.
6. PritchardJK, CoxNJ (2002) The allelic architecture of human disease genes: common disease-common variant…or not? Hum Mol Genet 11: 2417–2423.
7. CirulliET, GoldsteinDB (2010) Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 11: 415–425.
8. ManolioTA, CollinsFS, CoxNJ, GoldsteinDB, HindorffLA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747–753.
9. RoachJC, GlusmanG, SmitAF, HuffCD, HubleyR, et al. (2010) Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328: 636–639.
10. LiB, LealSM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83: 311–321.
11. LiB, LealSM (2009) Discovery of rare variants via sequencing: implications for the design of complex trait association studies. PLoS Genet 5: e1000481 doi:10.1371/journal.pgen.1000481.
12. XuB, RoosJL, LevyS, van RensburgEJ, GogosJA, et al. (2008) Strong association of de novo copy number mutations with sporadic schizophrenia. Nat Genet 40: 880–885.
13. VissersLE, de LigtJ, GilissenC, JanssenI, SteehouwerM, et al. (2010) A de novo paradigm for mental retardation. Nat Genet 42: 1109–1112.
14. GreenwaySC, PereiraAC, LinJC, DePalmaSR, IsraelSJ, et al. (2009) De novo copy number variants identify new genes and loci in isolated sporadic tetralogy of Fallot. Nat Genet 41: 931–935.
15. SebatJ, LakshmiB, MalhotraD, TrogeJ, Lese-MartinC, et al. (2007) Strong association of de novo copy number mutations with autism. Science 316: 445–449.
16. GirardSL, GauthierJ, NoreauA, XiongL, ZhouS, et al. (2011) Increased exonic de novo mutation rate in individuals with schizophrenia. Nat Genet 43: 860–863.
17. O'RoakBJ, DeriziotisP, LeeC, VivesL, SchwartzJJ, et al. (2011) Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat Genet
18. NealeBM, KouY, LiuL, Ma'ayanA, SamochaKE, et al. (2012) Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature
19. O'RoakBJ, VivesL, GirirajanS, KarakocE, KrummN, et al. (2012) Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature
20. SandersSJ, MurthaMT, GuptaAR, MurdochJD, RaubesonMJ, et al. (2012) De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature
21. DePristoMA, BanksE, PoplinR, GarimellaKV, MaguireJR, et al. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics 43: 491–498.
22. LiH, HandsakerB, WysokerA, FennellT, RuanJ, et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079.
23. CanningsC, ThompsonE (1978) Probability functions on complex pedigrees. Adv Appl Probab 10: 26–61.
24. ElstonRC, StewartJ (1970) A new test of association for continuous variables. Biometrics 26: 305–314.
25. ElstonRC, StewartJ (1971) A general model for the genetic analysis of pedigree data. Hum Hered 21: 523–542.
26. LiH, RuanJ, DurbinR (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18: 1851–1858.
27. DanecekP, AutonA, AbecasisG, AlbersCA, BanksE, et al. (2011) The variant call format and VCFtools. Bioinformatics 27: 2156–2158.
28. Lange K (1997) Mathematical and Statisitcal Methods for Genetics Analysis. Dietz K Gail M, Kricheberg A, Tsiatis A, Samet J, editor New York: Springer.
29. Press W, Flannery BP, Teukolsky SA (1992) Numerical Recipes in C: The Art of Scientific Computing: Cambridge University Press.
30. HudsonRR (1990) Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology 7.
31. ChenWM, AbecasisGR (2007) Family-based association tests for genomewide association scans. Am J Hum Genet 81: 913–926.
32. ChenW, LiB (2011) LD based genotype calling in trios. Submitted xxx: xxx–xxx.
33. AlbersCA, LunterG, MacArthurDG, McVeanG, OuwehandWH, et al. (2011) Dindel: accurate indel calls from short-read data. Genome Res 21: 961–973.
34. DePristoMA, BanksE, PoplinR, GarimellaKV, MaguireJR, et al. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43: 491–498.
35. McKennaA, HannaM, BanksE, SivachenkoA, CibulskisK, et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20: 1297–1303.
36. LiH (2011) A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27: 2987–2993.
37. SchaffnerSF, FooC, GabrielS, ReichD, DalyMJ, et al. (2005) Calibrating a coalescent simulation of human genome sequence variation. Genome Res 15: 1576–1583.
38. AwadallaP, GauthierJ, MyersRA, CasalsF, HamdanFF, et al. (2010) Direct measure of the de novo mutation rate in autism and schizophrenia cohorts. Am J Hum Genet 87: 316–324.
39. LiH, DurbinR (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754–1760.
40. PiliaG, ChenWM, ScuteriA, OrruM, AlbaiG, et al. (2006) Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet 2: e132 doi:10.1371/journal.pgen.0020132.
41. SannaS, PitzalisM, ZoledziewskaM, ZaraI, SidoreC, et al. (2010) Variants within the immunoregulatory CBLB gene are associated with multiple sclerosis. Nat Genet 42: 495–497.
42. CartwrightR, HussinJ, KeebleJ, StoneEA, AwadallaP (2012) A family-based probabilistic method for capturing de novo mutations from high-throughput short-read sequencing data. J Stat Appl Genet Mol Biol 11.
43. ConradDF, KeeblerJE, DePristoMA, LindsaySJ, ZhangY, et al. (2011) Variation in genome-wide mutation rates within and between human families. Nature genetics 43: 712–714.
44. LathropGM, LalouelJM (1984) Easy calculations of lod scores and genetic risks on small computers. Am J Hum Genet 36: 460–465.
45. LathropGM, LalouelJM, JulierC, OttJ (1984) Strategies for multilocus linkage analysis in humans. Proc Natl Acad Sci U S A 81: 3443–3446.
46. LathropGM, LalouelJM, JulierC, OttJ (1985) Multilocus linkage analysis in humans: detection of linkage and estimation of recombination. Am J Hum Genet 37: 482–498.
47. LiY, SidoreC, KangHM, BoehnkeM, AbecasisGR (2011) Low-coverage sequencing: implications for design of complex trait association studies. Genome Res 21: 940–951.
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2012 Číslo 10
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- A Mutation in the Gene Causes Alternative Splicing Defects and Deafness in the Bronx Waltzer Mouse
- Classical Genetics Meets Next-Generation Sequencing: Uncovering a Genome-Wide Recombination Map in
- Mutations in (Hhat) Perturb Hedgehog Signaling, Resulting in Severe Acrania-Holoprosencephaly-Agnathia Craniofacial Defects
- Regulation of ATG4B Stability by RNF5 Limits Basal Levels of Autophagy and Influences Susceptibility to Bacterial Infection