#PAGE_PARAMS# #ADS_HEAD_SCRIPTS# #MICRODATA#

A Likelihood-Based Framework for Variant Calling and Mutation Detection in Families


Family samples, which can be enriched for rare causal variants by focusing on families with multiple extreme individuals and which facilitate detection of de novo mutation events, provide an attractive resource for next-generation sequencing studies. Here, we describe, implement, and evaluate a likelihood-based framework for analysis of next generation sequence data in family samples. Our framework is able to identify variant sites accurately and to assign individual genotypes, and can handle de novo mutation events, increasing the sensitivity and specificity of variant calling and de novo mutation detection. Through simulations we show explicit modeling of family relationships is especially useful for analyses of low-frequency variants and that genotype accuracy increases with the number of individuals sequenced per family. Compared with the standard approach of ignoring relatedness, our methods identify and accurately genotype more variants, and have high specificity for detecting de novo mutation events. The improvement in accuracy using our methods over the standard approach is particularly pronounced for low-frequency variants. Furthermore the family-aware calling framework dramatically reduces Mendelian inconsistencies and is beneficial for family-based analysis. We hope our framework and software will facilitate continuing efforts to identify genetic factors underlying human diseases.


Vyšlo v časopise: A Likelihood-Based Framework for Variant Calling and Mutation Detection in Families. PLoS Genet 8(10): e32767. doi:10.1371/journal.pgen.1002944
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1002944

Souhrn

Family samples, which can be enriched for rare causal variants by focusing on families with multiple extreme individuals and which facilitate detection of de novo mutation events, provide an attractive resource for next-generation sequencing studies. Here, we describe, implement, and evaluate a likelihood-based framework for analysis of next generation sequence data in family samples. Our framework is able to identify variant sites accurately and to assign individual genotypes, and can handle de novo mutation events, increasing the sensitivity and specificity of variant calling and de novo mutation detection. Through simulations we show explicit modeling of family relationships is especially useful for analyses of low-frequency variants and that genotype accuracy increases with the number of individuals sequenced per family. Compared with the standard approach of ignoring relatedness, our methods identify and accurately genotype more variants, and have high specificity for detecting de novo mutation events. The improvement in accuracy using our methods over the standard approach is particularly pronounced for low-frequency variants. Furthermore the family-aware calling framework dramatically reduces Mendelian inconsistencies and is beneficial for family-based analysis. We hope our framework and software will facilitate continuing efforts to identify genetic factors underlying human diseases.


Zdroje

1. ConsortiumTGP (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073.

2. BilguvarK, OzturkAK, LouviA, KwanKY, ChoiM, et al. (2010) Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations. Nature 467: 207–210.

3. NgSB, BighamAW, BuckinghamKJ, HannibalMC, McMillinMJ, et al. (2010) Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat Genet 42: 790–793.

4. NgSB, BuckinghamKJ, LeeC, BighamAW, TaborHK, et al. (2010) Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 42: 30–35.

5. NejentsevS, WalkerN, RichesD, EgholmM, ToddJA (2009) Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 324: 387–389.

6. PritchardJK, CoxNJ (2002) The allelic architecture of human disease genes: common disease-common variant…or not? Hum Mol Genet 11: 2417–2423.

7. CirulliET, GoldsteinDB (2010) Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 11: 415–425.

8. ManolioTA, CollinsFS, CoxNJ, GoldsteinDB, HindorffLA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747–753.

9. RoachJC, GlusmanG, SmitAF, HuffCD, HubleyR, et al. (2010) Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328: 636–639.

10. LiB, LealSM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83: 311–321.

11. LiB, LealSM (2009) Discovery of rare variants via sequencing: implications for the design of complex trait association studies. PLoS Genet 5: e1000481 doi:10.1371/journal.pgen.1000481.

12. XuB, RoosJL, LevyS, van RensburgEJ, GogosJA, et al. (2008) Strong association of de novo copy number mutations with sporadic schizophrenia. Nat Genet 40: 880–885.

13. VissersLE, de LigtJ, GilissenC, JanssenI, SteehouwerM, et al. (2010) A de novo paradigm for mental retardation. Nat Genet 42: 1109–1112.

14. GreenwaySC, PereiraAC, LinJC, DePalmaSR, IsraelSJ, et al. (2009) De novo copy number variants identify new genes and loci in isolated sporadic tetralogy of Fallot. Nat Genet 41: 931–935.

15. SebatJ, LakshmiB, MalhotraD, TrogeJ, Lese-MartinC, et al. (2007) Strong association of de novo copy number mutations with autism. Science 316: 445–449.

16. GirardSL, GauthierJ, NoreauA, XiongL, ZhouS, et al. (2011) Increased exonic de novo mutation rate in individuals with schizophrenia. Nat Genet 43: 860–863.

17. O'RoakBJ, DeriziotisP, LeeC, VivesL, SchwartzJJ, et al. (2011) Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat Genet

18. NealeBM, KouY, LiuL, Ma'ayanA, SamochaKE, et al. (2012) Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature

19. O'RoakBJ, VivesL, GirirajanS, KarakocE, KrummN, et al. (2012) Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature

20. SandersSJ, MurthaMT, GuptaAR, MurdochJD, RaubesonMJ, et al. (2012) De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature

21. DePristoMA, BanksE, PoplinR, GarimellaKV, MaguireJR, et al. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics 43: 491–498.

22. LiH, HandsakerB, WysokerA, FennellT, RuanJ, et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079.

23. CanningsC, ThompsonE (1978) Probability functions on complex pedigrees. Adv Appl Probab 10: 26–61.

24. ElstonRC, StewartJ (1970) A new test of association for continuous variables. Biometrics 26: 305–314.

25. ElstonRC, StewartJ (1971) A general model for the genetic analysis of pedigree data. Hum Hered 21: 523–542.

26. LiH, RuanJ, DurbinR (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18: 1851–1858.

27. DanecekP, AutonA, AbecasisG, AlbersCA, BanksE, et al. (2011) The variant call format and VCFtools. Bioinformatics 27: 2156–2158.

28. Lange K (1997) Mathematical and Statisitcal Methods for Genetics Analysis. Dietz K Gail M, Kricheberg A, Tsiatis A, Samet J, editor New York: Springer.

29. Press W, Flannery BP, Teukolsky SA (1992) Numerical Recipes in C: The Art of Scientific Computing: Cambridge University Press.

30. HudsonRR (1990) Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology 7.

31. ChenWM, AbecasisGR (2007) Family-based association tests for genomewide association scans. Am J Hum Genet 81: 913–926.

32. ChenW, LiB (2011) LD based genotype calling in trios. Submitted xxx: xxx–xxx.

33. AlbersCA, LunterG, MacArthurDG, McVeanG, OuwehandWH, et al. (2011) Dindel: accurate indel calls from short-read data. Genome Res 21: 961–973.

34. DePristoMA, BanksE, PoplinR, GarimellaKV, MaguireJR, et al. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43: 491–498.

35. McKennaA, HannaM, BanksE, SivachenkoA, CibulskisK, et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20: 1297–1303.

36. LiH (2011) A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27: 2987–2993.

37. SchaffnerSF, FooC, GabrielS, ReichD, DalyMJ, et al. (2005) Calibrating a coalescent simulation of human genome sequence variation. Genome Res 15: 1576–1583.

38. AwadallaP, GauthierJ, MyersRA, CasalsF, HamdanFF, et al. (2010) Direct measure of the de novo mutation rate in autism and schizophrenia cohorts. Am J Hum Genet 87: 316–324.

39. LiH, DurbinR (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754–1760.

40. PiliaG, ChenWM, ScuteriA, OrruM, AlbaiG, et al. (2006) Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet 2: e132 doi:10.1371/journal.pgen.0020132.

41. SannaS, PitzalisM, ZoledziewskaM, ZaraI, SidoreC, et al. (2010) Variants within the immunoregulatory CBLB gene are associated with multiple sclerosis. Nat Genet 42: 495–497.

42. CartwrightR, HussinJ, KeebleJ, StoneEA, AwadallaP (2012) A family-based probabilistic method for capturing de novo mutations from high-throughput short-read sequencing data. J Stat Appl Genet Mol Biol 11.

43. ConradDF, KeeblerJE, DePristoMA, LindsaySJ, ZhangY, et al. (2011) Variation in genome-wide mutation rates within and between human families. Nature genetics 43: 712–714.

44. LathropGM, LalouelJM (1984) Easy calculations of lod scores and genetic risks on small computers. Am J Hum Genet 36: 460–465.

45. LathropGM, LalouelJM, JulierC, OttJ (1984) Strategies for multilocus linkage analysis in humans. Proc Natl Acad Sci U S A 81: 3443–3446.

46. LathropGM, LalouelJM, JulierC, OttJ (1985) Multilocus linkage analysis in humans: detection of linkage and estimation of recombination. Am J Hum Genet 37: 482–498.

47. LiY, SidoreC, KangHM, BoehnkeM, AbecasisGR (2011) Low-coverage sequencing: implications for design of complex trait association studies. Genome Res 21: 940–951.

Štítky
Genetika Reprodukčná medicína

Článok vyšiel v časopise

PLOS Genetics


2012 Číslo 10
Najčítanejšie tento týždeň
Najčítanejšie v tomto čísle
Kurzy

Zvýšte si kvalifikáciu online z pohodlia domova

Aktuální možnosti diagnostiky a léčby litiáz
nový kurz
Autori: MUDr. Tomáš Ürge, PhD.

Všetky kurzy
Prihlásenie
Zabudnuté heslo

Zadajte e-mailovú adresu, s ktorou ste vytvárali účet. Budú Vám na ňu zasielané informácie k nastaveniu nového hesla.

Prihlásenie

Nemáte účet?  Registrujte sa

#ADS_BOTTOM_SCRIPTS#