A Hybrid Likelihood Model for Sequence-Based Disease Association Studies

English version České info

In the past few years, case-control studies of common diseases have shifted their focus from single genes to whole exomes. New sequencing technologies now routinely detect hundreds of thousands of sequence variants in a single study, many of which are rare or even novel. The limitation of classical single-marker association analysis for rare variants has been a challenge in such studies. A new generation of statistical methods for case-control association studies has been developed to meet this challenge. A common approach to association analysis of rare variants is the burden-style collapsing methods to combine rare variant data within individuals across or within genes. Here, we propose a new hybrid likelihood model that combines a burden test with a test of the position distribution of variants. In extensive simulations and on empirical data from the Dallas Heart Study, the new model demonstrates consistently good power, in particular when applied to a gene set (e.g., multiple candidate genes with shared biological function or pathway), when rare variants cluster in key functional regions of a gene, and when protective variants are present. When applied to data from an ongoing sequencing study of bipolar disorder (191 cases, 107 controls), the model identifies seven gene sets with nominal p-values0.05, of which one MAPK signaling pathway (KEGG) reaches trend-level significance after correcting for multiple testing.

Vyšlo v časopise: A Hybrid Likelihood Model for Sequence-Based Disease Association Studies. PLoS Genet 9(1): e32767. doi:10.1371/journal.pgen.1003224
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1003224

Souhrn

Zdroje

1. StitzielNO, KiezunA, SunyaevS (2011) Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biol 12 : 227.

2. MajewskiJ, SchwartzentruberJ, LalondeE, MontpetitA, JabadoN (2011) What can exome sequencing do for you? J Med Genet 48 : 580–589.

3. ReichDE, LanderES (2001) On the allelic spectrum of human disease. Trends Genet 17 : 502–510.

4. BodmerW, BonillaC (2008) Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet 40 : 695–701.

5. WitteJS (2010) Genome-wide association studies and beyond. Annu Rev Public Health 31 : 9–20 4 p following 20.

6. BansalV, LibigerO, TorkamaniA, SchorkNJ (2010) Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet 11 : 773–785.

7. MorgenthalerS, ThillyWG (2007) A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (cast). Mutat Res 615 : 28–56.

8. LiB, LealSM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83 : 311–321.

9. MadsenBE, BrowningSR (2009) A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5: e1000384.

10. PriceAL, KryukovGV, de BakkerPIW, PurcellSM, StaplesJ, et al. (2010) Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet 86 : 832–838.

11. HanF, PanW (2010) A data-adaptive sum test for disease association with multiple common or rare variants. Hum Hered 70 : 42–54.

12. WuMC, LeeS, CaiT, LiY, BoehnkeM, et al. (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89 : 82–93.

13. LiuDJ, LealSM (2011) A exible likelihood framework for detecting associations with secondary phenotypes in genetic studies using selected samples: application to sequence data. Eur J Hum Genet

14. KinnamonDD, HershbergerRE, MartinER (2012) Reconsidering association testing methods using single-variant test statistics as alternatives to pooling tests for sequence data with rare variants. PLoS One 7: e30238.

15. BansalV, LibigerO, TorkamaniA, SchorkNJ (2011) An application and empirical comparison of statistical analysis methods for associating rare variants to a complex phenotype. Pac Symp Biocomput 76–87.

16. LadouceurM, DastaniZ, AulchenkoYS, GreenwoodCMT, RichardsJB (2012) The empirical power of rare variant association methods: results from sanger sequencing in 1,998 individuals. PLoS Genet 8: e1002496.

17. RomeoS, YinW, KozlitinaJ, PennacchioLA, BoerwinkleE, et al. (2009) Rare loss-of-function mutations in angptl family members contribute to plasma triglyceride levels in humans. J Clin Invest 119 : 70–79.

18. BezchlibnykY, YoungLT (2002) The neurobiology of bipolar disorder: focus on signal transduction pathways and the regulation of gene expression. Can J Psychiatry 47 : 135–148.

19. KryukovGV, ShpuntA, StamatoyannopoulosJA, SunyaevSR (2009) Power of deep, all-exon resequencing for discovery of human trait genes. Proc Natl Acad Sci U S A 106 : 3871–3876.

20. LiuDJ, LealSM (2010) A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet 6: e1001156.

21. LiuDJ, LealSM (2010) Replication strategies for rare variant complex trait association studies via next-generation sequencing. Am J Hum Genet 87 : 790–801.

22. KingCR, RathouzPJ, NicolaeDL (2010) An evolutionary framework for association testing in resequencing studies. PLoS Genet 6: e1001202.

23. YiN, LiuN, ZhiD, LiJ (2011) Hierarchical generalized linear models for multiple groups of rare and common variants: jointly estimating group and individual-variant effects. PLoS Genet 7: e1002382.

24. LiH, DurbinR (2010) Fast and accurate long-read alignment with burrows-wheeler transform. Bioinformatics 26 : 589–595.

25. McKennaA, HannaM, BanksE, SivachenkoA, CibulskisK, et al. (2010) The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res 20 : 1297–1303.

26. PiroozniaM, WangT, AvramopoulosD, ValleD, ThomasG, et al. (2012) Synaptomedb: an ontology-based knowledgebase for synaptic genes. Bioinformatics 28 : 897–899.

27. LiberzonA, SubramanianA, PinchbackR, ThorvaldsdtirH, TamayoP, et al. (2011) Molecular signatures database (msigdb) 3.0. Bioinformatics 27 : 1739–1740.

28. PriceAL, PattersonNJ, PlengeRM, WeinblattME, ShadickNA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38 : 904–909.

29. FridleyBL, BiernackaJM (2011) Gene set analysis of snp data: benefits, challenges, and future directions. Eur J Hum Genet 19 : 837–843.

30. ThomasGM, HuganirRL (2004) Mapk cascade signalling and synaptic plasticity. Nat Rev Neurosci 5 : 173–183.

31. ChenG, ManjiHK (2006) The extracellular signal-regulated kinase pathway: an emerging promis-ing target for mood stabilizers. Curr Opin Psychiatry 19 : 313–323.

32. LeeKY, AhnYM, JooEJ, ChangJS, KimYS (2006) The association of dusp6 gene with schizophre-nia and bipolar disorder: its possible role in the development of bipolar disorder. Mol Psychiatry 11 : 425–426.

33. KimSH, ShinSY, LeeKY, JooEJ, SongJY, et al. (2012) The genetic association of dusp6 with bipolar disorder and its effect on erk activity. Prog Neuropsychopharmacol Biol Psychiatry 37 : 41–49.

34. SklarP, RipkeS, ScottLJ, AndreassenOA, CichonS, et al. (2011) Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near odz4. Nat Genet 43 : 977–983.

35. RaoJS, LeeHJ, RapoportSI, BazinetRP (2008) Mode of action of mood stabilizers: is the arachidonic acid cascade a common target? Mol Psychiatry 13 : 585–596.

36. AmitY, GemanD (1997) Shape quantization and recognition with randomized trees. Neural Computation 9 : 1545–1588.

37. BreimanL (2001) Random forest. Machine Learning 45 : 5–32.

38. WongWC, KimD, CarterH, DiekhansM, RyanMC, et al. (2011) Chasm and snvbox: toolkit for detecting biologically important single nucleotide mutations in cancer. Bioinformatics 27 : 2147–2148.

39. StensonPD, BallEV, HowellsK, PhillipsAD, MortM, et al. (2009) The human gene mutation database: providing a comprehensive central mutation database for molecular diagnostics and personalized genomics. Hum Genomics 4 : 69–72.

40. DurbinR, AltshulerD, AbecasisG, BentleyD, ChakravartiA, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467 : 1061–1073.

41. DreszerTR, KarolchikD, ZweigAS, HinrichsAS, RaneyBJ, et al. (2012) The ucsc genome browser database: extensions and updates 2011. Nucleic Acids Res 40: D918–D923.

42. ConsortiumU (2011) Ongoing and future developments at the universal protein resource. Nucleic Acids Res 39: D214–D219.

43. HernandezRD (2008) A exible forward simulator for populations subject to selection and demog-raphy. Bioinformatics 24 : 2786–2787.

44. BoykoAR, WilliamsonSH, IndapAR, DegenhardtJD, HernandezRD, et al. (2008) Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 4: e1000083.

45. BenjaminiY, HochbergY (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological) 57 : 289–300.