Properties and Modeling of GWAS when Complex Disease Risk Is Due to Non-Complementing, Deleterious Mutations in Genes of Large Effect
Current genome-wide association studies (GWAS) have high power to detect intermediate frequency SNPs making modest contributions to complex disease, but they are underpowered to detect rare alleles of large effect (RALE). This has led to speculation that the bulk of variation for most complex diseases is due to RALE. One concern with existing models of RALE is that they do not make explicit assumptions about the evolution of a phenotype and its molecular basis. Rather, much of the existing literature relies on arbitrary mapping of phenotypes onto genotypes obtained either from standard population-genetic simulation tools or from non-genetic models. We introduce a novel simulation of a 100-kilobase gene region, based on the standard definition of a gene, in which mutations are unconditionally deleterious, are continuously arising, have partially recessive and non-complementing effects on phenotype (analogous to what is widely observed for most Mendelian disorders), and are interspersed with neutral markers that can be genotyped. Genes evolving according to this model exhibit a characteristic GWAS signature consisting of an excess of marginally significant markers. Existing tests for an excess burden of rare alleles in cases have low power while a simple new statistic has high power to identify disease genes evolving under our model. The structure of linkage disequilibrium between causative mutations and significantly associated markers under our model differs fundamentally from that seen when rare causative markers are assumed to be neutral. Rather than tagging single haplotypes bearing a large number of rare causative alleles, we find that significant SNPs in a GWAS tend to tag single causative mutations of small effect relative to other mutations in the same gene. Our results emphasize the importance of evaluating the power to detect associations under models that are genetically and evolutionarily motivated.
Vyšlo v časopise:
Properties and Modeling of GWAS when Complex Disease Risk Is Due to Non-Complementing, Deleterious Mutations in Genes of Large Effect. PLoS Genet 9(2): e32767. doi:10.1371/journal.pgen.1003258
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1003258
Souhrn
Current genome-wide association studies (GWAS) have high power to detect intermediate frequency SNPs making modest contributions to complex disease, but they are underpowered to detect rare alleles of large effect (RALE). This has led to speculation that the bulk of variation for most complex diseases is due to RALE. One concern with existing models of RALE is that they do not make explicit assumptions about the evolution of a phenotype and its molecular basis. Rather, much of the existing literature relies on arbitrary mapping of phenotypes onto genotypes obtained either from standard population-genetic simulation tools or from non-genetic models. We introduce a novel simulation of a 100-kilobase gene region, based on the standard definition of a gene, in which mutations are unconditionally deleterious, are continuously arising, have partially recessive and non-complementing effects on phenotype (analogous to what is widely observed for most Mendelian disorders), and are interspersed with neutral markers that can be genotyped. Genes evolving according to this model exhibit a characteristic GWAS signature consisting of an excess of marginally significant markers. Existing tests for an excess burden of rare alleles in cases have low power while a simple new statistic has high power to identify disease genes evolving under our model. The structure of linkage disequilibrium between causative mutations and significantly associated markers under our model differs fundamentally from that seen when rare causative markers are assumed to be neutral. Rather than tagging single haplotypes bearing a large number of rare causative alleles, we find that significant SNPs in a GWAS tend to tag single causative mutations of small effect relative to other mutations in the same gene. Our results emphasize the importance of evaluating the power to detect associations under models that are genetically and evolutionarily motivated.
Zdroje
1. ManolioTA, CollinsFS, CoxNJ, GoldsteinDB, HindroffLA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747–753.
2. SpencerC, SuZ, DonnellyP, MarchiniJ (2009) Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet 5: e1000477 doi:10.1371/journal.pgen.1000477.
3. CohenJC, KissRS, PertsemlidisA, MarcelYL, McPhersonR, et al. (2004) Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305: 869–72.
4. FearnheadNS, WildingJL, WinneyB, TonksS, BartlettS, et al. (2004) Multiple rare variants in different genes account for multifactorial inherited susceptibility to colorectal adenomas. Proc Natl Acad Sci USA 101: 15992–15997.
5. KotowskiIK, PertsemlidisA, LukeA, CooperRS, VegaGL, et al. (2006) A spectrum of PCSK9 alleles contributes to plasma levels of low-density lipoprotein cholesterol. The American Journal of Human Genetics 78: 410–22.
6. RomeoS, PennacchioLA, FuY, BoerwinkleE, Tybjaerg-HansenA, et al. (2007) Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nature Genetics 39: 513–516.
7. MariniNJ, GinJ, ZiegleJ, KehoKH, GinzingerD, et al. (2008) The prevalence of folate-remedial MTHFR enzyme variants in humans. Proc Natl Acad Sci USA 105: 8055–60.
8. JohansenCT, WangJ, LanktreeMB, CaoH, McIntyreAD, et al. (2010) Excess of rare variants in genes identified by genome-wide association study of hypertriglyceridemia. Nature Genetics 42: 684–7.
9. Strachan T, Read A (2011) Human Molecular Genetics, (Garland Science).
10. Online Mendelian Inheritance in Man, OMIM (TM). (McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine).
11. PritchardJ (2001) Are rare variants responsible for susceptibility to complex diseases? The American Journal of Human Genetics 69: 124–137.
12. HaldaneJBS (1927) A mathematical theory of natural and artificial selection. Part V. Selection and mutation. Proc Cambridge Phil Soc 23: 838–844.
13. RischN, MerikangasK (1996) The future of genetic studies of complex human diseases. Science 273: 1516–1517.
14. SlagerSL, HuangJ, VielandVJ (2000) Effect of allelic heterogeneity on the power of the transmission disequilibrium test. Genet Epidemiol 18: 143–156.
15. LiB, LealSM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J of Hum Genet 83: 311–321.
16. MadsenBE, BrowningSR (2009) A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5: e1000384 doi:10.1371/journal.pgen.1000384.
17. BasuS, PanW (2011) Comparison of statistical tests for disease association with rare variants. Genet Epidemiol 35: 606–19.
18. HudsonRR (2002) Generating samples under a Wright-Fisher neutral model. Bioinformatics 18: 337–8.
19. DicksonSP, WangK, KrantzI, HakonarsonH, GoldsteinDB (2010) Rare variants create synthetic genome-wide associations. PLoS Biol 8: e1000294 doi:10.1371/journal.pbio.1000294.
20. LuoL, BoerwinkleE, XiongM (2011) Association studies for next- generation sequencing. Genome Research 21: 1099–1108.
21. WuMC, LeeS, CaiT, LiY, BoehnkeM, et al. (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89: 82–93.
22. HernandezRD (2008) A flexible forward simulator for populations subject to selection and demography. Bioinformatics 24: 2786–7.
23. KingCR, RathouzPJ, NicolaeDL (2010) An evolutionary framework for association testing in resequencing studies. PLoS Genet 6: e1001202 doi:10.1371/journal.pgen.1001202.
24. PriceA, KryukovG, de BakkerP, PurcellS, StaplesJ, et al. (2010) Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet 86: 832–838.
25. BrowningSR, ThompsonEA (2012) Detecting rare variant associations by identity-by-descent mapping in case-control studies. Genetics 190: 1521–31.
26. BenzerS (1955) Fine structure of a genetic region in bacteriophage. Proc Natl Acad Sci USA 41: 344–54.
27. KaulR, MatalonR, AllenR, FischRO, MichalsK, et al. (1994) Frequency of 12 mutations in 114 children with phenylketonuria in the Midwest region of the USA. J Inherit Metab Dis 17: 356–8.
28. WrayNR, PurcellSM, VisscherPM (2011) Synthetic associates created by rare variants do not explain most GWAS results. PLoS Biol 9: e1000579 10.1371/journal.pbio.1000579.
29. WuMC, KraftP, EpsteinMP, TaylorDM, ChanockSJ, et al. (2010) Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet 86: 929–42.
30. LeeS, WuMC, LinX (2012) Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13: 762–75.
31. SimmonsMJ, CrowJF (1977) Mutations affecting fitness in Drosophila populations. Annu Rev Genet 11: 49–78.
32. PetersAD, HalliganDL, WhitlockMC, KeightleyPD (2003) Dominance and overdominance of mildly deleterious induced mutations for fitness traits in Caenorhabditis elegans. Genetics 165: 589–99.
33. PhadnisN, FryJD (2005) Widespread correlations between dominance and homozygous effects of mutations: implications for theories of dominance. Genetics 171: 385–392.
34. AgrawalA, WhitlockMC (2011) Inferences About the Distribution of Dominance Drawn From Yeast Gene Knockout Data. Genetics 187: 553–566.
35. LadouceurM, DastaniZ, AulchenkoYS, GreenwoodCMT, RichardsJB (2012) The empirical power of rare variant association methods: Results from Sanger sequencing in 1,998 individuals. PLoS Genet 8: e1002496 doi:10.1371/journal.pgen.1002496.
36. Di RienzoA (2006) Population genetics models of common diseases. Current opinion in genetics & development 16: 630–636.
37. TurelliM (1984) Heritable genetic variation via mutation-selection balance: Lerch's zeta meets the abdominal bristle. Theor Popul Biol 25: 138–93.
38. WangK, DicksonSP, StolleCA, KrantzID, GoldsteinDB, et al. (2010) Interpretation of association signals and identification of causal variants from genome-wide association studies. Am J Hum Genet 86: 730–42.
39. RischN (1990) Linkage strategies for genetically complex traits. II. The power of affected relative pairs. Am J Hum Genet 46: 229–41.
40. RischN (1990) Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet 46: 222–8.
41. HaldaneJBS (1957) The cost of natural selection. Journal of Genetics 55: 511–524.
42. CharlesworthB, MorganMT, CharlesworthD (1993) The effect of deleterious mutations on neutral molecular variation. Genetics 134: 1289–1303.
43. LambertBW, TerwilligerJD, WeissKM (2008) Forsim: a tool for exploring the genetic architecture of complex traits with controlled truth. Bioinformatics 24: 1821–2.
44. WrightS (1931) Evolution in Mendelian populations. Genetics 16: 97–159.
45. JorgensonE, WitteJS (2006) A gene-centric approach to genome-wide association studies. Nature Reviews Genetics 7: 885–91.
46. KimuraM (1969) The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61: 893–903.
47. Chadeau-HyamM, HoggartCJ, O'ReillyPF, WhittakerJC, IorioMD, et al. (2008) Fregene: simulation of realistic sequence-level data in populations and ascertained samples. BMC Bioinformatics 9: 364.
48. PengB, LiuX (2010) Simulating sequences of the human genome with rare variants. Hum Hered 70: 287–91.
49. PengB, AmosCI, KimmelM (2007) Forward-time simulations of human populations with complex diseases. PLoS Genet 3: e47 doi:10.1371/journal.pgen.0030047.
50. PadhukasahasramB, MarjoramP, WallJD, BustamanteCD, NordborgM (2008) Exploring population genetic models with recombination using efficient forward-time simulations. Genetics 178: 2417–27.
51. Falconer DS, Mackay TFC (1996) Introduction to Quantitative Genetics, Fourth Edition. Prentice Hall.
52. McCarthyM, HirschhornJ (2008) Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet 17: R156–65.
53. Schäfer J, Opgen-Rhein R, Zuber V, Silva APD, Strimmer K (2011) corpcor: Efficient Estimation of Covariance and (Partial) Correlation. R package version1.6.0. http://CRAN.R-project.org/package=corpcor.
54. R Development Core Team (2010) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.
55. TajimaF (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437–460.
56. Ewens WJ (2004) Mathematical Population Genetics. I. Theoretical Introduction. Springer-Verlag New York, Inc.
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2013 Číslo 2
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Complex Inheritance of Melanoma and Pigmentation of Coat and Skin in Grey Horses
- Coordination of Chromatid Separation and Spindle Elongation by Antagonistic Activities of Mitotic and S-Phase CDKs
- Autophagy Induction Is a Tor- and Tp53-Independent Cell Survival Response in a Zebrafish Model of Disrupted Ribosome Biogenesis
- Assembly of the Auditory Circuitry by a Genetic Network in the Mouse Brainstem