All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs
Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1−FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci.
Vyšlo v časopise:
All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs. PLoS Genet 9(4): e32767. doi:10.1371/journal.pgen.1003449
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1003449
Souhrn
Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1−FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci.
Zdroje
1. GlazierAM, NadeauJH, AitmanTJ (2002) Finding genes that underlie complex traits. Science 298: 2345–2349.
2. HirschhornJN, DalyMJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6: 95–108.
3. HindorffLA, SethupathyP, JunkinsHA, RamosEM, MehtaJP, et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106: 9362–9367.
4. ManolioTA, CollinsFS, CoxNJ, GoldsteinDB, HindorffLA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747–753.
5. YangJ, BenyaminB, McEvoyBP, GordonS, HendersAK, et al. (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42: 565–569.
6. YangJ, ManolioTA, PasqualeLR, BoerwinkleE, CaporasoN, et al. (2011) Genome partitioning of genetic variation for complex traits using common SNPs. Nat Genet 43: 519–525.
7. StahlEA, WegmannD, TrynkaG, Gutierrez-AchuryJ, DoR, et al. (2012) Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat Genet 44: 483–489.
8. Benjamini Y, Hochberg Y (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B (Methodological): Blackwell Publishing. pp. 289–300.
9. SunL, CraiuRV, PatersonAD, BullSB (2006) Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genet Epidemiol 30: 519–530.
10. YooYJ, PinnaduwageD, WaggottD, BullSB, SunL (2009) Genome-wide association analyses of North American Rheumatoid Arthritis Consortium and Framingham Heart Study data utilizing genome-wide linkage results. BMC Proc 3 Suppl 7S103.
11. LiC, LiM, LangeEM, WatanabeRM (2008) Prioritized subset analysis: improving power in genome-wide association studies. Hum Hered 65: 129–141.
12. LinW-Y, LeeW-C (2010) Incorporating prior knowledge to facilitate discoveries in a genome-wide association study on age-related macular degeneration. BMC Research Notes 3: 1–5.
13. SunL, RommensJM, CorvolH, LiW, LiX, et al. (2012) Multiple apical plasma membrane constituents are associated with susceptibility to meconium ileus in individuals with cystic fibrosis. Nat Genet 44: 562–569.
14. HuangB, RangrejJ, PatersonAD, SunL (2007) The multiplicity problem in linkage analysis of gene expression data - the power of differentiating cis- and trans-acting regulators. BMC Proc 1 Suppl 1: S142.
15. KnightJ, BarnesMR, BreenG, WealeME (2011) Using functional annotation for the empirical determination of Bayes Factors for genome-wide association study analysis. PLoS ONE 6: e14808 doi:10.1371/journal.pone.0014808.
16. SmithEN, KollerDL, PanganibanC, SzelingerS, ZhangP, et al. (2011) Genome-wide association of bipolar disorder suggests an enrichment of replicable associations in regions near genes. PLoS Genet 7: e1002134 doi:10.1371/journal.pgen.1002134.
17. Efron B (2010) Large-scale inference : empirical Bayes methods for estimation, testing, and prediction. Cambridge ; New York: Cambridge University Press. xii, 263 p. p.
18. SchwederT, SpjotvollE (1982) Plots of P-Values to Evaluate Many Tests Simultaneously. Biometrika 69: 493–502.
19. YangJ, WeedonMN, PurcellS, LettreG, EstradaK, et al. (2011) Genomic inflation factors under polygenic inheritance. Eur J Hum Genet 19: 807–812.
20. DevlinB, RoederK (1999) Genomic control for association studies. Biometrics 55: 997–1004.
21. HamshereML, WaltersJT, SmithR, RichardsAL, GreenE, et al. (2012) Genome-wide significant associations in schizophrenia to ITIH3/4, CACNA1C and SDCCAG8, and extensive replication of associations reported by the Schizophrenia PGC. Mol Psychiatry
22. BenjaminiY, HochbergY (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological) 57: 289–300.
23. CraiuRV, SunL (2008) Choosing the lesser evil: Trade-off between false discovery rate and non-discovery rate. Statistica Sinica 18: 861–879.
24. ConsortiumIS, PurcellSM, WrayNR, StoneJL, VisscherPM, et al. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460: 748–752.
25. SchwederT, SpjøtvollE (1982) Plots of P-values to evaluate many tests simultaneously. Biometrika 69: 493–502.
26. FlintJ, MackayTF (2009) Genetic architecture of quantitative traits in mice, flies, and humans. Genome Res 19: 723–733.
27. KeaneTM, GoodstadtL, DanecekP, WhiteMA, WongK, et al. (2011) Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477: 289–294.
28. SoHC, GuiAH, ChernySS, ShamPC (2011) Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases. Genet Epidemiol 35: 310–317.
29. SoHC, YipBH, ShamPC (2010) Estimating the total number of susceptibility variants underlying complex diseases from genome-wide association studies. PLoS ONE 5: e13898 doi:10.1371/journal.pone.0013898.
30. PawitanY, SengKC, MagnussonPK (2009) How many genetic variants remain to be discovered? PLoS ONE 4: e7969 doi:10.1371/journal.pone.0007969.
31. Falconer DS, Mackay TFC (1996) Introduction to quantitative genetics. Essex, England: Longman. xiii, 464 p. p.
32. VisscherPM, GoddardME, DerksEM, WrayNR (2012) Evidence-based psychiatric genetics, AKA the false dichotomy between common and rare variant hypotheses. Mol Psychiatry 17: 474–485.
33. MignoneF, GissiC, LiuniS, PesoleG (2002) Untranslated regions of mRNAs. Genome Biol 3: REVIEWS0004.
34. SiepelA, BejeranoG, PedersenJS, HinrichsAS, HouM, et al. (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15: 1034–1050.
35. KingMC, WilsonAC (1975) Evolution at two levels in humans and chimpanzees. Science 188: 107–116.
36. CooperGM, ShendureJ (2011) Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 12: 628–640.
37. SpeliotesEK, WillerCJ, BerndtSI, MondaKL, ThorleifssonG, et al. (2010) Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat Genet 42: 937–948.
38. HeidIM, JacksonAU, RandallJC, WinklerTW, QiL, et al. (2010) Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat Genet 42: 949–960.
39. FrankeA, McGovernDP, BarrettJC, WangK, Radford-SmithGL, et al. (2010) Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat Genet 42: 1118–1125.
40. AndersonCA, BoucherG, LeesCW, FrankeA, D'AmatoM, et al. (2011) Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nat Genet 43: 246–252.
41. Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium (2011) Genome-wide association study identifies five new schizophrenia loci. Nat Genet 43: 969–976.
42. Psychiatric GWAS Consortium Bipolar Disorder Working Group (2011) Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat Genet 43: 977–983.
43. Tobacco and Genetics Consortium (2010) Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat Genet 42: 441–447.
44. EhretGB, MunroePB, RiceKM, BochudM, JohnsonAD, et al. (2011) Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478: 103–109.
45. TeslovichTM, MusunuruK, SmithAV, EdmondsonAC, StylianouIM, et al. (2010) Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466: 707–713.
46. Purcell S (2009) Plink. 1.07 ed.
47. PurcellS, NealeB, Todd-BrownK, ThomasL, FerreiraMA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575.
48. HsuF, KentWJ, ClawsonH, KuhnRM, DiekhansM, et al. (2006) The UCSC Known Genes. Bioinformatics 22: 1036–1046.
49. StoreyJD, TaylorJE, SiegmundD (2004) Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society Series B-Statistical Methodology 66: 187–205.
50. StoreyJD (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society Series B-Statistical Methodology 64: 479–498.
51. SchwartzmanA, LinX (2011) The effect of correlation in false discovery rate estimation. Biometrika 98: 199–214.
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2013 Číslo 4
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- The G4 Genome
- Neutral Genomic Microevolution of a Recently Emerged Pathogen, Serovar Agona
- The Histone Demethylase Jarid1b Ensures Faithful Mouse Development by Protecting Developmental Genes from Aberrant H3K4me3
- The Tissue-Specific RNA Binding Protein T-STAR Controls Regional Splicing Patterns of Pre-mRNAs in the Brain