Gene-Based Tests of Association
Genome-wide association studies (GWAS) are now used routinely to identify SNPs associated with complex human phenotypes. In several cases, multiple variants within a gene contribute independently to disease risk. Here we introduce a novel Gene-Wide Significance (GWiS) test that uses greedy Bayesian model selection to identify the independent effects within a gene, which are combined to generate a stronger statistical signal. Permutation tests provide p-values that correct for the number of independent tests genome-wide and within each genetic locus. When applied to a dataset comprising 2.5 million SNPs in up to 8,000 individuals measured for various electrocardiography (ECG) parameters, this method identifies more validated associations than conventional GWAS approaches. The method also provides, for the first time, systematic assessments of the number of independent effects within a gene and the fraction of disease-associated genes housing multiple independent effects, observed at 35%–50% of loci in our study. This method can be generalized to other study designs, retains power for low-frequency alleles, and provides gene-based p-values that are directly compatible for pathway-based meta-analysis.
Vyšlo v časopise:
Gene-Based Tests of Association. PLoS Genet 7(7): e32767. doi:10.1371/journal.pgen.1002177
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1002177
Souhrn
Genome-wide association studies (GWAS) are now used routinely to identify SNPs associated with complex human phenotypes. In several cases, multiple variants within a gene contribute independently to disease risk. Here we introduce a novel Gene-Wide Significance (GWiS) test that uses greedy Bayesian model selection to identify the independent effects within a gene, which are combined to generate a stronger statistical signal. Permutation tests provide p-values that correct for the number of independent tests genome-wide and within each genetic locus. When applied to a dataset comprising 2.5 million SNPs in up to 8,000 individuals measured for various electrocardiography (ECG) parameters, this method identifies more validated associations than conventional GWAS approaches. The method also provides, for the first time, systematic assessments of the number of independent effects within a gene and the fraction of disease-associated genes housing multiple independent effects, observed at 35%–50% of loci in our study. This method can be generalized to other study designs, retains power for low-frequency alleles, and provides gene-based p-values that are directly compatible for pathway-based meta-analysis.
Zdroje
1. PfeuferAvan NoordCMarcianteKDArkingDELarsonMG 2010 Genome-wide association study of pr interval. Nat Genet
2. SotoodehniaNIsaacsAde BakkerPIWDrrMNewton-ChehC 2010 Common variants in 22 loci are associated with qrs duration and cardiac ventricular conduction. Nat Genet
3. ArkingDEPfeuferAPostWKaoWHNewton-ChehC 2006 A common genetic variant in the nos1 regulator nos1ap modulates cardiac repolarization. Nat Genet 38 644 51
4. PfeuferASannaSArkingDEMullerMGatevaV 2009 Common variants at ten loci modulate the qt interval duration in the qtscd study. Nat Genet 41 407 14
5. Newton-ChehCEijgelsheimMRiceKMde BakkerPIYinX 2009 Common variants at ten loci inuence qt interval duration in the qtgen study. Nat Genet 41 399 406
6. NealeBMShamPC 2004 The future of association studies: gene-based analysis and replication. Am J Hum Genet 75 353 62
7. BallardDHChoJZhaoH 2009 Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet Epidemiol
8. ChapmanJWhittakerJ 2008 Analysis of multiple snps in a candidate gene or region. Genet Epidemiol 32 560 6
9. WilleAHohJOttJ 2003 Sum statistics for the joint detection of multiple disease loci in case-control association studies with snp markers. Genet Epidemiol 25 350 9
10. ServinBStephensM 2007 Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet 3 e114 doi:10.1371/journal.pgen.0030114
11. FridleyBL 2009 Bayesian variable and model selection methods for genetic association studies. Genet Epidemiol 33 27 37
12. GeorgeEIMcCullochRE 1993 Variable selection via gibbs sampling. Journal of the American Statistical Association 88 881 889
13. BallRD 2001 Bayesian methods for quantitative trait loci mapping based on model selection: approximate analysis using the bayesian information criterion. Genetics 159 1351 64
14. BogdanMGhoshJKDoergeRW 2004 Modifying the schwarz bayesian information criterion to locate multiple interacting quantitative trait loci. Genetics 167 989 99
15. SchwarzG 1978 Estimating the dimension of a model. The Annals of Statistics 6 461 464
16. ChengSKeyesMJLarsonMGMcCabeELNewton-ChehC 2009 Long-term outcomes in individuals with prolonged pr interval or first-degree atrioventricular block. JAMA 301 2571 7
17. VrtovecBDelgadoRZewailAThomasCDRichartzBM 2003 Prolonged qtc interval and high b-type natriuretic peptide levels together predict mortality in patients with advanced heart failure. Circulation 107 1764 9
18. SchoutenEGDekkerJMMeppelinkPKokFJVandenbrouckeJP 1991 Qt interval prolongation predicts cardiovascular mortality in an apparently healthy population. Circulation 84 1516 23
19. GrigioniFCarinciVBorianiGBracchettiGPotenaL 2002 Accelerated qrs widening as an independent predictor of cardiac death or of the need for heart transplantation in patients with congestive heart failure. J Heart Lung Transplant 21 899 902
20. TurriniPCorradoDBassoCNavaABauceB 2001 Dispersion of ventricular depolarization-repolarization: a noninvasive marker for risk stratification in arrhythmogenic right ventricular cardiomyopathy. Circulation 103 3075 80
21. SayersEWBarrettTBensonDABryantSHCaneseK 2009 Database resources of the national center for biotechnology information. Nucleic Acids Res 37 D5 15
22. DixonALLiangLMoffattMFChenWHeathS 2007 A genome-wide association study of global gene expression. Nat Genet 39 1202 7
23. StrangerBENicaACForrestMSDimasABirdCP 2007 Population genomics of human gene expression. Nat Genet 39 1217 24
24. VeyrierasJBKudaravalliSKimSYDermitzakisETGiladY 2008 High-resolution mapping of expression-qtls yields insight into human gene regulation. PLoS Genet 4 e1000214 doi:10.1371/journal.pgen.1000214
25. LiuJZMcRaeAFNyholtDRMedlandSEWrayNR 2010 A versatile gene-based test for genome-wide association studies. Am J Hum Genet 87 139 145
26. TibshiraniR 1996 Regression shrinkage and selection via the lasso. J Roy Statist Soc Ser B 58 267 288
27. EfronBHastieTJohnstoneITibshiraniR 2002 Least angle regression
28. WuTLangeK 2008 Coordinate descent algorithms for lasso penalized regression. Annals of Applied Statistics 2 224 244
29. WuTTChenYFHastieTSobelELangeK 2009 Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25 714 21
30. WolpertDHMacreadyWG 1997 No free lunch theorems for optimization. IEEE transactions on evolutionary
31. PurcellSNealeBTodd-BrownKThomasLFerreiraMA 2007 Plink: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81 559 75
32. ChurchillGADoergeRW 1994 Empirical threshold values for quantitative trait mapping. Genetics 138 963 71
33. VerzilliCShahTCasasJPChapmanJSandhuM 2008 Bayesian meta-analysis of genetic association studies with different sets of markers. American journal of human genetics 82 859 872
34. StephensMBaldingDJ 2009 Bayesian statistical methods for genetic association studies. Nature Reviews Genetics 10 681 690
35. LindleyD 1957 A statistical paradox. Biometrika 44 187
36. BartlettM 1957 A comment on dv lindley's statistical paradox. Biometrika 44 533
37. ClineMKarchinR 2010 Using bioinformatics to predict the functional impact of snvs. Bioinformatics (Oxford, England)
38. AdzhubeiIASchmidtSPeshkinLRamenskyVEGerasimovaA 2010 A method and server for predicting damaging missense mutations. Nat Methods 7 248 9
39. McKusickVA 2007 Mendelian inheritance in man and its online version, omim. Am J Hum Genet 80 588 604
40. FridleyBLSerieDJenkinsGWhiteKBamletW 2010 Bayesian mixture models for the incorporation of prior knowledge to inform genetic association studies. Genet Epidemiol 34 418 26
41. SubramanianATamayoPMoothaVKMukherjeeSEbertBL 2005 Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102 15545 50
42. WangKLiMBucanM 2007 Pathway-based approaches for analysis of genomewide association studies. American journal of human genetics 81
43. HoldenMDengSWojnowskiLKulleB 2008 Gsea-snp: applying gene set enrichment analysis to snp data from genome-wide association studies. Bioinformatics 24 2784 5
44. The ARIC investigators 1989 The atherosclerosis risk in communities (aric) study: design and objectives. Am J Epidemiol 129 687 702
45. SaxenaRVoightBF Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, Novartis Institutes of BioMedical Research 2007 Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316 1331 1336
46. CheverudJM 2001 A simple correction for multiple comparisons in interval mapping genome scans. Heredity 87 52 8
47. NyholtDR 2004 A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. American journal of human genetics 74 765 9
48. LiJJiL 2005 Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95 221 7
49. GalweyNW 2009 A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests. Genet Epidemiol 33 559 68
50. FisherRAYatesF 1938 Statistical tables for biological, agricultural and medical research. London [etc.] Oliver and Boyd. 39000863 by R.A. Fisher … and F. Yates … 29 cm. "References": 23
51. KnuthDE 1997 The art of computer programming. Reading, Mass: Addison-Wesley, 3rd edition 97002147 Donald E. Knuth. ill. ; 24 cm. Includes indexes. v. 1. Fundamental algorithms – v. 2. Seminumerical algorithms – v. 3. Sorting and searching
52. HastieTEfronB 2009 lars: Least angle regression, lasso and forward stagewise
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2011 Číslo 7
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Genome-Wide Association Study Identifies Novel Restless Legs Syndrome Susceptibility Loci on 2p14 and 16q12.1
- Loss of the BMP Antagonist, SMOC-1, Causes Ophthalmo-Acromelic (Waardenburg Anophthalmia) Syndrome in Humans and Mice
- Gene-Based Tests of Association
- Genome-Wide Association Study Identifies as a Susceptibility Gene for Pediatric Asthma in Asian Populations