Improved Statistics for Genome-Wide Interaction Analysis
Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result.
Vyšlo v časopise:
Improved Statistics for Genome-Wide Interaction Analysis. PLoS Genet 8(4): e32767. doi:10.1371/journal.pgen.1002625
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1002625
Souhrn
Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result.
Zdroje
1. WuXDongHLuoLZhuYPengG 2010 A novel statistic for genome-wide interaction analysis. PLoS Genet 6 e1001131 doi:10.1371/journal.pgen.1001131
2. WTCCC 2007 Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447 661 678
3. FraylingTMTimpsonNJWeedonMNZegginiEFreathyRM 2007 A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316 889 894
4. ToddJWalkerNCooperJSmythDDownesK 2007 Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nat Genet 39 857 864
5. ZegginiEScottLJSaxenaRVoightBFMarchiniJL 2008 Meta-analysis of genomewide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet 40 638 645
6. BarrettJCHansoulSNicolaeDLChoJHDuerrRH 2008 Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat Genet 40 955 962
7. MellsGFFloydJAMorleyKICordellHJFranklinCS 2011 Genome-wide association study identifies 12 new susceptibility loci for primary biliary cirrhosis. Nat Genet 43 329 332
8. ManolioTACollinsFSCoxNJGoldsteinDBHindorffLA 2009 Finding the missing heritability of complex diseases. Nature 461 747 753
9. YangJBenyaminBMcEvoyBPGordonSHendersAK 2010 Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42 565 569
10. YangJManolioTAPasqualeLRBoerwinkleECaporasoN 2011 Genome partitioning of genetic variation for complex traits using common SNPs. Nat Genet 43 519 525
11. SiemiatyckiJThomasDC 1981 Biological models and statistical interactions: an example from multistage carcinogenesis. International Journal of Epidemiology 10 383 387
12. ThompsonWD 1991 Effect modification and the limits of biological inference from epidemiologic data. Journal of Clinical Epidemiology 44 221 232
13. PhillipsPC 1998 The language of gene interaction. Genetics 149 1167 1171
14. CordellHJ 2002 Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum Molec Genet 11 2463 2468
15. McClayJLvan den OordEJ 2006 Variance component analysis of polymorphic metabolic systems. J Theor Biol 240 149 159
16. PhillipsPC 2008 Epistasis–the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet 9 855 867
17. CordellHJ 2009 Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet 10 392 404
18. ClaytonDG 2009 Prediction and interaction in complex disease genetics: experience in type 1 diabetes. PLoS Genet 5 e1000540 doi:10.1371/journal.pgen.1000540
19. WangXElstonRZhuX 2010 The meaning of interaction. Hum Hered 70 269 277
20. KraftPYenYCStramDOMorrisonJGaudermanWJ 2007 Exploiting gene-environment interaction to detect genetic associations. Hum Hered 63 111 119
21. GaudermanWJ 2002 Sample size requirements for association studies of gene-gene interaction. Am J Epidemiol 155 478 484
22. ChapmanJClaytonD 2007 Detecting association using epistatic information. Genet Epidemiol 31 894 909
23. PiegorschWWWeinbergCRTaylorJA 1994 Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Statistics in Medicine 13 153 162
24. YangQKhouryMJSunFFlandersWD 1999 Case-only design to measure gene-gene interaction. Epidemiology 10 167 170
25. WeinbergCRUmbachDM 2000 Choosing a retrospective design to assess joint genetic and environmental contributions to risk. Am J Epidemiol 152 197 203
26. BhattacharjeeSWangZCiampaJKraftPChanockS 2010 Using principal components of genetic variation for robust and powerful detection of gene-gene interactions in case-control and case-only studies. Am J Hum Genet 86 331 342
27. ZaykinDVMengZEhmMG 2006 Contrasting linkage-disequilibrium patterns between cases and controls as a novel association-mapping method. Am J Hum Genet 78 737 746
28. SasieniP 1997 From genotypes to genes: doubling the sample size. Biometrics 53 1253 1261
29. BrownA 1975 Sample sizes required to detect linkage disequilibrium between two or three loci. Theoretical Population Biology 8 184 201
30. PurcellSNealeBTodd-BrownKThomasLFerreiraMA 2007 PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81 559 575
31. LewontinR 1998 On measures of gametic disequilibrium. Genetics 120 849 852
32. BaldingDJ 2006 A tutorial on statistical methods for population association studies. Nat Rev Genet 7 781 791
33. WellekSZieglerA 2009 A genotype based approach to assessing the association between single nucleotide polymorphisms. Hum Hered 67 128 139
34. Kam-ThongTCzamaraDTsudaKBorgwardtKLewisCM 2011 EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units. Eur J Hum Genet 19 465 471
35. DelaneauOMarchiniJZaguryJF 2011 A linear complexity phasing method for thousands of genomes. Nature Methods doi:10.1038/nmeth.1785
36. MukherjeeBChatterjeeN 2008 Exploiting gene-environment independence for analysis of casecontrol studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics 64 685 94
37. ChatterjeeNCarrollRJ 2005 Semiparametric maximum likelihood estimation exploiting geneenvironment independence in case-control studies. Biometrika 92 399 418
38. CiampaJYeagerMAmundadottirLJacobsKKraftP 2011 Large-scale exploration of gene-gene interactions in prostate cancer using a multistage genome-wide association study. Cancer Res 71 3287 3295
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2012 Číslo 4
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- A Coordinated Interdependent Protein Circuitry Stabilizes the Kinetochore Ensemble to Protect CENP-A in the Human Pathogenic Yeast
- Coordinate Regulation of Lipid Metabolism by Novel Nuclear Receptor Partnerships
- Defective Membrane Remodeling in Neuromuscular Diseases: Insights from Animal Models
- Formation of Rigid, Non-Flight Forewings (Elytra) of a Beetle Requires Two Major Cuticular Proteins