Informed Conditioning on Clinical Covariates Increases Power in Case-Control Association Studies
Genetic case-control association studies often include data on clinical covariates, such as body mass index (BMI), smoking status, or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios for established variants estimated from low–BMI cases are larger than those estimated from high–BMI cases. An unanswered question is how to use this information to maximize statistical power in case-control studies that ascertain individuals on the basis of phenotype (case-control ascertainment) or phenotype and clinical covariates (case-control-covariate ascertainment). While current approaches improve power in studies with random ascertainment, they often lose power under case-control ascertainment and fail to capture available power increases under case-control-covariate ascertainment. We show that an informed conditioning approach, based on the liability threshold model with parameters informed by external epidemiological information, fully accounts for disease prevalence and non-random ascertainment of phenotype as well as covariates and provides a substantial increase in power while maintaining a properly controlled false-positive rate. Our method outperforms standard case-control association tests with or without covariates, tests of gene x covariate interaction, and previously proposed tests for dealing with covariates in ascertained data, with especially large improvements in the case of case-control-covariate ascertainment. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, lung cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 89,726 samples. In these datasets, informed conditioning outperforms logistic regression for 115 of the 157 known associated variants investigated (P-value = 1×10−9). The improvement varied across diseases with a 16% median increase in χ2 test statistics and a commensurate increase in power. This suggests that applying our method to existing and future association studies of these diseases may identify novel disease loci.
Vyšlo v časopise:
Informed Conditioning on Clinical Covariates Increases Power in Case-Control Association Studies. PLoS Genet 8(11): e32767. doi:10.1371/journal.pgen.1003032
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1003032
Souhrn
Genetic case-control association studies often include data on clinical covariates, such as body mass index (BMI), smoking status, or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios for established variants estimated from low–BMI cases are larger than those estimated from high–BMI cases. An unanswered question is how to use this information to maximize statistical power in case-control studies that ascertain individuals on the basis of phenotype (case-control ascertainment) or phenotype and clinical covariates (case-control-covariate ascertainment). While current approaches improve power in studies with random ascertainment, they often lose power under case-control ascertainment and fail to capture available power increases under case-control-covariate ascertainment. We show that an informed conditioning approach, based on the liability threshold model with parameters informed by external epidemiological information, fully accounts for disease prevalence and non-random ascertainment of phenotype as well as covariates and provides a substantial increase in power while maintaining a properly controlled false-positive rate. Our method outperforms standard case-control association tests with or without covariates, tests of gene x covariate interaction, and previously proposed tests for dealing with covariates in ascertained data, with especially large improvements in the case of case-control-covariate ascertainment. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, lung cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 89,726 samples. In these datasets, informed conditioning outperforms logistic regression for 115 of the 157 known associated variants investigated (P-value = 1×10−9). The improvement varied across diseases with a 16% median increase in χ2 test statistics and a commensurate increase in power. This suggests that applying our method to existing and future association studies of these diseases may identify novel disease loci.
Zdroje
1. VoightBF, ScottLJ, SteinthorsdottirV, MorrisAP, DinaC, et al. (2010) Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. Nat Genet 42: 579–589.
2. FreedmanML, HaimanCA, PattersonN, McDonaldGJ, TandonA, et al. (2006) Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc Natl Acad Sci U S A 103: 14068–14073.
3. Kote-JaraiZ, OlamaAA, GilesGG, SeveriG, SchleutkerJ, et al. (2011) Seven prostate cancer susceptibility loci identified by a multi-stage genome-wide association study. Nat Genet 43: 785–791.
4. EllisKL, PilbrowAP, FramptonCM, DoughtyRN, WhalleyGA, et al. (2010) A common variant at chromosome 9P21.3 is associated with age of onset of coronary disease but not subsequent mortality. Circ Cardiovasc Genet 3: 286–293.
5. ImielinskiM, BaldassanoRN, GriffithsA, RussellRK, AnneseV, et al. (2009) Common variants at five new loci associated with early-onset inflammatory bowel disease. Nat Genet 41: 1335–1340.
6. WaldNJ, HackshawAK (1996) Cigarette smoking: an epidemiological overview. Br Med Bull 52: 3–11.
7. NeuhausJM (1998) Estimation Efficiency With Omitted Covariates in Generalized Linear Models. Journal of the Amer ican Stat ist ical Associat ion 93
8. RobinsonLD, JewellNP (1991) Some Surprising Results about Covariate Adjustment in Logistic Regression Models. International Statistical Review 59: 13.
9. RoseS, van der LaanM (2008) Simple Optimal Weighting of Cases and Controls in Case-Control Studies. The International Journal of Biostatistics 4
10. MonseesGM, TamimiRM, KraftP (2009) Genome-wide association scans for secondary traits using case-control samples. Genet Epidemiol 33: 717–728.
11. KuoCL, FeingoldE (2010) What's the best statistic for a simple test of genetic association in a case-control study? Genet Epidemiol 34: 246–253.
12. ChatterjeeN, CarrollRJ (2005) Semiparametric maximum-likelihood estimation exploiting gene-environment independence in case-control studies. Biometrika 92: 19.
13. ClaytonD (2012) Link functions in multi-locus genetic models: implications for testing, prediction, and interpretation. Genet Epidemiol 36: 409–418.
14. ZaitlenN, PasaniucB, PattersonN, PollackS, VoightB, et al. (2012) Analysis of case-control association studies with known risk variants. Bioinformatics 28: 1729–1737.
15. PirinenM, DonnellyP, SpencerCC (2012) Including known covariates can reduce power to detect genetic effects in case-control studies. Nat Genet 44: 848–851.
16. GueyLT, KravicJ, MelanderO, BurttNP, LaramieJM, et al. (2011) Power in the phenotypic extremes: a simulation study of power in discovery and replication of rare variants. Genet Epidemiol
17. ArmitageP (1955) Tests for linear trends in proportions and frequencies. Biometrics 11: 375–386.
18. KraftP, YenYC, StramDO, MorrisonJ, GaudermanWJ (2007) Exploiting gene-environment interaction to detect genetic associations. Hum Hered 63: 111–119.
19. ThomasD (2010) Gene-environment-wide association studies: emerging approaches. Nat Rev Genet 11: 259–272.
20. KathiresanS, VoightBF, PurcellS, MusunuruK, ArdissinoD, et al. (2009) Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants. Nat Genet 41: 334–341.
21. PiegorschWW, WeinbergCR, TaylorJA (1994) Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Stat Med 13: 153–162.
22. FalconerDS (1967) The inheritance of liability to diseases with variable age of onset, with particular reference to diabetes mellitus. Ann Hum Genet 31: 1–20.
23. WrayNR, YangJ, GoddardME, VisscherPM (2010) The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet 6: e1000864 doi:10.1371/journal.pgen.1000864
24. SoHC, ShamPC (2010) A unifying framework for evaluating the predictive power of genetic variants based on the level of heritability explained. PLoS Genet 6: e1001230 doi:10.1371/journal.pgen.1001230
25. LeeSH, WrayNR, GoddardME, VisscherPM (2011) Estimating Missing Heritability for Disease from Genome-wide Association Studies. Am J Hum Genet 88: 294–305.
26. ZaitlenN, PasaniucB, PattersonN, PollackS, VoightB, et al. (2012) Analysis of case-control association studies with known risk variants. Bioinformatics
27. PriceAL, PattersonNJ, PlengeRM, WeinblattME, ShadickNA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904–909.
28. WallaceC, ChapmanJM, ClaytonDG (2006) Improved power offered by a score test for linkage disequilibrium mapping of quantitative-trait loci by selective genotyping. Am J Hum Genet 78: 498–504.
29. CoxD, HinkleyD (1974) Theoretical statistics. Chapman and Hall
30. MarchiniJ, HowieB, MyersS, McVeanG, DonnellyP (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39: 906–913.
31. Wasserman L (2005) All of Statistics: Springer.
32. LumleyT, DiehrP, EmersonS, ChenL (2002) The importance of the normality assumption in large public health datasets. Annu Rev Public Health 23: 151–169.
33. HillWG, GoddardME, VisscherPM (2008) Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet 4: e1000008 doi:10.1371/journal.pgen.1000008
34. HamzaTH, ChenH, Hill-BurnsEM, RhodesSL, MontimurroJ, et al. (2011) Genome-Wide Gene-Environment Study Identifies Glutamate Receptor Gene GRIN2A as a Parkinson's Disease Modifier Gene via Interaction with Coffee. PLoS Genet 7: e1002237 doi:10.1371/journal.pgen.1002237
35. DongJ, HuZ, WuC, GuoH, ZhouB, et al. (2012) Association analyses identify multiple new lung cancer susceptibility loci and their interactions with smoking in the Chinese population. Nat Genet 44: 895–899.
36. PerryJR, VoightBF, YengoL, AminN, DupuisJ, et al. (2012) Stratifying type 2 diabetes cases by BMI identifies genetic risk variants in LAMA1 and enrichment for risk variants in lean compared to obese cases. PLoS Genet 8: e1002741 doi:10.1371/journal.pgen.1002741
37. PerryJRB, VoightBF, YengoLØ, AminN, DupuisJe, et al. (2012) Stratifying Type 2 Diabetes Cases by BMI Identifies Genetic Risk Variants in LAMA1 and Enrichment for Risk Variants in Lean Compared to Obese Cases. PLoS Genet 8: e1002741 doi:10.1371/journal.pgen.1002741
38. WatersKM, StramDO, HassaneinMT, Le MarchandL, WilkensLR, et al. (2010) Consistent association of type 2 diabetes risk variants found in europeans in diverse racial and ethnic groups. PLoS Genet 6 doi:10.1371/journal.pgen.1001078
39. MaskarinecG, GrandinettiA, MatsuuraG, SharmaS, MauM, et al. (2009) Diabetes prevalence and body mass index differ by ethnicity: the Multiethnic Cohort. Ethn Dis 19: 49–55.
40. LindstromS, SchumacherF, SiddiqA, TravisRC, CampaD, et al. (2011) Characterizing Associations and SNP-Environment Interactions for GWAS-Identified Prostate Cancer Risk Markers-Results from BPC3. PLoS ONE 6: e17142 doi:10.1371/journal.pone.0017142
41. Jewell NP (2004) Statistics for epidemiology. Boca Raton: Chapman & Hall/CRC. xiv: , 333 p. p.
42. FieldJK, SmithDL, DuffyS, CassidyA (2005) The Liverpool Lung Project research protocol. Int J Oncol 27: 1633–1645.
43. ZienolddinyS, CampaD, LindH, RybergD, SkaugV, et al. (2008) A comprehensive analysis of phase I and phase II metabolism gene polymorphisms and risk of non-small cell lung cancer in smokers. Carcinogenesis 29: 1164–1169.
44. HunterDJ, RiboliE, HaimanCA, AlbanesD, AltshulerD, et al. (2005) A candidate gene approach to searching for low-penetrance breast and prostate cancer genes. Nat Rev Cancer 5: 977–985.
45. CampaD, KaaksR, Le MarchandL, HaimanCA, TravisRC, et al. (2011) Interactions between genetic variants and breast cancer risk factors in the breast and prostate cancer cohort consortium. J Natl Cancer Inst 103: 1252–1263.
46. ThomsonW, BartonA, KeX, EyreS, HinksA, et al. (2007) Rheumatoid arthritis association at 6q23. Nat Genet 39: 1431–1433.
47. SchaumbergDA, HankinsonSE, GuoQ, RimmE, HunterDJ (2007) A prospective study of 2 major age-related macular degeneration susceptibility alleles and interactions with modifiable risk factors. Arch Ophthalmol 125: 55–62.
48. GenoveseG, FriedmanDJ, RossMD, LecordierL, UzureauP, et al. (2010) Association of trypanolytic ApoL1 variants with kidney disease in African Americans. Science 329: 841–845.
49. FraylingTM, TimpsonNJ, WeedonMN, ZegginiE, FreathyRM, et al. (2007) A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316: 889–894.
50. ChanockSJ, HunterDJ (2008) Genomics: when the smoke clears. Nature 452: 537–538.
51. VanderweeleTJ, AsomaningK, Tchetgen TchetgenEJ, HanY, SpitzMR, et al. (2012) Genetic variants on 15q25.1, smoking, and lung cancer: an assessment of mediation and interaction. Am J Epidemiol 175: 1013–1020.
52. RidkerPM, ChasmanDI, ZeeRY, ParkerA, RoseL, et al. (2008) Rationale, design, and methodology of the Women's Genome Health Study: a genome-wide association study of more than 25,000 initially healthy american women. Clin Chem 54: 249–255.
53. RischN, ZhangH (1995) Extreme discordant sib pairs for mapping quantitative trait loci in humans. Science 268: 1584–1589.
54. RischNJ, ZhangH (1996) Mapping quantitative trait loci with extreme discordant sib pairs: sampling considerations. Am J Hum Genet 58: 836–843.
55. Van GestelS, Houwing-DuistermaatJJ, AdolfssonR, van DuijnCM, Van BroeckhovenC (2000) Power of selective genotyping in genetic association analyses of quantitative traits. Behav Genet 30: 141–146.
56. KryukovGV, ShpuntA, StamatoyannopoulosJA, SunyaevSR (2009) Power of deep, all-exon resequencing for discovery of human trait genes. Proc Natl Acad Sci U S A 106: 3871–3876.
57. LanderESBD (1989) Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199.
58. JinG, XuL, ShuY, TianT, LiangJ, et al. (2009) Common genetic variants on 5p15.33 contribute to risk of lung adenocarcinoma in a Chinese population. Carcinogenesis 30: 987–990.
59. HolmH, GudbjartssonDF, SulemP, MassonG, HelgadottirHT, et al. (2011) A rare variant in MYH6 is associated with high risk of sick sinus syndrome. Nat Genet 43: 316–320.
60. AmosCI, SpitzMR, CinciripiniP (2010) Chipping away at the genetics of smoking behavior. Nat Genet 42: 366–368.
61. RaychaudhuriS, IartchoukO, ChinK, TanPL, TaiAK, et al. (2011) A rare penetrant mutation in CFH confers high risk of age-related macular degeneration. Nat Genet 43: 1232–1236.
62. SulemP, GudbjartssonDF, WaltersGB, HelgadottirHT, HelgasonA, et al. (2011) Identification of low-frequency variants associated with gout and serum uric acid levels. Nat Genet 43: 1127–1130.
63. PriceAL, ZaitlenNA, ReichD, PattersonN (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11: 459–463.
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2012 Číslo 11
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Mechanisms Employed by to Prevent Ribonucleotide Incorporation into Genomic DNA by Pol V
- Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data
- Zcchc11 Uridylates Mature miRNAs to Enhance Neonatal IGF-1 Expression, Growth, and Survival
- Histone Methyltransferases MES-4 and MET-1 Promote Meiotic Checkpoint Activation in