Discovering Genetic Interactions in Large-Scale Association Studies by Stage-wise Likelihood Ratio Tests

English version České info

Many of our common diseases are driven by complex interactions between multiple genetic factors. Disease-specific, genome-wide association studies have been the prominent tool for cataloging such factors, by studying the genetic variation of a gene in a population and its association with the disease. However, these studies often fail to capture interactions between genes despite their importance. Interactions are notoriously difficult to investigate, because testing the large number of possible interactions using contemporary statistical methods requires very large sample sizes and computational resources. We have taken a step forward by developing a new statistical methodology that significantly reduces these requirements, making the study of interactions more feasible. We show that our methodology makes it possible to study interactions on a large scale without compromising the statistical accuracy. We further demonstrate the utility of our methodology on data relating to coronary artery disease and discover three distinct interactions that might provides new clues to the pathophysiology. The study of genetic interactions have the potential to link disease genes together into disease networks that provide a more detailed description of the interaction between genes and how it drives the disease.

Vyšlo v časopise: Discovering Genetic Interactions in Large-Scale Association Studies by Stage-wise Likelihood Ratio Tests. PLoS Genet 11(9): e32767. doi:10.1371/journal.pgen.1005502
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1005502

Souhrn

Zdroje

1. GBD 2013 Mortality and Causes of Death Collaborators. Global, regional, and national age-sex specific all-cause and cause-specific mortality for 240 causes of death, 1990–2013: a systematic analysis for the Global Burden of Disease Study 2013. The Lancet. 2015 January;385(9963):117–171.

2. CARDIoGRAMplusC4D Consortium, Deloukas P, Kanoni S, Willenborg C, Farrall M, Assimes TL, et al. Large-scale association analysis identifies new risk loci for coronary artery disease. Nat Genet. 2013 January;45(1):25–33. doi: 10.1038/ng.2480 23202125

3. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009 October;461(7265):747–53. doi: 10.1038/nature08494 19812666

4. Lehner B. Genotype to phenotype: lessons from model organisms for human genetics. Nat Rev Genet. 2013 January;14(3):168–178. doi: 10.1038/nrg3404 23358379

5. Bateson W, Saunders E, Punnett R. Experimental studies in the physiology of heredity. Roy Soc Evolution Com Rpts. 1905;2 : 1–55.

6. Kelley R, Ideker T. Systematic interpretation of genetic interactions using protein networks. Nat Biotechnol. 2005 May;23(5):561–6. doi: 10.1038/nbt1096 15877074

7. McLellan J, O’Neil N, Tarailo S, Stoepel J, Bryan J, Rose A, et al. Synthetic Lethal Genetic Interactions That Decrease Somatic Cell Proliferation in Caenorhabditis elegans Identify the Alternative RFCCTF18 as a Candidate Cancer Drug Target. Mol Biol Cell. 2009 October;20 : 5305–5313. doi: 10.1091/mbc.E09-08-0699

8. Szappanos B, Kovács K, Szamecz B, Honti F, Costanzo M, Baryshnikova A, et al. An integrated approach to characterize genetic interaction networks in yeast metabolism. Nat Genet. 2011 May;43 : 656–662. doi: 10.1038/ng.846 21623372

9. Leamy LJ, Pomp D, Lightfoot JT. An Epistatic Genetic Basis for Physical Activity Traits in Mice. J Hered. 2008 May;99(6):639–646. doi: 10.1093/jhered/esn045 18534999

10. Gaertner BE, Parmenter MD, Rockman MV, Kruglyak L, Phillips PC. More than the sum of its parts: a complex epistatic network underlies natural variation in thermal preference behavior in Caenorhabditis elegans. Genetics. 2012 December;192(4):1533–1542. doi: 10.1534/genetics.112.142877 23086219

11. Huang W, Richards S, Carbone MA, Zhu D, Anholt RRH, Ayroles JF, et al. Epistasis dominates the genetic architecture of Drosophila quantitative traits. Proc Natl Acad Sci U S A. 2012 September;109(39):15553–15559. doi: 10.1073/pnas.1213423109 22949659

12. Gibson G. Epistasis and pleiotropy as natural properties of transcriptional regulation. Theor Popul Biol. 1996 January;49 : 58–89. doi: 10.1006/tpbi.1996.0003 8813014

13. Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009 June;10(6):392–404. doi: 10.1038/nrg2579 19434077

14. Steen KV. Travelling the world of gene-gene interactions. Brief Bioinform. 2012 January;13(1):1–19. doi: 10.1093/bib/bbr012 21441561

15. Sham PC, Purcell SM. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet. 2014 April;15(5):335–346. doi: 10.1038/nrg3706 24739678

16. Cordell HJ. Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet. 2002 October;11(20):2463–8. doi: 10.1093/hmg/11.20.2463 12351582

17. Clayton D. Link functions in multi-locus genetic models: implications for testing, prediction, and interpretation. Genet Epidemiol. 2012 May;36(4):409–18. doi: 10.1002/gepi.21635 22508388

18. Good IJ. Maximum Entropy for Hypothesis Formulation, Especially for Multidimensional Contingency Tables. Ann Math Stat. 1963 March;34 : 911–934. doi: 10.1214/aoms/1177704014

19. Risch N. Linkage Strategies for Genetically Complex Traits. 1. Multilocus Models. Am J Hum Genet. 1990 February;46(2):222–228. 2301392

20. Loftus GR. On interpretation of interactions. Mem Cognit. 1978 February;6 : 312–319. doi: 10.3758/BF03197461

21. Knol MJ, VanderWeele TJ. Recommendations for presenting analyses of effect modification and interaction. Intl J Epidemiol. 2012 April;41(2):514–20. doi: 10.1093/ije/dyr218

22. Nielsen DM, Ehm MG, Zaykin DV, Weir BS. Effect of two -⁠ and three-locus linkage disequilibrium on the power to detect marker/phenotype associations. Genetics. 2004 October;168(2):1029–1040. doi: 10.1534/genetics.103.022335 15514073

23. Prabhu S, Pe’er I. Ultrafast genome-wide scan for SNP-SNP interactions in common complex disease. Genome Res. 2012 November;22(11):2230–2240. doi: 10.1101/gr.137885.112 22767386

24. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet. 2001;69(1):138–147. doi: 10.1086/321276 11404819

25. Chung Y, Lee SY, Elston RC, Park T. Odds ratio based multifactor-dimensionality reduction method for detecting gene-gene interactions. Bioinformatics. 2007 January;23(1):71–76. doi: 10.1093/bioinformatics/btl557 17092990

26. Calle ML, Urrea V, Vellalta G, Malats N, Van Steen K. Model-Based Multifactor Dimensionality Reduction for detecting interactions in high-dimensional genomic data. Technical Reports, Department of Systems Biology, Universitat de Vic. 2008 January;p. 1–14. Available from: http://repositori.uvic.cat/bitstream/handle/10854/408/docrec_a2008_calle_malu_mbmdr.pdf?sequence=1.

27. Wan X, Yang C, Yang Q, Xue H, Tang NLS, Yu W. Detecting two-locus associations allowing for interactions in genome-wide association studies. Bioinformatics. 2010 October;26(20):2517–25. doi: 10.1093/bioinformatics/btq486 20736343

28. Ueki M, Cordell HJ. Improved statistics for genome-wide interaction analysis. PLoS Genet. 2012 January;8(4):e1002625. doi: 10.1371/journal.pgen.1002625 22496670

29. Marchini J, Donnelly P, Cardon LR. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet. 2005 April;37(4):413–7. doi: 10.1038/ng1537 15793588

30. Millstein J, Conti DV, Gilliland FD, Gauderman WJ. A testing framework for identifying susceptibility genes in the presence of epistasis. Am J Hum Genet. 2006 Jan;78(1):15–27. doi: 10.1086/498850 16385446

31. Lewinger JP, Morrison JL, Thomas DC, Murcray CE, Conti DV, Li D, et al. Efficient Two-Step Testing of Gene-Gene Interactions in Genome-Wide Association Studies. Genet Epidemiol. 2013 April;37(5):440–451. doi: 10.1002/gepi.21720 23633124

32. Marcus R, Eric P, Gabriel KR. On Closed Testing Procedures with Special Reference to Ordered Analysis of Variance. Biometrika. 1976 December;63 : 655–660. doi: 10.1093/biomet/63.3.655

33. Wright SPW. Adjusted P-Values for Simultaneous Inference. Biometrics. 1992 December;48 : 1005–1013. doi: 10.2307/2532694

34. Berger R. Likelihood Ratio Tests and Intersection-Union Tests. In: Panchapakesan S, Balakrishnan N, editors. Advances in Statistical Decision Theory and Applications. Statistics for Industry and Technology. Birkhäuser Boston; 1997. p. 225–237. Available from: http://dx.doi.org/10.1007/978-1-4612-2308-5_15.

35. Li W, Reich J. A complete enumeration and classification of two-locus disease models. Hum Hered. 2000 December;50(6):334–49. doi: 10.1159/000022939 10899752

36. Lee I, Blom UM, Wang PI, Shim JE, Marcotte EM. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 2011 July;21 : 1109–1121. doi: 10.1101/gr.118992.110 21536720

37. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009 June;106(23):9362–9367. doi: 10.1073/pnas.0903103106 19474294

38. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 1995 March;57(1):289–300.

39. Hamsten A, Eriksson P. Identifying the susceptibility genes for coronary artery disease: from hyperbole through doubt to cautious optimism. J Intern Med. 2008 May;263 : 538–552. doi: 10.1111/j.1365-2796.2008.01958.x 18410597

40. Arvind P, Nair J, Jambunathan S, Kakkar VV, Shanker J. CELSR2-PSRC1-SORT1 gene expression and association with coronary artery disease and plasma lipid levels in an Asian Indian cohort. J Cardiol. 2014 November;64(5):339–346. doi: 10.1016/j.jjcc.2014.02.012 24674750

41. Libby P. Inflammation in atherosclerosis. Arterioscler Thromb Vasc Biol. 2012 September;32(9):2045–51. doi: 10.1161/ATVBAHA.108.179705 22895665

42. IBC 50K CAD Consortium, Butterworth AS, Braund PS, Farrall M, Hardwick RJ, Saleheen D, et al. Large-scale gene-centric analysis identifies novel variants for coronary artery disease. PLoS Genet. 2011 September;7(9):e1002260. doi: 10.1371/journal.pgen.1002260

43. Keating BJ, Tischfield S, Murray SS, Bhangale T, Price TS, Glessner JT, et al. Concept, design and implementation of a cardiovascular gene-centric 50 k SNP array for large-scale genomic association studies. PloS One. 2008 October;3:e3583. doi: 10.1371/journal.pone.0003583 18974833

44. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010 September;38(16):e164. doi: 10.1093/nar/gkq603 20601685

45. Kinsella RJ, Kahari A, Haider S, Zamora J, Proctor G, Spudich G, et al. Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database (Oxford). 2011 January;2011(0):bar030–bar030. doi: 10.1093/database/bar030