Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis
Evidence from human genetic studies of several disorders suggests that interactions between alleles at multiple genes play an important role in influencing phenotypic expression. Analytical methods for identifying Mendelian disease genes are not appropriate when applied to common multigenic diseases, because such methods investigate association with the phenotype only one genetic locus at a time. New strategies are needed that can capture the spectrum of genetic effects, from Mendelian to multifactorial epistasis. Random Forests (RF) and Relief-F are two powerful machine-learning methods that have been studied as filters for genetic case-control data due to their ability to account for the context of alleles at multiple genes when scoring the relevance of individual genetic variants to the phenotype. However, when variants interact strongly, the independence assumption of RF in the tree node-splitting criterion leads to diminished importance scores for relevant variants. Relief-F, on the other hand, was designed to detect strong interactions but is sensitive to large backgrounds of variants that are irrelevant to classification of the phenotype, which is an acute problem in genome-wide association studies. To overcome the weaknesses of these data mining approaches, we develop Evaporative Cooling (EC) feature selection, a flexible machine learning method that can integrate multiple importance scores while removing irrelevant genetic variants. To characterize detailed interactions, we construct a genetic-association interaction network (GAIN), whose edges quantify the synergy between variants with respect to the phenotype. We use simulation analysis to show that EC is able to identify a wide range of interaction effects in genetic association data. We apply the EC filter to a smallpox vaccine cohort study of single nucleotide polymorphisms (SNPs) and infer a GAIN for a collection of SNPs associated with adverse events. Our results suggest an important role for hubs in SNP disease susceptibility networks. The software is available at http://sites.google.com/site/McKinneyLab/software.
Vyšlo v časopise:
Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis. PLoS Genet 5(3): e32767. doi:10.1371/journal.pgen.1000432
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1000432
Souhrn
Evidence from human genetic studies of several disorders suggests that interactions between alleles at multiple genes play an important role in influencing phenotypic expression. Analytical methods for identifying Mendelian disease genes are not appropriate when applied to common multigenic diseases, because such methods investigate association with the phenotype only one genetic locus at a time. New strategies are needed that can capture the spectrum of genetic effects, from Mendelian to multifactorial epistasis. Random Forests (RF) and Relief-F are two powerful machine-learning methods that have been studied as filters for genetic case-control data due to their ability to account for the context of alleles at multiple genes when scoring the relevance of individual genetic variants to the phenotype. However, when variants interact strongly, the independence assumption of RF in the tree node-splitting criterion leads to diminished importance scores for relevant variants. Relief-F, on the other hand, was designed to detect strong interactions but is sensitive to large backgrounds of variants that are irrelevant to classification of the phenotype, which is an acute problem in genome-wide association studies. To overcome the weaknesses of these data mining approaches, we develop Evaporative Cooling (EC) feature selection, a flexible machine learning method that can integrate multiple importance scores while removing irrelevant genetic variants. To characterize detailed interactions, we construct a genetic-association interaction network (GAIN), whose edges quantify the synergy between variants with respect to the phenotype. We use simulation analysis to show that EC is able to identify a wide range of interaction effects in genetic association data. We apply the EC filter to a smallpox vaccine cohort study of single nucleotide polymorphisms (SNPs) and infer a GAIN for a collection of SNPs associated with adverse events. Our results suggest an important role for hubs in SNP disease susceptibility networks. The software is available at http://sites.google.com/site/McKinneyLab/software.
Zdroje
1. HirschhornJN
LohmuellerK
ByrneE
HirschhornK
2002 A comprehensive review of genetic association studies. Genet Med 4 45 61
2. AltmullerJ
PalmerLJ
FischerG
ScherbH
WjstM
2001 Genomewide scans of complex human diseases: true linkage is hard to find. Am J Hum Genet 69 936 950
3. CordellHJ
2002 Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum Mol Genet 11 2463 2468
4. CulverhouseR
SuarezBK
LinJ
ReichT
2002 A perspective on epistasis: limits of models displaying no main effect. Am J Hum Genet 70 461 471
5. McKinneyBA
ReifDM
RitchieMD
MooreJH
2006 Machine learning for detecting gene-gene interactions: a review. Appl Bioinformatics 5 77 88
6. CarlborgO
HaleyCS
2004 Epistasis: too often neglected in complex trait studies? Nat Rev Genet 5 618 625
7. HeidemaAG
BoerJM
NagelkerkeN
MarimanEC
van derAD
2006 The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases. BMC Genet 7 23
8. HohJ
OttJ
2003 Mathematical multi-locus approaches to localizing complex human trait genes. Nat Rev Genet 4 701 709
9. MusaniSK
ShrinerD
LiuN
FengR
CoffeyCS
2007 Detection of gene x gene interactions in genome-wide association studies of human population data. Hum Hered 63 67 84
10. CheverudJM
RoutmanEJ
1995 Epistasis and its contribution to genetic variance components. Genetics 139 1455 1461
11. KimJH
SenS
AveryCS
SimpsonE
ChandlerP
2001 Genetic analysis of a new mouse model for non-insulin-dependent diabetes. Genomics 74 273 286
12. MackayTF
2001 The genetic architecture of quantitative traits. Annu Rev Genet 35 303 339
13. SegreD
DelunaA
ChurchGM
KishonyR
2005 Modular epistasis in yeast metabolism. Nat Genet 37 77 83
14. ShimomuraK
Low-ZeddiesSS
KingDP
SteevesTD
WhiteleyA
2001 Genome-wide epistatic interaction analysis reveals complex genetic determinants of circadian behavior in mice. Genome Res 11 959 980
15. WilliamsSM
HainesJL
MooreJH
2004 The use of animal models in the study of complex disease: all else is never equal or why do so many human studies fail to replicate animal findings? Bioessays 26 170 179
16. ReimanEM
WebsterJA
MyersAJ
HardyJ
DunckleyT
2007 GAB2 alleles modify Alzheimer's risk in APOE epsilon4 carriers. Neuron 54 713 720
17. ThorleifssonG
MagnussonKP
SulemP
WaltersGB
GudbjartssonDF
2007 Common sequence variants in the LOXL1 gene confer susceptibility to exfoliation glaucoma. Science 317 1397 1400
18. GudbjartssonDF
ArnarDO
HelgadottirA
GretarsdottirS
HolmH
2007 Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature 448 353 357
19. CarrasquilloMM
McCallionAS
PuffenbergerEG
KashukCS
NouriN
2002 Genome-wide association study and mouse model identify interaction between RET and EDNRB pathways in Hirschsprung disease. Nat Genet 32 237 244
20. BreimanL
2001 Random Forests. Machine Learning 45 5 32
21. BureauA
DupuisJ
FallsK
LunettaKL
HaywardB
2005 Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol 28 171 182
22. LunettaKL
HaywardLB
SegalJ
Van EerdeweghP
2004 Screening large-scale association study data: exploiting interactions using random forests. BMC Genet 5 32
23. DraperB
KaitoC
BinsJ
2003 Iterative Relief Madison, WI Workshop on Learning in Computer Vision and Pattern Recognition
24. MooreJH
WhiteBC
2007 Tuning ReliefF for Genome-Wide Genetic Analysis. Lecture Notes in Computer Science: Evolutionary Computation, Machine Learning, and Data Mining in Bioinformatics Springer 166 175
25. Robnik-SikonjaM
Improving Random Forests.
BoulicautJF
Machine Learning, ECML, 2004 Berlin Springer 359 370
26. McKinneyBA
ReifDM
WhiteBC
CroweJEJr
MooreJH
2007 Evaporative cooling feature selection for genotypic data involving interactions. Bioinformatics 23 2113 2120
27. ReifDM
McKinneyBA
MotsingerAA
ChanockSJ
EdwardsKM
2008 Genetic basis for adverse events following smallpox vaccination. Journal of Infectious Diseases 198 16 22
28. McGillWJ
1954 Multivariate information transmission. Psychometrika 19 97 116
29. JakulinA
BratkoI
2003 Analyzing attribute interactions. Lecture Notes in Artificial Intelligence 2838 229 240
30. MooreJH
GilbertJC
TsaiCT
ChiangFT
HoldenT
2006 A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol 241 252 261
31. ParkMY
HastieT
2008 Penalized logistic regression for detecting gene interactions. Biostatistics 9 30 50
32. DudekSM
MotsingerAA
VelezDR
WilliamsSM
RitchieMD
2006 Data simulation software for whole-genome association and other studies in human genetics. Pac Symp Biocomput 499 510
33. HaflerDA
CompstonA
SawcerS
LanderES
DalyMJ
2007 Risk alleles for multiple sclerosis identified by a genomewide study. N Engl J Med 357 851 862
34. HunterDJ
KraftP
JacobsKB
CoxDG
YeagerM
2007 A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39 870 874
35. Thornton-WellsTA
MooreJH
HainesJL
2004 Genetics, statistics and human disease: analytical retooling for complexity. Trends Genet 20 640 647
36. TopicG
SmucT
2004 PARF parallel RF algorithm Rudjer Boskovic Institute, Center for informatics and computing http://www.irb.hr/en/cir/projects/info/parf/
37. KononenkoI
1994 Analysis and extensions of Relief; European Conference on Machine Learning Catana, Italy Springer-Verlag 171 182
38. HessH
1986 Evaporative cooling of a magnetically trapped and compressed spin-polarized hydrogen gas. Physical Review B 34 3476 3479
39. BellmanR
1961 Adaptive Control Processes Princeton University Press
40. ShannonP
MarkielA
OzierO
BaligaNS
WangJT
2003 Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13 2498 2504
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2009 Číslo 3
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Neocentromeres Come of Age
- Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis
- Mitotic Recombination: Why? When? How? Where?
- Life, Death, Differentiation, and the Multicellularity of Bacteria