#PAGE_PARAMS# #ADS_HEAD_SCRIPTS# #MICRODATA#

Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis


Evidence from human genetic studies of several disorders suggests that interactions between alleles at multiple genes play an important role in influencing phenotypic expression. Analytical methods for identifying Mendelian disease genes are not appropriate when applied to common multigenic diseases, because such methods investigate association with the phenotype only one genetic locus at a time. New strategies are needed that can capture the spectrum of genetic effects, from Mendelian to multifactorial epistasis. Random Forests (RF) and Relief-F are two powerful machine-learning methods that have been studied as filters for genetic case-control data due to their ability to account for the context of alleles at multiple genes when scoring the relevance of individual genetic variants to the phenotype. However, when variants interact strongly, the independence assumption of RF in the tree node-splitting criterion leads to diminished importance scores for relevant variants. Relief-F, on the other hand, was designed to detect strong interactions but is sensitive to large backgrounds of variants that are irrelevant to classification of the phenotype, which is an acute problem in genome-wide association studies. To overcome the weaknesses of these data mining approaches, we develop Evaporative Cooling (EC) feature selection, a flexible machine learning method that can integrate multiple importance scores while removing irrelevant genetic variants. To characterize detailed interactions, we construct a genetic-association interaction network (GAIN), whose edges quantify the synergy between variants with respect to the phenotype. We use simulation analysis to show that EC is able to identify a wide range of interaction effects in genetic association data. We apply the EC filter to a smallpox vaccine cohort study of single nucleotide polymorphisms (SNPs) and infer a GAIN for a collection of SNPs associated with adverse events. Our results suggest an important role for hubs in SNP disease susceptibility networks. The software is available at http://sites.google.com/site/McKinneyLab/software.


Vyšlo v časopise: Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis. PLoS Genet 5(3): e32767. doi:10.1371/journal.pgen.1000432
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1000432

Souhrn

Evidence from human genetic studies of several disorders suggests that interactions between alleles at multiple genes play an important role in influencing phenotypic expression. Analytical methods for identifying Mendelian disease genes are not appropriate when applied to common multigenic diseases, because such methods investigate association with the phenotype only one genetic locus at a time. New strategies are needed that can capture the spectrum of genetic effects, from Mendelian to multifactorial epistasis. Random Forests (RF) and Relief-F are two powerful machine-learning methods that have been studied as filters for genetic case-control data due to their ability to account for the context of alleles at multiple genes when scoring the relevance of individual genetic variants to the phenotype. However, when variants interact strongly, the independence assumption of RF in the tree node-splitting criterion leads to diminished importance scores for relevant variants. Relief-F, on the other hand, was designed to detect strong interactions but is sensitive to large backgrounds of variants that are irrelevant to classification of the phenotype, which is an acute problem in genome-wide association studies. To overcome the weaknesses of these data mining approaches, we develop Evaporative Cooling (EC) feature selection, a flexible machine learning method that can integrate multiple importance scores while removing irrelevant genetic variants. To characterize detailed interactions, we construct a genetic-association interaction network (GAIN), whose edges quantify the synergy between variants with respect to the phenotype. We use simulation analysis to show that EC is able to identify a wide range of interaction effects in genetic association data. We apply the EC filter to a smallpox vaccine cohort study of single nucleotide polymorphisms (SNPs) and infer a GAIN for a collection of SNPs associated with adverse events. Our results suggest an important role for hubs in SNP disease susceptibility networks. The software is available at http://sites.google.com/site/McKinneyLab/software.


Zdroje

1. HirschhornJN

LohmuellerK

ByrneE

HirschhornK

2002 A comprehensive review of genetic association studies. Genet Med 4 45 61

2. AltmullerJ

PalmerLJ

FischerG

ScherbH

WjstM

2001 Genomewide scans of complex human diseases: true linkage is hard to find. Am J Hum Genet 69 936 950

3. CordellHJ

2002 Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum Mol Genet 11 2463 2468

4. CulverhouseR

SuarezBK

LinJ

ReichT

2002 A perspective on epistasis: limits of models displaying no main effect. Am J Hum Genet 70 461 471

5. McKinneyBA

ReifDM

RitchieMD

MooreJH

2006 Machine learning for detecting gene-gene interactions: a review. Appl Bioinformatics 5 77 88

6. CarlborgO

HaleyCS

2004 Epistasis: too often neglected in complex trait studies? Nat Rev Genet 5 618 625

7. HeidemaAG

BoerJM

NagelkerkeN

MarimanEC

van derAD

2006 The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases. BMC Genet 7 23

8. HohJ

OttJ

2003 Mathematical multi-locus approaches to localizing complex human trait genes. Nat Rev Genet 4 701 709

9. MusaniSK

ShrinerD

LiuN

FengR

CoffeyCS

2007 Detection of gene x gene interactions in genome-wide association studies of human population data. Hum Hered 63 67 84

10. CheverudJM

RoutmanEJ

1995 Epistasis and its contribution to genetic variance components. Genetics 139 1455 1461

11. KimJH

SenS

AveryCS

SimpsonE

ChandlerP

2001 Genetic analysis of a new mouse model for non-insulin-dependent diabetes. Genomics 74 273 286

12. MackayTF

2001 The genetic architecture of quantitative traits. Annu Rev Genet 35 303 339

13. SegreD

DelunaA

ChurchGM

KishonyR

2005 Modular epistasis in yeast metabolism. Nat Genet 37 77 83

14. ShimomuraK

Low-ZeddiesSS

KingDP

SteevesTD

WhiteleyA

2001 Genome-wide epistatic interaction analysis reveals complex genetic determinants of circadian behavior in mice. Genome Res 11 959 980

15. WilliamsSM

HainesJL

MooreJH

2004 The use of animal models in the study of complex disease: all else is never equal or why do so many human studies fail to replicate animal findings? Bioessays 26 170 179

16. ReimanEM

WebsterJA

MyersAJ

HardyJ

DunckleyT

2007 GAB2 alleles modify Alzheimer's risk in APOE epsilon4 carriers. Neuron 54 713 720

17. ThorleifssonG

MagnussonKP

SulemP

WaltersGB

GudbjartssonDF

2007 Common sequence variants in the LOXL1 gene confer susceptibility to exfoliation glaucoma. Science 317 1397 1400

18. GudbjartssonDF

ArnarDO

HelgadottirA

GretarsdottirS

HolmH

2007 Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature 448 353 357

19. CarrasquilloMM

McCallionAS

PuffenbergerEG

KashukCS

NouriN

2002 Genome-wide association study and mouse model identify interaction between RET and EDNRB pathways in Hirschsprung disease. Nat Genet 32 237 244

20. BreimanL

2001 Random Forests. Machine Learning 45 5 32

21. BureauA

DupuisJ

FallsK

LunettaKL

HaywardB

2005 Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol 28 171 182

22. LunettaKL

HaywardLB

SegalJ

Van EerdeweghP

2004 Screening large-scale association study data: exploiting interactions using random forests. BMC Genet 5 32

23. DraperB

KaitoC

BinsJ

2003 Iterative Relief Madison, WI Workshop on Learning in Computer Vision and Pattern Recognition

24. MooreJH

WhiteBC

2007 Tuning ReliefF for Genome-Wide Genetic Analysis. Lecture Notes in Computer Science: Evolutionary Computation, Machine Learning, and Data Mining in Bioinformatics Springer 166 175

25. Robnik-SikonjaM

Improving Random Forests.

BoulicautJF

Machine Learning, ECML, 2004 Berlin Springer 359 370

26. McKinneyBA

ReifDM

WhiteBC

CroweJEJr

MooreJH

2007 Evaporative cooling feature selection for genotypic data involving interactions. Bioinformatics 23 2113 2120

27. ReifDM

McKinneyBA

MotsingerAA

ChanockSJ

EdwardsKM

2008 Genetic basis for adverse events following smallpox vaccination. Journal of Infectious Diseases 198 16 22

28. McGillWJ

1954 Multivariate information transmission. Psychometrika 19 97 116

29. JakulinA

BratkoI

2003 Analyzing attribute interactions. Lecture Notes in Artificial Intelligence 2838 229 240

30. MooreJH

GilbertJC

TsaiCT

ChiangFT

HoldenT

2006 A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol 241 252 261

31. ParkMY

HastieT

2008 Penalized logistic regression for detecting gene interactions. Biostatistics 9 30 50

32. DudekSM

MotsingerAA

VelezDR

WilliamsSM

RitchieMD

2006 Data simulation software for whole-genome association and other studies in human genetics. Pac Symp Biocomput 499 510

33. HaflerDA

CompstonA

SawcerS

LanderES

DalyMJ

2007 Risk alleles for multiple sclerosis identified by a genomewide study. N Engl J Med 357 851 862

34. HunterDJ

KraftP

JacobsKB

CoxDG

YeagerM

2007 A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39 870 874

35. Thornton-WellsTA

MooreJH

HainesJL

2004 Genetics, statistics and human disease: analytical retooling for complexity. Trends Genet 20 640 647

36. TopicG

SmucT

2004 PARF parallel RF algorithm Rudjer Boskovic Institute, Center for informatics and computing http://www.irb.hr/en/cir/projects/info/parf/

37. KononenkoI

1994 Analysis and extensions of Relief; European Conference on Machine Learning Catana, Italy Springer-Verlag 171 182

38. HessH

1986 Evaporative cooling of a magnetically trapped and compressed spin-polarized hydrogen gas. Physical Review B 34 3476 3479

39. BellmanR

1961 Adaptive Control Processes Princeton University Press

40. ShannonP

MarkielA

OzierO

BaligaNS

WangJT

2003 Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13 2498 2504

Štítky
Genetika Reprodukčná medicína
Prihlásenie
Zabudnuté heslo

Zadajte e-mailovú adresu, s ktorou ste vytvárali účet. Budú Vám na ňu zasielané informácie k nastaveniu nového hesla.

Prihlásenie

Nemáte účet?  Registrujte sa

#ADS_BOTTOM_SCRIPTS#