#PAGE_PARAMS# #ADS_HEAD_SCRIPTS# #MICRODATA#

Incorporating Biological Pathways via a Markov Random Field Model in Genome-Wide Association Studies


Genome-wide association studies (GWAS) examine a large number of markers across the genome to identify associations between genetic variants and disease. Most published studies examine only single markers, which may be less informative than considering multiple markers and multiple genes jointly because genes may interact with each other to affect disease risk. Much knowledge has been accumulated in the literature on biological pathways and interactions. It is conceivable that appropriate incorporation of such prior knowledge may improve the likelihood of making genuine discoveries. Although a number of methods have been developed recently to prioritize genes using prior biological knowledge, such as pathways, most methods treat genes in a specific pathway as an exchangeable set without considering the topological structure of a pathway. However, how genes are related with each other in a pathway may be very informative to identify association signals. To make use of the connectivity information among genes in a pathway in GWAS analysis, we propose a Markov Random Field (MRF) model to incorporate pathway topology for association analysis. We show that the conditional distribution of our MRF model takes on a simple logistic regression form, and we propose an iterated conditional modes algorithm as well as a decision theoretic approach for statistical inference of each gene's association with disease. Simulation studies show that our proposed framework is more effective to identify genes associated with disease than a single gene–based method. We also illustrate the usefulness of our approach through its applications to a real data example.


Vyšlo v časopise: Incorporating Biological Pathways via a Markov Random Field Model in Genome-Wide Association Studies. PLoS Genet 7(4): e32767. doi:10.1371/journal.pgen.1001353
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1001353

Souhrn

Genome-wide association studies (GWAS) examine a large number of markers across the genome to identify associations between genetic variants and disease. Most published studies examine only single markers, which may be less informative than considering multiple markers and multiple genes jointly because genes may interact with each other to affect disease risk. Much knowledge has been accumulated in the literature on biological pathways and interactions. It is conceivable that appropriate incorporation of such prior knowledge may improve the likelihood of making genuine discoveries. Although a number of methods have been developed recently to prioritize genes using prior biological knowledge, such as pathways, most methods treat genes in a specific pathway as an exchangeable set without considering the topological structure of a pathway. However, how genes are related with each other in a pathway may be very informative to identify association signals. To make use of the connectivity information among genes in a pathway in GWAS analysis, we propose a Markov Random Field (MRF) model to incorporate pathway topology for association analysis. We show that the conditional distribution of our MRF model takes on a simple logistic regression form, and we propose an iterated conditional modes algorithm as well as a decision theoretic approach for statistical inference of each gene's association with disease. Simulation studies show that our proposed framework is more effective to identify genes associated with disease than a single gene–based method. We also illustrate the usefulness of our approach through its applications to a real data example.


Zdroje

1. WangK

LiM

BucanM

2007 Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 81 1278 1283

2. PengG

LuoL

SiuH

ZhuY

HuP

2010 Gene and pathway-based second-wave analysis of genome-wide association studies. Eur J Hum Genet 18 111 117

3. BallardDH

ChoJ

ZhaoH

2010 Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet Epidemiol 34 201 212

4. ChunH

BallardD

ChoJ

ZhaoH

2011 Identification of association between disease and multiple markers within a candidate region via sparse partial least squares regression. Submitted

5. FrankeL

van BakelH

FokkensL

de JongED

Egmont-PetersenM

2006 Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet 78 1011 1025

6. AertsS

LambrechtsD

MaityS

LooPV

CoessensB

2006 Gene prioritization through genomic data fusion. Nat Biotechnol 24 537 544

7. MaX

LeeH

WangL

SunF

2007 CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data. Bioinformatics 23 215 221

8. HutzJE

KrajaAT

McLeodHL

ProvinceMA

2008 Candid: a flexible method for prioritizing candidate genes for complex human traits. Genet Epidemiol 32 779 790

9. KöhlerS

BauerS

HornD

RobinsonPN

2008 Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 82 949 958

10. WuX

JiangR

ZhangMQ

LiS

2008 Network-based global inference of human disease genes. Mol Syst Biol 4 189

11. SacconeSF

SacconeNL

SwanGE

MaddenPAF

GoateAM

2008 Systematic biological prioritization after a genome-wide association study: an application to nicotine dependence. Bioinformatics 24 1805 1811

12. WeiP

PanW

2008 Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model. Bioinformatics 24 404 411

13. WeiZ

LiH

2007 A Markov random field model for network-based analysis of genomic data. Bioinformatics 23 1537 1544

14. LiH

WeiZ

MarisJM

2010 A hidden markov random field model for genome-wide association studies. Biostatistics 11 139 150

15. DuerrRH

TaylorKD

BrantSR

RiouxJD

SilverbergMS

2006 A genome-wide association study identifies IL23R as an inflammatory bowel disease genes. Science 314 1461 1463

16. SalomonisN

HanspersK

ZambonAC

VranizanK

LawlorSC

2007 Genmapp 2: new features and resources for pathway analysis. BMC Bioinformatics 8 217

17. KanehisaM

GotoS

2000 Kegg: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28 27 30

18. KindermannR

SnellJL

1980 Markov random fields and their applications. American Mathematical Society ISBN: 0-8218-3381-2

19. BesagJ

1986 On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society Series B (Methodological) 48 259 302

20. MüllerP

ParmigianiG

RobertC

RousseauJ

2004 Optimal sample size for multiple testing: the case of gene expression microarrays. Journal of the American Statistical Association 99 990 1001

21. BenjaminiY

HochbergY

1995 Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological) 57 289 300

22. BarrettJC

HansoulS

NicolaeDL

ChoJH

DuerrRH

2008 Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat Genet 40 955 962

23. BesagJ

1972 Nearest-neighbour systems and the auto-logistic model for binary data. Journal of the Royal Statistical Society Series B (Methodological) 34 75 83

24. BesagJ

1974 Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society Series B (Methodological) 36 192 236

25. SartorRB

2006 Mechanisms of disease: pathogenesis of Crohn's disease and ulcerative colitis. Nat Clin Pract Gastroenterol Hepatol 3 390 407

26. PeetersM

NevensH

BaertF

HieleM

de MeyerAM

1996 Familial aggregation in Crohn's disease: increased age-adjusted risk and concordance in clinical characteristics. Gastroenterology 111 597 603

27. GaudermanWJ

MurcrayC

GillilandF

ContiDV

2007 Testing association between disease and multiple SNPs in a candidate gene. Genet Epidemiol 31 383 395

28. WangK

AbbottD

2008 A principal components regression approach to multilocus genetic association studies. Genet Epidemiol 32 108 118

29. BallardDH

2009 Integration of Genomic Data to Identify Genes and Pathways Associated with Disease. Ph.D. thesis, Yale University

Štítky
Genetika Reprodukčná medicína

Článok vyšiel v časopise

PLOS Genetics


2011 Číslo 4
Najčítanejšie tento týždeň
Najčítanejšie v tomto čísle
Kurzy

Zvýšte si kvalifikáciu online z pohodlia domova

Aktuální možnosti diagnostiky a léčby litiáz
nový kurz
Autori: MUDr. Tomáš Ürge, PhD.

Všetky kurzy
Prihlásenie
Zabudnuté heslo

Zadajte e-mailovú adresu, s ktorou ste vytvárali účet. Budú Vám na ňu zasielané informácie k nastaveniu nového hesla.

Prihlásenie

Nemáte účet?  Registrujte sa

#ADS_BOTTOM_SCRIPTS#