Incorporating Biological Pathways via a Markov Random Field Model in Genome-Wide Association Studies
Genome-wide association studies (GWAS) examine a large number of markers across the genome to identify associations between genetic variants and disease. Most published studies examine only single markers, which may be less informative than considering multiple markers and multiple genes jointly because genes may interact with each other to affect disease risk. Much knowledge has been accumulated in the literature on biological pathways and interactions. It is conceivable that appropriate incorporation of such prior knowledge may improve the likelihood of making genuine discoveries. Although a number of methods have been developed recently to prioritize genes using prior biological knowledge, such as pathways, most methods treat genes in a specific pathway as an exchangeable set without considering the topological structure of a pathway. However, how genes are related with each other in a pathway may be very informative to identify association signals. To make use of the connectivity information among genes in a pathway in GWAS analysis, we propose a Markov Random Field (MRF) model to incorporate pathway topology for association analysis. We show that the conditional distribution of our MRF model takes on a simple logistic regression form, and we propose an iterated conditional modes algorithm as well as a decision theoretic approach for statistical inference of each gene's association with disease. Simulation studies show that our proposed framework is more effective to identify genes associated with disease than a single gene–based method. We also illustrate the usefulness of our approach through its applications to a real data example.
Vyšlo v časopise:
Incorporating Biological Pathways via a Markov Random Field Model in Genome-Wide Association Studies. PLoS Genet 7(4): e32767. doi:10.1371/journal.pgen.1001353
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1001353
Souhrn
Genome-wide association studies (GWAS) examine a large number of markers across the genome to identify associations between genetic variants and disease. Most published studies examine only single markers, which may be less informative than considering multiple markers and multiple genes jointly because genes may interact with each other to affect disease risk. Much knowledge has been accumulated in the literature on biological pathways and interactions. It is conceivable that appropriate incorporation of such prior knowledge may improve the likelihood of making genuine discoveries. Although a number of methods have been developed recently to prioritize genes using prior biological knowledge, such as pathways, most methods treat genes in a specific pathway as an exchangeable set without considering the topological structure of a pathway. However, how genes are related with each other in a pathway may be very informative to identify association signals. To make use of the connectivity information among genes in a pathway in GWAS analysis, we propose a Markov Random Field (MRF) model to incorporate pathway topology for association analysis. We show that the conditional distribution of our MRF model takes on a simple logistic regression form, and we propose an iterated conditional modes algorithm as well as a decision theoretic approach for statistical inference of each gene's association with disease. Simulation studies show that our proposed framework is more effective to identify genes associated with disease than a single gene–based method. We also illustrate the usefulness of our approach through its applications to a real data example.
Zdroje
1. WangK
LiM
BucanM
2007 Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 81 1278 1283
2. PengG
LuoL
SiuH
ZhuY
HuP
2010 Gene and pathway-based second-wave analysis of genome-wide association studies. Eur J Hum Genet 18 111 117
3. BallardDH
ChoJ
ZhaoH
2010 Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet Epidemiol 34 201 212
4. ChunH
BallardD
ChoJ
ZhaoH
2011 Identification of association between disease and multiple markers within a candidate region via sparse partial least squares regression. Submitted
5. FrankeL
van BakelH
FokkensL
de JongED
Egmont-PetersenM
2006 Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet 78 1011 1025
6. AertsS
LambrechtsD
MaityS
LooPV
CoessensB
2006 Gene prioritization through genomic data fusion. Nat Biotechnol 24 537 544
7. MaX
LeeH
WangL
SunF
2007 CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data. Bioinformatics 23 215 221
8. HutzJE
KrajaAT
McLeodHL
ProvinceMA
2008 Candid: a flexible method for prioritizing candidate genes for complex human traits. Genet Epidemiol 32 779 790
9. KöhlerS
BauerS
HornD
RobinsonPN
2008 Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 82 949 958
10. WuX
JiangR
ZhangMQ
LiS
2008 Network-based global inference of human disease genes. Mol Syst Biol 4 189
11. SacconeSF
SacconeNL
SwanGE
MaddenPAF
GoateAM
2008 Systematic biological prioritization after a genome-wide association study: an application to nicotine dependence. Bioinformatics 24 1805 1811
12. WeiP
PanW
2008 Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model. Bioinformatics 24 404 411
13. WeiZ
LiH
2007 A Markov random field model for network-based analysis of genomic data. Bioinformatics 23 1537 1544
14. LiH
WeiZ
MarisJM
2010 A hidden markov random field model for genome-wide association studies. Biostatistics 11 139 150
15. DuerrRH
TaylorKD
BrantSR
RiouxJD
SilverbergMS
2006 A genome-wide association study identifies IL23R as an inflammatory bowel disease genes. Science 314 1461 1463
16. SalomonisN
HanspersK
ZambonAC
VranizanK
LawlorSC
2007 Genmapp 2: new features and resources for pathway analysis. BMC Bioinformatics 8 217
17. KanehisaM
GotoS
2000 Kegg: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28 27 30
18. KindermannR
SnellJL
1980 Markov random fields and their applications. American Mathematical Society ISBN: 0-8218-3381-2
19. BesagJ
1986 On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society Series B (Methodological) 48 259 302
20. MüllerP
ParmigianiG
RobertC
RousseauJ
2004 Optimal sample size for multiple testing: the case of gene expression microarrays. Journal of the American Statistical Association 99 990 1001
21. BenjaminiY
HochbergY
1995 Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological) 57 289 300
22. BarrettJC
HansoulS
NicolaeDL
ChoJH
DuerrRH
2008 Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat Genet 40 955 962
23. BesagJ
1972 Nearest-neighbour systems and the auto-logistic model for binary data. Journal of the Royal Statistical Society Series B (Methodological) 34 75 83
24. BesagJ
1974 Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society Series B (Methodological) 36 192 236
25. SartorRB
2006 Mechanisms of disease: pathogenesis of Crohn's disease and ulcerative colitis. Nat Clin Pract Gastroenterol Hepatol 3 390 407
26. PeetersM
NevensH
BaertF
HieleM
de MeyerAM
1996 Familial aggregation in Crohn's disease: increased age-adjusted risk and concordance in clinical characteristics. Gastroenterology 111 597 603
27. GaudermanWJ
MurcrayC
GillilandF
ContiDV
2007 Testing association between disease and multiple SNPs in a candidate gene. Genet Epidemiol 31 383 395
28. WangK
AbbottD
2008 A principal components regression approach to multilocus genetic association studies. Genet Epidemiol 32 108 118
29. BallardDH
2009 Integration of Genomic Data to Identify Genes and Pathways Associated with Disease. Ph.D. thesis, Yale University
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2011 Číslo 4
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- PTG Depletion Removes Lafora Bodies and Rescues the Fatal Epilepsy of Lafora Disease
- Survival Motor Neuron Protein Regulates Stem Cell Division, Proliferation, and Differentiation in
- An Evolutionary Genomic Approach to Identify Genes Involved in Human Birth Timing
- Loss-of-Function Mutations in Cause Metachondromatosis, but Not Ollier Disease or Maffucci Syndrome