Pathways of Distinction Analysis: A New Technique for Multi–SNP Analysis of GWAS Data

English version České info

Genome-wide association studies (GWAS) have become increasingly common due to advances in technology and have permitted the identification of differences in single nucleotide polymorphism (SNP) alleles that are associated with diseases. However, while typical GWAS analysis techniques treat markers individually, complex diseases (cancers, diabetes, and Alzheimers, amongst others) are unlikely to have a single causative gene. Thus, there is a pressing need for multi–SNP analysis methods that can reveal system-level differences in cases and controls. Here, we present a novel multi–SNP GWAS analysis method called Pathways of Distinction Analysis (PoDA). The method uses GWAS data and known pathway–gene and gene–SNP associations to identify pathways that permit, ideally, the distinction of cases from controls. The technique is based upon the hypothesis that, if a pathway is related to disease risk, cases will appear more similar to other cases than to controls (or vice versa) for the SNPs associated with that pathway. By systematically applying the method to all pathways of potential interest, we can identify those for which the hypothesis holds true, i.e., pathways containing SNPs for which the samples exhibit greater within-class similarity than across classes. Importantly, PoDA improves on existing single–SNP and SNP–set enrichment analyses, in that it does not require the SNPs in a pathway to exhibit independent main effects. This permits PoDA to reveal pathways in which epistatic interactions drive risk. In this paper, we detail the PoDA method and apply it to two GWAS: one of breast cancer and the other of liver cancer. The results obtained strongly suggest that there exist pathway-wide genomic differences that contribute to disease susceptibility. PoDA thus provides an analytical tool that is complementary to existing techniques and has the power to enrich our understanding of disease genomics at the systems-level.

Vyšlo v časopise: Pathways of Distinction Analysis: A New Technique for Multi–SNP Analysis of GWAS Data. PLoS Genet 7(6): e32767. doi:10.1371/journal.pgen.1002101
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1002101

Souhrn

Zdroje

1. HirschhornJNDalyMJ 2005 Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6 95 108

2. McCarthyMIAbecasisGRCardonLRGoldsteinDBLittleJ 2008 Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9 356 69

3. EastonDFEelesRA 2008 Genome-wide association studies in cancer. Hum Mol Genet 17 R109 15

4. HunterDJKraftPJacobsKBCoxDGYeagerM 2007 A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nature Genetics 39 870 874

5. LouHYeagerMLiHBosquetJGHayesRB 2009 Fine mapping and functional analysis of a common variant in MSMB on chromosome 10q11.2 associated with prostate cancer susceptibility. PNAS 106 7933 8

6. LouHYeagerMLiHBosquetJGHayesRB 2009 Fine mapping and functional analysis of a common variant in MSMB on chromosome 10q11.2 associated with prostate cancer susceptibility. PNAS 106 7933 8

7. ThomasGJacobsKBKraftPYeagerMWacholderS 2009 A multistage genome-wide association study in breast cancer identifies two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1). Nat Genet 41 579 84

8. HindorffLASethupathyPJunkinsHARamosEMMehtaJP 2009 Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. PNAS 106 9362 7

9. SchorkNMurraySFrazerKTopolE 2009 Common vs. rare allele hypotheses for complex diseases. Current opinion in genetics & development 19 212 219

10. MooreJAsselbergsFWilliamsS 2010 Bioinformatics challenges for genome-wide association studies. Bioinformatics 26 445

11. GreeneCPenrodNWilliamsSMooreJ 2009 Failure to replicate a genetic association may provide important clues about genetic architecture. PLoS ONE 4 e5639 doi:10.1371/journal.pone.0005639

12. MooreJ 2003 The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Human Heredity 56 73 82

13. TylerAAsselbergsFWilliamsSMooreJ 2009 Shadows of complexity: what biological networks reveal about epistasis and pleiotropy. BioEssays 31 220 227

14. HanahanDWeinbergRA 2000 The hallmarks of cancer. Cell 100 57 70

15. HolmansP 2010 Statistical methods for pathway analysis of genome-wide data for association with complex genetic traits. Advances in genetics 72 141

16. WangKLiMHakonarsonH 2010 Analysing biological pathways in genome-wide association studies. Nature Reviews Genetics 11 843 854

17. SubramanianATamayoPMoothaVKMukherjeeSEbertBL 2005 Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102 15545 50

18. WangKLiMBucanM 2007 Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 81 1278

19. HoldenMDengSWojnowskiLKulleB 2008 GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies. Bioinformatics 24 2784 5

20. YangCHeZWanXYangQXueH 2009 SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics 25 504 11

21. MotsingerARitchieM 2006 Multifactor dimensionality reduction: an analysis strategy for modelling and detecting gene–gene interactions in human genetics and pharmacogenomics studies. Human Genomics 2 318 328

22. MooreJGilbertJTsaiCChiangFHoldenT 2006 A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. Journal of theoretical biology 241 252 261

23. CordellH 2009 Detecting gene–gene interactions that underlie human diseases. Nature Reviews Genetics 10 392 404

24. GreeneCSinnott-ArmstrongNHimmelsteinDParkPMooreJ 2010 Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic als. Bioinformatics 26 694

25. KiraKRendellL 1992 A practical approach to feature selection. Proceedings of the Ninth International Workshop on Machine learning 249 256

26. Robnik-ŠikonjaMKononenkoI 1997 An adaptation of relief for attribute estimation in regression. Proc Int Conf on Machine Learning ICML-97 296 304

27. MooreJ 2007 Genome-wide analysis of epistasis using multifactor dimensionality reduction: feature selection and construction in the domain of human genetics. Knowledge Discovery and Data Mining: Challenges and Realities with Real World Data 17 30

28. GreeneCPenrodNKiralisJMooreJ 2009 Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions. BioData mining 2 5

29. HomerNSzelingerSRedmanMDugganDTembeW 2008 Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet 4 e1000167 doi:10.1371/journal.pgen.1000167

30. BraunRRoweWSchaeferCZhangJBuetowK 2009 Needles in the haystack: Identifying individuals present in pooled genomic data. PLoS Genet 5 e1000668 doi:10.1371/journal.pgen.1000668

31. VisscherPMHillWG 2009 The limits of individual identification from sample allele frequencies: theory and statistical analysis. PLoS Genet 5 e1000628 doi:10.1371/journal.pgen.1000628

32. SchaeferCFAnthonyKKrupaSBuchoffJDayM 2009 PID: the Pathway Interaction Database. Nucleic Acids Res 37 D674 679

33. KanehisaMArakiMGotoSHattoriMHirakawaM 2008 KEGG for linking genomes to life and the environment. Nucleic Acids Res 36 D480 4

34. SherrySTWardMHKholodovMBakerJPhanL 2001 dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29 308 311

35. KraftPRaychaudhuriS 2009 Complex diseases, complex genes: keeping pathways on the right track. Epidemiology (Cambridge, Mass) 20 508

36. CliffordRZhangJMeerzamanDLyuMHuY 2010 Genetic variations at loci involved in the immune response are risk factors for hepatocellular carcinoma. Hepatology 52 2034 2043

37. BenjaminiYHochbergY 1995 Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society 57 289 300

38. BenjaminiYYekutieliD 2001 The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 1165 1188

39. WeberG 1977 Enzymology of cancer cells. New England Journal of Medicine 296 541 551

40. WeberG 1983 Enzymes of purine metabolism in cancer. Clinical Biochemistry 16 57 63

41. RoseDConnollyJ 1990 Effects of fatty acids and inhibitors of eicosanoid synthesis on the growth of a human breast cancer cell line in culture. Cancer research 50 7139

42. EidneKFlanaganCHarrisNMillarR 1987 Gonadotropin-releasing hormone (GnRH)-binding sites in human breast cancer cell lines and inhibitory effects of GnRH antagonists. Journal of Clinical Endocrinology & Metabolism 64 425

43. ManniASantenRHarveyHLiptonAMaxD 1986 Treatment of breast cancer with gonadotropin-releasing hormone. Endocrine reviews 7 89

44. CanzianFKaaksRCoxDHendersonKHendersonB 2009 Genetic polymorphisms of the GnRH1 and GNRHR genes and risk of breast cancer in the national cancer institute breast and prostate cancer cohort consortium. BMC cancer 9 257

45. NakagawaraA 2001 Trk receptor tyrosine kinases: a bridge between cancer and neural development. Cancer letters 169 107 114

46. Pentcheva-HoangTCorseEAllisonJP 2009 Negative regulators of T-cell activation: potential targets for therapeutic intervention in cancer, autoimmune disease, and persistent infections. Immunol Rev 229 67 87

47. OrmandyLAHillemannTWedemeyerHMannsMPGretenTF 2005 Increased populations of regulatory T cells in peripheral blood of patients with hepatocellular carcinoma. Cancer Res 65 2457 64

48. UnittERushbrookSMMarshallADaviesSGibbsP 2005 Compromised lymphocytes infiltrate hepatocellular carcinoma: the role of T-regulatory cells. Hepatology 41 722 30

49. NauglerWESakuraiTKimSMaedaSKimK 2007 Gender disparity in liver cancer due to sex differences in MyD88-dependent IL-6 production. Science 317 121 4

50. BudhuASZipserBForguesMYeQHSunZ 2005 The molecular signature of metastases of human hepatocellular carcinoma. Oncology 69 Suppl 1 23 7

51. BudhuAWangXW 2006 The role of cytokines in hepatocellular carcinoma. J Leukoc Biol 80 1197 213

52. BudhuAForguesMYeQHJiaHLHeP 2006 Prediction of venous metastases, recurrence, and prognosis in hepatocellular carcinoma based on a unique immune response signature of the liver microenvironment. Cancer Cell 10 99 111

53. EmonsGGrundkerCGunthertAWestphalenSKavanaghJ 2003 GnRH antagonistsin the treatment of gynecological and breast cancers. Endocrine-related cancer 10 291

54. CoverTHartP 2002 Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13 21 27

55. TibshiraniRHastieTNarasimhanBChuG 2002 Diagnosis of multiple cancer types by shrunken centroids of gene expressionx. PNAS 99 6567 72

56. BuhlmannPHothornT 2007 Boosting algorithms: regularization, prediction and model fitting. Statistical Science 22 477 505

57. MeirRRatschG 2003 An introduction to boosting and leveraging. Lecture Notes in Computer Science 2600 118 183