Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated with High-Density Lipoprotein Cholesterol in Two Asian Cohorts
Standard approaches to data analysis in genome-wide association studies (GWAS) ignore any potential functional relationships between gene variants. In contrast gene pathways analysis uses prior information on functional structure within the genome to identify pathways associated with a trait of interest. In a second step, important single nucleotide polymorphisms (SNPs) or genes may be identified within associated pathways. The pathways approach is motivated by the fact that genes do not act alone, but instead have effects that are likely to be mediated through their interaction in gene pathways. Where this is the case, pathways approaches may reveal aspects of a trait's genetic architecture that would otherwise be missed when considering SNPs in isolation. Most pathways methods begin by testing SNPs one at a time, and so fail to capitalise on the potential advantages inherent in a multi-SNP, joint modelling approach. Here, we describe a dual-level, sparse regression model for the simultaneous identification of pathways and genes associated with a quantitative trait. Our method takes account of various factors specific to the joint modelling of pathways with genome-wide data, including widespread correlation between genetic predictors, and the fact that variants may overlap multiple pathways. We use a resampling strategy that exploits finite sample variability to provide robust rankings for pathways and genes. We test our method through simulation, and use it to perform pathways-driven gene selection in a search for pathways and genes associated with variation in serum high-density lipoprotein cholesterol levels in two separate GWAS cohorts of Asian adults. By comparing results from both cohorts we identify a number of candidate pathways including those associated with cardiomyopathy, and T cell receptor and PPAR signalling. Highlighted genes include those associated with the L-type calcium channel, adenylate cyclase, integrin, laminin, MAPK signalling and immune function.
Vyšlo v časopise:
Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated with High-Density Lipoprotein Cholesterol in Two Asian Cohorts. PLoS Genet 9(11): e32767. doi:10.1371/journal.pgen.1003939
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1003939
Souhrn
Standard approaches to data analysis in genome-wide association studies (GWAS) ignore any potential functional relationships between gene variants. In contrast gene pathways analysis uses prior information on functional structure within the genome to identify pathways associated with a trait of interest. In a second step, important single nucleotide polymorphisms (SNPs) or genes may be identified within associated pathways. The pathways approach is motivated by the fact that genes do not act alone, but instead have effects that are likely to be mediated through their interaction in gene pathways. Where this is the case, pathways approaches may reveal aspects of a trait's genetic architecture that would otherwise be missed when considering SNPs in isolation. Most pathways methods begin by testing SNPs one at a time, and so fail to capitalise on the potential advantages inherent in a multi-SNP, joint modelling approach. Here, we describe a dual-level, sparse regression model for the simultaneous identification of pathways and genes associated with a quantitative trait. Our method takes account of various factors specific to the joint modelling of pathways with genome-wide data, including widespread correlation between genetic predictors, and the fact that variants may overlap multiple pathways. We use a resampling strategy that exploits finite sample variability to provide robust rankings for pathways and genes. We test our method through simulation, and use it to perform pathways-driven gene selection in a search for pathways and genes associated with variation in serum high-density lipoprotein cholesterol levels in two separate GWAS cohorts of Asian adults. By comparing results from both cohorts we identify a number of candidate pathways including those associated with cardiomyopathy, and T cell receptor and PPAR signalling. Highlighted genes include those associated with the L-type calcium channel, adenylate cyclase, integrin, laminin, MAPK signalling and immune function.
Zdroje
1. McCarthyMI, AbecasisGR, CardonLR, GoldsteinDB, LittleJ, et al. (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics 9: 356–69.
2. VisscherPM, BrownMa, McCarthyMI, YangJ (2012) Five years of GWAS discovery. American journal of human genetics 90: 7–24.
3. ManolioTa, CollinsFS, CoxNJ, HindorffLA, GoldsteinDB, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747–753.
4. GoldsteinDB (2009) Common genetic variation and human traits. The New England journal of medicine 360: 1696–8.
5. SchadtEE (2009) Molecular networks as sensors and drivers of common human diseases. Nature 461: 218–23.
6. WangK, LiM, HakonarsonH (2010) Analysing biological pathways in genome-wide association studies. Nature Reviews Genetics 11: 843–854.
7. FridleyBL, BiernackaJM (2011) Gene set analysis of SNP data: benefits, challenges, and future directions. European journal of human genetics : EJHG 19: 837–843.
8. ShiG, BoerwinkleE, MorrisonAC, GuCC, ChakravartiA, et al. (2011) Mining Gold Dust Under the Genome Wide Significance Level: A Two-Stage Approach to Analysis of GWAS. Genetic epidemiology 35: 117–118.
9. ChoS, KimK, KimYJ, LeeJK, ChoYS, et al. (2010) Joint identification of multiple genetic variants via elastic-net variable selection in a genome-wide association analysis. Annals of human genetics 74: 416–28.
10. AyersKL, CordellHJ (2010) SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genetic epidemiology 34: 879–91.
11. WuTT, ChenYF, HastieT, SobelE, LangeK (2009) Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics (Oxford, England) 25: 714–21.
12. TibshiraniR (1996) Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 58: 267–288.
13. ZouH, HastieT (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67: 301–320.
14. TibshiraniR, SaundersM, RossetS, ZhuJ, KnightK (2005) Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67: 91–108.
15. TibshiraniR, WangP (2008) Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics (Oxford, England) 9: 18–29.
16. ChenLS, HutterCM, PotterJD, LiuY, PrenticeRL, et al. (2010) Insights into Colon Cancer Etiology via a Regularized Approach to Gene Set Analysis of GWAS Data. American Journal of Human Genetics 86: 860–871.
17. SilverM, JanousovaE, HuaX, ThompsonPM, MontanaG (2012) Identification of gene pathways implicated in Alzheimer's disease using longitudinal imaging phenotypes with sparse regression. NeuroImage 63: 1681–1694.
18. EleftherohorinouH, WrightV, HoggartC, HartikainenAL, JarvelinMR, et al. (2009) Pathway analysis of GWAS provides new insights into genetic susceptibility to 3 inammatory diseases. PloS one 4: e8068.
19. EleftherohorinouH, HoggartCJ, WrightVJ, LevinM, CoinLJM (2011) Pathway-driven gene stability selection of two rheumatoid arthritis GWAS identifies and validates new susceptibility genes in receptor mediated signalling pathways. Human molecular genetics 20 (17) 3494–506.
20. SimonN, FriedmanJ, HastieT, TibshiraniROB (2012) A sparse-group lasso. Journal of Computational and Graphical Statistics In press 1–13.
21. FriedmanJ, HastieT, TibshiraniR (2010) A note on the group lasso and a sparse group lasso. 1–8.
22. ZhouH, SehlME, SinsheimerJS, LangeK (2010) Association Screening of Common and Rare Genetic Variants by Penalized Regression. Bioinformatics (Oxford, England) 26: 2375–2382.
23. PengJ, ZhuJ, BergamaschiA, HanW, NohDY, et al. (2010) Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. The Annals of Applied Statistics 4: 53–77.
24. ChatterjeeS, BanerjeeA, ChatterjeeS, GangulyAR (2011) Sparse Group Lasso for Regression on Land Climate Variables. 2011 IEEE 11th International Conference on Data Mining Workshops 1–8.
25. ZhaoP, RochaG, YuB (2009) The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics 37: 3468–3497.
26. HuangJ, ZhangT, MetaxasD (2011) Learning with Structured Sparsity. Journal of Machine Learning Research 12: 3371–3412.
27. JenattonR, BachF (2011) Structured Variable Selection with Sparsity-Inducing Norms. Journal of Machine Learning Research 12: 2777–2824.
28. BrennerDR, BrennanP, BoffettaP, AmosCI, SpitzMR, et al. (2013) Hierarchical modeling identifies novel lung cancer susceptibility variants in inammation pathways among 10,140 cases and 11,012 controls. Human genetics 32 (5) 579–89.
29. WangL, JiaP, WolfingerRD, ChenX, GraysonBL, et al. (2011) An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies. Bioinformatics (Oxford, England) 27: 686–92.
30. SilverM, MontanaG (2012) Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps. Statistical Applications in Genetics and Molecular Biology 11 (1) Article 7 doi: 10.2202/1544-6115.1755
31. TothPP (2005) Cardiology patient page. The “good cholesterol”: high-density lipoprotein. Circulation 111: e89–91.
32. NamboodiriKK, KaplanEB, HeuchI, ElstonRC, GreenPP, et al. (1985) The Collaborative Lipid Research Clinics Family Study: biological and cultural determinants of familial resemblance for plasma lipids and lipoproteins. Genetic epidemiology 2: 227–54.
33. HindorffLA, SethupathyP, JunkinsHA, RamosEM, MehtaJP, et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences of the United States of America 106: 9362–7.
34. TeslovichTM, MusunuruK, SmithAV, EdmondsonAC, StylianouIM, et al. (2010) Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466: 707–13.
35. TsengP, YunS (2009) A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming 117: 387–423.
36. Jacob L, Obozinski G, Vert Jp (2009) Group Lasso with Overlap and Graph Lasso. In: Proceedings of the 26th International Conference on Machine Learning.
37. KimYA, WuchtyS, PrzytyckaTM (2011) Identifying causal genes and dysregulated pathways in complex diseases. PLoS computational biology 7: e1001095.
38. LehnerB, CrombieC, TischlerJ, FortunatoA, FraserAG (2006) Systematic mapping of genetic interactions in Caenorhabditis elegans identifies common modifiers of diverse signaling pathways. Nature genetics 38: 896–903.
39. WangK, ZhangH, KugathasanS, AnneseV, BradfieldJP, et al. (2009) Diverse Genome-wide Association Studies Associate the IL12/IL23 Pathway with Crohn Disease. American journal of human genetics 84: 399–405.
40. HolmansP, GreenEK, PahwaJS, FerreiraMaR, PurcellSM, et al. (2009) Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. American journal of human genetics 85: 13–24.
41. ZhaoJ, GuptaS, SeielstadM, LiuJ, ThalamuthuA (2011) Pathway-based analysis using reduced gene subsets in genome-wide association studies. BMC bioinformatics 12: 17.
42. ChenX, LiuH (2011) An Efficient Optimization Algorithm for Structured Sparse CCA, with Applications to eQTL Mapping. Statistics in Biosciences 4: 3–26.
43. Hastie T, Tibshirani R, Friedman J (2008) The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York, 2nd edition.
44. VounouM, JanousovaE, WolzR, SteinJL, ThompsonPM, et al. (2011) Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer's disease. NeuroImage 60: 700–716.
45. MeinshausenN, BühlmannP (2010) Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72: 417–473.
46. Bach FR (2008) Bolasso : Model Consistent Lasso Estimation through the Bootstrap. In: Proceedings of the 25th International Conference on Machine Learning. 2004.
47. ChatterjeeA, LahiriS (2011) Bootstrapping Lasso Estimators. Journal of the American Statistical Association 106: 608–625.
48. Motyer AJ, McKendry C, Galbraith S, Wilson SR (2011) LASSO model selection with postprocessing for a genome-wide association study data set. In: BMC proceedings. BioMed Central Ltd, volume 5, p. S24.
49. AlexanderDH, LangeK (2011) Stability selection for genome-wide association. Genetic epidemiology 35: 722–8.
50. ParkJH, WacholderS, GailMH, PetersU, JacobsKB, et al. (2010) Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nature genetics 42 (7) 570–5.
51. PurcellSM, WrayNR, StoneJL, VisscherPM, O'DonovanMC, et al. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460: 748–52.
52. SimX, OngRTH, SuoC, TayWT, LiuJ, et al. (2011) Transferability of type 2 diabetes implicated loci in multi-ethnic cohorts from Southeast Asia. PLoS Genetics 7: e1001363.
53. TeoYY, SimX, OngRTH, TanAKS, ChenJ, et al. (2009) Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations. Genome research 19: 2154–62.
54. FrazerKA, BallingerDG, CoxDR, HindsDA, StuveLL, et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–61.
55. DelaneauO, MarchiniJ, ZaguryJF (2012) A linear complexity phasing method for thousands of genomes. Nature methods 9: 179–81.
56. HowieB, MarchiniJ, StephensM (2011) Genotype Imputation with Thousands of Genomes. G3 (Bethesda) 1: 457–469.
57. The 1000 Genomes Project Consortium (2011) A map of human genome variation from populationscale sequencing. Nature 467: 1061–1073.
58. CantorRM, LangeK, SinsheimerJS (2010) Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application. American Journal of Human Genetics 86: 6–22.
59. SohD, DongD, GuoY, WongL (2010) Consistency, comprehensiveness, and compatibility of pathway databases. BMC Bioinformatics 11: 449.
60. CarterSL, BrechbühlerCM, GriffnM, BondAT (2004) Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics (Oxford, England) 20: 2242–50.
61. JeongH, MasonSP, BarabásiaL, OltvaiZN (2001) Lethality and centrality in protein networks. Nature 411: 41–2.
62. JurmanG, MerlerS, BarlaA, PaoliS, GaleaA, et al. (2008) Algebraic stability indicators for ranked lists in molecular profiling. Bioinformatics (Oxford, England) 24: 258–64.
63. BenjaminiY, HochbergY (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society, Series B 57: 289–300.
64. PercivalD (2012) Theoretical properties of the overlapping groups lasso. Electronic Journal of Statistics 6: 269–288.
65. ValdarW, SabourinJ, NobelA, HolmesCC (2012) Reprioritizing genetic associations in hit regions using LASSO-based resample model averaging. Genetic epidemiology 36: 451–62.
66. GoemanJJ, BühlmannP (2007) Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics (Oxford, England) 23: 980–7.
67. EvangelouM, RendonA, OuwehandWH, WernischL, DudbridgeF (2012) Comparison of methods for competitive tests of pathway analysis. PloS one 7: e41018.
68. SculleyD (2007) Rank Aggregation for Similar Items. Proceedings of the 2007 SIAM International Conference on Data Mining 587–592.
69. KoldeR, LaurS, AdlerP, ViloJ (2012) Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics (Oxford, England) 28: 573–80.
70. JurmanG, RiccadonnaS, VisintainerR, FurlanelloC (2012) Algebraic comparison of partial lists in bioinformatics. PloS one 7: e36540.
71. AnsellBJ, WatsonKE, FogelmanAM, NavabM, FonarowGC (2005) High-density lipoprotein function recent advances. Journal of the American College of Cardiology 46: 1792–8.
72. GordonDJ, ProbstfieldJL, GarrisonRJ, NeatonJD, CastelliWP, et al. (1989) High-density lipoprotein cholesterol and cardiovascular disease. Four prospective American studies. Circulation 79: 8–15.
73. FreitasH, BarbosaE, RosaF, LimaA, MansurA (2009) Association of HDL cholesterol and triglycerides with mortality in patients with heart failure. Brazilian Journal of Medical and Biological Research 42: 420–425.
74. GaddamS, NimmagaddaKC, NagraniT, NaqiM, WetzRV, et al. (2011) Serum lipoprotein levels in takotsubo cardiomyopathy vs. myocardial infarction. International archives of medicine 4: 14.
75. JanesPW, LeySC, MageeAI, KabouridisPS (2000) The role of lipid rafts in T cell antigen receptor (TCR) signalling. Seminars in immunology 12: 23–34.
76. CalderPC, YaqoobP (2007) Lipid Rafts–Composition, Characterization, and Controversies. J Nutr 137: 545–547.
77. StaelsB, DallongevilleJ, AuwerxJ, SchoonjansK, LeitersdorfE, et al. (1998) Mechanism of Action of Fibrates on Lipid and Lipoprotein Metabolism. Circulation 98: 2088–2093.
78. BensingerSJ, TontonozP (2008) Integration of metabolism and inammation by lipid-activated nuclear receptors. Nature 454: 470–7.
79. SplawskiI, TimothyKW, SharpeLM, DecherN, KumarP, et al. (2004) Ca(V)1.2 calcium channel dysfunction causes a multisystem disorder including arrhythmia and autism. Cell 119: 19–31.
80. AntzelevitchC, PollevickGD, CordeiroJM, CasisO, SanguinettiMC, et al. (2007) Loss-of-function mutations in the cardiac calcium channel underlie a new clinical entity characterized by ST-segment elevation, short QT intervals, and sudden cardiac death. Circulation 115: 442–9.
81. TemplinC, GhadriJR, RougierJS, BaumerA, KaplanV, et al. (2011) Identification of a novel loss-of-function calcium channel gene mutation in short QT syndrome (SQTS6). European heart journal 32: 1077–88.
82. BremerT, ManA, KaskK, DiamondC (2006) CACNA1C polymorphisms are associated with the efficacy of calcium channel blockers in the treatment of hypertension. Pharmacogenomics 7: 271–9.
83. KamideK, YangJ, MatayoshiT, TakiuchiS, HorioT, et al. (2009) Genetic polymorphisms of L-type calcium channel alpha1C and alpha1D subunit genes are associated with sensitivity to the antihypertensive effects of L-type dihydropyridine calcium-channel blockers. Circulation journal : official journal of the Japanese Circulation Society 73: 732–40.
84. LevyD, EhretGB, RiceK, VerwoertGC, LaunerLJ, et al. (2009) Genome-wide association study of blood pressure and hypertension. Nature genetics 41: 677–87.
85. CastelliWP (1988) Cholesterol and lipids in the risk of coronary artery disease–the Framingham Heart Study. The Canadian journal of cardiology 4 Suppl A: 5A–10A.
86. NermutMV, GreenNM, EasonP, YamadaSS, YamadaKM (1988) Electron microscopy and structural model of human fibronectin receptor. The EMBO journal 7: 4093–9.
87. TakeuchiF, IsonoM, KatsuyaT, YamamotoK, YokotaM, et al. (2010) Blood pressure and hypertension are associated with 7 loci in the Japanese population. Circulation 121: 2302–9.
88. HirosumiJ, TuncmanG, ChangL, GörgünCZ, UysalKT, et al. (2002) A central role for JNK in obesity and insulin resistance. Nature 420: 333–6.
89. HowardBV, RuotoloG, RobbinsDC (2003) Obesity and dyslipidemia. Endocrinology and metabolism clinics of North America 32: 855–67.
90. LuY, DolléMET, ImholzS, van 't SlotR, VerschurenWMM, et al. (2008) Multiple genetic variants along candidate pathways inuence plasma high-density lipoprotein cholesterol concentrations. Journal of lipid research 49: 2582–9.
91. FerreiraMAR, O'DonovanMC, MengYA, JonesIR, RuderferDM, et al. (2008) Collaborative genome-wide association analysis supports a role for ANK3 and CACNA1C in bipolar disorder. Nature genetics 40: 1056–8.
92. MoskvinaV, CraddockN, HolmansP, NikolovI, PahwaJS, et al. (2009) Gene-wide analyses of genome-wide association data sets: evidence for multiple common risk alleles for schizophrenia and bipolar disorder and for overlap in genetic risk. Molecular psychiatry 14: 252–60.
93. GreenEK, GrozevaD, JonesI, JonesL, KirovG, et al. (2010) The bipolar disorder risk allele at CACNA1C also confers risk of recurrent major depression and of schizophrenia. Molecular psychiatry 15: 1016–22.
94. HirschhornJN (2009) Genomewide association studies–illuminating biologic pathways. The New England journal of medicine 360: 1699–701.
95. ElbersCC, van EijkKR, FrankeL, MulderF, van der SchouwYT, et al. (2009) Using genome-wide pathway analysis to unravel the etiology of complex diseases. Genetic epidemiology 33: 419–31.
96. BernsteinBE, BirneyE, DunhamI, GreenED, GunterC, et al. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57–74.
97. SanyalA, LajoieBR, JainG, DekkerJ (2012) The long-range interaction landscape of gene promoters. Nature 489: 109–113.
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2013 Číslo 11
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
Najčítanejšie v tomto čísle
- Genetic and Functional Studies Implicate Synaptic Overgrowth and Ring Gland cAMP/PKA Signaling Defects in the Neurofibromatosis-1 Growth Deficiency
- RNA∶DNA Hybrids Initiate Quasi-Palindrome-Associated Mutations in Highly Transcribed Yeast DNA
- The Light Skin Allele of in South Asians and Europeans Shares Identity by Descent
- Roles of XRCC2, RAD51B and RAD51D in RAD51-Independent SSA Recombination