Regularized Machine Learning in the Genetic Prediction of Complex Traits
article has not abstract
Vyšlo v časopise:
Regularized Machine Learning in the Genetic Prediction of Complex Traits. PLoS Genet 10(11): e32767. doi:10.1371/journal.pgen.1004754
Kategorie:
Review
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1004754
Souhrn
article has not abstract
Zdroje
1. WeiZ, WangK, QuHQ, ZhangH, BradfieldJ, et al. (2009) From disease association to risk assessment: an optimistic view from genome-wide association studies on type 1 diabetes. PLoS Genet 5: e1000678.
2. OkserS, LehtimäkiT, EloLL, MononenN, PeltonenN, et al. (2009) Genetic variants and their interactions in the prediction of increased pre-clinical carotid atherosclerosis: the cardiovascular risk in young Finns study. PLoS Genet 6: e1001146.
3. KruppaJ, ZieglerA, KönigIR (2012) Risk estimation and risk prediction using machine-learning methods. Hum Genet 131: 1639–1654.
4. WeiZ1, WangW, BradfieldJ, LiJ, CardinaleC, et al. (2013) Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease. Am J Hum Genetics 92: 1008–1012.
5. OkserS, PahikkalaT, AittokallioT (2013) Genetic variants and their interactions in disease risk prediction - machine learning and network perspectives. BioData Min 6: 5 doi:10.1186/1756-0381-6-5
6. SzymczakS, BiernackaJM, CordellHJ, González-RecioO, KönigIR, et al. (2009) Machine learning in genome-wide association studies. Genet Epidemiol 33 Suppl 1: S51–S57.
7. MooreJH, AsselbergsFW, WilliamsSM (2010) Bioinformatics challenges for genome-wide association studies. Bioinformatics 26: 445–455.
8. KooperbergC, LeBlancM, ObenchainV (2010) Risk prediction using genome-wide association studies. Genet Epidemiol 34: 643–652.
9. KraftP, WacholderS, CornelisMC, HuFB, HayesRB, et al. (2009) Beyond odds ratios: communicating disease risk based on genetic profiles. Nat Rev Genet 10: 264–269.
10. AshleyEA, ButteAJ, WheelerMT, ChenR, KleinTE, et al. (2010) Clinical assessment incorporating a personal genome. Lancet 375: 1525–1535.
11. ManolioTA (2013) Bringing genome-wide association findings into clinical use. Nat Rev Genet 14: 549–558.
12. LehnerB (2011) Molecular mechanisms of epistasis within and between genes. Trends Genet 27: 323–331.
13. LehnerB (2007) Modelling genotype-phenotype relationships and human disease with genetic interaction networks. J Exp Biol 210: 1559–1566.
14. MooreJH, WilliamsSM (2009) Epistasis and its implications for personal genetics. Am J Hum Genet 85: 309–320.
15. AshworthA, LordCJ, Reis-FilhoJS (2011) Genetic interactions in cancer progression and treatment. Cell 145: 30–38.
16. BroughR, FrankumJR, Costa-CabralS, LordCJ, AshworthA (2011) Searching for synthetic lethality in cancer. Curr Opin Genet Dev 21: 34–41.
17. CordellHJ (2009) Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet 10: 392–404.
18. GibsonG (2010) Hints of hidden heritability in GWAS. Nat Genet 42: 558–60.
19. InouyeM, RipattiS, KettunenJ, LyytikäinenLP, OksalaN, et al. (2012) Novel Loci for metabolic networks and multi-tissue expression studies reveal genes for atherosclerosis. PLoS Genet 8: e1002907.
20. RipattiS, TikkanenE, Orho-MelanderM, HavulinnaAS, SilanderK, et al. (2012) A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses. Lancet 376: 1393–400.
21. WineingerNE, HarperA, LibigerO, SrinivasanSR, ChenW, et al. (2013) Front Genet 4: 86.
22. SilverM, ChenP, LiR, ChengCY, WongTY, et al. (2013) Pathways-driven sparse regression identifies pathways and genes associated with high-density lipoprotein cholesterol in two Asian cohorts. PLoS Genet 9: e1003939.
23. CheR, Motsinger-ReifAA (2013) Evaluation of genetic risk score models in the presence of interaction and linkage disequilibrium. Front Genet 4: 138.
24. AbrahamG, KowalczykA, ZobelJ, InouyeM (2013) Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease. Genet Epidemiol 37: 184–195.
25. EvansDM, VisscherPM, WrayNR (2009) Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Hum Mol Genet 18: 3525–3531.
26. ShiG, BoerwinkleE, MorrisonAC, GuCC, ChakravartiA, et al. (2011) Mining gold dust under the genome wide significance level: a two-stage approach to analysis of GWAS. Genetic Epidemiol 35: 111–118.
27. JakobsdottirJ, GorinMB, ConleyYP, FerrellRE, WeeksDE (2009) Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers. PLoS Genet 5: e1000337.
28. WrayNR, YangJ, HayesBJ, PriceAL, GoddardME, et al. (2013) Pitfalls of predicting complex traits from SNPs. Nat Rev Genet 14: 507–515.
29. JostinsL, BarrettJC (2011) Genetic risk prediction in complex disease. Hum Mol Genet 20: R182–188.
30. PahikkalaT, OkserS, AirolaA, SalakoskiT, AittokallioT (2012) Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations. Algorithms Mol Biol 7: 11 doi:10.1186/1748-7188-7-11
31. ChatterjeeN, WheelerB, SampsonJ, HartgeP, ChanockS, et al. (2013) Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat Genet 45: 400–405.
32. DudbridgeF (2013) Power and Predictive Accuracy of Polygenic Risk Scores. PLoS Genet 9: e1003348.
33. DoCB, HindsDA, FranckeU, ErikssonN (2012) Comparison of family history and SNPs for predicting risk of complex disease. PLoS Genet 8: e1002973.
34. YangJ, BenyaminB, McEvoyBP, GordonS, HendersAK, et al. (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42: 565–569.
35. MakowskyR, PajewskiNM, KlimentidisYC, VazquezAI, DuarteCW, et al. (2011) Beyond missing heritability: prediction of complex traits. PLoS Genet 7: e1002051.
36. MaherB (2008) Personal genomes: The case of the missing heritability. Nature 456: 18–21.
37. EichlerEE, FlintJ, GibsonG, KongA, LealSM, et al. (2010) Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11: 446–450.
38. GibsonG (2012) Rare and common variants: twenty arguments. Nat Rev Genet 13: 135–145.
39. MihaescuR, PencinaMJ, AlonsoA, LunettaKL, HeckbertSR, et al. (2013) Incremental value of rare genetic variants for the prediction of multifactorial diseases. Genome Med 20: 76.
40. HuntKA, MistryV, BockettNA, AhmadT, BanM, et al. (2013) Negligible impact of rare autoimmune-locus coding-region variants on missing heritability. Nature 498: 232–235.
41. ManorO, SegalE (2013) Predicting disease risk using bootstrap ranking and classification algorithms. PLoS Comput Biol 9: e1003200.
42. MooreCB, WallaceJR, WolfeDJ, FraseAT, PendergrassSA, et al. (2013) Low frequency variants, collapsed based on biological knowledge, uncover complexity of population stratification in 1000 genomes project data. PLoS Genet 9: e1003959.
43. ZhouH, SehlME, SinsheimerJS, LangeK (2010) Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26: 2375–2382.
44. BloomJS1, EhrenreichIM, LooWT, LiteTL, KruglyakL (2013) Finding the sources of missing heritability in a yeast cross. Nature 494: 234–237.
45. Rat Genome Sequencing and Mapping Consortium (2013) Combined sequence-based and genetic mapping analysis of complex traits in outbred rats. Nat Genet 45: 767–775.
46. BurgaA, LehnerB (2012) Beyond genotype to phenotype: why the phenotype of an individual cannot always be predicted from their genome sequence and the environment that they experience. FEBS J 279: 3765–3775.
47. LehnerB (2013) Genotype to phenotype: lessons from model organisms for human genetics. Nat Rev Genet 14: 168–178.
48. QueitschC, CarlsonKD, GirirajanS (2012) Lessons from model organisms: phenotypic robustness and missing heritability in complex disease. PLoS Genet 8: e1003041.
49. BurgaA, LehnerB (2013) Predicting phenotypic variation from genotypes, phenotypes and a combination of the two. Curr Opin Biotechnol 24: 803–809.
50. ParkS, LehnerB (2013) Epigenetic epistatic interactions constrain the evolution of gene expression. Mol Syst Biol 9: 645.
51. HuangY, WuchtyS, PrzytyckaTM (2013) eQTL epistasis - challenges and computational approaches. Front Genet 4: 51.
52. ManorO, SegalE (2013) Robust prediction of expression differences among human individuals using only genotype information. PLoS Genet 9: e1003396.
53. GoldingerA, HendersAK, McRaeAF, MartinNG, GibsonG, et al. (2013) Genetic and Non-Genetic Variation Revealed for the Principal Components of Human Gene Expression. Genetics 195: 1117–1128.
54. GalvanA, IoannidisJP, DraganiTA (2010) Beyond genome-wide association studies: genetic heterogeneity and individual predisposition to cancer. Trends Genet 26: 132–141.
55. MachielaMJ, ChenCY, ChenC, ChanockSJ, HunterDJ, et al. (2011) Evaluation of polygenic risk scores for predicting breast and prostate cancer risk. Genet Epidemiol 35: 506–514.
56. UrbachD, LupienM, KaragasMR, MooreJH (2012) Cancer heterogeneity: origins and implications for genetic association studies. Trends Genet 28: 538–543.
57. GibsonG, VisscherPM (2013) From personalized to public health genomics. Genome Med 5: 60.
58. BrombergY (2013) Building a genome analysis pipeline to predict disease risk and prevent disease. J Mol Biol 425: 3993–4005.
59. WuJ, PfeifferRM, GailMH (2013) Strategies for developing prediction models from genome-wide association studies. Genet Epidemiol 37: 768–777.
60. WarrenH, CasasJP, HingoraniA, DudbridgeF, WhittakerJ (2014) Genetic prediction of quantitative lipid traits: comparing shrinkage models to gene scores. Genet Epidemiol 38: 72–83.
61. de Los CamposG, VazquezAI, FernandoR, KlimentidisYC, SorensenD (2013) Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet 9: e1003608.
62. Hennings-YeomansPH, CooperGF (2012) Improving the prediction of clinical outcomes from genomic data using multiresolution analysis. IEEE/ACM Trans Comput Biol Bioinform 9: 1442–1450.
63. SolovieffN, CotsapasC, LeePH, PurcellSM, SmollerJW (2013) Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet 14: 483–495.
64. SilverM, JanousovaE, HuaX, ThompsonPM, MontanaG, et al. (2012) Identification of gene pathways implicated in Alzheimer's disease using longitudinal imaging phenotypes with sparse regression. Neuroimage 63: 1681–1694.
65. SchifanoED, LiL, ChristianiDC, LinX (2013) Genome-wide association analysis for multiple continuous secondary phenotypes. Am J Hum Genet 92: 744–759.
66. MarttinenP, GillbergJ, HavulinnaA, CoranderJ, KaskiS (2013) Genome-wide association studies with high-dimensional phenotypes. Stat Appl Genet Mol Biol 12: 413–431.
67. MutshindaCM, NoykovaN, SillanpääMJ (2012) A hierarchical Bayesian approach to multi-trait clinical quantitative trait locus modeling. Front Genet 3: 97.
68. HartleySW, MontiS, LiuCT, SteinbergMH, SebastianiP (2012) Bayesian methods for multivariate modeling of pleiotropic SNP associations and genetic risk prediction. Front Genet 3: 176.
69. HartleySW, SebastianiP (2013) PleioGRiP: genetic risk prediction with pleiotropy. Bioinformatics 29: 1086–1088.
70. BottoloL, Chadeau-HyamM, HastieDI, ZellerT, LiquetB, et al. (2013) GUESS-ing polygenic associations with multiple phenotypes using a GPU-based evolutionary stochastic search algorithm. PLoS Genet 9: e1003657.
71. MarttinenP, PirinenM, SarinAP, GillbergJ, KettunenJ, et al. (2014) Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank regression. Bioinformatics 30: 2026–2034.
72. CarterGW, HaysM, ShermanA, GalitskiT (2012) Use of pleiotropy to model genetic interactions in a population. PLoS Genet 8: e1003010.
73. KimYA, PrzytyckaTM (2012) Bridging the gap between genotype and phenotype via network approaches. Front Genet 3: 227.
74. BebekG, KoyutürkM, PriceND, ChanceMR (2012) Network biology methods integrating biological data for translational science. Brief Bioinform 13: 446–459.
75. MitraK, CarvunisAR, RameshSK, IdekerT (2013) Integrative approaches for finding modular structure in biological networks. Nat Rev Genet 14: 719–732.
76. Upstill-GoddardR, EcclesD, FliegeJ, CollinsA (2013) Machine learning approaches for the discovery of gene-gene interactions in disease data. Brief Bioinform 14: 251–260.
77. LuC, LatourelleJ, O'ConnorGT, DupuisJ, KolaczykED (2013) Network-guided sparse regression modeling for detection of gene-by-gene interactions. Bioinformatics 29: 1241–1249.
78. SuC, AndrewA, KaragasMR, BorsukME (2013) Using Bayesian networks to discover relations between genes, environment, and disease. BioData Min 6: 6.
79. BushWS, MooreJH (2012) Chapter 11: Genome-wide association studies. PLoS Comput Biol 8: e1002822.
80. SunX, LuQ, MukheerjeeS, CranePK, ElstonR, et al. (2014) Analysis pipeline for the epistasis search - statistical versus biological filtering. Front Genet 5: 106.
81. SebastianiP, SolovieN, SunJ (2012) Naive Bayesian classifier and genetic risk score for genetic risk prediction of a categorical trait: not so different after all!. Front Genet 3: 26.
82. TibshiraniR (1994) Regression shrinkage and selection via the Lasso. J Royal Stat Soc B 58: 267–288.
83. ZouH, HastieT (2003) Regularization and variable selection via the elastic net. J Royal Stat Soc B 67: 301–320.
84. WaldmannP, MészárosG, GredlerB, FuerstC, SölknerJ (2013) Evaluation of the lasso and the elastic net in genome-wide association studies. Front Genet 4: 270.
85. WuTT, ChenYF, HastieT, SobelE, LangeK (2009) Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25: 714–721.
86. HoggartCJ, WhittakerJC, De IorioM, BaldingDJ (2008) Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet 4: e1000130.
87. AbrahamG, KowalczykA, ZobelJ, InouyeM (2012) SparSNP: fast and memory-efficient analysis of all SNPs for phenotype prediction. BMC Bioinformatics 13: 88.
88. ChenGK (2012) A scalable and portable framework for massively parallel variable selection in genetic association studies. Bioinformatics 28: 719–720.
89. HoffmanGE, LogsdonBA, MezeyJG (2013) PUMA: a unified framework for penalized multiple regression analysis of GWAS data. PLoS Comput Biol 9: e1003101.
90. BreimanL (2001) Random Forests. Machine Learning 45: 5–32.
91. GoldsteinBA, HubbardAE, CutlerA, BarcellosLF (2010) An application of Random Forests to a genome-wide association dataset: methodological considerations and new findings. BMC Genet 11: 49.
92. BoulesteixAL, BenderA, Lorenzo BermejoJ, StroblC (2012) Random forest Gini importance favours SNPs with large minor allele frequency: impact, sources and recommendations. Brief Bioinform 13: 292–304.
93. LiJ, DasK, FuG, LiR, WuR (2011) The Bayesian lasso for genome-wide association studies. Bioinformatics 27: 516–523.
94. PeltolaT, MarttinenP, JulaA, SalomaaV, PerolaM, et al. (2012) Bayesian variable selection in searching for additive and dominant effects in genome-wide data. PLoS ONE 7: e29115.
95. ZhouX, CarbonettoP, StephensM (2013) Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet 9: e1003264.
96. MiltonJN, GordeukVR, TaylorJG, GladwinMT, SteinbergMH, et al. (2014) Prediction of fetal hemoglobin in sickle cell anemia using an ensemble of genetic risk prediction models. Circ Cardiovasc Genet 7: 110–115.
97. BrownG, WyattJL, TinoP (2005) Managing diversity in regression Ensembles. J Mach Learn Res 6: 1621–1650.
98. PoggioT, RifkinR, MukherjeeS, RakhlinA (2002) Bagging regularizes. CBCL Memo 214. MIT AI lab Available: http://cbcl.mit.edu/publications/ai-publications/2002/AIM-2002-003.pdf. Accessed 24 June 2014.
99. GerfoLL, RosascoL, OdoneF, De VitoE, VerriA (2008) Spectral algorithms for supervised learning. Neural Comput 20: 1873–1897.
100. MitchellTJ, BeauchampJJ (1998) Bayesian variable selection in linear regression. J Am Stat Assoc 83: 1023–1036.
101. Robnik-SikonjaM, KononenkoI (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning 53: 23–69.
102. YangP, HoJW, YangYH, ZhouBB (2011) Gene-gene interaction filtering with ensemble of filters. BMC Bioinformatics 12 Suppl 1: S10.
103. McKinneyBA, CroweJE, GuoJ, TianD (2009) Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis. PLoS Genet 5: e1000432.
104. ZhaoY, ChenF, ZhaiR, LinX, WangZ, et al. (2012) Correction for population stratification in random forest analysis. Int J Epidemiol 41: 1798–1806.
105. RakitschB, LippertC, StegleO, BorgwardtK (2013) A Lasso multi-marker mixed model for association mapping with population structure correction. Bioinformatics 29: 206–214.
106. YangJ, ZaitlenNA, GoddardME, VisscherPM, PriceAL (2014) Advantages and pitfalls in the application of mixed-model association methods. Nat Genet 46: 100–106.
107. HajilooM, SapkotaY, MackeyJR, RobsonP, GreinerR, et al. (2013) ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction. BMC Bioinformatics 14: 61.
108. MengYA, YuY, CupplesLA, FarrerLA, LunettaKL (2009) Performance of random forest when SNPs are in linkage disequilibrium. BMC Bioinformatics 10: 78.
109. BottaV, LouppeG, GeurtsP, WehenkelL (2014) Exploiting SNP correlations within random forest for genome-wide association studies. PloS ONE 9: e93379.
110. MaloN, LibigerO, SchorkNJ (2008) Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. Am J Hum Genet 82: 375–385.
111. HeQ, LinDY (2011) A variable selection method for genome-wide association studies. Bioinformatics 27: 1–8.
112. OberU, ErbeM, LongN, PorcuE, SchlatherM, et al. (2011) Predicting genetic values: a kernel-based best linear unbiased prediction with genomic data. Genetics 188: 695–708.
113. WimmerV, AlbrechtT, AuingerHJ, SchönCC (2012) Synbreed: a framework for the analysis of genomic prediction data using R. Bioinformatics 28: 2086–2087.
114. OberU, AyrolesJF, StoneEA, RichardsS, ZhuD, et al. (2012) Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster. PLoS Genet 8: e1002685.
115. WimmerV, LehermeierC, AlbrechtT, AuingerHJ, WangY, et al. (2013) Genome-wide prediction of traits with different genetic architecture through efficient variable selection. Genetics 195: 573–587.
116. ZhangZ, OberU, ErbeM, ZhangH, GaoN, et al. (2014) Improving the accuracy of whole genome prediction for complex traits using the results of genome wide association studies. PLoS ONE 9: e93017.
117. SpeedD, BaldingDJ (2014) MultiBLUP: improved SNP-based prediction for complex traits. Genome Res 24: 1550–1557.
118. Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678.
119. PedregosaF, VaroquauxG, GramfortA, MichelV, ThirionB, et al. (2011) Scikit-learn: machine learning in Python. J Machine Learn Res 12: 2825–2830.
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2014 Číslo 11
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- An RNA-Seq Screen of the Antenna Identifies a Transporter Necessary for Ammonia Detection
- Systematic Comparison of the Effects of Alpha-synuclein Mutations on Its Oligomerization and Aggregation
- Functional Diversity of Carbohydrate-Active Enzymes Enabling a Bacterium to Ferment Plant Biomass
- Regularized Machine Learning in the Genetic Prediction of Complex Traits