Polygenic Modeling with Bayesian Sparse Linear Mixed Models

English version České info

Both linear mixed models (LMMs) and sparse regression models are widely used in genetics applications, including, recently, polygenic modeling in genome-wide association studies. These two approaches make very different assumptions, so are expected to perform well in different situations. However, in practice, for a given dataset one typically does not know which assumptions will be more accurate. Motivated by this, we consider a hybrid of the two, which we refer to as a “Bayesian sparse linear mixed model” (BSLMM) that includes both these models as special cases. We address several key computational and statistical issues that arise when applying BSLMM, including appropriate prior specification for the hyper-parameters and a novel Markov chain Monte Carlo algorithm for posterior inference. We apply BSLMM and compare it with other methods for two polygenic modeling applications: estimating the proportion of variance in phenotypes explained (PVE) by available genotypes, and phenotype (or breeding value) prediction. For PVE estimation, we demonstrate that BSLMM combines the advantages of both standard LMMs and sparse regression modeling. For phenotype prediction it considerably outperforms either of the other two methods, as well as several other large-scale regression methods previously suggested for this problem. Software implementing our method is freely available from http://stephenslab.uchicago.edu/software.html.

Vyšlo v časopise: Polygenic Modeling with Bayesian Sparse Linear Mixed Models. PLoS Genet 9(2): e32767. doi:10.1371/journal.pgen.1003264
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1003264

Souhrn

Zdroje

1. AbneyM, OberC, McPeekMS (2002) Quantitative-trait homozygosity and association mapping and empirical genomewide significance in large, complex pedigrees: Fasting serum-insulin level in the hutterites. Am J Hum Genet 70 : 920–934.

2. YuJ, PressoirG, BriggsWH, BiIV, YamasakiM, et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38 : 203–208.

3. AulchenkoYS, de KoningDJ, HaleyC (2007) Genomewide rapid association using mixed model and regression: A fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177 : 577–585.

4. KangHM, ZaitlenNA, WadeCM, KirbyA, HeckermanD, et al. (2008) Efficient control of population structure in model organism association mapping. Genetics 178 : 1709–1723.

5. KangHM, SulJH, ServiceSK, ZaitlenNA, KongSY, et al. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42 : 348–354.

6. ZhangZ, ErsozE, LaiCQ, TodhunterRJ, TiwariHK, et al. (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42 : 355–360.

7. PriceAL, ZaitlenNA, ReichD, PattersonN (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11 : 459–463.

8. LippertC, ListgartenJ, LiuY, KadieCM, DavidsonRI, et al. (2011) FaST linear mixed models for genome-wide association studies. Nat Methods 8 : 833–835.

9. ZhouX, StephensM (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44 : 821–824.

10. KangHM, YeC, EskinE (2008) Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics 180 : 1909–1925.

11. ListgartenJ, KadiebC, SchadtEE, HeckermanD (2010) Correction for hidden confounders in the genetic analysis of gene expression. Proc Natl Acad Sci U S A 107 : 16465–16470.

12. FusiN, StegleO, LawrenceND (2012) Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies. PLoS Comput Biol 8: e1002330 doi:10.1371/journal.pcbi.1002330.

13. MaloN, LibigerO, SchorkNJ (2008) Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. Am J Hum Genet 82 : 375–385.

14. ChenLS, HutterCM, PotterJD, LiuY, PrenticeRL, et al. (2010) Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. Am J Hum Genet 86 : 860–871.

15. YiN, XuS (2008) Bayesian lasso for quantitative trait loci mapping. Genetics 179 : 1045–1055.

16. HoggartCJ, WhittakerJC, IorioMD, BaldingDJ (2008) Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet 4: e1000130 doi:10.1371/journal.pgen.1000130.

17. WuTT, ChenYF, HastieT, SobelE, LangeK (2009) Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25 : 714–721.

18. LogsdonBA, HoffmanGE, MezeyJG (2010) A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis. BMC Bioinformatics 11 : 58.

19. GuanY, StephensM (2011) Bayesian variable selection regression for genome-wide association studies, and other large-scale problems. Ann Appl Stat 5 : 1780–1815.

20. CarbonettoP, StephensM (2012) Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Anal 7 : 73–108.

21. LeeSI, DudleyAM, DrubinD, SilverPA, KroganNJ, et al. (2009) Learning a prior on regulatory potential from eQTL data. PLoS Genet 5: e1000358 doi:10.1371/journal.pgen.1000358.

22. YangJ, BenyaminB, McEvoyBP, GordonS, HendersAK, et al. (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42 : 565–569.

23. GolanD, RossetS (2011) Accurate estimation of heritability in genome wide studies using random effects models. Bioinformatics 27: i317–i323.

24. LeeSH, WrayNR, GoddardME, VisscherPM (2011) Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet 88 : 294–305.

25. Henderson CR (1984) Applications of linear models in animal breeding. Guelph: University of Guelph.

26. WhittakerJC, ThompsonR, DenhamM (2000) Marker-assisted selection using ridge regression. Genet Res 75 : 249–252.

27. MeuwissenTHE, HayesBJ, GoddardME (2001) Prediction of total genetic value using genomewide dense marker maps. Genetics 157 : 1819–1829.

28. MakowskyR, PajewskiNM, KlimentidisYC, VazquezAI, DuarteCW, et al. (2011) Beyond missing heritability: Prediction of complex traits. PLoS Genet 7: e1002051 doi:10.1371/journal.pgen.1002051.

29. OberU, AyrolesJF, StoneEA, RichardsS, ZhuD, et al. (2012) Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster. PLoS Genet 8: e1002685 doi:10.1371/journal.pgen.1002685.

30. PiephoHP (2009) Ridge regression and extensions for genomewide selection in maize. Crop Sci 49 : 1165–1176.

31. GoddardME, HayesBJ (2009) Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nat Rev Genet 10 : 381–391.

32. VerbylaKL, HayesBJ, BowmanPJ, GoddardME (2009) Accuracy of genomic selection using stochastic search variable selection in Australian Holstein Friesian dairy cattle. Genet Res 91 : 307–311.

33. VerbylaKL, BowmanPJ, HayesBJ, GoddardME (2010) Sensitivity of genomic selection to using different prior distributions. BMC Proc 4: S5.

34. HabierD, FernandoRL, KizilkayaK, GarrickDJ (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics 12 : 186.

35. ErbeM, HayesBJ, MatukumalliLK, GoswamiS, BowmanPJ, et al. (2012) Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J Dairy Sci 95 : 4114–4129.

36. BaragattiM (2011) Bayesian variable selection for probit mixed models applied to gene selection. Bayesian Anal 6 : 209–230.

37. LeeSH, van der WerfJHJ, HayesBJ, GoddardME, VisscherPM (2008) Predicting unobserved phenotypes for complex traits from whole-genome SNP data. PLoS Genet 4: e1000231 doi:10.1371/journal.pgen.1000231.

38. LegarraA, Robert-GraniéC, ManfrediE, ElsenJM (2008) Performance of genomic selection in mice. Genetics 180 : 611–618.

39. de los CamposG, NayaH, GianolaD, CrossaJ, LegarraA, et al. (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182 : 375–385.

40. HayesBJ, PryceJ, ChamberlainAJ, BowmanPJ, GoddardME (2010) Genetic architecture of complex traits and accuracy of genomic prediction: Coat colour, milk-fat percentage, and type in Holstein cattle as contrasting model traits. PLoS Genet 6: e1001139 doi:10.1371/journal.pgen.1001139.

41. SeguraV, VilhjálmssonBJ, PlattA, KorteA, ÜmitSeren, et al. (2012) An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet 44 : 825–830.

42. EichlerEE, FlintJ, GibsonG, KongA, LealSM, et al. (2010) Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11 : 446–450.

43. StahlEA, WegmannD, TrynkaG, Gutierrez-AchuryJ, DoR, et al. (2012) Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat Genet 44 : 483–489.

44. YangJ, ManolioTA, PasqualeLR, BoerwinkleE, CaporasoN, et al. (2011) Genome partitioning of genetic variation for complex traits using common SNPs. Nat Genet 43 : 519–525.

45. DearyIJ, YangJ, DaviesG, HarrisSE, TenesaA, et al. (2012) Genetic contributions to stability and change in intelligence from childhood to old age. Nature 482 : 212–215.

46. LeeSH, DeCandiaTR, RipkeS, YangJ (2012) (PGC-SCZ) TSPGWASC, (2012) et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common snps. Nat Genet 44 : 247–250.

47. de los CamposG, GianolaD, RosaGJ, WeigelKA, CrossaJ (2010) Semi-parametric genomicenabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet Res 92 : 295–308.

48. HoerlAE, KennardRW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 : 55–67.

49. RobinsonGK (1991) That BLUP is a good thing: The estimation of random effects. Stat Sci 6 : 15–32.

50. ClydeM, DesimoneH, ParmigianiG (1996) Prediction via orthogonalized model mixing. J Am Stat Assoc 91 : 1197–1208.

51. ChipmanH, GeorgeEI, McCullochRE (2001) The practical implementation of Bayesian model selection. Model Selection (P Lahiri, ed, IMS, Beachwood, OH MR2000752) 38 : 65–134.

52. GeorgeEI, McCullochRE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88 : 881–889.

53. de los CamposG, HickeyJM, Pong-WongR, DaetwylerHD, CalusMPL (2012) Whole genome regression and prediction methods applied to plant and animal breeding. Genetics

54. GianolaD, de los CamposG, HillWG, ManfrediE, FernandoR (2009) Additive genetic variability and the Bayesian alphabet. Genetics 183 : 347–363.

55. GelmanA (2005) Analysis of variance –⁠ why i is more important than ever. Ann Stat 33 : 1–53.

56. BrowningSR, BrowningBL (2011) Population structure can inate SNP-based heritability estimates. Am J Hum Genet 89 : 191–193.

57. GoddardME, LeeSH, YangJ, WrayNR, VisscherPM (2011) Response to Browning and Browning. Am J Hum Genet 89 : 193–195.

58. ServinB, StephensM (2007) Imputation-based analysis of association studies: Candidate regions and quantitative traits. PLoS Genet 3: e114 doi:10.1371/journal.pgen.0030114.

59. MacgregorS, CornesBK, MartinNG, VisscherPM (2006) Bias, precision and heritability of selfreported and clinically measured height in Australian twins. Hum Genet 120 : 571–580.

60. AbneyM, McPeekMS, OberC (2001) Broad and narrow heritabilities of quantitative traits in a founder population. Am J Hum Genet 68 : 1302–1307.

61. PiliaG, ChenWM, ScuteriA, OrrúM, AlbaiG, et al. (2006) Heritability of cardiovascular and personality traits in 6,148 Sardinians. Am J Hum Genet 2: e132.

62. The Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447 : 661–678.

63. EvansDM, VisscherPM, WrayNR (2009) Harnessing the information contained within genomewide association studies to improve individual prediction of complex disease risk. Hum Mol Genet 18 : 3525–3531.

64. WeiZ, WangK, QuHQ, ZhangH, BradfieldJ, et al. (2009) From disease association to risk assessment: An optimistic view from genome-wide association studies on type 1 diabetes. PLoS Genet 5: e1000678 doi:10.1371/journal.pgen.1000678.

65. KooperbergC, LeBlancM, ObenchainV (2010) Risk prediction using genome-wide association studies. Genet Epidemiol 34 : 643–652.

66. WrayNR, YangJ, GoddardME, VisscherPM (2010) The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet 6: e1000864 doi:10.1371/journal.pgen.1000864.

67. ValdarW, SolbergLC, GauguierD, BurnettS, KlenermanP, et al. (2006) Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat Genet 38 : 879–887.

68. ParkT, CasellaG (2008) The Bayesian lasso. J Am Stat Assoc 103 : 681–686.

69. ValdarW, SolbergLC, GauguierD, CooksonWO, RawlinsJNP, et al. (2006) Genetic and environmental effects on complex traits in mice. Genetics 174 : 959–984.

70. TibshiraniR (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol 58 : 267–288.

71. VisscherPM, HaleyCS, KnottSA (1996) Mapping QTLs for binary traits in backcross and F2 populations. Genet Res 68 : 55–63.

72. RoyV, HobertJP (2007) Convergence rates and asymptotic standard errors for Markov chain Monte Carlo algorithms for Bayesian probit regression. J R Stat Soc Series B Stat Methodol 69 : 607–623.

73. GoddardME (2009) Genomic selection: Prediction of accuracy and maximisation of long term response. Genetica 136 : 245–257.

74. GeorgeEI, McCullochRE (1997) Approaches for Bayesian variable selection. Stat Sin 7 : 339–373.

75. O'HaraRB, SillanpääMJ (2009) A review of Bayesian variable selection methods: What, how and which. Bayesian Anal 4 : 85–118.