Beyond Missing Heritability: Prediction of Complex Traits
Despite rapid advances in genomic technology, our ability to account for phenotypic variation using genetic information remains limited for many traits. This has unfortunately resulted in limited application of genetic data towards preventive and personalized medicine, one of the primary impetuses of genome-wide association studies. Recently, a large proportion of the “missing heritability” for human height was statistically explained by modeling thousands of single nucleotide polymorphisms concurrently. However, it is currently unclear how gains in explained genetic variance will translate to the prediction of yet-to-be observed phenotypes. Using data from the Framingham Heart Study, we explore the genomic prediction of human height in training and validation samples while varying the statistical approach used, the number of SNPs included in the model, the validation scheme, and the number of subjects used to train the model. In our training datasets, we are able to explain a large proportion of the variation in height (h2 up to 0.83, R2 up to 0.96). However, the proportion of variance accounted for in validation samples is much smaller (ranging from 0.15 to 0.36 depending on the degree of familial information used in the training dataset). While such R2 values vastly exceed what has been previously reported using a reduced number of pre-selected markers (<0.10), given the heritability of the trait (∼0.80), substantial room for improvement remains.
Vyšlo v časopise:
Beyond Missing Heritability: Prediction of Complex Traits. PLoS Genet 7(4): e32767. doi:10.1371/journal.pgen.1002051
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1002051
Souhrn
Despite rapid advances in genomic technology, our ability to account for phenotypic variation using genetic information remains limited for many traits. This has unfortunately resulted in limited application of genetic data towards preventive and personalized medicine, one of the primary impetuses of genome-wide association studies. Recently, a large proportion of the “missing heritability” for human height was statistically explained by modeling thousands of single nucleotide polymorphisms concurrently. However, it is currently unclear how gains in explained genetic variance will translate to the prediction of yet-to-be observed phenotypes. Using data from the Framingham Heart Study, we explore the genomic prediction of human height in training and validation samples while varying the statistical approach used, the number of SNPs included in the model, the validation scheme, and the number of subjects used to train the model. In our training datasets, we are able to explain a large proportion of the variation in height (h2 up to 0.83, R2 up to 0.96). However, the proportion of variance accounted for in validation samples is much smaller (ranging from 0.15 to 0.36 depending on the degree of familial information used in the training dataset). While such R2 values vastly exceed what has been previously reported using a reduced number of pre-selected markers (<0.10), given the heritability of the trait (∼0.80), substantial room for improvement remains.
Zdroje
1. ManolioTACollinsFSCoxNJGoldsteinDBHindorffLA 2009 Finding the missing heritability of complex diseases. Nature 461 747 753
2. ClarkeAJCooperDN 2010 GWAS: heritability missing in action? Eur J Hum Genet 18 859 861
3. HuebingerRMGarnerHRBarberRC 2010 Pathway genetic load allows simultaneous evaluation of multiple genetic associations. Burns 36 787 792
4. ParkJHWacholderSGailMHPetersUJacobsKB 2010 Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat Genet 42 570 575
5. VisscherPMMedlandSEFerreiraMARMorleyKIZhuG 2006 Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet 2 e41 doi:10.1371/journal.pgen.0020041
6. BodmerWTomlinsonI 2010 Rare genetic variants and the risk of cancer. Curr Opin Genet Dev 20 262 267
7. ForerLSchönherrSWeissensteinerHHaiderFKlucknerT 2010 CONAN: copy number variation analysis software for genome-wide association studies. BMC Bioinformatics 11 318
8. MaherB 2008 The case of the missing heritibility. Nature 456 18 21
9. DominiczakAFMcBrideMW 2003 Genetics of common ploygenic stroke. Nat Genet 35 116 117
10. GorielyAWilkieAOM 2010 Missing heritability: paternal age effect mutations and selfish spermatogonia. Nat Rev Genet 11 589 589
11. EichlerEEFlintJGibsonGKongALealSM 2010 Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11 446 450
12. YangJBenyaminBMcEvoyBPGordonSHendersAK 2010 Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42 565 569
13. GibsonG 2010 Hints of hidden heritability in GWAS. Nat Genet 42 558 560
14. de los CamposGGianolaDAllisonDB 2010 Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat Rev Genet 11 880 886
15. PaynterNPChasmanDIPareGBuringJECookNR 2010 Association between a literature-based genetic risk score and cardiovascular events in women. JAMA 303 631 637
16. HillWG 2010 Understanding and using quantitative genetic variation. Phil Trans R Soc B 365 73 85
17. MeuwissenTHEHayesBJGoddardME 2001 Prediction of total genetic value using genome-wide dense marker maps. Genetics 157 1819 1829
18. VisscherPMYangKGoddardME 2010 A commentary on ‘Common SNPs explain a large proportion of the heritability for human height’ by Yang et al. Twin Res Hum Genet 13 517 524
19. SilventoinenKSammalistoSPerolaMBoomsmaDICornesBK 2003 Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res 6 399 408
20. MacgregorSCornesBMartinNVisscherP 2006 Bias, precision and heritability of self-reported and clinically measured height in Australian twins. Hum Genet 120 571 580
21. FisherRA 1918 The correlation between relatives on the supposition of Mendelian inheritance. Phil Trans R Soc Edinb 52 399 433
22. WrightS 1921 Systems of mating. I–V. Genetics 6
23. PurcellSMWrayNRStoneJLVisscherPM International Schizophrenia Consortium 2009 Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460 748 752
24. TeslovichTMusunuruKSmithAEdmondsonAStylianouI 2010 Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466 707 713
25. SpeliotesEKWillerCJBerndtSIMondaKLThorleifssonG 2010 Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat Genet 42 937 948
26. HastieTTibshiraniRFriedmanJ 2009 The elements of statistical learning: Data mining, inference, and prediction New York Springer-Verlag
27. DawberTRMeadorsGFMooreFE 1951 Epidemiological approaches to heart disease: the Framingham Study. Am J Public Health 41 279 286
28. DawberTRKannelWBLyellLP 1963 An approach to longitudinal studies in a community: the Framingham Study. Ann N Y Acad Sci 107 539 556
29. ParkTCasellaG 2008 The Bayesian Lasso. J Am Stat Assoc 103 681 686
30. HayesBJGoddardME 2008 Prediction of breeding values using marker-derived relationship matrices. J Anim Sci 86 2089 2092
31. SpiegelhalterDJBestNGCarlinBPVan Der LindeA 2002 Bayesian measures of model complexity and fit. J Roy Stat Soc Ser B (Stat Method) 64 583 639
32. GoddardM 2009 Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 136 245 257
33. GianolaDde los CamposGHillWGManfrediEFernandoR 2009 Additive genetic variability and the Bayesian alphabet. Genetics 183 347 363
34. HabierDFernandoRLDekkersJCM 2007 The Impact of Genetic Relationship Information on Genome-Assisted Breeding Values. Genetics 177 2389 2397
35. HabierDTetensJSeefriedF-RLichtnerPThallerG 2010 The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Genet Sel Evol 21 5
36. Perez-CabalMAVazquezAIGianolaDRosaGJMWeigelKA 2010 Accuracy of genomic predictions in USA Holstein cattle from different training-testing designs. Proceedings of the 9th World Congress on Genetics Applied to Livestock Production # 563 and book of abstracts, p 150 August 1–6, Leipzig, Germany
37. WeigelKAde los CamposGGonzález-RecioONayaHWuXL 2009 Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. J Dairy Sci 92 5248 5257
38. VazquezAIRosaGJMWeigelKAde los CamposGGianolaD 2010 Predictive ability of subsets of single nucleotide polymorphisms with and without parent average in US Holsteins. J Dairy Sci 93 5942 5949
39. Lango AllenHEstradaKLettreGBerndtSIWeedonMN 2010 Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467 832 838
40. HayesBJBowmanPJChamberlainAJGoddardME 2009 Genomic selection in dairy cattle: Progress and challenges. J Dairy Sci 92 433 443
41. GoddardMEHayesBJ 2009 Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nat Rev Genet 10 381 391
42. CrossaJde los CamposGPerezPGianolaDBurguenoJ 2010 Prediction of Genetic Values of Quantitative Traits in Plant Breeding Using Pedigree and Molecular Markers. Genetics 186 713 724
43. de los CamposGNayaHGianolaDCrossaJLegarraA 2009 Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182 375 385
44. Van RadenPMWiggansGRVan TassellCPSonstegardTSSchenkelFS 2009 Benefits from cooperation in genomics. Interbull Bulletin 39 67 72
45. PriceALZaitlenNAReichDPattersonN 2010 New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11 459 463
46. de los CamposGGianolaDRosaGJMWeigelKACrossaJ 2010 Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genetics Res 92 295 308
47. CampbellCDOgburnELLunettaKLLyonHNFreedmanML 2005 Demonstrating stratification in a European American population. Nat Genet 37 868 872
48. de RoosAPWHayesBJGoddardME 2009 Reliability of Genomic Predictions Across Multiple Populations. Genetics 183 1545 1553
49. LynchMRitlandK 1999 Estimation of pairwise relatedness with molecular markers. Genetics 152 1753 1766
50. EdingHMeuwissenTHE 2001 Marker-based estimates of between and within population kinships for the conservation of genetic diversity. J Anim Breed Genet 118 141 159
51. Van RadenPMVan TassellCPWiggansGRSonstegardTSSchnabelRD 2009 Reliability of genomic predictions for North American Holstein bulls. J Dairy Sci 92 16 24
52. YiNXuS 2008 Bayesian LASSO for quantitative trait loci mapping. Genetics 179 1045 1055
53. de los CamposGPerezP 2010 BLR: Bayesian linear regression. R package version 1.1. http://www.R-project.org/
54. HendersonCR 1975 Best linear unbiased estimation and prediction under a selection model. Biometrics 31 423 447
55. HadfieldJDWilson AlastairJGarantDSheldon BenCKruuk LoeskeEB 2010 The Misuse of BLUP in Ecology and Evolution. Am Nat 175 116 125
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2011 Číslo 4
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- PTG Depletion Removes Lafora Bodies and Rescues the Fatal Epilepsy of Lafora Disease
- Survival Motor Neuron Protein Regulates Stem Cell Division, Proliferation, and Differentiation in
- An Evolutionary Genomic Approach to Identify Genes Involved in Human Birth Timing
- Loss-of-Function Mutations in Cause Metachondromatosis, but Not Ollier Disease or Maffucci Syndrome