Using Whole-Genome Sequence Data to Predict Quantitative Trait Phenotypes in
Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using ∼2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP–based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms.
Vyšlo v časopise:
Using Whole-Genome Sequence Data to Predict Quantitative Trait Phenotypes in. PLoS Genet 8(5): e32767. doi:10.1371/journal.pgen.1002685
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1002685
Souhrn
Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using ∼2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP–based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms.
Zdroje
1. MackayTFCStoneEAAyrolesJF 2009 The genetics of quantitative traits: Challenges and prospects. Nat Rev Genet 10 565 577 doi:10.1038/nrg2612
2. WrayNRGoddardMEVisscherPM 2007 Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res 17 1520 1528
3. de los CamposGGianolaDAllisonDB 2010 Predicting genetic predisposition in humans: The promise of whole-genome markers. Nat Rev Genet 11 880 886 doi:10.1038/nrg2898
4. HayesBJBowmanPJChamberlainAJGoddardME 2009 Genomic selection in dairy cattle: Progress and challenges. J Dairy Sci 92 433 443
5. LorenzAJChaoSAsoroFGHeffnerELHayashiT 2011 Genomic selection in plant breeding: Knowledge and prospects. Adv Agron 110 77 123
6. HendersonCR 1973 Sire evaluation and genetic trends. J Anim Sci 1973 10 41
7. RanadeKChangMSTingCTPeiDHsiaoCF 2001 High-throughput genotyping with single nucleotide polymorphisms. Genome Res 11 1262 1268
8. VanRadenPM 2008 Efficient methods to compute genomic predictions. J Dairy Sci 91 4414 4423
9. GoddardM 2009 Genomic selection: Prediction of accuracy and maximisation of long-term response. Genetica 185 1021 1031
10. MeuwissenTHEHayesBJGoddardME 2001 Prediction of total genetic value using genomewide dense marker maps. Genetics 157 1819 1829
11. FisherRA 1918 The correlation between relatives under the supposition of mendelian inheritance. Trans Roy Soc Edinburgh 52 399 433
12. PimentelEErbeMKoenigSSimianerH 2011 Genome partitioning of genetic variation for milk production and composition traits in Holstein cattle. Front Gene 2 doi:10.3389/fgene.2011.00019
13. SchönCCUtzHFGrohSTrubergBOpenshawS 2004 Quantitative trait locus mapping based on resampling in a vast maize testcross experiment and its relevance to quantitative genetics for complex traits. Genetics 167 485 498 doi:10.1534/genetics.167.1.485
14. MackayTFC 2004 The genetic architecture of quantitative traits: Lessons from Drosophila. Curr Opin Genetics Dev 14 253 257
15. FlintJMackayTFC 2009 Genetic architecture of quantitative traits in mice, ies, and humans. Genome Res 19 723 733 doi:0.1101/gr.086660.108
16. EckSHBenet-PagèsAFlisikowskiKMeitingerTFriesR 2009 Whole genome sequencing of a single Bos taurus animal for single nucleotide polymorphism discovery. Genome Biol 10 doi:10.1186/gb-2009-10-8-r82
17. The 1000 Genomes Project Consortium 2010 A map of human genome variation from populationscale sequencing. Nature 467 1061 1073
18. ElshireRJGlaubitzJCSunQPolandJAKawamotoK 2011 A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6 e0019379 doi:10.1371/journal.pone.0019379
19. HayesBJPryceJChamberlainAJBowmanPJGoddardME 2010 Genetic architecture of complex traits and accuracy of genomic prediction: Coat colour, milk-fat percentage, and type in Holstein cattle as contrasting model traits. PLoS Genet 6 e1001139 doi:10.1371/journal.pgen.1001139
20. DaetwylerHDPong-WongRVillanuevaBWoolliamsJA 2010 The impact of genetic architecture on genome-wide evaluation methods. Genetics 185 1021 1031
21. GianolaDde los CamposGHillWGManfrediEFernandoR 2009 Additive genetic variability and the Bayesian alphabet. Genetics 183 347 363
22. GianolaDvan KaamJBCHM 2008 Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178 2289 2303
23. de los CamposGGianolaDRosaGJM 2009 Reproducing kernel Hilbert spaces regression: A general framework for genetic evaluation. J Anim Sci 87 1883 1887
24. LongNGianolaDRosaGJMWeigelKAKranisA 2010 Radial basis function regression methods for predicting quantitative traits using SNP markers. Genet Res 92 209 225
25. OberUErbeELongNPorcuESchlatherM 2011 Predicting genetic values: A kernelbased best linear unbiased prediction with genomic data. Genetics 188 695 708
26. MeuwissenTGoddardM 2010 Accurate prediction of genetic values for complex traits by wholegenome resequencing. Genetics 185 623 631
27. MackayTFCRichardsSStoneEABarbadillaAAyrolesJF 2012 The Drosophila Genetic Reference Panel. Nature 482 173 178 doi:10.1038/nature10811
28. AyrolesJFCarboneMAStoneEAJordanKWLymanRF 2009 Systems genetics of complex traits in Drosophila melanogaster. Nat Genet 41 299 307
29. HarbisonSTYamamotoAHFanaraJJNorgaKKMackayTFC 2004 Quantitative trait loci affecting starvation resistance in Drosophila melanogaster. Genetics 166 1807 1823
30. JordanKWCarboneMAYamamotoAMorganTJMackayTFC 2007 Quantitative genomics of locomotor behavior in Drosophila melanogaster. Genome Biol 8 doi:10.1186/gb-2007-8-8-r172
31. MakowskyRPajewskiNMKlimentidisYCVazquezAIDuarteCW 2011 Beyond missing heritability: Prediction of complex traits. PLoS Genet 7 e1002051 doi:10.1371/journal.pgen.1002051
32. EfronBTibshiraniR 1986 Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist Sci 1 54 75
33. KusakabeSYamaguchiYBabaHMukaiT 2000 The genetic structure of the Raleigh natural population of Drosophila melanogaster revisited. Genetics 154 679 685
34. FalconerDSMackayTFC 1996 Introduction to quantitative genetics Harlow, England Pearson
35. QanbariSPimentelETetensJThallerGLichtnerP 2010 The pattern of linkage disequilibrium in german Holstein cattle. Anim Genet 41 346 356 doi:10.1111/j.1365-2052.2009.02011.x
36. TenesaANavarroPHayesBJ 2007 Recent human effective population size estimated from linkage disequilibrium. Genom Res 17 520 526
37. HabierDFernandoRLDekkersJCM 2007 The impact of genetic relationship information on genome-assisted breeding values. Genetics 177 2389 2397
38. MeuwissenTHE 2009 Accuracy of breeding values of ‘unrelated’ individuals predicted by dense SNP genotyping. Genet Sel Evol 41 doi:10.1186/1297-9686-41-35
39. VisscherPMMedlandSEFerreiraMARMorleyKIZhuG 2006 Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full sublings. PLoS Genet 2 e0020041 doi:10.1371/journal.pgen.0020041
40. GonzálezJPetrovDA 2009 The adaptive role of transposable elements in the Drosophila genome. Gene 448 124 133
41. VanRadenPMVan TassellCPWiggansGRSonstegardTSSchnabelRD 2009 Reliability of genomic predictions for North American Holstein bulls. J Dairy Sci 92 16 24
42. AulchenkoYSStruchalinMVBelonogovaNMAxenovichTIWeedonMN 2009 Predicting human height by Victorian and genomic methods. Eur J Human Genet 17 1070 1075
43. BrowningBLBrowningSR 2009 A unified approach to genotype imputation and haplotype phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84 210 223
44. StoneM 1974 Cross-validation choice and assessment of statistical predictions. J Roy Statist Soc B 36 111 147
45. StoneM 1977 An aymptotic equivalence of choice of model by cross-validation and Akaike's criterion. J Roy Statist Soc B 39 44 47
46. AllenD 1977 The relationship between variable selection and data augmentation and a method of prediction. Technometrics 16 125 127
47. LegarraARobert-Grani_eCManfrediEElsenJM 2008 Performance of genomic selection in mice. Genetics 180 611 618
48. HillWGWeirBS 1995 Maximum likelihood estimation of gene location by linkage disequilibrium. Am J Hum Genet 54 704 714
49. AdamsMDCelnikerSEHoltRAEvansCAGocayneJD 2000 The genome sequence of Drosophila melanogaster. Science 287 2185 2195 doi:10.1126/science.287.5461.2185
50. Fiston-LavierASSinghNDLipatovMPetrovDA 2010 Drosophila melanogaster recombination rate calculator. Gene 463 18 20
51. EfronB 1987 Better bootstrap confidence intervals. J Am Stat Assoc 82 171 185
52. SvedJA 1971 Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theor Popul Biol 2 125 141
53. HendersonCR 1984 Applications of Linear Models in Animal Breeding Guelph, Canada University of Guelph
54. GilmourARGogelBJCullisBRThompsonR 2006 ASReml user guide release 2.0 Hemel Hempstead, UK VSN International Ltd.
55. IhakaRGentlemanR 1996 R: A language for data analysis and graphics. J Comput Graph Statist 5 299 314
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2012 Číslo 5
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Inactivation of a Novel FGF23 Regulator, FAM20C, Leads to Hypophosphatemic Rickets in Mice
- Genome-Wide Association of Pericardial Fat Identifies a Unique Locus for Ectopic Fat
- Slowing Replication in Preparation for Reduction
- Deletion of PTH Rescues Skeletal Abnormalities and High Osteopontin Levels in Mice