Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits
Finding genes involved in adaptation to the environment has long been of interest to evolutionary biologists and ecologists. Most commonly, researchers look for loci whose differences in allelic state correlate with differences in a particular trait or environmental variable such as temperature. The implicit assumption behind such methods is that natural selection by the environment will shape variation in adaptive traits through associated changes in allele frequencies. This means that both environmental and phenotypic variation are relevant for detecting adaptive genes, although we have incomplete knowledge of how the two types of variation relate to adaptation. Here we present a method that aims to identify adaptive genes by combining phenotypic and environmental data. We first predict trait variation from a set of environmental variables as a way to extract the most biologically relevant information from the environment and then look for genes associated with both the predicted and observed trait. Using simulations and published data from the model plant Arabidopsis thaliana, we show that this approach may find adaptive genes more effectively compared to existing methods. We also demonstrate that predicted traits can be used to identify relevant loci in individuals for which no phenotypic data is available.
Vyšlo v časopise:
Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits. PLoS Genet 11(10): e32767. doi:10.1371/journal.pgen.1005594
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1005594
Souhrn
Finding genes involved in adaptation to the environment has long been of interest to evolutionary biologists and ecologists. Most commonly, researchers look for loci whose differences in allelic state correlate with differences in a particular trait or environmental variable such as temperature. The implicit assumption behind such methods is that natural selection by the environment will shape variation in adaptive traits through associated changes in allele frequencies. This means that both environmental and phenotypic variation are relevant for detecting adaptive genes, although we have incomplete knowledge of how the two types of variation relate to adaptation. Here we present a method that aims to identify adaptive genes by combining phenotypic and environmental data. We first predict trait variation from a set of environmental variables as a way to extract the most biologically relevant information from the environment and then look for genes associated with both the predicted and observed trait. Using simulations and published data from the model plant Arabidopsis thaliana, we show that this approach may find adaptive genes more effectively compared to existing methods. We also demonstrate that predicted traits can be used to identify relevant loci in individuals for which no phenotypic data is available.
Zdroje
1. Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010 Jun;465(7298):627–31. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3023908&tool=pmcentrez&rendertype=abstract. doi: 10.1038/nature08800 20336072
2. Baxter I, Brazelton JN, Yu D, Huang YS, Lahner B, Yakubova E, et al. A coastal cline in sodium accumulation in Arabidopsis thaliana is driven by natural variation of the sodium transporter AtHKT1;1. PLoS genetics. 2010 Nov;6(11):e1001193. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2978683&tool=pmcentrez&rendertype=abstract. doi: 10.1371/journal.pgen.1001193 21085628
3. Brachi B, Faure N, Horton M, Flahauw E, Vazquez A, Nordborg M, et al. Linkage and association mapping of Arabidopsis thaliana flowering time in nature. PLoS Genet. 2010;6(5):e1000940. doi: 10.1371/journal.pgen.1000940 20463887
4. Fournier-Level a, Korte a, Cooper MD, Nordborg M, Schmitt J, Wilczek aM. A map of local adaptation in Arabidopsis thaliana. Science (New York, NY). 2011 Oct;334(6052):86–9. Available from: http://www.ncbi.nlm.nih.gov/pubmed/21980109.
5. Eckert AJ, van Heerwaarden J, Wegrzyn JL, Nelson CD, Ross-Ibarra J, Gonzalez-Martinez SC, et al. Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L., Pinaceae). Genetics. 2010;185(3):969–982. doi: 10.1534/genetics.110.115543 20439779
6. Coop G, Witonsky D, Di Rienzo A, Pritchard JK. Using environmental correlations to identify loci underlying local adaptation. Genetics. 2010;185(4):1411–1423. doi: 10.1534/genetics.110.114819 20516501
7. Hancock AM, Brachi B, Faure N, Horton MW, Jarymowycz LB, Sperone FG, et al. Adaptation to climate across the Arabidopsis thaliana genome. Science. 2011;334(6052):83–86. doi: 10.1126/science.1209244 21980108
8. Lasky JR, Des Marais DL, McKay JK, Richards JH, Juenger TE, Keitt TH. Characterizing genomic variation of Arabidopsis thaliana: the roles of geography and climate. Mol Ecol. 2012;21(22):5512–5529. doi: 10.1111/j.1365-294X.2012.05709.x 22857709
9. Haldane JBS. The theory of a cline. Journal of genetics. 1948;48(3):277–284. doi: 10.1007/BF02986626 18905075
10. Slatkin M. Gene flow and selection in a cline. Genetics. 1973;1973(1948):733–756. Available from: http://www.genetics.org/content/75/4/733.short.
11. Nagylaki T. Conditions for the existence of clines. Genetics. 1975;80:595–615. Available from: http://www.genetics.org/content/80/3/595.short.
12. Endresen DTF, Street K, Mackay M, Bari A, De Pauw E. Predictive Association between Biotic Stress Traits and Eco-Geographic Data for Wheat and Barley Landraces. Crop Science. 2011;51(5):2036. Available from: https://www.crops.org/publications/cs/abstracts/51/5/2036. doi: 10.2135/cropsci2010.12.0717
13. Bari A, Street K, Mackay M, Endresen DTF, Pauw E, Amri A. Focused identification of germplasm strategy (FIGS) detects wheat stem rust resistance linked to environmental variables. Genetic Resources and Crop Evolution. 2011 Dec;59(7):1465–1481. doi: 10.1007/s10722-011-9775-5
14. Korte A, Vilhjalmsson BJ, Segura V, Platt A, Long Q, Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat Genet. 2012;44(9):1066–1071. doi: 10.1038/ng.2376 22902788
15. Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Meth. 2014 Apr;11(4):407–409. doi: 10.1038/nmeth.2848
16. Korves TM, Schmid KJ, Caicedo AL, Mays C, Stinchcombe JR, Purugganan MD, et al. Fitness effects associated with the major flowering time gene FRIGIDA in Arabidopsis thaliana in the field. The American naturalist. 2007 May;169(5):E141–57. doi: 10.1086/513111 17427127
17. Stinchcombe JR, Weinig C, Ungerer M, Olsen KM, Mays C, Halldorsdottir SS, et al. A latitudinal cline in flowering time in Arabidopsis thaliana modulated by the flowering time gene {FRIGIDA}. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(13):4712–4717. doi: 10.1073/pnas.0306401101 15070783
18. Srikanth A, Schmid M. Regulation of flowering time: all roads lead to Rome. Cellular and Molecular Life Sciences. 2011;68(12):2013–2037. doi: 10.1007/s00018-011-0673-y 21611891
19. Johansson M, Staiger D. Time to flower: interplay between photoperiod and the circadian clock. Journal of Experimental Botany. 2015;66(3):719–730. Available from: http://jxb.oxfordjournals.org/content/66/3/719.abstract. doi: 10.1093/jxb/eru441 25371508
20. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2005 Apr;67(2):301–320. doi: 10.1111/j.1467-9868.2005.00503.x
21. Breiman L. Random forests. Machine learning. 2001;45:5–32. doi: 10.1023/A:1010933404324
22. Hotelling H. Relations between two sets of variates. Biometrika. 1936;28(3):321–377. Available from: http://www.jstor.org/stable/10.2307/2333955. doi: 10.1093/biomet/28.3-4.321
23. Johanson U. Molecular Analysis of FRIGIDA, a Major Determinant of Natural Variation in Arabidopsis Flowering Time. Science. 2000 Oct;290(5490):344–347. doi: 10.1126/science.290.5490.344 11030654
24. Segura V, Vilhjalmsson BJ, Platt A, Korte A, Seren U, Long Q, et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet. 2012 Jul;44(7):825–830. doi: 10.1038/ng.2314 22706313
25. Shindo C, Aranzana MJ, Lister C, Baxter C, Nicholls C, Nordborg M, et al. Role of FRIGIDA and FLOWERING LOCUS C in determining variation in flowering time of Arabidopsis. Plant physiology. 2005 Jun;138(2):1163–73. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1150429&tool=pmcentrez&rendertype=abstract. doi: 10.1104/pp.105.061309 15908596
26. Caicedo AL, Stinchcombe JR, Olsen KM, Schmitt J, Purugganan MD. Epistatic interaction between Arabidopsis FRI and FLC flowering time genes generates a latitudinal cline in a life history trait. Proceedings of the National Academy of Sciences of the United States of America. 2004 Nov;101(44):15670–5. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=524852&tool=pmcentrez&rendertype=abstract. doi: 10.1073/pnas.0406232101 15505218
27. Balasubramanian S, Sureshkumar S, Agrawal M, Michael TP, Wessinger C, Maloof JN, et al. The PHYTOCHROME C photoreceptor gene mediates natural variation in flowering and growth responses of Arabidopsis thaliana. Nature genetics. 2006 Jun;38(6):711–715. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1592229&tool=pmcentrez&rendertype=abstract. doi: 10.1038/ng1818 16732287
28. Li Y, Huang Y, Bergelson J, Nordborg M, Borevitz JO. Association mapping of local climate-sensitive quantitative trait loci in Arabidopsis thaliana. Proceedings of the National Academy of Sciences of the United States of America. 2010 Dec;107(49):21199–21204. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3000268&tool=pmcentrez&rendertype=abstract. doi: 10.1073/pnas.1007431107 21078970
29. Horton MW, Hancock AM, Huang YS, Toomajian C, Atwell S, Auton A, et al. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nature genetics. 2012 Feb;44(2):212–6. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3267885&tool=pmcentrez&rendertype=abstract. doi: 10.1038/ng.1042 22231484
30. Korol aB, Ronin YI, Itskovich aM, Peng J, Nevo E. Enhanced efficiency of quantitative trait loci mapping analysis based on multivariate complexes of quantitative traits. Genetics. 2001 Apr;157(4):1789–803. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1461583&tool=pmcentrez&rendertype=abstract. 11290731
31. Stephens M. A Unified Framework for Association Analysis with Multiple Related Phenotypes. PLoS ONE. 2013 07;8(7):e65245. doi: 10.1371/journal.pone.0065245 23861737
32. Platt A, Vilhjálmsson BJ, Nordborg M. Conditions under which genome-wide association studies will be positively misleading. Genetics. 2010 Nov;186(3):1045–52. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2975277&tool=pmcentrez&rendertype=abstract. doi: 10.1534/genetics.110.121665 20813880
33. Kim S, Plagnol V, Hu TT, Toomajian C, Clark RM, Ossowski S, et al. Recombination and linkage disequilibrium in Arabidopsis thaliana. Nature genetics. 2007 Oct;39(9):1151–5. doi: 10.1038/ng2115 17676040
34. Witten DM, Tibshirani R, Hastie T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics (Oxford, England). 2009 Jul;10(3):515–34. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2697346&tool=pmcentrez&rendertype=abstract. doi: 10.1093/biostatistics/kxp008
35. Nordborg M, Hu TT, Ishino Y, Jhaveri J, Toomajian C, Zheng H, et al. The Pattern of Polymorphism in Arabidopsis thaliana. PLoS Biol. 2005 May;3(7):e196+. doi: 10.1371/journal.pbio.0030196 15907155
36. Platt A, Horton M, Huang YS, Li Y, Anastasio AE, Mulyati NWW, et al. The scale of population structure in Arabidopsis thaliana. PLoS genetics. 2010 Feb;6(2):e1000843+. doi: 10.1371/journal.pgen.1000843 20169178
37. Lloyd J, Meinke D. A comprehensive dataset of genes with a loss-of-function mutant phenotype in Arabidopsis. Plant physiology. 2012 Mar;158(3):1115–1129. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3291275&tool=pmcentrez&rendertype=abstract. doi: 10.1104/pp.111.192393 22247268
38. Schulzweida U, Kornblueh L, Quast R. CDO User’s Guide. Climate Data Operators, Version. 2006;1(6).
39. Dwyer MJ, Schmidt G. The MODIS reprojection tool. In: Earth science satellite remote sensing. Springer; 2006. p. 162–177. doi: 10.1007/978-3-540-37294-3_9
40. Hijmans RJ, van Etten J. raster: Geographic analysis and modeling with raster data. R package version. 2010;1(2):r948.
41. Forsythe WC, Rykiel EJ, Stahl RS, Wu Hi, Schoolfield RM. A model comparison for daylength as a function of latitude and day of year. Ecological Modelling. 1995;80(1):87–95. doi: 10.1016/0304-3800(94)00034-F
42. QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial Foundation Project.; 2013. Available from: http://qgis.osgeo.org.
43. Wasser SK, Shedlock AM, Comstock K, Ostrander EA, Mutayoba B, Stephens M. Assigning African elephant {DNA} to geographic region of origin: Applications to the ivory trade. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(41):14847–14852. doi: 10.1073/pnas.0403170101 15459317
44. Akaike H. A new look at the statistical model identification. Automatic Control, IEEE Transactions on. 1974 Dec;19(6):716–723. doi: 10.1109/TAC.1974.1100705
45. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software. 2010;33(1):1–22. doi: 10.18637/jss.v033.i01 20808728
46. Breiman L. Bagging Predictors. Machine Learning. 1996;24(2):123–140. Available from: http://citeseer.ist.psu.edu/breiman96bagging.html. doi: 10.1023/A:1018054314350
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2015 Číslo 10
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Single Strand Annealing Plays a Major Role in RecA-Independent Recombination between Repeated Sequences in the Radioresistant Bacterium
- The Rise and Fall of an Evolutionary Innovation: Contrasting Strategies of Venom Evolution in Ancient and Young Animals
- Genome Wide Identification of SARS-CoV Susceptibility Loci Using the Collaborative Cross
- DCA1 Acts as a Transcriptional Co-activator of DST and Contributes to Drought and Salt Tolerance in Rice