#PAGE_PARAMS# #ADS_HEAD_SCRIPTS# #MICRODATA#

Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits


Finding genes involved in adaptation to the environment has long been of interest to evolutionary biologists and ecologists. Most commonly, researchers look for loci whose differences in allelic state correlate with differences in a particular trait or environmental variable such as temperature. The implicit assumption behind such methods is that natural selection by the environment will shape variation in adaptive traits through associated changes in allele frequencies. This means that both environmental and phenotypic variation are relevant for detecting adaptive genes, although we have incomplete knowledge of how the two types of variation relate to adaptation. Here we present a method that aims to identify adaptive genes by combining phenotypic and environmental data. We first predict trait variation from a set of environmental variables as a way to extract the most biologically relevant information from the environment and then look for genes associated with both the predicted and observed trait. Using simulations and published data from the model plant Arabidopsis thaliana, we show that this approach may find adaptive genes more effectively compared to existing methods. We also demonstrate that predicted traits can be used to identify relevant loci in individuals for which no phenotypic data is available.


Vyšlo v časopise: Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits. PLoS Genet 11(10): e32767. doi:10.1371/journal.pgen.1005594
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1005594

Souhrn

Finding genes involved in adaptation to the environment has long been of interest to evolutionary biologists and ecologists. Most commonly, researchers look for loci whose differences in allelic state correlate with differences in a particular trait or environmental variable such as temperature. The implicit assumption behind such methods is that natural selection by the environment will shape variation in adaptive traits through associated changes in allele frequencies. This means that both environmental and phenotypic variation are relevant for detecting adaptive genes, although we have incomplete knowledge of how the two types of variation relate to adaptation. Here we present a method that aims to identify adaptive genes by combining phenotypic and environmental data. We first predict trait variation from a set of environmental variables as a way to extract the most biologically relevant information from the environment and then look for genes associated with both the predicted and observed trait. Using simulations and published data from the model plant Arabidopsis thaliana, we show that this approach may find adaptive genes more effectively compared to existing methods. We also demonstrate that predicted traits can be used to identify relevant loci in individuals for which no phenotypic data is available.


Zdroje

1. Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010 Jun;465(7298):627–31. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3023908&tool=pmcentrez&rendertype=abstract. doi: 10.1038/nature08800 20336072

2. Baxter I, Brazelton JN, Yu D, Huang YS, Lahner B, Yakubova E, et al. A coastal cline in sodium accumulation in Arabidopsis thaliana is driven by natural variation of the sodium transporter AtHKT1;1. PLoS genetics. 2010 Nov;6(11):e1001193. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2978683&tool=pmcentrez&rendertype=abstract. doi: 10.1371/journal.pgen.1001193 21085628

3. Brachi B, Faure N, Horton M, Flahauw E, Vazquez A, Nordborg M, et al. Linkage and association mapping of Arabidopsis thaliana flowering time in nature. PLoS Genet. 2010;6(5):e1000940. doi: 10.1371/journal.pgen.1000940 20463887

4. Fournier-Level a, Korte a, Cooper MD, Nordborg M, Schmitt J, Wilczek aM. A map of local adaptation in Arabidopsis thaliana. Science (New York, NY). 2011 Oct;334(6052):86–9. Available from: http://www.ncbi.nlm.nih.gov/pubmed/21980109.

5. Eckert AJ, van Heerwaarden J, Wegrzyn JL, Nelson CD, Ross-Ibarra J, Gonzalez-Martinez SC, et al. Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L., Pinaceae). Genetics. 2010;185(3):969–982. doi: 10.1534/genetics.110.115543 20439779

6. Coop G, Witonsky D, Di Rienzo A, Pritchard JK. Using environmental correlations to identify loci underlying local adaptation. Genetics. 2010;185(4):1411–1423. doi: 10.1534/genetics.110.114819 20516501

7. Hancock AM, Brachi B, Faure N, Horton MW, Jarymowycz LB, Sperone FG, et al. Adaptation to climate across the Arabidopsis thaliana genome. Science. 2011;334(6052):83–86. doi: 10.1126/science.1209244 21980108

8. Lasky JR, Des Marais DL, McKay JK, Richards JH, Juenger TE, Keitt TH. Characterizing genomic variation of Arabidopsis thaliana: the roles of geography and climate. Mol Ecol. 2012;21(22):5512–5529. doi: 10.1111/j.1365-294X.2012.05709.x 22857709

9. Haldane JBS. The theory of a cline. Journal of genetics. 1948;48(3):277–284. doi: 10.1007/BF02986626 18905075

10. Slatkin M. Gene flow and selection in a cline. Genetics. 1973;1973(1948):733–756. Available from: http://www.genetics.org/content/75/4/733.short.

11. Nagylaki T. Conditions for the existence of clines. Genetics. 1975;80:595–615. Available from: http://www.genetics.org/content/80/3/595.short.

12. Endresen DTF, Street K, Mackay M, Bari A, De Pauw E. Predictive Association between Biotic Stress Traits and Eco-Geographic Data for Wheat and Barley Landraces. Crop Science. 2011;51(5):2036. Available from: https://www.crops.org/publications/cs/abstracts/51/5/2036. doi: 10.2135/cropsci2010.12.0717

13. Bari A, Street K, Mackay M, Endresen DTF, Pauw E, Amri A. Focused identification of germplasm strategy (FIGS) detects wheat stem rust resistance linked to environmental variables. Genetic Resources and Crop Evolution. 2011 Dec;59(7):1465–1481. doi: 10.1007/s10722-011-9775-5

14. Korte A, Vilhjalmsson BJ, Segura V, Platt A, Long Q, Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat Genet. 2012;44(9):1066–1071. doi: 10.1038/ng.2376 22902788

15. Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Meth. 2014 Apr;11(4):407–409. doi: 10.1038/nmeth.2848

16. Korves TM, Schmid KJ, Caicedo AL, Mays C, Stinchcombe JR, Purugganan MD, et al. Fitness effects associated with the major flowering time gene FRIGIDA in Arabidopsis thaliana in the field. The American naturalist. 2007 May;169(5):E141–57. doi: 10.1086/513111 17427127

17. Stinchcombe JR, Weinig C, Ungerer M, Olsen KM, Mays C, Halldorsdottir SS, et al. A latitudinal cline in flowering time in Arabidopsis thaliana modulated by the flowering time gene {FRIGIDA}. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(13):4712–4717. doi: 10.1073/pnas.0306401101 15070783

18. Srikanth A, Schmid M. Regulation of flowering time: all roads lead to Rome. Cellular and Molecular Life Sciences. 2011;68(12):2013–2037. doi: 10.1007/s00018-011-0673-y 21611891

19. Johansson M, Staiger D. Time to flower: interplay between photoperiod and the circadian clock. Journal of Experimental Botany. 2015;66(3):719–730. Available from: http://jxb.oxfordjournals.org/content/66/3/719.abstract. doi: 10.1093/jxb/eru441 25371508

20. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2005 Apr;67(2):301–320. doi: 10.1111/j.1467-9868.2005.00503.x

21. Breiman L. Random forests. Machine learning. 2001;45:5–32. doi: 10.1023/A:1010933404324

22. Hotelling H. Relations between two sets of variates. Biometrika. 1936;28(3):321–377. Available from: http://www.jstor.org/stable/10.2307/2333955. doi: 10.1093/biomet/28.3-4.321

23. Johanson U. Molecular Analysis of FRIGIDA, a Major Determinant of Natural Variation in Arabidopsis Flowering Time. Science. 2000 Oct;290(5490):344–347. doi: 10.1126/science.290.5490.344 11030654

24. Segura V, Vilhjalmsson BJ, Platt A, Korte A, Seren U, Long Q, et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet. 2012 Jul;44(7):825–830. doi: 10.1038/ng.2314 22706313

25. Shindo C, Aranzana MJ, Lister C, Baxter C, Nicholls C, Nordborg M, et al. Role of FRIGIDA and FLOWERING LOCUS C in determining variation in flowering time of Arabidopsis. Plant physiology. 2005 Jun;138(2):1163–73. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1150429&tool=pmcentrez&rendertype=abstract. doi: 10.1104/pp.105.061309 15908596

26. Caicedo AL, Stinchcombe JR, Olsen KM, Schmitt J, Purugganan MD. Epistatic interaction between Arabidopsis FRI and FLC flowering time genes generates a latitudinal cline in a life history trait. Proceedings of the National Academy of Sciences of the United States of America. 2004 Nov;101(44):15670–5. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=524852&tool=pmcentrez&rendertype=abstract. doi: 10.1073/pnas.0406232101 15505218

27. Balasubramanian S, Sureshkumar S, Agrawal M, Michael TP, Wessinger C, Maloof JN, et al. The PHYTOCHROME C photoreceptor gene mediates natural variation in flowering and growth responses of Arabidopsis thaliana. Nature genetics. 2006 Jun;38(6):711–715. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1592229&tool=pmcentrez&rendertype=abstract. doi: 10.1038/ng1818 16732287

28. Li Y, Huang Y, Bergelson J, Nordborg M, Borevitz JO. Association mapping of local climate-sensitive quantitative trait loci in Arabidopsis thaliana. Proceedings of the National Academy of Sciences of the United States of America. 2010 Dec;107(49):21199–21204. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3000268&tool=pmcentrez&rendertype=abstract. doi: 10.1073/pnas.1007431107 21078970

29. Horton MW, Hancock AM, Huang YS, Toomajian C, Atwell S, Auton A, et al. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nature genetics. 2012 Feb;44(2):212–6. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3267885&tool=pmcentrez&rendertype=abstract. doi: 10.1038/ng.1042 22231484

30. Korol aB, Ronin YI, Itskovich aM, Peng J, Nevo E. Enhanced efficiency of quantitative trait loci mapping analysis based on multivariate complexes of quantitative traits. Genetics. 2001 Apr;157(4):1789–803. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1461583&tool=pmcentrez&rendertype=abstract. 11290731

31. Stephens M. A Unified Framework for Association Analysis with Multiple Related Phenotypes. PLoS ONE. 2013 07;8(7):e65245. doi: 10.1371/journal.pone.0065245 23861737

32. Platt A, Vilhjálmsson BJ, Nordborg M. Conditions under which genome-wide association studies will be positively misleading. Genetics. 2010 Nov;186(3):1045–52. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2975277&tool=pmcentrez&rendertype=abstract. doi: 10.1534/genetics.110.121665 20813880

33. Kim S, Plagnol V, Hu TT, Toomajian C, Clark RM, Ossowski S, et al. Recombination and linkage disequilibrium in Arabidopsis thaliana. Nature genetics. 2007 Oct;39(9):1151–5. doi: 10.1038/ng2115 17676040

34. Witten DM, Tibshirani R, Hastie T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics (Oxford, England). 2009 Jul;10(3):515–34. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2697346&tool=pmcentrez&rendertype=abstract. doi: 10.1093/biostatistics/kxp008

35. Nordborg M, Hu TT, Ishino Y, Jhaveri J, Toomajian C, Zheng H, et al. The Pattern of Polymorphism in Arabidopsis thaliana. PLoS Biol. 2005 May;3(7):e196+. doi: 10.1371/journal.pbio.0030196 15907155

36. Platt A, Horton M, Huang YS, Li Y, Anastasio AE, Mulyati NWW, et al. The scale of population structure in Arabidopsis thaliana. PLoS genetics. 2010 Feb;6(2):e1000843+. doi: 10.1371/journal.pgen.1000843 20169178

37. Lloyd J, Meinke D. A comprehensive dataset of genes with a loss-of-function mutant phenotype in Arabidopsis. Plant physiology. 2012 Mar;158(3):1115–1129. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3291275&tool=pmcentrez&rendertype=abstract. doi: 10.1104/pp.111.192393 22247268

38. Schulzweida U, Kornblueh L, Quast R. CDO User’s Guide. Climate Data Operators, Version. 2006;1(6).

39. Dwyer MJ, Schmidt G. The MODIS reprojection tool. In: Earth science satellite remote sensing. Springer; 2006. p. 162–177. doi: 10.1007/978-3-540-37294-3_9

40. Hijmans RJ, van Etten J. raster: Geographic analysis and modeling with raster data. R package version. 2010;1(2):r948.

41. Forsythe WC, Rykiel EJ, Stahl RS, Wu Hi, Schoolfield RM. A model comparison for daylength as a function of latitude and day of year. Ecological Modelling. 1995;80(1):87–95. doi: 10.1016/0304-3800(94)00034-F

42. QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial Foundation Project.; 2013. Available from: http://qgis.osgeo.org.

43. Wasser SK, Shedlock AM, Comstock K, Ostrander EA, Mutayoba B, Stephens M. Assigning African elephant {DNA} to geographic region of origin: Applications to the ivory trade. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(41):14847–14852. doi: 10.1073/pnas.0403170101 15459317

44. Akaike H. A new look at the statistical model identification. Automatic Control, IEEE Transactions on. 1974 Dec;19(6):716–723. doi: 10.1109/TAC.1974.1100705

45. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software. 2010;33(1):1–22. doi: 10.18637/jss.v033.i01 20808728

46. Breiman L. Bagging Predictors. Machine Learning. 1996;24(2):123–140. Available from: http://citeseer.ist.psu.edu/breiman96bagging.html. doi: 10.1023/A:1018054314350

Štítky
Genetika Reprodukčná medicína

Článok vyšiel v časopise

PLOS Genetics


2015 Číslo 10
Najčítanejšie tento týždeň
Najčítanejšie v tomto čísle
Kurzy

Zvýšte si kvalifikáciu online z pohodlia domova

Aktuální možnosti diagnostiky a léčby litiáz
nový kurz
Autori: MUDr. Tomáš Ürge, PhD.

Všetky kurzy
Prihlásenie
Zabudnuté heslo

Zadajte e-mailovú adresu, s ktorou ste vytvárali účet. Budú Vám na ňu zasielané informácie k nastaveniu nového hesla.

Prihlásenie

Nemáte účet?  Registrujte sa

#ADS_BOTTOM_SCRIPTS#