New Routes to Phylogeography: A Bayesian Structured Coalescent Approximation
When studying infectious diseases it is often important to understand how germs spread from location-to-location, person-to-person, or even one part of the body to another. Using phylogeographic methods, it is possible to recover the history of spread of pathogens (or other organisms) by studying their genetic material. Here we reveal that some popular, fast phylogeographic methods are inaccurate, and we introduce a new more reliable method to address the problem. By comparing different phylogeographic methods based on principled population models and fast alternatives, we found that different approaches can give diametrically opposed results, and we offer concrete examples in the context of the ongoing Ebola outbreak in West Africa and the world-wide outbreaks of Avian Influenza Virus and Tomato Yellow Leaf Curl Virus. We found that the most popular phylogeographic method often produces completely inaccurate conclusions. One of the reasons for its popularity has been its computational speed, which has allowed users to analyse large genetic datasets with complex models. More accurate approaches have until now been considerably slower, and therefore we propose a new method called BASTA that achieves good accuracy in a reasonable time. We are relying more and more on genetic sequencing to learn about the origin and spread of infections, and as this role continues to grow, it will be essential to use accurate phylogeographic methods when designing policies to prevent or curb the spread of disease.
Vyšlo v časopise:
New Routes to Phylogeography: A Bayesian Structured Coalescent Approximation. PLoS Genet 11(8): e32767. doi:10.1371/journal.pgen.1005421
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1005421
Souhrn
When studying infectious diseases it is often important to understand how germs spread from location-to-location, person-to-person, or even one part of the body to another. Using phylogeographic methods, it is possible to recover the history of spread of pathogens (or other organisms) by studying their genetic material. Here we reveal that some popular, fast phylogeographic methods are inaccurate, and we introduce a new more reliable method to address the problem. By comparing different phylogeographic methods based on principled population models and fast alternatives, we found that different approaches can give diametrically opposed results, and we offer concrete examples in the context of the ongoing Ebola outbreak in West Africa and the world-wide outbreaks of Avian Influenza Virus and Tomato Yellow Leaf Curl Virus. We found that the most popular phylogeographic method often produces completely inaccurate conclusions. One of the reasons for its popularity has been its computational speed, which has allowed users to analyse large genetic datasets with complex models. More accurate approaches have until now been considerably slower, and therefore we propose a new method called BASTA that achieves good accuracy in a reasonable time. We are relying more and more on genetic sequencing to learn about the origin and spread of infections, and as this role continues to grow, it will be essential to use accurate phylogeographic methods when designing policies to prevent or curb the spread of disease.
Zdroje
1. Bloomquist EW, Lemey P, Suchard MA (2010) Three roads diverged? routes to phylogeographic inference. Trends Ecol Evol 25: 626–632. doi: 10.1016/j.tree.2010.08.010 20863591
2. Hudson RR, et al. (1990) Gene genealogies and the coalescent process. Oxford surveys in evolutionary biology 7: 44.
3. Notohara M (1990) The coalescent and the genealogical process in geographically structured population. J Math Biol 29: 59–75. doi: 10.1007/BF00173909 2277236
4. Templeton AR, Boerwinkle E, Sing CF (1987) A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. i. basic theory and an analysis of alcohol dehydrogenase activity in drosophila. Genetics 117: 343–351. 2822535
5. Templeton AR, Sing CF (1993) A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. iv. nested analyses with cladogram uncertainty and recombination. Genetics 134: 659–669. 8100789
6. Templeton AR (2010) Coalescent-based, maximum likelihood inference in phylogeography. Mol Ecol 19: 431–435. doi: 10.1111/j.1365-294X.2009.04514.x 20070519
7. Hey J, Machado CA (2003) The study of structured populations-new hope for a difficult and divided science. Nat Rev Genet 4: 535–543. doi: 10.1038/nrg1112 12838345
8. Beaumont MA, Nielsen R, Robert C, Hey J, Gaggiotti O, et al. (2010) In defence of model-based inference in phylogeography. Mol Ecol 19: 436–446. doi: 10.1111/j.1365-294X.2009.04515.x
9. Lemey P, Rambaut A, Drummond AJ, Suchard MA (2009) Bayesian phylogeography finds its roots. PLoS Comput Biol 5: e1000520. doi: 10.1371/journal.pcbi.1000520 19779555
10. Lemey P, Rambaut A, Welch JJ, Suchard MA (2010) Phylogeography takes a relaxed random walk in continuous space and time. Mol Biol Evol 27: 1877–1885. doi: 10.1093/molbev/msq067 20203288
11. Allicock OM, Lemey P, Tatem AJ, Pybus OG, Bennett SN, et al. (2012) Phylogeography and population dynamics of dengue viruses in the americas. Mol Biol Evol 29: 1533–1543. doi: 10.1093/molbev/msr320 22319149
12. Faria NR, Hodges-Mameletzis I, Silva JC, Rodés B, Erasmus S, et al. (2012) Phylogeographical footprint of colonial history in the global dispersal of human immunodeficiency virus type 2 group a. J Gen Virol 93: 889–899. doi: 10.1099/vir.0.038638-0 22190015
13. Campos PF, Willerslev E, Sher A, Orlando L, Axelsson E, et al. (2010) Ancient DNA analyses exclude humans as the driving force behind late pleistocene musk ox (ovibos moschatus) population dynamics. Proc Natl Acad Sci U S A 107: 5675–5680. doi: 10.1073/pnas.0907189107 20212118
14. Brandley MC, Wang Y, Guo X, de Oca ANM, Fería-Ortíz M, et al. (2011) Accommodating heterogenous rates of evolution in molecular divergence dating methods: an example using intercontinental dispersal of plestiodon (eumeces) lizards. Syst Biol 60: 3–15. doi: 10.1093/sysbio/syq045 20952756
15. Edwards CJ, Suchard MA, Lemey P, Welch JJ, Barnes I, et al. (2011) Ancient hybridization and an irish origin for the modern polar bear matriline. Curr Biol 21: 1251–1258. doi: 10.1016/j.cub.2011.05.058 21737280
16. Drummond CS, Eastwood RJ, Miotto ST, Hughes CE (2012) Multiple continental radiations and correlates of diversification in lupinus (leguminosae): testing for key innovation with incomplete taxon sampling. Syst Biol 61: 443–460. doi: 10.1093/sysbio/syr126 22228799
17. LIU JQ, SUN YS, GE XJ, GAO LM, QIU YX (2012) Phylogeographic studies of plants in china: advances in the past and directions in the future. J Syst Evol 50: 267–275. doi: 10.1111/j.1759-6831.2012.00214.x
18. Bouckaert R, Lemey P, Dunn M, Greenhill SJ, Alekseyenko AV, et al. (2012) Mapping the origins and expansion of the indo-european language family. Science 337: 957–960. doi: 10.1126/science.1219669 22923579
19. Chaillon A, Gianella S, Wertheim JO, Richman DD, Mehta SR, et al. (2014) Hiv migration between blood and cerebrospinal fluid or semen over time. Journal of Infectious Diseases 209: 1642–1652. doi: 10.1093/infdis/jit678 24302756
20. Didelot X, Eyre DW, Cule M, Ip C, Ansari MA, et al. (2012) Microevolutionary analysis of clostridium difficile genomes to investigate transmission. Genome Biol 13: R118. doi: 10.1186/gb-2012-13-12-r118 23259504
21. Grad YH, Kirkcaldy RD, Trees D, Dordel J, Harris SR, et al. (2014) Genomic epidemiology of neisseria gonorrhoeae with reduced susceptibility to cefixime in the USA: a retrospective observational study. Lancet Infect Dis 14: 220–226. doi: 10.1016/S1473-3099(13)70693-5 24462211
22. Spoor LE, McAdam PR, Weinert LA, Rambaut A, Hasman H, et al. (2013) Livestock origin for a human pandemic clone of community-associated methicillin-resistant staphylococcus aureus. MBio 4: e00356–13. doi: 10.1128/mBio.00356-13 23943757
23. Beerli P, Felsenstein J (1999) Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152: 763–773. 10353916
24. Beerli P, Felsenstein J (2001) Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc Natl Acad Sci U S A 98: 4563–4568. doi: 10.1073/pnas.081068098 11287657
25. Ewing G, Nicholls G, Rodrigo A (2004) Using temporally spaced sequences to simultaneously estimate migration rates, mutation rate and population sizes in measurably evolving populations. Genetics 168: 2407–2420. doi: 10.1534/genetics.104.030411 15611198
26. Beerli P (2006) Comparison of bayesian and maximum-likelihood inference of population genetic parameters. Bioinformatics 22: 341–345. doi: 10.1093/bioinformatics/bti803 16317072
27. Vaughan TG, Kühnert D, Popinga A, Welch D, Drummond AJ (2014) Efficient bayesian inference under the structured coalescent. Bioinformatics: btu201.
28. Bodmer WF, Cavalli-Sforza LL (1968) A migration matrix model for the study of random genetic drift. Genetics 59: 565. 5708302
29. Wright S (1931) Evolution in mendelian populations. Genetics 16: 97. 17246615
30. Kühnert D, Wu CH, Drummond AJ (2011) Phylogenetic and epidemic modeling of rapidly evolving infectious diseases. Infect Genet Evol 11: 1825–1841. doi: 10.1016/j.meegid.2011.08.005 21906695
31. Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, et al. (2014) Beast 2: a software platform for bayesian evolutionary analysis. PLoS Comput Biol 10: e1003537. doi: 10.1371/journal.pcbi.1003537 24722319
32. Herbots HMJD (1994) Stochastic models in population genetics: genealogy and genetic differentiation in structured populations. Ph.D. thesis.
33. Wilkinson-Herbots HM (1998) Genealogy and subpopulation differentiation under various models of population structure. J Math Biol 37: 535–585. doi: 10.1007/s002850050140
34. Cunningham CW, Omland KE, Oakley TH (1998) Reconstructing ancestral character states: a critical reappraisal. Trends Ecol Evol 13: 361–366. doi: 10.1016/S0169-5347(98)01382-2 21238344
35. Pagel M (1999) The maximum likelihood approach to reconstructing ancestral character states of discrete characters on phylogenies. Syst Biol: 612–622. doi: 10.1080/106351599260184
36. Lemey P, Rambaut A, Bedford T, Faria N, Bielejec F, et al. (2014) Unifying viral genetics and human transportation data to predict the global transmission dynamics of human influenza H3N2. PLoS Pathog 10: e1003932. doi: 10.1371/journal.ppat.1003932 24586153
37. Volz EM (2012) Complex population dynamics and the coalescent under neutrality. Genetics 190: 187–201. doi: 10.1534/genetics.111.134627 22042576
38. Rasmussen DA, Volz EM, Koelle K (2014) Phylodynamic inference for structured epidemiological models. PLoS Comput Biol 10: e1003570. doi: 10.1371/journal.pcbi.1003570 24743590
39. Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, et al. (2014) Genomic surveillance elucidates ebola virus origin and transmission during the 2014 outbreak. Science 345: 1369–1372. doi: 10.1126/science.1259657 25214632
40. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17: 368–376. doi: 10.1007/BF01734359 7288891
41. Kingman JFC (1982) The coalescent. Stoch Proc Appl 13: 235–248. doi: 10.1016/0304-4149(82)90011-4
42. Drummond AJ, Suchard MA, Xie D, Rambaut A (2012) Bayesian phylogenetics with beauti and the beast 1.7. Mol Biol Evol 29: 1969–1973. doi: 10.1093/molbev/mss075 22367748
43. Ewing G, Hermisson J (2010) MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics 26: 2064–2065. doi: 10.1093/bioinformatics/btq322 20591904
44. Rambaut A, Grassly NC (1997) Seq-Gen: an application for the monte carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci 13: 235–238. 9183526
45. Lu L, Lycett SJ, Brown AJL (2014) Determining the phylogenetic and phylogeographic origin of highly pathogenic avian influenza (H7N3) in mexico. PloS one 9: e107330. doi: 10.1371/journal.pone.0107330 25226523
46. Lefeuvre P, Martin DP, Harkins G, Lemey P, Gray AJ, et al. (2010) The spread of tomato yellow leaf curl virus from the middle east to the world. PLoS Pathog 6: e1001164. doi: 10.1371/journal.ppat.1001164 21060815
47. Ewing G, Rodrigo A (2006) Estimating population parameters using the structured serial coalescent with bayesian MCMC inference when some demes are hidden. Evol Bioinform Online 2: 227.
48. Leroy EM, Kumulungui B, Pourrut X, Rouquet P, Hassanin A, et al. (2005) Fruit bats as reservoirs of ebola virus. Nature 438: 575–576. doi: 10.1038/438575a 16319873
49. Kass RE, Carlin BP, Gelman A, Neal RM (1998) Markov chain monte carlo in practice: A roundtable discussion. Am Stat 52: 93–100. doi: 10.2307/2685466
50. Pigott DM, Golding N, Mylne A, Huang Z, Henry AJ, et al. (2014) Mapping the zoonotic niche of ebola virus disease in africa. Elife 3: e04395. doi: 10.7554/eLife.04395 25201877
51. May FJ, Davis CT, Tesh RB, Barrett AD (2011) Phylogeography of west nile virus: from the cradle of evolution in africa to eurasia, australia, and the americas. J Virol 85: 2964–2974. doi: 10.1128/JVI.01963-10 21159871
52. Kimura M, Weiss GH (1964) The stepping stone model of population structure and the decrease of genetic correlation with distance. Genetics 49: 561. 17248204
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2015 Číslo 8
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Exon 7 Contributes to the Stable Localization of Xist RNA on the Inactive X-Chromosome
- YAP1 Exerts Its Transcriptional Control via TEAD-Mediated Activation of Enhancers
- SmD1 Modulates the miRNA Pathway Independently of Its Pre-mRNA Splicing Function
- Molecular Basis of Gene-Gene Interaction: Cyclic Cross-Regulation of Gene Expression and Post-GWAS Gene-Gene Interaction Involved in Atrial Fibrillation