The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection
Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa.
Vyšlo v časopise:
The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection. PLoS Genet 8(4): e32767. doi:10.1371/journal.pgen.1002660
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1002660
Souhrn
Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa.
Zdroje
1. DoyleJJ 1992 Gene trees and species trees: molecular systematics as one-character taxonomy. Syst Bot 17 144 163
2. MaddisonW 1997 Gene trees in species trees. Syst Biol 46 523 536
3. EdwardsSV 2009 Is a new and general theory of molecular systematic biology emerging? Evolution 63 1 19
4. SwoffordDOlsenGWaddellPHillisD 1996 Phylogenetic inference. HillisDMableBMoritzC Molecular Syst Biol.s Sunderland, Mass. Sinauer Assoc 407 514
5. RosenbergNA 2002 The probability of topological concordance of gene trees and species trees. Theor Pop Biol 61 225 247
6. DegnanJHRosenbergNA 2009 Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24 332 340
7. ArnoldML 1997 Natural Hybridization and Evolution Oxford Oxford University Press
8. MalletJ 2007 Hybrid speciation. Nature 446 279 283
9. HusonDRuppRScornavaccaC 2010 Phylogenetic Networks: Concepts, Algorithms and Applications New York Cambridge University Press
10. NakhlehL 2010 Evolutionary phylogenetic networks: models and issues. HeathLRamakrishnanN The Problem Solving Handbook for Computational Biology and Bioinformatics New York Springer 125 158
11. MalletJ 2005 Hybridization as an invasion of the genome. Trends Ecol Evol 20 229 237
12. LinderCRRiesebergLH 2004 Reconstructing patterns of reticulate evolution in plants. Am J Bot 91 1700 1708
13. DegnanJHDeGiorgioMBryantDRosenbergNA 2009 Properties of consensus methods for inferring species trees from gene trees. Syst Biol 58 35 54
14. ThanCVRosenbergNA 2011 Consistency properties of species tree inference by minimizing deep coalescences. J Comput Biol 18 1 15
15. WangYDegnanJH 2011 Performance of matrix representation with parsimony for inferring species from gene trees. Stat Appl Genet Mol 10 21
16. AnéC 2010 Reconstructing concordance trees and testing the coalescent model from genome- wide data sets. KnowlesLLKubatkoLS Estimating species trees: Theoretical and practical aspects Hoboken, NJ Wiley-Blackwell 35 52
17. AllmanESDegnanJHRhodesJA 2011 Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J Math Biol 62 833 862
18. AllmanESDegnanJHRhodesJA 2011 Determining species tree topologies from clade probabilities under the coalescent. J Theor Biol 289 96 106
19. KnowlesLLCarstensBC 2007 Delimiting species without monophyletic gene trees. Syst Biol 56 887 895
20. KubatkoLSDegnanJH 2007 Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst Biol 56 17 24
21. LiuLYuLPearlDKEdwardsSV 2009 Estimating species phylogenies using coalescence times among sequences. Syst Biol 58 468 477
22. DeGiorgioMDegnanJH 2010 Fast and consistent estimation of species trees using supermatrix rooted triples. Mol Biol Evol 27 552 569
23. CarstensBKnowlesLL 2007 Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. Syst Biol 56 400 411
24. WuY 2012 Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood. Evolution 66 763 775
25. AnéCLargetBBaumDASmithSDRokasA 2007 Bayesian estimation of concordance factors. Mol Biol Evol 24 412 426
26. DegnanJHSalterLA 2005 Gene tree distributions under the coalescent process. Evolution 59 24 37
27. ThanCRuthsDInnanHNakhlehL 2007 Confounding factors in HGT detection: Statistical error, coalescent effects, and multiple solutions. J Comput Biol 14 517 535
28. MengCKubatkoLS 2009 Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: A model. Theor Popul Biol 75 35 45
29. KubatkoLS 2009 Identifying hybridization events in the presence of coalescence via model selection. Syst Biol 58 478 488
30. YuYThanCDegnanJHNakhlehL 2011 Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Syst Biol 60 138 149
31. AkaikeH 1974 A new look at the statistical model identification. IEEE Trans Automat Contr 19 716 723
32. BurnhamKAndersonD 2002 Model selection and multi-model inference: a practical-theoretic approach New York Springer Verlag, 2nd edition
33. SchwarzG 1978 Estimating the dimension of a model. Ann Stat 6 461 464
34. ThanCRuthsDNakhlehL 2008 PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9 322
35. RokasAWilliamsBLKingNCarrollSB 2003 Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425 798 804
36. ThanCNakhlehL 2009 Species tree inference by minimizing deep coalescences. PLoS Comput Biol 5 e1000501 doi:10.1371/journal.pcbi.1000501
37. HuelsenbeckJPRonquistF 2001 MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17 754 755
38. SwoffordDL 1996 PAUP*: Phylogenetic analysis using parsimony (and other methods). Sinauer Associates, Underland, Massachusetts, Version 4.0
39. EdwardsSVLiuLPearlDK 2007 High-resolution species trees without concatenation. Proc Natl Acad Sci U S A 104 5936 5941
40. BloomquistEWSuchardMA 2010 Unifying vertical and nonvertical evolution: A stochastic ARG-based framework. Syst Biol 59 27 41
41. PollardDAIyerVNMosesAMEisenMB 2006 Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting. PLoS Genet 2 e173 doi:10.1371/journal.pgen.0020173
42. NeiM 1987 Molecular Evolutionary Genetics New York Columbia University Press
43. SlatkinM 2008 Linkage disequilibrium — understanding the evolutionary past and mapping the medical future. Nature Rev Genet 9 477 485
44. RannalaBYangZ 2003 Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164 1645 1656
45. LiuLPearlDK 2007 Species trees from gene trees: Reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol 56 504 514
46. HeledJDrummondAJ 2010 Bayesian inference of species trees from multilocus data. Mol Biol Evol 27 570 580
47. LeachéADRannalaB 2011 The accuracy of species tree estimation under simulation: A com- parison of methods. Syst Biol 60 126 137
48. KubatkoLSCarstensBCKnowlesLL 2009 STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25 971 973
49. HuangHHeQKubatkoLSKnowlesLL 2010 Sources of error inherent in species-tree estimation: Impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. Syst Biol 59 573 583
50. PosadaDCrandallKA 2001 Evaluation of methods for detecting recombination from dna sequences: Computer simulations. P Natl Acad Sci USA 98 13757 13762
51. BruenTCPhilippeHBryantD 2002 A simple and robust statistical test for detecting the presence of recombination. Genetics 172 2665 2681
52. RuthsDNakhlehL 2006 RECOMP: A parsimony-based method for detecting recombination. 59 68 In: Proceedings of the 4th Asia Pacific Bioinformatics Conference
53. AnéC 2011 Detecting phylogenetic breakpoints and discordance from genome-wide alignments for species tree reconstruction. Genome Biol Evol 3 246 258
54. LanierHKnowlesL 2012 Is recombination a problem for species-tree analyses? Syst Biol In press: DOI:10.1093/sysbio/syr128
55. WakeleyJ 2008 Coalescent Theory Greenwood Village, CO Roberts & Company
56. RossSM 2010 Introduction to Probability Models New York Academic Press, 10th edition
57. HollandBBenthinSLockhartPMoultonVHuberK 2008 Using supernetworks to distinguish hybridization from lineage-sorting. BMC Evol Biol 8 202
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2012 Číslo 4
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
Najčítanejšie v tomto čísle
- A Coordinated Interdependent Protein Circuitry Stabilizes the Kinetochore Ensemble to Protect CENP-A in the Human Pathogenic Yeast
- Coordinate Regulation of Lipid Metabolism by Novel Nuclear Receptor Partnerships
- Defective Membrane Remodeling in Neuromuscular Diseases: Insights from Animal Models
- Formation of Rigid, Non-Flight Forewings (Elytra) of a Beetle Requires Two Major Cuticular Proteins