#PAGE_PARAMS# #ADS_HEAD_SCRIPTS# #MICRODATA#

The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection


Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa.


Vyšlo v časopise: The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection. PLoS Genet 8(4): e32767. doi:10.1371/journal.pgen.1002660
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1002660

Souhrn

Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa.


Zdroje

1. DoyleJJ 1992 Gene trees and species trees: molecular systematics as one-character taxonomy. Syst Bot 17 144 163

2. MaddisonW 1997 Gene trees in species trees. Syst Biol 46 523 536

3. EdwardsSV 2009 Is a new and general theory of molecular systematic biology emerging? Evolution 63 1 19

4. SwoffordDOlsenGWaddellPHillisD 1996 Phylogenetic inference. HillisDMableBMoritzC Molecular Syst Biol.s Sunderland, Mass. Sinauer Assoc 407 514

5. RosenbergNA 2002 The probability of topological concordance of gene trees and species trees. Theor Pop Biol 61 225 247

6. DegnanJHRosenbergNA 2009 Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24 332 340

7. ArnoldML 1997 Natural Hybridization and Evolution Oxford Oxford University Press

8. MalletJ 2007 Hybrid speciation. Nature 446 279 283

9. HusonDRuppRScornavaccaC 2010 Phylogenetic Networks: Concepts, Algorithms and Applications New York Cambridge University Press

10. NakhlehL 2010 Evolutionary phylogenetic networks: models and issues. HeathLRamakrishnanN The Problem Solving Handbook for Computational Biology and Bioinformatics New York Springer 125 158

11. MalletJ 2005 Hybridization as an invasion of the genome. Trends Ecol Evol 20 229 237

12. LinderCRRiesebergLH 2004 Reconstructing patterns of reticulate evolution in plants. Am J Bot 91 1700 1708

13. DegnanJHDeGiorgioMBryantDRosenbergNA 2009 Properties of consensus methods for inferring species trees from gene trees. Syst Biol 58 35 54

14. ThanCVRosenbergNA 2011 Consistency properties of species tree inference by minimizing deep coalescences. J Comput Biol 18 1 15

15. WangYDegnanJH 2011 Performance of matrix representation with parsimony for inferring species from gene trees. Stat Appl Genet Mol 10 21

16. AnéC 2010 Reconstructing concordance trees and testing the coalescent model from genome- wide data sets. KnowlesLLKubatkoLS Estimating species trees: Theoretical and practical aspects Hoboken, NJ Wiley-Blackwell 35 52

17. AllmanESDegnanJHRhodesJA 2011 Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J Math Biol 62 833 862

18. AllmanESDegnanJHRhodesJA 2011 Determining species tree topologies from clade probabilities under the coalescent. J Theor Biol 289 96 106

19. KnowlesLLCarstensBC 2007 Delimiting species without monophyletic gene trees. Syst Biol 56 887 895

20. KubatkoLSDegnanJH 2007 Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst Biol 56 17 24

21. LiuLYuLPearlDKEdwardsSV 2009 Estimating species phylogenies using coalescence times among sequences. Syst Biol 58 468 477

22. DeGiorgioMDegnanJH 2010 Fast and consistent estimation of species trees using supermatrix rooted triples. Mol Biol Evol 27 552 569

23. CarstensBKnowlesLL 2007 Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. Syst Biol 56 400 411

24. WuY 2012 Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood. Evolution 66 763 775

25. AnéCLargetBBaumDASmithSDRokasA 2007 Bayesian estimation of concordance factors. Mol Biol Evol 24 412 426

26. DegnanJHSalterLA 2005 Gene tree distributions under the coalescent process. Evolution 59 24 37

27. ThanCRuthsDInnanHNakhlehL 2007 Confounding factors in HGT detection: Statistical error, coalescent effects, and multiple solutions. J Comput Biol 14 517 535

28. MengCKubatkoLS 2009 Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: A model. Theor Popul Biol 75 35 45

29. KubatkoLS 2009 Identifying hybridization events in the presence of coalescence via model selection. Syst Biol 58 478 488

30. YuYThanCDegnanJHNakhlehL 2011 Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Syst Biol 60 138 149

31. AkaikeH 1974 A new look at the statistical model identification. IEEE Trans Automat Contr 19 716 723

32. BurnhamKAndersonD 2002 Model selection and multi-model inference: a practical-theoretic approach New York Springer Verlag, 2nd edition

33. SchwarzG 1978 Estimating the dimension of a model. Ann Stat 6 461 464

34. ThanCRuthsDNakhlehL 2008 PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9 322

35. RokasAWilliamsBLKingNCarrollSB 2003 Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425 798 804

36. ThanCNakhlehL 2009 Species tree inference by minimizing deep coalescences. PLoS Comput Biol 5 e1000501 doi:10.1371/journal.pcbi.1000501

37. HuelsenbeckJPRonquistF 2001 MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17 754 755

38. SwoffordDL 1996 PAUP*: Phylogenetic analysis using parsimony (and other methods). Sinauer Associates, Underland, Massachusetts, Version 4.0

39. EdwardsSVLiuLPearlDK 2007 High-resolution species trees without concatenation. Proc Natl Acad Sci U S A 104 5936 5941

40. BloomquistEWSuchardMA 2010 Unifying vertical and nonvertical evolution: A stochastic ARG-based framework. Syst Biol 59 27 41

41. PollardDAIyerVNMosesAMEisenMB 2006 Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting. PLoS Genet 2 e173 doi:10.1371/journal.pgen.0020173

42. NeiM 1987 Molecular Evolutionary Genetics New York Columbia University Press

43. SlatkinM 2008 Linkage disequilibrium — understanding the evolutionary past and mapping the medical future. Nature Rev Genet 9 477 485

44. RannalaBYangZ 2003 Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164 1645 1656

45. LiuLPearlDK 2007 Species trees from gene trees: Reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol 56 504 514

46. HeledJDrummondAJ 2010 Bayesian inference of species trees from multilocus data. Mol Biol Evol 27 570 580

47. LeachéADRannalaB 2011 The accuracy of species tree estimation under simulation: A com- parison of methods. Syst Biol 60 126 137

48. KubatkoLSCarstensBCKnowlesLL 2009 STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25 971 973

49. HuangHHeQKubatkoLSKnowlesLL 2010 Sources of error inherent in species-tree estimation: Impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. Syst Biol 59 573 583

50. PosadaDCrandallKA 2001 Evaluation of methods for detecting recombination from dna sequences: Computer simulations. P Natl Acad Sci USA 98 13757 13762

51. BruenTCPhilippeHBryantD 2002 A simple and robust statistical test for detecting the presence of recombination. Genetics 172 2665 2681

52. RuthsDNakhlehL 2006 RECOMP: A parsimony-based method for detecting recombination. 59 68 In: Proceedings of the 4th Asia Pacific Bioinformatics Conference

53. AnéC 2011 Detecting phylogenetic breakpoints and discordance from genome-wide alignments for species tree reconstruction. Genome Biol Evol 3 246 258

54. LanierHKnowlesL 2012 Is recombination a problem for species-tree analyses? Syst Biol In press: DOI:10.1093/sysbio/syr128

55. WakeleyJ 2008 Coalescent Theory Greenwood Village, CO Roberts & Company

56. RossSM 2010 Introduction to Probability Models New York Academic Press, 10th edition

57. HollandBBenthinSLockhartPMoultonVHuberK 2008 Using supernetworks to distinguish hybridization from lineage-sorting. BMC Evol Biol 8 202

Štítky
Genetika Reprodukčná medicína

Článok vyšiel v časopise

PLOS Genetics


2012 Číslo 4
Najčítanejšie tento týždeň
Najčítanejšie v tomto čísle
Kurzy

Zvýšte si kvalifikáciu online z pohodlia domova

Aktuální možnosti diagnostiky a léčby litiáz
nový kurz
Autori: MUDr. Tomáš Ürge, PhD.

Všetky kurzy
Prihlásenie
Zabudnuté heslo

Zadajte e-mailovú adresu, s ktorou ste vytvárali účet. Budú Vám na ňu zasielané informácie k nastaveniu nového hesla.

Prihlásenie

Nemáte účet?  Registrujte sa

#ADS_BOTTOM_SCRIPTS#