Inferring Demographic History from a Spectrum of Shared Haplotype Lengths
There has been much recent excitement about the use of genetics to elucidate ancestral history and demography. Whole genome data from humans and other species are revealing complex stories of divergence and admixture that were left undiscovered by previous smaller data sets. A central challenge is to estimate the timing of past admixture and divergence events, for example the time at which Neanderthals exchanged genetic material with humans and the time at which modern humans left Africa. Here, we present a method for using sequence data to jointly estimate the timing and magnitude of past admixture events, along with population divergence times and changes in effective population size. We infer demography from a collection of pairwise sequence alignments by summarizing their length distribution of tracts of identity by state (IBS) and maximizing an analytic composite likelihood derived from a Markovian coalescent approximation. Recent gene flow between populations leaves behind long tracts of identity by descent (IBD), and these tracts give our method power by influencing the distribution of shared IBS tracts. In simulated data, we accurately infer the timing and strength of admixture events, population size changes, and divergence times over a variety of ancient and recent time scales. Using the same technique, we analyze deeply sequenced trio parents from the 1000 Genomes project. The data show evidence of extensive gene flow between Africa and Europe after the time of divergence as well as substructure and gene flow among ancestral hominids. In particular, we infer that recent African-European gene flow and ancient ghost admixture into Europe are both necessary to explain the spectrum of IBS sharing in the trios, rejecting simpler models that contain less population structure.
Vyšlo v časopise:
Inferring Demographic History from a Spectrum of Shared Haplotype Lengths. PLoS Genet 9(6): e32767. doi:10.1371/journal.pgen.1003521
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1003521
Souhrn
There has been much recent excitement about the use of genetics to elucidate ancestral history and demography. Whole genome data from humans and other species are revealing complex stories of divergence and admixture that were left undiscovered by previous smaller data sets. A central challenge is to estimate the timing of past admixture and divergence events, for example the time at which Neanderthals exchanged genetic material with humans and the time at which modern humans left Africa. Here, we present a method for using sequence data to jointly estimate the timing and magnitude of past admixture events, along with population divergence times and changes in effective population size. We infer demography from a collection of pairwise sequence alignments by summarizing their length distribution of tracts of identity by state (IBS) and maximizing an analytic composite likelihood derived from a Markovian coalescent approximation. Recent gene flow between populations leaves behind long tracts of identity by descent (IBD), and these tracts give our method power by influencing the distribution of shared IBS tracts. In simulated data, we accurately infer the timing and strength of admixture events, population size changes, and divergence times over a variety of ancient and recent time scales. Using the same technique, we analyze deeply sequenced trio parents from the 1000 Genomes project. The data show evidence of extensive gene flow between Africa and Europe after the time of divergence as well as substructure and gene flow among ancestral hominids. In particular, we infer that recent African-European gene flow and ancient ghost admixture into Europe are both necessary to explain the spectrum of IBS sharing in the trios, rejecting simpler models that contain less population structure.
Zdroje
1. SlatkinM, MadisonW (1989) A cladistic measure of gene ow inferred form the phylogenies of alleles. Genetics 123: 603–613.
2. TempletonA (2002) Out of Africa again and again. Nature 416: 45–51.
3. TajimaF (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437–460.
4. SlatkinM, HudsonR (1991) Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129: 555–562.
5. WakeleyJ, HeyJ (1997) Estimating ancestral population parameters. Genetics 145: 847–855.
6. GriffthsR, TavaréS (1994) Ancestral inference in population genetics. Stat Sci 9: 307–319.
7. GriffthsR, TavaréS (1994) Simulating probability distributions in the coalescent. Theor Pop Biol 46: 131–159.
8. KuhnerM, YamatoJ, FelsensteinJ (1995) Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics 140: 1421–1430.
9. NielsenR (1998) Maximum likelihood estimation of population divergence times and population phylogenies under the infinite sites model. Theor Pop Biol 53: 143–151.
10. NielsenR (1997) A likelihood approach to populations samples of microsatellite alleles. Genetics 146: 711–716.
11. BeerliP, FelsensteinJ (2001) Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc Natl Acad Sci USA 98: 4563–4568.
12. NielsenR, WakeleyJ (2001) Distinguishing migration from isolation: a Markov Chain Monte Carlo approach. Genetics 158: 885–896.
13. YangZ, RannalaB (1997) Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo method. Mol Biol Evol 14: 717–724.
14. GronauI, HubiszM, GulkoB, DankoC, SiepelA (2011) Bayesian inference of ancient human demography from individual genome sequences. Nature Genetics 43: 1031–1034.
15. SchierupM, HeinJ (2000) Consequences of recombination on traditional phylogenetic analysis. Genetics 156: 879–891.
16. StrasburgJ, RiesebergL (2010) How robust are “isolation with migration” analyses to violations of the IM model? A simulation study. Mol Biol Evol 27: 297–310.
17. GreenR, KrauseJ, BriggsA, MaricicT, StenzelU, et al. (2010) A draft sequence of the Neanderthal genome. Science 328: 710–722.
18. RasmussenM, GuoX, WangY, LohmuellerK, RasmussenS, et al. (2011) An Aboriginal Australian genome reveals separate human dispersals into Asia. Science 334: 94–98.
19. TavaréS, BaldingD, GriffthsR, DonnellyP (1997) Inferring coalescence times from DNA sequence data. Genetics 145: 505–518.
20. PritchardJ, SeielstadM, Perez-LezunA, FeldmanM (1999) Population growth of human Y chromosomes: a study of Y chromosome microsattelites. Mol Biol Evol 16: 1791–1798.
21. BeaumontM, ZhangW, BaldingD (2002) Approximate Bayesian computation in population genetics. Genetics 192: 2025–2035.
22. NielsenR (2000) Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154: 931–942.
23. WilliamsonS, HernandezR, Fledel-AlonA, ZhuL, NielsenR, et al. (2005) Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc Natl Acad Sci USA 102: 7882–7887.
24. GutenkunstR, HernandezR, WilliamsonS, BustamanteC (2009) Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genetics 5: e1000695.
25. Nielsen R, Wiuf C (2005) Composite likelihood estimation applied to single nucleotide polymorphism (SNP) data. In: ISI Conference Proceedings. 5–12 April 2005. Sydney, Australia. URL http://www.math.ku.dk/pbx512/journalWiuf/ISI2005.pdf.
26. WiufC (2006) Consistency of estimators of population scaled parameters using composite likelihood. Math Biol 53: 821–841.
27. HobolthA, ChristensenO, MailundT, SchierupM (2007) Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genetics 3: e7.
28. LiH, DurbinR (2011) Inference of human population history from individual whole-genome sequences. Nature 475: 493–496.
29. SteinrückenM, PaulJ, SongY (2012) A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor Popul Biol Epub ahead of print. doi:10.1016/j.tpb.2012.08.004
30. SheehanS, HarrisK, SongY (2013) Estimating variable effective population sizes from multiple genomes: A sequentially Markov conditional sampling distribution approach. Genetics Epub ahead of print. Doi:10.1534/genetics.112.149096.
31. WiufC, HeinJ (1999) Recombination as a point process along sequences. Theor Popul Biol 55: 248–259.
32. McVeanG, CardinN (2005) Approximating the coalescent with recombination. Phil Trans Royal Soc B 360: 1387–1393.
33. MailundT, HalagerA, WestergaardM, DutheilJ, MunchK, et al. (2012) A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species. PLoS Genetics 8: e1003125.
34. MillerW, SchusterS, WelchA, RatanA, Bedoya-ReinaO, et al. (2012) Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change. Proc Natl Acad Sci USA 109: E2382–E2390.
35. BrowningB, BrowningS (2011) A fast, powerful method for detecting identity by descent. Am J Hum Gen 88: 173–182.
36. PurcellS, NealeB, Todd-BrownK, ThomasL, FerreiraM, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Gen 81: 559–575.
37. MoltkeI, AlbrechtsenA, HansenT, NielsenF, NielsenR (2011) A method for detecting IBD regions simultaneously in multiple individuals- with applications to disease genetics. Genome Res 21: 1168–1180.
38. GusevA, LoweJ, StoffelM, DalyM, AltshulerD, et al. (2009) Whole population, genome-wide mapping of hidden relatedness. Genome Res 19: 318–326.
39. HayesB, VisscherP, McPartlanH, GoddardM (2003) Novel multilocus measure of linkage disequilibrium to estimate past effective population size. Genome Res 13: 635–643.
40. MacLeodI, MeuwissenT, HayesB, GoddardM (2009) A novel predictor of multilocus haplotype homozygosity: comparison with existing predictors. Genet Res 91: 413–426.
41. PalamaraP, LenczT, DarvasiA, Pe'erI (2012) Length distributions of identity by descent reveal fine-scale demographic history. Am J Hum Gen 91: 809–822.
42. RalphP, CoopG (2013) The geography of recent genetic ancestry across Europe. PLoS Biology 11: e1001555.
43. PoolJ, NielsenR (2009) Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics 181: 711–719.
44. GravelS (2012) Population genetics models of local ancestry. Genetics 191: 607–619.
45. MoorjaniP, PattersonN, HirschhornJ, KeinanA, HaoL, et al. (2011) The history of African gene flow into southern Europeans, Levantines, and Jews. PLoS Genetics 7: e1001373.
46. MarjoramP, WallJ (2006) Fast “coalescent” simulation. BMC Genetics 7: 16.
47. PritchardJ (2011) Whole-genome sequencing data offfer insights into human demography. Nature Genetics 43: 923–925.
48. SchaffnerS, FooC, GabrielS, ReichD, DalyM, et al. (2005) Calibrating a coalescent simulation of human genome sequence variation. Genome Res 15: 1576–1583.
49. GravelS, HennB, GutenkunstR, IndapA, MarthG, et al. (2011) Demographic history and rare allele sharing among human populations. Proc Natl Acad Sci USA 108: 11983–11988.
50. Press W, Teukolsky S, Vetterling W, Flannery B (2007) Numerical Recipes: The Art of Scientific Computing. 3rd edition. Cambridge University Press.
51. The 1000 Genomes Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073.
52. HodgkinsonA, LadoukakisE, Eyre-WalkerA (2009) Cryptic variation in the human mutation rate. PLoS Biology 7: e1000027.
53. KongA, GudbjartssonD, SainzJ, JonsdottirG, GudjonssonS, et al. (2002) A high-resolution recombination map of the human genome. Nature 31: 241–247.
54. PaulJ, SteinrückenM, SongY (2011) An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187: 1115–1128.
55. ScallyA, DurbinR (2012) Revising the human mutation rate: implications for understanding human evolution. Nature Rev Gen 13: 745–753.
56. KongA, FriggeM, MassonG, BesenbacherS, SulemP, et al. (2012) Rate of de novo mutations and the importance of father's age to disease risk. Nature 488: 471–475.
57. CoxM, WoernerA, WallJ, HammerM (2008) Intergenic DNA sequences from the human X chromosome reveal high rates of global gene ow. BMC Genetics 9: 1471–2156.
58. NoonanJ, CoopG, KudarvalliS, SmithD, KrauseJ, et al. (2006) Sequencing and analysis of Neanderthal genomic DNA. Science 314: 1113–1118.
59. SankararamanS, PattersonN, LiH, PääbloS, ReichD (2012) The date of interbreeding between Neandertals and modern humans. PLoS Genetics 8: e1002947.
60. BrowningS, BrowningB (2009) A uni_ed approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Gen 84: 210–223.
61. LiY, WillerC, DingJ, ScheetP, AbecasisG (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Gen Epidem 34: 816–834.
62. SabetiP, ReichD, HigginsJ, LevineH, RichterD, et al. (2002) Detecting recent positive selection in the human genome from haplotype structure. Nature 419: 832–837.
63. PickerellJ, CoopG, NovembreJ, KudarvalliS, LiJ, et al. (2009) Signals of recent positive selection in a worldwide sample of human populations. Genome Res 19: 826–837.
64. CharlesworthD, CharlesworthB, MorganM (1995) The pattern of neutral molecular variation under the background selection model. Genetics 141: 1619–1632.
65. McVickerG, GordonD, DavisC, GreenP (2009) Widespread genomic signatures of natural selection in hominid evolution. PLoS Genetics 5: e1000471.
66. LohmuellerK, AlbrechtsenA, LiY, KimS, KorneliussenT, et al. (2011) Natural selection affects multiple aspects of genetic variation at putatively neutral sites across the human genome. PLoS Genetics 7: e1002326.
67. Barton N (June 28, 2012). Personal communication.
68. HudsonR (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18: 337–338.
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2013 Číslo 6
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- BMS1 Is Mutated in Aplasia Cutis Congenita
- Sex-stratified Genome-wide Association Studies Including 270,000 Individuals Show Sexual Dimorphism in Genetic Loci for Anthropometric Traits
- Distinctive Expansion of Potential Virulence Genes in the Genome of the Oomycete Fish Pathogen
- Distinct Neuroblastoma-associated Alterations of Impair Sympathetic Neuronal Differentiation in Zebrafish Models