#PAGE_PARAMS# #ADS_HEAD_SCRIPTS# #MICRODATA#

Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data


Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and “ancient” Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com.


Vyšlo v časopise: Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data. PLoS Genet 8(11): e32767. doi:10.1371/journal.pgen.1002967
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1002967

Souhrn

Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and “ancient” Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com.


Zdroje

1. Cavalli-SforzaLL, EdwardsAW (1967) Phylogenetic analysis. Models and estimation proce-dures. Am J Hum Genet 19: 233–57.

2. FelsensteinJ (1982) How can we infer geography and history from gene frequencies? J Theor Biol 96: 9–20.

3. CannRL, StonekingM, WilsonAC (1987) Mitochondrial DNA and human evolution. Nature 325: 31–6.

4. NeiM, RoychoudhuryAK (1974) Genic variation within and between the three major races of man, Caucasoids, Negroids, and Mongoloids. Am J Hum Genet 26: 421–43.

5. NeiM, RoychoudhuryAK (1993) Evolutionary relationships of human populations on a global scale. Mol Biol Evol 10: 927–43.

6. Cavalli-SforzaLL, PiazzaA, MenozziP, MountainJ (1988) Reconstruction of human evolution: bringing together genetic, archaeological, and linguistic data. Proc Natl Acad Sci U S A 85: 6002–6.

7. LiJZ, AbsherDM, TangH, SouthwickAM, CastoAM, et al. (2008) Worldwide human rela-tionships inferred from genome-wide patterns of variation. Science 319: 1100–1104.

8. PritchardJK, SeielstadMT, Perez-LezaunA, FeldmanMW (1999) Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol Biol Evol 16: 1791–8.

9. BeaumontMA, ZhangW, BaldingDJ (2002) Approximate Bayesian computation in popula-tion genetics. Genetics 162: 2025–35.

10. SchaffnerSF, FooC, GabrielS, ReichD, DalyMJ, et al. (2005) Calibrating a coalescent simulation of human genome sequence variation. Genome Res 15: 1576–1583.

11. GutenkunstRN, HernandezRD, WilliamsonSH, BustamanteCD (2009) Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet 5: e1000695 doi:10.1371/journal.pgen.1000695.

12. HeyJ, NielsenR (2004) Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics 167: 747–60.

13. BeerliP, FelsensteinJ (1999) Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152: 763–73.

14. BecquetC, PrzeworskiM (2007) A new approach to estimate parameters of speciation models with application to apes. Genome Res 17: 1505–19.

15. KubatkoLS (2009) Identifying hybridization events in the presence of coalescence via model selection. Syst Biol 58: 478–88.

16. GronauI, HubiszMJ, GulkoB, DankoCG, SiepelA (2011) Bayesian inference of ancient human demography from individual genome sequences. Nat Genet 43: 1031–4.

17. MenozziP, PiazzaA, Cavalli-SforzaL (1978) Synthetic maps of human gene frequencies in Europeans. Science 201: 786–92.

18. PritchardJK, StephensM, DonnellyP (2000) Inference of population structure using multi-locus genotype data. Genetics 155: 945–59.

19. PattersonN, PriceAL, ReichD (2006) Population structure and eigenanalysis. PLoS Genet 2: e190 doi:10.1371/journal.pgen.0020190.

20. LawsonDJ, HellenthalG, MyersS, FalushD (2012) Inference of Population Structure using Dense Haplotype Data. PLoS Genet 8: e1002453 doi:10.1371/journal.pgen.1002453.

21. RosenbergNA, PritchardJK, WeberJL, CannHM, KiddKK, et al. (2002) Genetic structure of human populations. Science 298: 2381–2385.

22. LitiG, CarterDM, MosesAM, WarringerJ, PartsL, et al. (2009) Population genomics of domestic and wild yeasts. Nature 458: 337–41.

23. vonHoldtBM, PollingerJP, LohmuellerKE, HanE, ParkerHG, et al. (2010) Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication. Nature 464: 898–902.

24. FrançoisO, CurratM, RayN, HanE, ExcoffierL, et al. (2010) Principal component analysis under population genetic models of range expansion and admixture. Mol Biol Evol 27: 1257–68.

25. McVeanG (2009) A genealogical interpretation of principal components analysis. PLoS Genet 5: e1000686 doi:10.1371/journal.pgen.1000686.

26. NovembreJ, StephensM (2008) Interpreting principal component analyses of spatial popula-tion genetic variation. Nat Genet 40: 646–9.

27. SaitouN, NeiM (1987) The neighbor-joining method: a new method for reconstructing phy-logenetic trees. Mol Biol Evol 4: 406–25.

28. FelsensteinJ (1973) Maximum-likelihood estimation of evolutionary trees from continuous characters. Am J Hum Genet 25: 471–92.

29. RoyChoudhuryA, FelsensteinJ, ThompsonEA (2008) A two-stage pruning algorithm for likelihood computation for a population tree. Genetics 180: 1095–105.

30. FelsensteinJ (1981) Evolutionary trees from gene frequencies and quantitative characters: finding maximum likelihood estimates. Evolution 35: 1229–1242.

31. SirénJ, MarttinenP, CoranderJ (2011) Reconstructing population histories from single nu-cleotide polymorphism data. Mol Biol Evol 28: 673–83.

32. NielsenR, MountainJL, HuelsenbeckJP, SlatkinM (1998) Maximum-likelihood estimation of population divergence times and population phylogeny in models without mutation. Evolution 52: 669–677.

33. NicholsonG, SmithAV, JónssonF, GústafssonÓ, StefánssonK, et al. (2002) Assessing Pop-ulation Differentiation and Isolation from Single-Nucleotide Polymorphism Data. Journal of the Royal Statistical Society Series B (Statistical Methodology) 64: 695–715.

34. Cavalli-SforzaLL (1973) Analytic review: some current problems of human population genet-ics. Am J Hum Genet 25: 82–104.

35. Cavalli-SforzaLL, PiazzaA (1975) Analysis of evolution: evolutionary rates, independence and treeness. Theor Popul Biol 8: 127–65.

36. KeinanA, MullikinJC, PattersonN, ReichD (2007) Measurement of the human allele fre-quency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nat Genet 39: 1251–5.

37. ReichD, ThangarajK, PattersonN, PriceAL, SinghL (2009) Reconstructing Indian popula-tion history. Nature 461: 489–94.

38. DurandEY, PattersonN, ReichD, SlatkinM (2011) Testing for Ancient Admixture between Closely Related Populations. Mol Biol Evol 28: 2239–52.

39. GreenRE, KrauseJ, BriggsAW, MaricicT, StenzelU, et al. (2010) A draft sequence of the Neandertal genome. Science 328: 710–22.

40. LathropGM (1982) Evolutionary trees and admixture: phylogenetic inference when some populations are hybridized. Ann Hum Genet 46: 245–55.

41. ReichD, GreenRE, KircherM, KrauseJ, PattersonN, et al. (2010) Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468: 1053–60.

42. ReichD, PattersonN, KircherM, DelfinF, NandineniMR, et al. (2011) Denisova admixture and the first modern human dispersals into Southeast Asia and Oceania. Am J Hum Genet 89: 516–28.

43. MoorjaniP, PattersonN, HirschhornJN, KeinanA, HaoL, et al. (2011) The history of African gene flow into Southern Europeans, Levantines, and Jews. PLoS Genet 7: e1001373 doi:10.1371/journal.pgen.1001373..

44. RasmussenM, GuoX, WangY, LohmuellerKE, RasmussenS, et al. (2011) An Aboriginal Australian genome reveals separate human dispersals into Asia. Science 334: 94–8.

45. Huson D, Rupp R, Scornavacca C (2010) Phylogenetic Networks. Concepts, Algorithms and Applications. Cambridge University Press.

46. HusonDH, BryantD (2006) Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23: 254–67.

47. DyerRJ, NasonJD (2004) Population Graphs: the graph theoretic shape of genetic structure. Mol Ecol 13: 1713–27.

48. CoopG, WitonskyD, Di RienzoA, PritchardJK (2010) Using environmental correlations to identify loci underlying local adaptation. Genetics 185: 1411–23.

49. WeirBS, HillWG (2002) Estimating F-statistics. Annu Rev Genet 36: 721–50.

50. Felsenstein J (2003) Inferring Phylogenies. Sinauer Associates, 2nd edition.

51. DeGiorgioM, JakobssonM, RosenbergNA (2009) Out of Africa: modern humanorigins special feature: explaining worldwide patterns of human genetic variation using a coalescent- based serial founder model of migration outward from Africa. Proc Natl Acad Sci U S A 106: 16057–62.

52. JakobssonM, ScholzSW, ScheetP, GibbsJR, VanLiereJM, et al. (2008) Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451: 998–1003.

53. HellenthalG, AutonA, FalushD (2008) Inferring human colonization history using a copying model. PLoS Genet 4: e1000078 doi:10.1371/journal.pgen.1000078..

54. PattersonN, MoorjaniP, LuoY, MallickS, RohlandN, et al. (2012) Ancient admixture. Genetics In Press.

55. PriceAL, TandonA, PattersonN, BarnesKC, RafaelsN, et al. (2009) Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet 5: e1000519 doi:10.1371/journal.pgen.1000519..

56. XuS, HuangW, QianJ, JinL (2008) Analysis of genomic admixture in Uyghur and its implication in mapping strategy. Am J Hum Genet 82: 883–94.

57. HennBM, BotiguéLR, GravelS, WangW, BrisbinA, et al. (2012) Genomic ancestry of North Africans supports back-to-Africa migrations. PLoS Genet 8: e1002397 doi:10.1371/journal.pgen.1002397..

58. BoykoAR, QuignonP, LiL, SchoenebeckJJ, DegenhardtJD, et al. (2010) A simple genetic architecture underlies morphological variation in dogs. PLoS Biol 8: e1000451 doi:10.1371/journal.pbio.1000451..

59. American Kennel Club (2006) The complete dog book. Ballantine Books.

60. ParkerHG, KimLV, SutterNB, CarlsonS, LorentzenTD, et al. (2004) Genetic structure of the purebred domestic dog. Science 304: 1160–4.

61. NovembreJ, JohnsonT, BrycK, KutalikZ, BoykoAR, et al. (2008) Genes mirror geography within Europe. Nature 456: 98–101.

62. NovembreJ, RamachandranS (2011) Perspectives on human population structure at the cusp of the sequencing era. Annu Rev Genomics Hum Genet 12: 245–74.

63. BhatiaG, PattersonN, PasaniucB, ZaitlenN, GenoveseG, et al. (2011) Genome-wide com-parison of African-ancestry populations from CARe and other cohorts reveals signals of natural selection. Am J Hum Genet 89: 368–81.

64. BonhommeM, ChevaletC, ServinB, BoitardS, AbdallahJ, et al. (2010) Detecting selection in population trees: the Lewontin and Krakauer test extended. Genetics 186: 241–62.

65. Myers S, Hellenthal G, Lawson D, Busby G, Leslie S, et al.. (2011). LD patterns in dense variation data reveal information about the history of human populations worldwide. URL http://www.ichg2011.org/cgi-bin/ichg11s?&sort=ptimes&sbutton=Detail&absno=21708&sid=806980.

66. Lawson CL, Hanson RJ (1995) Solving least squares problems. Philadelphia, PA: Society for Industrial Mathematics, 3rd edition.

67. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical recipes in C (2nd ed.): the art of scientific computing. New York, NY, USA: Cambridge University Press.

68. Lindblad-TohK, WadeCM, MikkelsenTS, KarlssonEK, JaffeDB, et al. (2005) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438: 803–19.

69. HudsonRR (2002) Generating samples under a Wright-Fisher neutral model of genetic varia-tion. Bioinformatics 18: 337–8.

70. SukumaranJ, HolderMT (2010) DendroPy: a Python library for phylogenetic computing. Bioinformatics 26: 1569–71.

Štítky
Genetika Reprodukčná medicína

Článok vyšiel v časopise

PLOS Genetics


2012 Číslo 11
Najčítanejšie tento týždeň
Najčítanejšie v tomto čísle
Kurzy

Zvýšte si kvalifikáciu online z pohodlia domova

Aktuální možnosti diagnostiky a léčby litiáz
nový kurz
Autori: MUDr. Tomáš Ürge, PhD.

Všetky kurzy
Prihlásenie
Zabudnuté heslo

Zadajte e-mailovú adresu, s ktorou ste vytvárali účet. Budú Vám na ňu zasielané informácie k nastaveniu nového hesla.

Prihlásenie

Nemáte účet?  Registrujte sa

#ADS_BOTTOM_SCRIPTS#