Robust Demographic Inference from Genomic and SNP Data
We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets.
Vyšlo v časopise:
Robust Demographic Inference from Genomic and SNP Data. PLoS Genet 9(10): e32767. doi:10.1371/journal.pgen.1003905
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1003905
Souhrn
We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets.
Zdroje
1. NielsenR, HellmannI, HubiszM, BustamanteC, ClarkAG (2007) Recent and ongoing selection in the human genome. Nat Rev Genet 8: 857–868.
2. KelleyJL, MadeoyJ, CalhounJC, SwansonW, AkeyJM (2006) Genomic signatures of positive selection in humans and the limits of outlier approaches. Genome Res 16: 980–989.
3. NielsenR, WilliamsonS, KimY, HubiszMJ, ClarkAG, et al. (2005) Genomic scans for selective sweeps using SNP data. Genome Res 15: 1566–1575.
4. BeaumontMA, NicholsRA (1996) Evaluating loci for use in the genetic analysis of population structure. Proceedings of the Royal Society London B 263: 1619–1626.
5. BoykoAR, WilliamsonSH, IndapAR, DegenhardtJD, HernandezRD, et al. (2008) Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 4: e1000083.
6. KuhnerMK, BeerliP, YamatoJ, FelsensteinJ (2000) Usefulness of Single Nucleotide Polymorphism Data for Estimating Population Parameters. Genetics 156: 439–447.
7. BeerliP, FelsensteinJ (2001) Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proceedings of the National Academy of Sciences USA 98: 4563–4568.
8. HeyJ, NielsenR (2007) Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc Natl Acad Sci U S A 104: 2785–2790.
9. HeyJ (2010) Isolation with migration models for more than two populations. Mol Biol Evol 27: 905–920.
10. BecquetC, PrzeworskiM (2007) A new approach to estimate parameters of speciation models with application to apes. Genome Res 17: 1505–1519.
11. NaduvilezhathL, RoseLE, MetzlerD (2011) Jaatha: a fast composite-likelihood approach to estimate demographic parameters. Mol Ecol 20: 2709–2723.
12. LeuenbergerC, WegmannD (2010) Bayesian computation and model selection without likelihoods. Genetics 184: 243–252.
13. WegmannD, LeuenbergerC, ExcoffierL (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182: 1207–1218.
14. BeaumontMA, CornuetJ-M, MarinJ-M, RobertCP (2009) Adaptive approximate Bayesian computation. Biometrika 96: 983–990.
15. ExcoffierL, EstoupA, CornuetJ-M (2005) Bayesian Analysis of an Admixture Model With Mutations and Arbitrarily Linked Markers. Genetics 169: 1727–1738.
16. BeaumontMA, ZhangW, BaldingDJ (2002) Approximate Bayesian computation in population genetics. Genetics 162: 2025–2035.
17. NielsenR (2000) Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154: 931–942.
18. ChenH (2012) The joint allele frequency spectrum of multiple populations: a coalescent theory approach. Theor Popul Biol 81: 179–195.
19. MarthGT, CzabarkaE, MurvaiJ, SherryST (2004) The Allele Frequency Spectrum in Genome-Wide Human Variation Data Reveals Signals of Differential Demographic History in Three Large World Populations. Genetics 166: 351–372.
20. AdamsAM, HudsonRR (2004) Maximum-likelihood estimation of demographic parameters using the frequency spectrum of unlinked single-nucleotide polymorphisms. Genetics 168: 1699–1712.
21. GutenkunstRN, HernandezRD, WilliamsonSH, BustamanteCD (2009) Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS genetics 5: e1000695.
22. GarriganD (2009) Composite likelihood estimation of demographic parameters. BMC genetics 10: 72.
23. LukicS, HeyJ, ChenK (2011) Non-equilibrium allele frequency spectra via spectral methods. Theoretical population biology 79: 203–219.
24. LukicS, HeyJ (2012) Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion. Genetics 192: 619–639.
25. LiH, DurbinR (2011) Inference of human population history from individual whole-genome sequences. Nature 475: 493–496.
26. GronauI, HubiszMJ, GulkoB, DankoCG, SiepelA (2011) Bayesian inference of ancient human demography from individual genome sequences. Nat Genet 43: 1031–1034.
27. MyersS, FeffermanC, PattersonN (2008) Can one learn history from the allelic spectrum? Theoretical population biology 73: 342–348.
28. GravelS, HennBM, GutenkunstRN, IndapAR, MarthGT, et al. (2011) Demographic history and rare allele sharing among human populations. Proc Natl Acad Sci U S A 108: 11983–11988.
29. TennessenJA, BighamAW, O'ConnorTD, FuW, KennyEE, et al. (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337: 64–69.
30. SousaV, HeyJ (2013) Understanding the origin of species with genome-scale data: modelling gene flow. Nat Rev Genet 14: 404–414.
31. YiX, LiangY, Huerta-SanchezE, JinX, CuoZX, et al. (2010) Sequencing of 50 human exomes reveals adaptation to high altitude. Science 329: 75–78.
32. NielsenR, KorneliussenT, AlbrechtsenA, LiY, WangJ (2012) SNP calling, genotype calling, and sample allele frequency estimation from New-Generation Sequencing data. PloS one 7: e37558.
33. DurbinRM, AbecasisGR, AltshulerDL, AutonA, BrooksLD, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073.
34. CrawfordJE, LazzaroBP (2012) Assessing the accuracy and power of population genetic inference from low-pass next-generation sequencing data. Front Genet 3: 66.
35. NielsenR, PaulJS, AlbrechtsenA, SongYS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12: 443–451.
36. LynchM (2009) Estimation of allele frequencies from high-coverage genome-sequencing projects. Genetics 182: 295–301.
37. KimSY, LohmuellerKE, AlbrechtsenA, LiY, KorneliussenT, et al. (2011) Estimation of allele frequency and association mapping using next-generation sequencing data. BMC Bioinformatics 12: 231.
38. JohnsonPL, SlatkinM (2006) Inference of population genetic parameters in metagenomics: a clean look at messy data. Genome Res 16: 1320–1327.
39. WollsteinA, LaoO, BeckerC, BrauerS, TrentRJ, et al. (2010) Demographic history of Oceania inferred from genome-wide data. Current biology : CB 20: 1983–1992.
40. AlbrechtsenA, NielsenFC, NielsenR (2010) Ascertainment biases in SNP chips affect measures of population divergence. Mol Biol Evol 27: 2534–2547.
41. ClarkAG, HubiszMJ, BustamanteCD, WilliamsonSH, NielsenR (2005) Ascertainment bias in studies of human genome-wide polymorphism. Genome Res 15: 1496–1502.
42. PattersonN, MoorjaniP, LuoY, MallickS, RohlandN, et al. (2012) Ancient admixture in human history. Genetics 192: 1065–1093.
43. Lu Y, Patterson N, Zhan Y, Mallick S, Reich D (2011) Technical design document for a SNP array that is optimized for population genetics. Available: ftp://ftp.cephb.fr/hgdp_supp10/8_12_2011_Technical_Array_Design_Document.pdf
44. NielsenR, HubiszMJ, ClarkAG (2004) Reconstituting the Frequency Spectrum of Ascertained Single-Nucleotide Polymorphism Data. Genetics 168: 2373–2382.
45. PickrellJK, PattersonN, BarbieriC, BertholdF, GerlachL, et al. (2012) The genetic prehistory of southern Africa. Nature communications 3: 1143.
46. WakeleyJ, HeyJ (1997) Estimating ancestral population parameters. Genetics 145: 847–855.
47. ExcoffierL (2004) Patterns of DNA sequence diversity and genetic structure after a range expansion: lessons from the infinite-island model. Mol Ecol 13: 853–864.
48. FagundesNJ, RayN, BeaumontM, NeuenschwanderS, SalzanoFM, et al. (2007) Statistical evaluation of alternative models of human evolution. Proc Natl Acad Sci U S A 104: 17614–17619.
49. ZakhariaF, BasuA, AbsherD, AssimesTL, GoAS, et al. (2009) Characterizing the admixed African ancestry of African Americans. Genome Biol 10: R141.
50. SjodinP, SjostrandAE, JakobssonM, BlumMG (2012) Resequencing data provide no evidence for a human bottleneck in Africa during the penultimate glacial period. Mol Biol Evol 29: 1851–1860.
51. HennBM, GignouxCR, JobinM, GrankaJM, MacphersonJM, et al. (2011) Hunter-gatherer genomic diversity suggests a southern African origin for modern humans. Proc Natl Acad Sci U S A 108: 5154–5162.
52. AkaikeH (1974) New Look at Statistical-Model Identification. Ieee Transactions on Automatic Control Ac19: 716–723.
53. VeeramahKR, WegmannD, WoernerA, MendezFL, WatkinsJC, et al. (2012) An early divergence of KhoeSan ancestors from those of other modern humans is supported by an ABC-based analysis of autosomal resequencing data. Molecular biology and evolution 29: 617–630.
54. HammerMF, WoernerAE, MendezFL, WatkinsJC, WallJD (2011) Genetic evidence for archaic admixture in Africa. Proc Natl Acad Sci U S A 108: 15123–15128.
55. SchlebuschCM, SkoglundP, SjodinP, GattepailleLM, HernandezD, et al. (2012) Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History. Science 338: 374–379.
56. DimmendaalGJ (2008) Language Ecology and Linguistic Diversity on the African Continent. Language and Linguistics Compass 840–858.
57. EhretC (2001) Bantu expansions: Re-envisioning a central problem of early African history. International Journal of African Historical Studies 34: 5–41.
58. ReichD, GreenRE, KircherM, KrauseJ, PattersonN, et al. (2010) Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468: 1053–1060.
59. MeyerM, KircherM, GansaugeMT, LiH, RacimoF, et al. (2012) A high-coverage genome sequence from an archaic Denisovan individual. Science 338: 222–226.
60. AutonA, McVeanG (2007) Recombination rate estimation in the presence of hotspots. Genome Research 17: 1219–1227.
61. JenkinsPA, SongYS, BremRB (2012) Genealogy-based methods for inference of historical recombination and gene flow and their application in Saccharomyces cerevisiae. PloS one 7: e46947.
62. NielsenR, HubiszMJ, HellmannI, TorgersonD, AndresAM, et al. (2009) Darwinian and demographic forces affecting human protein coding genes. Genome Res 19: 838–849.
63. HernandezRD, WilliamsonSH, BustamanteCD (2007) Context dependence, ancestral misidentification, and spurious signatures of natural selection. Mol Biol Evol 24: 1792–1800.
64. VarinC, ReidN, FirthD (2011) An Overview of Composite Likelihood Methods. Statistica Sinica 21: 5–42.
65. BeaumontMA (2003) Estimation of population growth or decline in genetically monitored populations. Genetics 164: 1139–1160.
66. AndrieuC, RobertsGO (2009) The Pseudo-Marginal Approach for Efficient Monte Carlo Computations. Annals of Statistics 37: 697–725.
67. KongA, FriggeML, MassonG, BesenbacherS, SulemP, et al. (2012) Rate of de novo mutations and the importance of father's age to disease risk. Nature 488: 471–475.
68. ScallyA, DurbinR (2012) Revising the human mutation rate: implications for understanding human evolution. Nature reviews Genetics 13: 745–753.
69. LiS, JakobssonM (2012) Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation. BMC genetics 13: 22.
70. CsilleryK, BlumMG, GaggiottiOE, FrancoisO (2010) Approximate Bayesian Computation (ABC) in practice. Trends in ecology & evolution 25: 410–418.
71. LopesJS, BeaumontMA (2010) ABC: a useful Bayesian tool for the analysis of population data. Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases 10: 826–833.
72. AeschbacherS, BeaumontMA, FutschikA (2012) A novel approach for choosing summary statistics in approximate Bayesian computation. Genetics 192: 1027–1047.
73. NunesMA, BaldingDJ (2010) On optimal selection of summary statistics for approximate Bayesian computation. Statistical applications in genetics and molecular biology 9: Article34.
74. SousaVC, FritzM, BeaumontMA, ChikhiL (2009) Approximate bayesian computation without summary statistics: the case of admixture. Genetics 181: 1507–1519.
75. BeerliP (2004) Effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations. Mol Ecol 13: 827–836.
76. SlatkinM (2005) Seeing ghosts: the effect of unsampled populations on migration rates estimated for sampled populations. Mol Ecol 14: 67–73.
77. GelmanA, MengXL, SternH (1996) Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica 6: 733–760.
78. Box GEP, Draper NR (1987) Empirical model-building and response surfaces. New York; Chichester etc.: J. Wiley. XIV, 669 pp.
79. MengXL, RubinDB (1993) Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80: 267–278.
80. Brent RP (1973) Algorithms for Minimization without Derivatives. Englewood Cliffs, NJ: Prentice-Hall.
81. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical Recipes in C++: The Art of Scientific Computing. Cambridge: Cambridge University Press. 1256 p.
82. ExcoffierL, FollM (2011) fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332–1334.
83. DrmanacR, SparksAB, CallowMJ, HalpernAL, BurnsNL, et al. (2010) Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327: 78–81.
84. O'FallonB (2013) Purifying selection causes widespread distortions of genealogical structure on the human×chromosome. Genetics 194: 485–492.
85. BirneyE, AndrewsD, BevanP, CaccamoM, CameronG, et al. (2004) Ensembl 2004. Nucleic acids research 32: D468–470.
86. Karolchik D, Hinrichs AS, Kent WJ (2012) The UCSC Genome Browser. Current protocols in bioinformatics/editoral board, Andreas D Baxevanis [et al] Chapter 1: Unit1 4.
87. BeaumontMA (2004) Recent developments in genetic data analysis: what can they tell us about human demographic history? Heredity 92: 365–379.
88. WakeleyJ (1999) Nonequilibrium migration in human history. Genetics 153: 1863–1871.
89. JohnsonJB, OmlandKS (2004) Model selection in ecology and evolution. Trends in ecology & evolution 19: 101–108.
90. ZhuL, BustamanteCD (2005) A composite-likelihood approach for detecting directional selection from DNA sequence data. Genetics 170: 1411–1421.
91. VarinC, VidoniP (2005) A note on composite likelihood inference and model selection. Biometrika 92: 519–528.
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2013 Číslo 10
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Dominant Mutations in Identify the Mlh1-Pms1 Endonuclease Active Site and an Exonuclease 1-Independent Mismatch Repair Pathway
- Eleven Candidate Susceptibility Genes for Common Familial Colorectal Cancer
- The Histone H3 K27 Methyltransferase KMT6 Regulates Development and Expression of Secondary Metabolite Gene Clusters
- A Mutation in the Gene in Labrador Retrievers with Hereditary Nasal Parakeratosis (HNPK) Provides Insights into the Epigenetics of Keratinocyte Differentiation