A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations
Multivariate statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been widely used to summarize the structure of human genetic variation, often in easily visualized two-dimensional maps. Many recent studies have reported similarity between geographic maps of population locations and MDS or PCA maps of genetic variation inferred from single-nucleotide polymorphisms (SNPs). However, this similarity has been evident primarily in a qualitative sense; and, because different multivariate techniques and marker sets have been used in different studies, it has not been possible to formally compare genetic variation datasets in terms of their levels of similarity with geography. In this study, using genome-wide SNP data from 128 populations worldwide, we perform a systematic analysis to quantitatively evaluate the similarity of genes and geography in different geographic regions. For each of a series of regions, we apply a Procrustes analysis approach to find an optimal transformation that maximizes the similarity between PCA maps of genetic variation and geographic maps of population locations. We consider examples in Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, as well as in a worldwide sample, finding that significant similarity between genes and geography exists in general at different geographic levels. The similarity is highest in our examples for Asia and, once highly distinctive populations have been removed, Sub-Saharan Africa. Our results provide a quantitative assessment of the geographic structure of human genetic variation worldwide, supporting the view that geography plays a strong role in giving rise to human population structure.
Vyšlo v časopise:
A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations. PLoS Genet 8(8): e32767. doi:10.1371/journal.pgen.1002886
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1002886
Souhrn
Multivariate statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been widely used to summarize the structure of human genetic variation, often in easily visualized two-dimensional maps. Many recent studies have reported similarity between geographic maps of population locations and MDS or PCA maps of genetic variation inferred from single-nucleotide polymorphisms (SNPs). However, this similarity has been evident primarily in a qualitative sense; and, because different multivariate techniques and marker sets have been used in different studies, it has not been possible to formally compare genetic variation datasets in terms of their levels of similarity with geography. In this study, using genome-wide SNP data from 128 populations worldwide, we perform a systematic analysis to quantitatively evaluate the similarity of genes and geography in different geographic regions. For each of a series of regions, we apply a Procrustes analysis approach to find an optimal transformation that maximizes the similarity between PCA maps of genetic variation and geographic maps of population locations. We consider examples in Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, as well as in a worldwide sample, finding that significant similarity between genes and geography exists in general at different geographic levels. The similarity is highest in our examples for Asia and, once highly distinctive populations have been removed, Sub-Saharan Africa. Our results provide a quantitative assessment of the geographic structure of human genetic variation worldwide, supporting the view that geography plays a strong role in giving rise to human population structure.
Zdroje
1. SokalRR, OdenNL, WilsonC (1991) Genetic evidence for the spread of agriculture in Europe by demic diffusion. Nature 351: 143–145.
2. Cavalli-Sforza LL, Menozzi P, Piazza A (1994) The History and Geography of Human Genes. Princeton: Princeton University Press.
3. BarbujaniG (2000) Geographic patterns: how to identify them and why. Hum Biol 72: 133–153.
4. Cavalli-SforzaLL, FeldmanMW (2003) The application of molecular genetic approaches to the study of human evolution. Nat Genet 33(Suppl):266–275.
5. NovembreJ, RamachandranS (2011) Perspectives on human population structure at the cusp of the sequencing era. Annu Rev Genomics Hum Genet 12: 245–274.
6. RamachandranS, DeshpandeO, RosemanCC, RosenbergNA, FeldmanMW, et al. (2005) Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci USA 102: 15942–15947.
7. LiJZ, AbsherDM, TangH, SouthwickAM, CastoAM, et al. (2008) Worldwide human relationships inferred from genome-wide patterns of variation. Science 319: 1100–1104.
8. JakobssonM, ScholzSW, ScheetP, GibbsJR, VanLiereJM, et al. (2008) Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451: 998–1003.
9. NovembreJ, JohnsonT, BrycK, KutalikZ, BoykoAR, et al. (2008) Genes mirror geography within Europe. Nature 456: 98–101.
10. BiswasS, ScheinfeldtLB, AkeyJM (2009) Genome-wide insights into the patterns and determinants of fine-scale population structure in humans. Am J Hum Genet 84: 641–650.
11. MenozziP, PiazzaA, Cavalli-SforzaL (1978) Synthetic maps of human gene frequencies in Europeans. Science 201: 786–792.
12. PattersonN, PriceAL, ReichD (2006) Population structure and eigenanalysis. PLoS Genet 2: e190 doi:10.1371/journal.pgen.0020190.
13. Cox TF, Cox MAA (2001) Multidimensional Scaling. Boca Raton: Chapman & Hall, 2nd edition.
14. PaschouP, ZivE, BurchardEG, ChoudhryS, Rodriguez-CintronW, et al. (2007) PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genet 3: e160 doi:10.1371/journal.pgen.0030160.
15. WangC, SzpiechZA, DegnanJH, JakobssonM, PembertonTJ, et al. (2010) Comparing spatial maps of human population-genetic variation using Procrustes analysis. Stat Appl Genet Mol Biol 9: Article 13.
16. LaoO, LuTT, NothnagelM, JungeO, Freitag-WolfS, et al. (2008) Correlation between genetic and geographic structure in Europe. Curr Biol 18: 1241–1248.
17. HeathSC, GutIG, BrennanP, McKayJD, BenckoV, et al. (2008) Investigation of the fine structure of European populations with applications to disease association studies. Eur J Hum Genet 16: 1413–1429.
18. JakkulaE, RehnströmK, VariloT, PietiläinenOPH, PaunioT, et al. (2008) The genome-wide patterns of variation expose significant substructure in a founder population. Am J Hum Genet 83: 787–794.
19. HoggartCJ, O'ReillyPF, KaakinenM, ZhangW, ChambersJC, et al. (2012) Fine-scale estimation of location of birth from genome-wide single-nucleotide polymorphism data. Genetics 190: 669–677.
20. PriceAL, HelgasonA, PalssonS, StefanssonH, St ClairD, et al. (2009) The impact of divergence time on the nature of population structure: an example from Iceland. PLoS Genet 5: e1000505 doi:10.1371/journal.pgen.1000505.
21. SalmelaE, LappalainenT, LiuJ, SistonenP, AndersenPM, et al. (2011) Swedish population substructure revealed by genome-wide single nucleotide polymorphism data. PLoS ONE 6: e16747 doi:10.1371/journal.pone.0016747.
22. XingJ, WatkinsWS, WitherspoonDJ, ZhangY, GutherySL, et al. (2009) Fine-scaled human genetic structure revealed by SNP microarrays. Genome Res 19: 815–825.
23. XingJ, WatkinsWS, ShlienA, WalkerE, HuffCD, et al. (2010) Toward a more uniform sampling of human genetic diversity: a survey of worldwide populations by high-density genotyping. Genomics 96: 199–210.
24. The HUGO Pan-Asian SNP Consortium (2009) Mapping human genetic diversity in Asia. Science 326: 1541–1545.
25. TianC, KosoyR, LeeA, RansomM, BelmontJW, et al. (2008) Analysis of East Asia genetic substructure using genome-wide SNP arrays. PLoS ONE 3: e3862 doi:10.1371/journal.pone.0003862.
26. BrycK, AutonA, NelsonMR, OksenbergJR, HauserSL, et al. (2010) Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proc Natl Acad Sci USA 107: 786–791.
27. SikoraM, LaayouniH, CalafellF, ComasD, BertranpetitJ (2011) A genomic analysis identifies a novel component in the genetic structure of sub-Saharan African populations. Eur J Hum Genet 19: 84–88.
28. ChenJ, ZhengH, BeiJX, SunL, JiaWH, et al. (2009) Genetic structure of the Han Chinese population revealed by genome-wide SNP variation. Am J Hum Genet 85: 775–785.
29. XuS, YinX, LiS, JinW, LouH, et al. (2009) Genomic dissection of population substructure of Han Chinese and its implication in association studies. Am J Hum Genet 85: 762–774.
30. Yamaguchi-KabataY, NakazonoK, TakahashiA, SaitoS, HosonoN, et al. (2008) Japanese population structure, based on SNP genotypes from 7003 individuals compared to other ethnic groups: effects on population-based association studies. Am J Hum Genet 83: 445–456.
31. PembertonTJ, AbsherD, FeldmanMW, MyersRM, RosenbergNA, et al. Genomic patterns of homozygosity in worldwide human populations. Am J Hum Genet (in press).
32. SimonsonT, YangY, HuffCD, YunH, QinG, et al. (2010) Genetic evidence for high-altitude adaptation in Tibet. Science 329: 72–75.
33. The International HapMap 3 Consortium (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467: 52–58.
34. AutonA, BrycK, BoykoAR, LohmuellerKE, NovembreJ, et al. (2009) Global distribution of genomic diversity underscores rich complex history of continental human populations. Genome Res 19: 795–803.
35. BowcockAM, Ruiz-LinaresA, TomfohrdeJ, MinchE, KiddJR, et al. (1994) High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368: 455–457.
36. RosenbergNA, PritchardJK, WeberJL, CannHM, KiddKK, et al. (2002) Genetic structure of human populations. Science 298: 2381–2385.
37. TishkoffSA, ReedFA, FriedlaenderFR, EhretC, RanciaroA, et al. (2009) The genetic structure and history of Africans and African Americans. Science 324: 1035–1044.
38. HennBM, GignouxCR, JobinM, GrankaJM, MacphersonJM, et al. (2011) Hunter-gatherer genomic diversity suggests a southern African origin for modern humans. Proc Natl Acad Sci USA 108: 5154–5162.
39. Bregel Y (2003) An Historical Atlas of Central Asia. Boston: Brill.
40. Du R, Yip VF (1993) Ethnic Groups in China. Beijing: Science Press.
41. PowellGT, YangH, Tyler-SmithC, XueY (2007) The population history of the Xibe in northern China: a comparison of autosomal, mtDNA and Y-chromosomal analyses of migration and gene ow. Forensic Sci Int Genet 1: 115–119.
42. Weir BS (1996) Genetic Data Analysis II. Sunderland, MA: Sinauer.
43. McVeanG (2009) A genealogical interpretation of principal components analysis. PLoS Genet 5: e1000686 doi:10.1371/journal.pgen.1000686.
44. NovembreJ, StephensM (2008) Interpreting principal component analyses of spatial population genetic variation. Nature Genet 40: 646–649.
45. RosenbergNA (2011) A population-genetic perspective on the similarities and differences among worldwide human populations. Hum Biol 83: 659–684.
46. EngelhardtBE, StephensM (2010) Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis. PLoS Genet 6: e1001117 doi:10.1371/journal.pgen.1001117.
47. RosenbergNA, MahajanS, RamachandranS, ZhaoC, PritchardJK, et al. (2005) Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genet 1 doi:10.1371/journal.pgen.0010070.
48. YangWY, NovembreJ, EskinE, HalperinE (2012) A model-based approach for analysis of spatial structure in genetic data. Nat Genet 44: 725–731.
49. PembertonTJ, WangC, LiJZ, RosenbergNA (2010) Inference of unexpected genetic relatedness among individuals in HapMap Phase III. Am J Hum Genet 87: 457–464.
50. NelsonMR, BrycK, KingKS, IndapA, BoykoAR, et al. (2008) The Population Reference Sample, POPRES: a resource for population, disease, and pharmacological genetics research. Am J Hum Genet 83: 347–358.
51. MailmanMD, FeoloM, JinY, KimuraM, TrykaK, et al. (2007) The NCBI dbGaP database of genotypes and phenotypes. Nat Genet 39: 1181–1186.
52. PriceAL, PattersonNJ, PlengeRM, WeinblattME, ShadickNA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet 38: 904–909.
53. Hastie T, Tibshirani R, Friedman J (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer, 2nd edition.
54. WeirBS, CockerhamCC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38: 1358–1370.
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2012 Číslo 8
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Dissecting the Gene Network of Dietary Restriction to Identify Evolutionarily Conserved Pathways and New Functional Genes
- It's All in the Timing: Too Much E2F Is a Bad Thing
- Variation of Contributes to Dog Breed Skull Diversity
- The PARN Deadenylase Targets a Discrete Set of mRNAs for Decay and Regulates Cell Motility in Mouse Myoblasts