Low Frequency Variants, Collapsed Based on Biological Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes Project Data
Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses.
Vyšlo v časopise:
Low Frequency Variants, Collapsed Based on Biological Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes Project Data. PLoS Genet 9(12): e32767. doi:10.1371/journal.pgen.1003959
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1003959
Souhrn
Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses.
Zdroje
1. ManolioTA, CollinsFS, CoxNJ, GoldsteinDB, HindorffLA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747–753 doi:[];[]nature08494 [pii];10.1038/nature08494 [doi]
2. NelsonMR, WegmannD, EhmMG, KessnerD, JeanPS, et al. (2012) An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People. Science 337: 100–104 doi:10.1126/science.1217876
3. TennessenJA, BighamAW, O'ConnorTD, FuW, KennyEE, et al. (2012) Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes. Science 337: 64–69 doi:10.1126/science.1219240
4. MesserPW (2009) Measuring the Rates of Spontaneous Mutation From Deep and Large-Scale Polymorphism Data. Genetics 182: 1219–1232 doi:10.1534/genetics.109.105692
5. GorlovIP, GorlovaOY, SunyaevSR, SpitzMR, AmosCI (2008) Shifting Paradigm of Association Studies: Value of Rare Single-Nucleotide Polymorphisms. The American Journal of Human Genetics 82: 100–112 doi:10.1016/j.ajhg.2007.09.006
6. CasalsF, BertranpetitJ (2012) Human Genetic Variation, Shared and Private. Science 337: 39–40 doi:10.1126/science.1224528
7. LiB, LealSM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83: 311–321 doi:[];[]S0002-9297(08)00408-4 [pii];10.1016/j.ajhg.2008.06.024 [doi]
8. MadsenBE, BrowningSR (2009) A groupwise association test for rare mutations using a weighted sum statistic. PLoSGenet 5: e1000384 doi:[]10.1371/journal.pgen.1000384 [doi]
9. HoffmannTJ, MariniNJ, WitteJS (2010) Comprehensive approach to analyzing rare genetic variants. PLoSOne 5: e13584 doi:[]10.1371/journal.pone.0013584 [doi]
10. YandellM, HuffC, HuH, SingletonM, MooreB, et al. (2011) A probabilistic disease-gene finder for personal genomes. Genome Res 21: 1529–1542 doi:[];[]gr.123158.111 [pii];10.1101/gr.123158.111 [doi]
11. Consortium T 1000 GP (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491: 56 doi:10.1038/nature11632
12. RobersonEDO, PevsnerJ (2009) Visualization of Shared Genomic Regions and Meiotic Recombination in High-Density SNP Data. PLoS ONE 4: e6711 doi:10.1371/journal.pone.0006711
13. AbecasisGR, ChernySS, CooksonWO, CardonLR (2001) GRR: graphical representation of relationship errors. Bioinformatics 17: 742–743.
14. FujitaPA, RheadB, ZweigAS, HinrichsAS, KarolchikD, et al. (2010) The UCSC Genome Browser database: update 2011. Nucl Acids Res 39(Database issue): D876–82 Available: http://nar.oxfordjournals.org/content/early/2010/10/18/nar.gkq963. Accessed 31 July 2012.
15. McLarenW, PritchardB, RiosD, ChenY, FlicekP, et al. (2010) Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26: 2069–2070 doi:10.1093/bioinformatics/btq330
16. KumarP, HenikoffS, NgPC (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4: 1073–1081 doi:10.1038/nprot.2009.86
17. AdzhubeiIA, SchmidtS, PeshkinL, RamenskyVE, GerasimovaA, et al. (2010) A method and server for predicting damaging missense mutations. Nat Methods 7: 248–249 doi:10.1038/nmeth0410-248
18. GrossmanSR, AndersenKG, ShlyakhterI, TabriziS, WinnickiS, et al. (2013) Identifying Recent Adaptations in Large-Scale Genomic Data. Cell 152: 703–713 doi:10.1016/j.cell.2013.01.035
19. BarreiroLB, LavalG, QuachH, PatinE, Quintana-MurciL (2008) Natural selection has driven population differentiation in modern humans. Nature Genetics 40: 340 doi:10.1038/ng.78
20. LiB, LealSM (2009) Discovery of rare variants via sequencing: implications for the design of complex trait association studies. PLoS Genet 5: e1000481 doi:10.1371/journal.pgen.1000481
21. RosenfeldJA, MasonCE, SmithTM (2012) Limitations of the human reference genome for personalized genomics. PLoS ONE 7: e40294 doi:10.1371/journal.pone.0040294
22. LiR, LiY, ZhengH, LuoR, ZhuH, et al. (2010) Building the sequence map of the human pan-genome. Nature Biotechnology 28: 57–63 doi:10.1038/nbt.1596
23. KiddJM, SampasN, AntonacciF, GravesT, FultonR, et al. (2010) Characterization of missing human genome sequences and copy-number polymorphic insertions. Nature Methods 7: 365–371 doi:10.1038/nmeth.1451
24. PembertonTJ, WangC, LiJZ, RosenbergNA (2010) Inference of Unexpected Genetic Relatedness among Individuals in HapMap Phase III. Am J Hum Genet 87: 457–464 doi:10.1016/j.ajhg.2010.08.014
25. Nembot-Simo AJ, McNeney JG and B (2012) CrypticIBDcheck: Identifying cryptic relatedness in genetic association studies. Available: http://cran.r-project.org/web/packages/CrypticIBDcheck/index.html. Accessed 20 June 2013.
26. HodgkinsonA, Eyre-WalkerA (2011) Variation in the mutation rate across mammalian genomes. Nature Reviews Genetics 12: 756–766 doi:10.1038/nrg3098
27. EllegrenH, SmithNG, WebsterMT (2003) Mutation rate variation in the mammalian genome. Current Opinion in Genetics & Development 13: 562–568 doi:10.1016/j.gde.2003.10.008
28. BelezaS, SantosAM, McEvoyB, AlvesI, MartinhoC, et al. (2013) The Timing of Pigmentation Lightening in Europeans. Mol Biol Evol 30: 24–35 doi:10.1093/molbev/mss207
29. JablonskiNG, ChaplinG (2012) Human skin pigmentation, migration and disease susceptibility. Phil Trans R Soc B 367: 785–792 doi:10.1098/rstb.2011.0308
30. LamasonRL, MohideenM-APK, MestJR, WongAC, NortonHL, et al. (2005) SLC24A5, a Putative Cation Exchanger, Affects Pigmentation in Zebrafish and Humans. Science 310: 1782–1786 doi:10.1126/science.1116238
31. ENCODE Project Consortium (2012) DunhamI, KundajeA, AldredSF, CollinsPJ, et al. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57–74 doi:10.1038/nature11247
32. WrayNR, PurcellSM, VisscherPM (2011) Synthetic Associations Created by Rare Variants Do Not Explain Most GWAS Results. PLoS Biol 9: e1000579 doi:10.1371/journal.pbio.1000579
33. LiuQ, NicolaeDL, ChenLS (2013) Marbled Inflation From Population Structure in Gene-Based Association Studies With Rare Variants. Genetic Epidemiology 37: 286–292 doi:10.1002/gepi.21714
34. HeH, ZhangX, DingL, BayeTM, KurowskiBG, et al. (2011) Effect of population stratification analysis on false-positive rates for common and rare variants. BMC Proc 5 Suppl 9: S116 doi:10.1186/1753-6561-5-S9-S116
35. PriceAL, ZaitlenNA, ReichD, PattersonN (2010) New approaches to population stratification in genome-wide association studies. Nature Reviews Genetics 11: 459–463 doi:10.1038/nrg2813
36. MooreCB, WallaceJR, FraseAT, PendergrassSA, RitchieMD (2013) Using BioBin to Explore Rare Variant Population Stratification. Pacific Symposium on Biocomputing
37. SayersEW, BarrettT, BensonDA, BoltonE, BryantSH, et al. (2010) Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 39: D38–D51 doi:10.1093/nar/gkq1172
38. KanehisaM, GotoS, SatoY, FurumichiM, TanabeM (2011) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Research 40: D109–D114 doi:10.1093/nar/gkr988
39. CroftD, O'KellyG, WuG, HawR, GillespieM, et al. (2010) Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Research 39: D691–D697 doi:10.1093/nar/gkq1018
40. DimmerEC, HuntleyRP, Alam-FaruqueY, SawfordT, O'DonovanC, et al. (2011) The UniProt-GO Annotation database in 2011. Nucleic Acids Research 40: D565–D570 doi:10.1093/nar/gkr1048
41. PuntaM, CoggillPC, EberhardtRY, MistryJ, TateJ, et al. (2012) The Pfam protein families database. Nucleic Acids Research 40: D290–D301 doi:10.1093/nar/gkr1065
42. KandasamyK, MohanSS, RajuR, KeerthikumarS, KumarGSS, et al. (2010) NetPath: a public resource of curated signal transduction pathways. Genome Biol 11: R3 doi:10.1186/gb-2010-11-1-r3
43. LicataL, BrigantiL, PelusoD, PerfettoL, IannuccelliM, et al. (2012) MINT, the molecular interaction database: 2012 update. Nucleic Acids Res 40: D857–861 doi:10.1093/nar/gkr930
44. StarkC, BreitkreutzB-J, Chatr-AryamontriA, BoucherL, OughtredR, et al. (2011) The BioGRID Interaction Database: 2011 update. Nucleic Acids Res 39: D698–704 doi:10.1093/nar/gkq1116
45. McDonaghEM, Whirl-CarrilloM, GartenY, AltmanRB, KleinTE (2011) From pharmacogenomic knowledge acquisition to clinical applications: the PharmGKB as a clinical pharmacogenomic biomarker resource. Biomark Med 5: 795–806 doi:10.2217/bmm.11.94
46. GriffithOL, MontgomerySB, BernierB, ChuB, KasaianK, et al. (2007) ORegAnno: an open-access community-driven resource for regulatory annotation. Nucleic Acids Research 36: D107–D113 doi:10.1093/nar/gkm967
47. R Development Core Team (2011) R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Available: http://www.R-project.org.
48. Wickham H (2009) ggplot2: elegant graphics for data analysis. Springer New York. Available: http://had.co.nz/ggplot2/book.
49. DurbinRM, AbecasisGR, AltshulerDL, AutonA, BrooksLD, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073 doi:[];[]nature09534 [pii];10.1038/nature09534 [doi]
50. BrowningSR, ThompsonEA (2012) Detecting Rare Variant Associations by Identity-by-Descent Mapping in Case-Control Studies. Genetics 190: 1521–1531 doi:10.1534/genetics.111.136937
51. PurcellS, NealeB, Todd-BrownK, ThomasL, FerreiraMA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575 doi:[];[]S0002-9297(07)61352-4 [pii];10.1086/519795 [doi]
52. WilliamsonSH, HernandezR, Fledel-AlonA, ZhuL, NielsenR, et al. (2005) Simultaneous inference of selection and population growth from patterns of variation in the human genome. PNAS 102: 7882–7887 doi:10.1073/pnas.0502300102
53. VoightBF, KudaravalliS, WenX, PritchardJK (2006) A Map of Recent Positive Selection in the Human Genome. PLoS Biol 4: e72 doi:10.1371/journal.pbio.0040072
54. PickrellJK, CoopG, NovembreJ, KudaravalliS, LiJZ, et al. (2009) Signals of recent positive selection in a worldwide sample of human populations. Genome Res 19: 826–837 doi:10.1101/gr.087577.108
55. López HerráezD, BauchetM, TangK, TheunertC, PugachI, et al. (2009) Genetic Variation and Recent Positive Selection in Worldwide Human Populations: Evidence from Nearly 1 Million SNPs. PLoS ONE 4: e7888 doi:10.1371/journal.pone.0007888
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2013 Číslo 12
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- The NuRD Chromatin-Remodeling Enzyme CHD4 Promotes Embryonic Vascular Integrity by Transcriptionally Regulating Extracellular Matrix Proteolysis
- Comprehensive Analysis of Transcriptome Variation Uncovers Known and Novel Driver Events in T-Cell Acute Lymphoblastic Leukemia
- Quantifying Missing Heritability at Known GWAS Loci
- Smc5/6-Mms21 Prevents and Eliminates Inappropriate Recombination Intermediates in Meiosis