Abundance of ethnically biased microsatellites in human gene regions

Autoři: Nick Kinney aff001;  Lin Kang aff001;  Laurel Eckstrand aff003;  Arichanah Pulenthiran aff001;  Peter Samuel aff001;  Ramu Anandakrishnan aff001;  Robin T. Varghese aff001;  P. Michalak aff001;  Harold R. Garner aff001
Působiště autorů: Edward Via College of Osteopathic Medicine, Blacksburg, VA, United States of America aff001;  Gibbs Cancer Center & Research Institute, Spartanburg, SC, United States of America aff002;  Virginia-Maryland College of Veterinary Medicine, Blacksburg, VA, United States of America aff003;  Institute of Evolution, University of Haifa, Haifa, Israel aff004
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pone.0225216


Microsatellites–a type of short tandem repeat (STR)–have been used for decades as putatively neutral markers to study the genetic structure of diverse human populations. However, recent studies have demonstrated that some microsatellites contribute to gene expression, cis heritability, and phenotype. As a corollary, some microsatellites may contribute to differential gene expression and RNA/protein structure stability in distinct human populations. To test this hypothesis, we investigate genotype frequencies, functional relevance, and adaptive potential of microsatellites in five super-populations (ethnicities) drawn from the 1000 Genomes Project. We discover 3,984 ethnically-biased microsatellite loci (EBML); for each EBML at least one ethnicity has genotype frequencies statistically different from the remaining four. South Asian, East Asian, European, and American EBML show significant overlap; on the contrary, the set of African EBML is mostly unique. We cross-reference the 3,984 EBML with 2,060 previously identified expression STRs (eSTRs); repeats known to affect gene expression (64 total) are over-represented. The most significant pathway enrichments are those associated with the matrisome: a broad collection of genes encoding the extracellular matrix and its associated proteins. At least 14 of the EBML have established links to human disease. Analysis of the 3,984 EBML with respect to known selective sweep regions in the genome shows that allelic variation in some of them is likely associated with adaptive evolution.

Principal component analysis – Gene expression – Sequence motif analysis – Population genetics – Ethnicities – Introns – Human genomics – Contingency tables


70. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics. 2006;38(8):904–9. doi: 10.1038/ng1847 16862161.

71. Yu GC, Wang LG, Han YY, He QY. clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. Omics. 2012;16(5):284–7. WOS:000303653300007. doi: 10.1089/omi.2011.0118 22455463

