Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples
As we move forward from the current generation of genome-wide association (GWA) studies, additional cohorts of different ancestries will be studied to increase power, fine map association signals, and generalize association results to additional populations. Knowledge of genetic ancestry as well as population substructure will become increasingly important for GWA studies in populations of unknown ancestry. Here we propose genotyping pooled DNA samples using genome-wide SNP arrays as a viable option to efficiently and inexpensively estimate admixture proportion and identify ancestry informative markers (AIMs) in populations of unknown origin. We constructed DNA pools from African American, Native Hawaiian, Latina, and Jamaican samples and genotyped them using the Affymetrix 6.0 array. Aided by individual genotype data from the African American cohort, we established quality control filters to remove poorly performing SNPs and estimated allele frequencies for the remaining SNPs in each panel. We then applied a regression-based method to estimate the proportion of admixture in each cohort using the allele frequencies estimated from pooling and populations from the International HapMap Consortium as reference panels, and identified AIMs unique to each population. In this study, we demonstrated that genotyping pooled DNA samples yields estimates of admixture proportion that are both consistent with our knowledge of population history and similar to those obtained by genotyping known AIMs. Furthermore, through validation by individual genotyping, we demonstrated that pooling is quite effective for identifying SNPs with large allele frequency differences (i.e., AIMs) and that these AIMs are able to differentiate two closely related populations (HapMap JPT and CHB).
Vyšlo v časopise:
Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples. PLoS Genet 6(3): e32767. doi:10.1371/journal.pgen.1000866
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1000866
Souhrn
As we move forward from the current generation of genome-wide association (GWA) studies, additional cohorts of different ancestries will be studied to increase power, fine map association signals, and generalize association results to additional populations. Knowledge of genetic ancestry as well as population substructure will become increasingly important for GWA studies in populations of unknown ancestry. Here we propose genotyping pooled DNA samples using genome-wide SNP arrays as a viable option to efficiently and inexpensively estimate admixture proportion and identify ancestry informative markers (AIMs) in populations of unknown origin. We constructed DNA pools from African American, Native Hawaiian, Latina, and Jamaican samples and genotyped them using the Affymetrix 6.0 array. Aided by individual genotype data from the African American cohort, we established quality control filters to remove poorly performing SNPs and estimated allele frequencies for the remaining SNPs in each panel. We then applied a regression-based method to estimate the proportion of admixture in each cohort using the allele frequencies estimated from pooling and populations from the International HapMap Consortium as reference panels, and identified AIMs unique to each population. In this study, we demonstrated that genotyping pooled DNA samples yields estimates of admixture proportion that are both consistent with our knowledge of population history and similar to those obtained by genotyping known AIMs. Furthermore, through validation by individual genotyping, we demonstrated that pooling is quite effective for identifying SNPs with large allele frequency differences (i.e., AIMs) and that these AIMs are able to differentiate two closely related populations (HapMap JPT and CHB).
Zdroje
1. LiJZ
AbsherDM
TangH
SouthwickAM
CastoAM
2008 Worldwide human relationships inferred from genome-wide patterns of variation. Science 319 1100 1104
2. RosenbergNA
PritchardJK
WeberJL
CannHM
KiddKK
2002 Genetic structure of human populations. Science 298 2381 2385
3. SmithMW
O'BrienSJ
2005 Mapping by admixture linkage disequilibrium: advances, limitations and guidelines. Nat Rev Genet 6 623 632
4. PriceAL
PattersonNJ
PlengeRM
WeinblattME
ShadickNA
2006 Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38 904 909
5. DevlinB
RoederK
1999 Genomic control for association studies. Biometrics 55 997 1004
6. McCarthyMI
HirschhornJN
2008 Genome-wide association studies: past, present and future. Hum Mol Genet 17 R100 101
7. CampbellCD
OgburnEL
LunettaKL
LyonHN
FreedmanML
2005 Demonstrating stratification in a European American population. Nat Genet 37 868 872
8. PriceAL
ButlerJ
PattersonN
CapelliC
PascaliVL
2008 Discerning the ancestry of European Americans in genetic association studies. PLoS Genet 4 e236 doi:10.1371/journal.pgen.0030236
9. ZhuX
ZhangS
ZhaoH
CooperRS
2002 Association mapping, using a mixture model for complex traits. Genet Epidemiol 23 181 196
10. EgyudMR
GajdosZK
ButlerJL
TischfieldS
Le MarchandL
2009 Use of weighted reference panels based on empirical estimates of ancestry for capturing untyped variation. Hum Genet 125 295 303
11. PriceAL
PattersonN
YuF
CoxDR
WaliszewskaA
2007 A genomewide admixture map for Latino populations. Am J Hum Genet 80 1024 1036
12. SeldinMF
ShigetaR
VillosladaP
SelmiC
TuomilehtoJ
2006 European population substructure: clustering of northern and southern populations. PLoS Genet 2 e143 doi:10.1371/journal.pgen.0020143
13. PritchardJK
StephensM
DonnellyP
2000 Inference of population structure using multilocus genotype data. Genetics 155 945 959
14. PattersonN
PriceAL
ReichD
2006 Population structure and eigenanalysis. PLoS Genet 2 e190 doi:10.1371/journal.pgen.0020190
15. GajdosZK
ButlerJL
HendersonKD
HeC
SupelakPJ
2008 Association studies of common variants in 10 hypogonadotropic hypogonadism genes with age at menarche. J Clin Endocrinol Metab 93 4290 4298
16. ZhuX
LiS
CooperRS
ElstonRC
2008 A unified association analysis approach for family and unrelated samples correcting for stratification. Am J Hum Genet 82 352 365
17. FrazerKA
BallingerDG
CoxDR
HindsDA
StuveLL
2007 A second generation human haplotype map of over 3.1 million SNPs. Nature 449 851 861
18. ShamP
BaderJS
CraigI
O'DonovanM
OwenM
2002 DNA Pooling: a tool for large-scale association studies. Nat Rev Genet 3 862 871
19. MacgregorS
ZhaoZZ
HendersA
NicholasMG
MontgomeryGW
2008 Highly cost-efficient genome-wide association studies using DNA pools and dense SNP arrays. Nucleic Acids Res 36 e35
20. PearsonJV
HuentelmanMJ
HalperinRF
TembeWD
MelquistS
2007 Identification of the genetic basis for complex disorders by use of pooling-based genomewide single-nucleotide-polymorphism association studies. Am J Hum Genet 80 126 139
21. VisscherPM
Le HellardS
2003 Simple method to analyze SNP-based association studies using DNA pools. Genet Epidemiol 24 291 296
22. HomerN
TembeWD
SzelingerS
RedmanM
StephanDA
2008 Multimarker analysis and imputation of multiple platform pooling-based genome-wide association studies. Bioinformatics 24 1896 1902
23. DochertySJ
ButcherLM
SchalkwykLC
PlominR
2007 Applicability of DNA pools on 500 K SNP microarrays for cost-effective initial screens in genomewide association studies. BMC Genomics 8 214
24. YangHC
HuangMC
LiLH
LinCH
YuAL
2008 MPDA: microarray pooled DNA analyzer. BMC Bioinformatics 9 196
25. ZhangH
YangHC
YangY
2008 PoooL: an efficient method for estimating haplotype frequencies from large DNA pools. Bioinformatics 24 1942 1948
26. MeaburnE
ButcherLM
SchalkwykLC
PlominR
2006 Genotyping pooled DNA using 100K SNP microarrays: a step towards genomewide association scans. Nucleic Acids Res 34 e27
27. SchrauwenI
EalyM
HuentelmanMJ
ThysM
HomerN
2009 A genome-wide analysis identifies genetic variants in the RELN gene associated with otosclerosis. Am J Hum Genet 84 328 338
28. HansonRL
CraigDW
MillisMP
YeattsKA
KobesS
2007 Identification of PVT1 as a candidate gene for end-stage renal disease in type 2 diabetes using a pooling-based genome-wide single nucleotide polymorphism association study. Diabetes 56 975 983
29. CargillM
SchrodiSJ
ChangM
GarciaVE
BrandonR
2007 A large-scale genetic association study confirms IL12B and leads to the identification of IL23R as psoriasis-risk genes. Am J Hum Genet 80 273 290
30. BrownKM
MacgregorS
MontgomeryGW
CraigDW
ZhaoZZ
2008 Common sequence variants on 20q11.22 confer melanoma susceptibility. Nat Genet 40 838 840
31. IlesMM
2008 What can genome-wide association studies tell us about the genetics of common disease? PLoS Genet 4 e33 doi:10.1371/journal.pgen.0040033
32. KeatingBJ
TischfieldS
MurraySS
BhangaleT
PriceTS
2008 Concept, design and implementation of a cardiovascular gene-centric 50 k SNP array for large-scale genomic association studies. PLoS ONE 3 e3583 doi:10.1371/journal.pone.0003583
33. ParraEJ
MarciniA
AkeyJ
MartinsonJ
BatzerMA
1998 Estimating African American admixture proportions by use of population-specific alleles. Am J Hum Genet 63 1839 1851
34. ZhuX
LukeA
CooperRS
QuertermousT
HanisC
2005 Admixture mapping for hypertension loci with genome-scan markers. Nat Genet 37 177 181
35. ShriverMD
KittlesRA
2004 Genetic ancestry and the search for personalized genetic histories. Nat Rev Genet 5 611 618
36. FriedlaenderJS
FriedlaenderFR
ReedFA
KiddKK
KiddJR
2008 The genetic structure of Pacific Islanders. PLoS Genet 4 e19 doi:10.1371/journal.pgen.0040019
37. McLeanDCJr
SpruillI
GevaoS
MorrisonEY
BernardOS
2003 Three novel mtDNA restriction site polymorphisms allow exploration of population affinities of African Americans. Hum Biol 75 147 161
38. HomerN
SzelingerS
RedmanM
DugganD
TembeW
2008 Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet 4 e1000167 doi:10.1371/journal.pgen.1000167
39. KolonelLN
HendersonBE
HankinJH
NomuraAM
WilkensLR
2000 A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. Am J Epidemiol 151 346 357
40. KornJM
KuruvillaFG
McCarrollSA
WysokerA
NemeshJ
2008 Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet 40 1253 1260
41. SimpsonCL
KnightJ
ButcherLM
HansenVK
MeaburnE
2005 A central resource for accurate allele frequency estimation from pooled DNA genotyped on DNA microarrays. Nucleic Acids Res 33 e25
42. McCarrollSA
KuruvillaFG
KornJM
CawleyS
NemeshJ
2008 Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 40 1166 1174
43. MacgregorS
VisscherPM
MontgomeryG
2006 Analysis of pooled DNA samples on high density arrays without prior knowledge of differential hybridization rates. Nucleic Acids Res 34 e55
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2010 Číslo 3
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
Najčítanejšie v tomto čísle
- Deciphering Normal Blood Gene Expression Variation—The NOWAC Postgenome Study
- Papillorenal Syndrome-Causing Missense Mutations in / Result in Hypomorphic Alleles in Mouse and Human
- Fatal Cardiac Arrhythmia and Long-QT Syndrome in a New Form of Congenital Generalized Lipodystrophy with Muscle Rippling (CGL4) Due to Mutations
- HAP2(GCS1)-Dependent Gamete Fusion Requires a Positively Charged Carboxy-Terminal Domain