Inference of Relationships in Population Data Using Identity-by-Descent and Identity-by-State
It is an assumption of large, population-based datasets that samples are annotated accurately whether they correspond to known relationships or unrelated individuals. These annotations are key for a broad range of genetics applications. While many methods are available to assess relatedness that involve estimates of identity-by-descent (IBD) and/or identity-by-state (IBS) allele-sharing proportions, we developed a novel approach that estimates IBD0, 1, and 2 based on observed IBS within windows. When combined with genome-wide IBS information, it provides an intuitive and practical graphical approach with the capacity to analyze datasets with thousands of samples without prior information about relatedness between individuals or haplotypes. We applied the method to a commonly used Human Variation Panel consisting of 400 nominally unrelated individuals. Surprisingly, we identified identical, parent-child, and full-sibling relationships and reconstructed pedigrees. In two instances non-sibling pairs of individuals in these pedigrees had unexpected IBD2 levels, as well as multiple regions of homozygosity, implying inbreeding. This combined method allowed us to distinguish related individuals from those having atypical heterozygosity rates and determine which individuals were outliers with respect to their designated population. Additionally, it becomes increasingly difficult to identify distant relatedness using genome-wide IBS methods alone. However, our IBD method further identified distant relatedness between individuals within populations, supported by the presence of megabase-scale regions lacking IBS0 across individual chromosomes. We benchmarked our approach against the hidden Markov model of a leading software package (PLINK), showing improved calling of distantly related individuals, and we validated it using a known pedigree from a clinical study. The application of this approach could improve genome-wide association, linkage, heterozygosity, and other population genomics studies that rely on SNP genotype data.
Vyšlo v časopise:
Inference of Relationships in Population Data Using Identity-by-Descent and Identity-by-State. PLoS Genet 7(9): e32767. doi:10.1371/journal.pgen.1002287
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1002287
Souhrn
It is an assumption of large, population-based datasets that samples are annotated accurately whether they correspond to known relationships or unrelated individuals. These annotations are key for a broad range of genetics applications. While many methods are available to assess relatedness that involve estimates of identity-by-descent (IBD) and/or identity-by-state (IBS) allele-sharing proportions, we developed a novel approach that estimates IBD0, 1, and 2 based on observed IBS within windows. When combined with genome-wide IBS information, it provides an intuitive and practical graphical approach with the capacity to analyze datasets with thousands of samples without prior information about relatedness between individuals or haplotypes. We applied the method to a commonly used Human Variation Panel consisting of 400 nominally unrelated individuals. Surprisingly, we identified identical, parent-child, and full-sibling relationships and reconstructed pedigrees. In two instances non-sibling pairs of individuals in these pedigrees had unexpected IBD2 levels, as well as multiple regions of homozygosity, implying inbreeding. This combined method allowed us to distinguish related individuals from those having atypical heterozygosity rates and determine which individuals were outliers with respect to their designated population. Additionally, it becomes increasingly difficult to identify distant relatedness using genome-wide IBS methods alone. However, our IBD method further identified distant relatedness between individuals within populations, supported by the presence of megabase-scale regions lacking IBS0 across individual chromosomes. We benchmarked our approach against the hidden Markov model of a leading software package (PLINK), showing improved calling of distantly related individuals, and we validated it using a known pedigree from a clinical study. The application of this approach could improve genome-wide association, linkage, heterozygosity, and other population genomics studies that rely on SNP genotype data.
Zdroje
1. ManolioTABrooksLDCollinsFS 2008 A HapMap harvest of insights into the genetics of common disease. J Clin Invest 118 1590 1605
2. BishopDTWilliamsonJA 1990 The power of identity-by-state methods for linkage analysis. Am J Hum Genet 46 254 265
3. LeeW 2003 Testing the genetic relation between two individuals using a panel of frequency-unknown single nucleotide polymorphisms. Ann Hum Genet 618 619
4. RosenbergNA 2006 Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives. Ann Hum Genet 70 841 847
5. CottermanC 1940 A calculus for statistico-genetics: Ohio State University
6. PurcellSNealeBTodd-BrownKThomasLFerreiraMA 2007 PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81 559 575
7. BrowningBLBrowningSR 2011 A fast, powerful method for detecting identity by descent. Am J Hum Genet 88 173 182
8. BrowningSRBrowningBL 2010 High-resolution detection of identity by descent in unrelated individuals. Am J Hum Genet 86 526 539
9. GusevALoweJKStoffelMDalyMJAltshulerD 2009 Whole population, genome-wide mapping of hidden relatedness. Genome Res 19 318 326
10. ConsortiumIH 2005 A haplotype map of the human genome. Nature 437 1299 1320
11. DurbinRMAbecasisGRAltshulerDLAutonABrooksLD 2010 A map of human genome variation from population-scale sequencing. Nature 467 1061 1073
12. FrazerKABallingerDGCoxDRHindsDAStuveLL 2007 A second generation human haplotype map of over 3.1 million SNPs. Nature 449 851 861
13. GabrielSBSchaffnerSFNguyenHMooreJMRoyJ 2002 The structure of haplotype blocks in the human genome. Science 296 2225 2229
14. KangHMSulJHServiceSKZaitlenNAKongSY 2010 Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42 348 354
15. RobersonEDPevsnerJ 2009 Visualization of shared genomic regions and meiotic recombination in high-density SNP data. PLoS ONE 4 e6711 doi:10.1371/journal.pone.0006711
16. PLINK website: http://pngu.mgh.harvard.edu/purcell/plink/
17. SobreiraNLCirulliETAvramopoulosDWohlerEOswaldGL 2010 Whole-genome sequencing of a single proband together with linkage analysis identifies a Mendelian disease gene. PLoS Genet 6 e1000991 doi:10.1371/journal.pgen.1000991
18. PembertonTJWangCLiJZRosenbergNA 2010 Inference of unexpected genetic relatedness among individuals in HapMap Phase III. Am J Hum Genet 87 457 464
19. WeirBSAndersonADHeplerAB 2006 Genetic relatedness analysis: modern data and new challenges. Nat Rev Genet 7 771 780
20. GaoXMartinER 2009 Using allele sharing distance for detecting human population stratification. Hum Hered 68 182 191
21. ShamPCChernySSPurcellS 2009 Application of genome-wide SNP data for uncovering pairwise relationships and quantitative trait loci. Genetica 136 237 243
22. JolliffeIT 2002 Principal Component Analysis: Springer
23. XuSYinXLiSJinWLouH 2009 Genomic dissection of population substructure of Han Chinese and its implication in association studies. Am J Hum Genet 85 762 774
24. ChenJZhengHBeiJXSunLJiaWH 2009 Genetic structure of the Han Chinese population revealed by genome-wide SNP variation. Am J Hum Genet 85 775 785
25. LaoOLuTTNothnagelMJungeOFreitag-WolfS 2008 Correlation between genetic and geographic structure in Europe. Curr Biol 18 1241 1248
26. LundmarkPELiljedahlUBoomsmaDIMannilaHMartinNG 2008 Evaluation of HapMap data in six populations of European descent. Eur J Hum Genet 16 1142 1150
27. NovembreJJohnsonTBrycKKutalikZBoykoAR 2008 Genes mirror geography within Europe. Nature 456 98 101
28. OlshenABGoldBLohmuellerKEStruewingJPSatagopanJ 2008 Analysis of genetic variation in Ashkenazi Jews by high density SNP genotyping. BMC Genet 9 14
29. BrycKAutonANelsonMROksenbergJRHauserSL 2010 Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proc Natl Acad Sci U S A 107 786 791
30. AbdullaMAAhmedIAssawamakinABhakJBrahmachariSK 2009 Mapping human genetic diversity in Asia. Science 326 1541 1545
31. TianCKosoyRLeeARansomMBelmontJW 2008 Analysis of East Asia genetic substructure using genome-wide SNP arrays. PLoS ONE 3 e3862 doi:10.1371/journal.pone.0003862
32. ReichDThangarajKPattersonNPriceALSinghL 2009 Reconstructing Indian population history. Nature 461 489 494
33. McVeanG 2009 A genealogical interpretation of principal components analysis. PLoS Genet 5 e1000686 doi:10.1371/journal.pgen.1000686
34. ChiangCWGajdosZKKornJMKuruvillaFGButlerJL 0866 Rapid assessment of genetic ancestry in populations of unknown origin by genome-wide genotyping of pooled samples. PLoS Genet 6 e1000866 doi:10.1371/journal.pgen.1000866
35. GutenkunstRNHernandezRDWilliamsonSHBustamanteCD 2009 Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet 5 e1000695 doi:10.1371/journal.pgen.1000695
36. LohmuellerKEBustamanteCDClarkAG 2010 The Effect of Recent Admixture on Inference of Ancient Human Population History. Genetics doi:10.1534/genetics.109.113761
37. CoopGWenXOberCPritchardJKPrzeworskiM 2008 High-resolution mapping of crossovers reveals extensive variation in fine-scale recombination patterns among humans. Science 319 1395 1398
38. AbecasisGRChernySSCooksonWOCardonLR 2001 GRR: graphical representation of relationship errors. Bioinformatics 17 742 743
39. ClarkAGHubiszMJBustamanteCDWilliamsonSHNielsenR 2005 Ascertainment bias in studies of human genome-wide polymorphism. Genome Res 15 1496 1502
40. Pevsner lab website: http://pevsnerlab.kennedykrieger.org/
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2011 Číslo 9
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- The Evolutionarily Conserved Longevity Determinants HCF-1 and SIR-2.1/SIRT1 Collaborate to Regulate DAF-16/FOXO
- Genome-Wide Analysis of Heteroduplex DNA in Mismatch Repair–Deficient Yeast Cells Reveals Novel Properties of Meiotic Recombination Pathways
- Association of eGFR-Related Loci Identified by GWAS with Incident CKD and ESRD
- MicroRNA Predictors of Longevity in