Statistical correction of the Winner’s Curse explains replication variability in quantitative trait genome-wide association studies
The majority of associations between common genetic variation and human traits come from genome-wide association studies, which have analyzed millions of single-nucleotide polymorphisms in millions of samples. These kinds of studies pose serious statistical challenges to discovering new associations. Finite resources restrict the number of candidate associations that can brought forward into validation samples, introducing the need for a significance threshold. This threshold creates a phenomenon called the Winner’s Curse, in which candidate associations close to the discovery threshold are more likely to have biased overestimates of the variant’s true association in the sampled population. We survey all human quantitative trait association studies that validated at least one signal. We find the majority of these studies do not publish sufficient information to actually support their claims of replication. For studies that did, we computationally correct the Winner’s Curse and evaluate replication performance. While all variants combined replicate significantly less than expected, we find that the subset of studies that (1) perform both discovery and replication in samples of the same ancestry; and (2) report accurate per-variant sample sizes, replicate as expected. This study provides strong, rigorous evidence for the broad reliability of genome-wide association studies. We furthermore provide a model for more efficient selection of variants as candidates for replication, as selecting variants using cursed discovery data enriches for variants with little real evidence for trait association.
Vyšlo v časopise:
Statistical correction of the Winner’s Curse explains replication variability in quantitative trait genome-wide association studies. PLoS Genet 13(7): e32767. doi:10.1371/journal.pgen.1006916
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1006916
Souhrn
The majority of associations between common genetic variation and human traits come from genome-wide association studies, which have analyzed millions of single-nucleotide polymorphisms in millions of samples. These kinds of studies pose serious statistical challenges to discovering new associations. Finite resources restrict the number of candidate associations that can brought forward into validation samples, introducing the need for a significance threshold. This threshold creates a phenomenon called the Winner’s Curse, in which candidate associations close to the discovery threshold are more likely to have biased overestimates of the variant’s true association in the sampled population. We survey all human quantitative trait association studies that validated at least one signal. We find the majority of these studies do not publish sufficient information to actually support their claims of replication. For studies that did, we computationally correct the Winner’s Curse and evaluate replication performance. While all variants combined replicate significantly less than expected, we find that the subset of studies that (1) perform both discovery and replication in samples of the same ancestry; and (2) report accurate per-variant sample sizes, replicate as expected. This study provides strong, rigorous evidence for the broad reliability of genome-wide association studies. We furthermore provide a model for more efficient selection of variants as candidates for replication, as selecting variants using cursed discovery data enriches for variants with little real evidence for trait association.
Zdroje
1. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42(Database issue):D1001–1006. doi: 10.1093/nar/gkt1229 24316577
2. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42(7):565–569. doi: 10.1038/ng.608 20562875
3. Zhong H, Prentice RL. Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics. 2008;9(4):621–634. doi: 10.1093/biostatistics/kxn001 18310059
4. Sun L, Dimitromanolakis A, Faye LL, Paterson AD, Waggott D, Bull SB. BR-squared: a practical solution to the winner’s curse in genome-wide scans. Hum Genet. 2011;129(5):545–552. doi: 10.1007/s00439-011-0948-2 21246217
5. Zhong H, Prentice RL. Correcting “winner’s curse” in odds ratios from genomewide association findings for major complex human diseases. Genet Epidemiol. 2010;34(1):78–91. doi: 10.1002/gepi.20437 19639606
6. Xiao R, Boehnke M. Quantifying and correcting for the winner’s curse in genetic association studies. Genet Epidemiol. 2009;33(5):453–462. doi: 10.1002/gepi.20398 19140131
7. Xiao R, Boehnke M. Quantifying and correcting for the winner’s curse in quantitative-trait association studies. Genet Epidemiol. 2011;35(3):133–138. doi: 10.1002/gepi.20551 21284035
8. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449(7164):851–861. doi: 10.1038/nature06258 17943122
9. Ntzani EE, Liberopoulos G, Manolio TA, Ioannidis JP. Consistency of genome-wide associations across major ancestral groups. Hum Genet. 2012;131(7):1057–1071. doi: 10.1007/s00439-011-1124-4 22183176
10. Burdett T, Hall P, Hastings E, Hindorff L, Junkins H, Klemm A, et al.. The NHGRI-EBI Catalog of published genome-wide association studies;. Available from: www.ebi.ac.uk/gwas.
11. Parra EJ, Marcini A, Akey J, Martinson J, Batzer MA, Cooper R, et al. Estimating African American admixture proportions by use of population-specific alleles. Am J Hum Genet. 1998;63(6):1839–1851. doi: 10.1086/302148 9837836
12. United Nations, Department of Economic and Social Affairs, Population Division. World Population Prospects: The 2015 Revision; 2015. Volume II: Demographic Profiles.
13. Colby SL, Ortman JM. Projections of the Size and Composition of the U.S. Population: 2014 to 2060. Washington, DC: U.S. Census Bureau; 2014. P25–1143.
14. Ioannidis JPA, Trikalinos TA. Early extreme contradictory estimates may appear in published research: the Proteus phenomenon in molecular genetics research and randomized trials. J Clin Epidemiol. 2005;58(6):543–9. doi: 10.1016/j.jclinepi.2004.10.019 15878467
15. Dwan K, Gamble C, Williamson PR, Kirkham JJ, “Reporting Bias Group”. Systematic review of the empirical evidence of study publication bias and outcome reporting bias—an updated review. PLoS ONE. 2013;8(7):e66844. doi: 10.1371/journal.pone.0066844 23861749
16. Littner Y, Mimouni FB, Dollberg S, Mandel D. Negative results and impact factor: a lesson from neonatology. Arch Pediatr Adolesc Med. 2005;159(11):1036–7. doi: 10.1001/archpedi.159.11.1036 16275793
17. Skol AD, Scott LJ, Abecasis GR, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet. 2006;38(2):209–213. doi: 10.1038/ng1706 16415888
18. Ferguson JP, Cho JH, Yang C, Zhao H. Empirical Bayes correction for the Winner’s Curse in genetic association studies. Genet Epidemiol. 2013;37(1):60–68. doi: 10.1002/gepi.21683 23012258
19. Palmer C. GWAS Winner’s Curse Correction Tool; 2016. https://github.com/cpalmer718/gwas-winners-curse.
20. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–2191. doi: 10.1093/bioinformatics/btq340 20616382
21. Hankin RKS. Special functions in R: introducing the gsl package. R News. 2006;6(4).
22. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–575. doi: 10.1086/519795 17701901
23. Sham PC, Purcell SM. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet. 2014;15(5):335–346. doi: 10.1038/nrg3706 24739678
24. Hong Y. On computing the distribution function for the Poisson binomial distribution. Computational Statistics & Data Analysis. 2013;59:41–51. doi: 10.1016/j.csda.2012.10.006
25. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucl Acids Res. 2010;38(16):e164–e164. doi: 10.1093/nar/gkq603 20601685
26. Wolfe D, Dudek S, Ritchie MD, Pendergrass SA. Visualizing genomic information across chromosomes with PhenoGram. BioData Min. 2013;6(1):18. doi: 10.1186/1756-0381-6-18 24131735
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2017 Číslo 7
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Genetic compensation: A phenomenon in search of mechanisms
- The gene drive bubble: New realities
- Statistical correction of the Winner’s Curse explains replication variability in quantitative trait genome-wide association studies