TATES: Efficient Multivariate Genotype-Phenotype Analysis for Genome-Wide Association Studies
To date, the genome-wide association study (GWAS) is the primary tool to identify genetic variants that cause phenotypic variation. As GWAS analyses are generally univariate in nature, multivariate phenotypic information is usually reduced to a single composite score. This practice often results in loss of statistical power to detect causal variants. Multivariate genotype–phenotype methods do exist but attain maximal power only in special circumstances. Here, we present a new multivariate method that we refer to as TATES (Trait-based Association Test that uses Extended Simes procedure), inspired by the GATES procedure proposed by Li et al (2011). For each component of a multivariate trait, TATES combines p-values obtained in standard univariate GWAS to acquire one trait-based p-value, while correcting for correlations between components. Extensive simulations, probing a wide variety of genotype–phenotype models, show that TATES's false positive rate is correct, and that TATES's statistical power to detect causal variants explaining 0.5% of the variance can be 2.5–9 times higher than the power of univariate tests based on composite scores and 1.5–2 times higher than the power of the standard MANOVA. Unlike other multivariate methods, TATES detects both genetic variants that are common to multiple phenotypes and genetic variants that are specific to a single phenotype, i.e. TATES provides a more complete view of the genetic architecture of complex traits. As the actual causal genotype–phenotype model is usually unknown and probably phenotypically and genetically complex, TATES, available as an open source program, constitutes a powerful new multivariate strategy that allows researchers to identify novel causal variants, while the complexity of traits is no longer a limiting factor.
Vyšlo v časopise:
TATES: Efficient Multivariate Genotype-Phenotype Analysis for Genome-Wide Association Studies. PLoS Genet 9(1): e32767. doi:10.1371/journal.pgen.1003235
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1003235
Souhrn
To date, the genome-wide association study (GWAS) is the primary tool to identify genetic variants that cause phenotypic variation. As GWAS analyses are generally univariate in nature, multivariate phenotypic information is usually reduced to a single composite score. This practice often results in loss of statistical power to detect causal variants. Multivariate genotype–phenotype methods do exist but attain maximal power only in special circumstances. Here, we present a new multivariate method that we refer to as TATES (Trait-based Association Test that uses Extended Simes procedure), inspired by the GATES procedure proposed by Li et al (2011). For each component of a multivariate trait, TATES combines p-values obtained in standard univariate GWAS to acquire one trait-based p-value, while correcting for correlations between components. Extensive simulations, probing a wide variety of genotype–phenotype models, show that TATES's false positive rate is correct, and that TATES's statistical power to detect causal variants explaining 0.5% of the variance can be 2.5–9 times higher than the power of univariate tests based on composite scores and 1.5–2 times higher than the power of the standard MANOVA. Unlike other multivariate methods, TATES detects both genetic variants that are common to multiple phenotypes and genetic variants that are specific to a single phenotype, i.e. TATES provides a more complete view of the genetic architecture of complex traits. As the actual causal genotype–phenotype model is usually unknown and probably phenotypically and genetically complex, TATES, available as an open source program, constitutes a powerful new multivariate strategy that allows researchers to identify novel causal variants, while the complexity of traits is no longer a limiting factor.
Zdroje
1. CorvinA, CraddockN, SullivanPF (2010) Genome-wide association studies: a primer. Psychol Med 40: 1063–1077.
2. McClellanJ, KingMC (2010) Genetic heterogeneity in human disease. Cell 141: 210–217.
3. DowellRD, RyanO, JansenA, CheungD, AgarwalaS, et al. (2010) Genotype to phenotype: a complex problem. Science 328: 469–469.
4. MedlandS, NealeMC (2010) An integrated phenomic approach to multivariate allelic association. Eur J Hum Genet 18: 233–239.
5. MinicaCC, BoomsmaDI, van der SluisS, DolanCV (2010) Genetic Association in Multivariate Phenotypic Data: Power in Five Models. Twin Res Hum Genet 13: 525–543.
6. Van der SluisS, VerhageM, PosthumaD, DolanCV (2010) Phenotypic Complexity, Measurement Bias, and Poor Phenotypic Resolution Contribute to the Missing Heritability Problem in Genetic Association Studies. Plos One 5: e13929.
7. Van der SluisS, PosthumaD, NivardMG, VerhageM, DolanCV (2012) Power in GWAS: lifting the curse of the clinical cut-off. Mol Psych doi:10.1038/mp.2012.65.
8. Van der MaasHLJ, DolanCV, GrasmanRPPP, WichertsJM, HuizengaHM, et al. (2006) A Dynamic model of general intelligence: the positive manifold of intelligence by mutualism. Psychol Rev 113: 842–861.
9. CramerAOJ, WaldorpLJ, van der MaasHLJ, BorsboomD (2010) Comorbidity: a network perspective. Behav Brain Sci 33: 137–193.
10. BorsboomD, CramerAOJ, SchmittmannVD, EpskampS, WaldorpLJ (2011) The small world of psychopathology. Plos One 6: e27407.
11. O'ReillyPF, HoggartCJ, PomyenY, CalboliFCF, ElliottP, et al. (2012) MultiPhen: Joint model of multiple phenotypes can increase discovery in GWAS. Plos One 7: e34861.
12. FerreiraMAR, PurcellSM (2009) A multivariate test of association. Bioinformatics 25: 132–133.
13. PurcellS, NealeB, Todd-BrownK, ThomasL, FerreiraMAR, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575.
14. ColeDA, MaxwellSE, AvreyRD, SalasE (1994) How the power of MANOVA can both increase and decrease as a function of the intercorrelations among the dependent variables. Psychol Bull 115: 465–474.
15. LiM-X, GuiH-S, KwanJSH, ShamPC (2011) GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet 88: 283–293.
16. AulchenkoYS, RipkeS, IsaacsA, van DuijnCM (2007) GenABEL: an R library for genome-wide association analysis. Bioinformatics 23: 1294–1296.
17. AulchenkoYS, StruchalinMV, van DuijnCM (2010) ProbABEL package for genome-wide association analysis of imputed data. BMC Bioinformatics 11: 134.
18. LiY, WillerCJ, DingJ, ScheetP, AbecasisGR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34: 816–834.
19. LiY, WillerCJ, SannaS, AbecasisGR (2009) Genotype Imputation. Annu Rev Genomics Hum Genet 10: 387–406.
20. MarchiniJ, HowieB, MyersS, McVeanG, DonnellyP (2007) A new multipoint method for genome-wide association studies via imputation of genotypes. Nat Genet 39: 906–913.
21. AbecasisGR, ChernySS, CooksonWO, CardonLR (2002) Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30: 97–101.
22. LangeC, DeMeoD, SilvermanEK, WeissST, LairdNM (2004) PBAT: Tools for family-based association studies. Am J Hum Genet 74: 367–369.
23. Lawley DN, Maxwell AE (1971) Factor analysis as a statistical method. London: Butterworth.
24. Carroll JB (1993) Human Cognitive abilities: A survey of factor analytic studies. Cambridge University press.
25. Achenbach TM (1991) Manual for the Child Behavior Checklist/4–18. Burlington, VT: University of Vermont, Department of Psychiatry.
26. DigmanJM (1997) Higher-order factors of the big five. J Pers Soc Psychol 73: 1246–1256.
27. HendersonND, TurriMG, DeFriesJC, FlintJ (2004) QTL analysis of multiple behavioural measures of anxiety in mice. Behav Genet 34: 267–293.
28. BrzustowicsLM, BassettAS (2008) Phenotype matters: The case for careful characterization of relevant traits. Am J Psychiat 165: 1096–1098.
29. BlossCS, SchiaborKM, SchorkNJ (2010) Human behavioral informatics in genetic studies of neuropsychiatric disease: multi-variate profile-based analysis. Brain Res Bull 83: 177–188.
30. HouleD, GovindarajuDR, OmholtS (2010) Phenomics: the next challenge. Nat Rev Genet 11: 855–866.
31. R Development Core Team (2011) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.
32. Rasch G (1980) Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press.
33. Jöreskog KG, Sörbom D (1996–2001) LISREL 8 User's Reference Guide, SSI Scientific Software International. Suite. USA
34. WhitlockMC (2005) Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach. J Evolution Biol 18: 1368–1373.
35. Neale MC, Boker SM, Xie G, Maes HH (2006) Mx: statistical modeling, 7th edn. Department of Psychiatry, VCU, Richmond.
36. Muthén LK, Muthén BO (1998–2012) Mplus User's Guide. Seventh Edition. Los Angeles, CA: Muthén & Muthén
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2013 Číslo 1
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Function and Regulation of , a Gene Implicated in Autism and Human Evolution
- Comprehensive Methylome Characterization of and at Single-Base Resolution
- Susceptibility Loci Associated with Specific and Shared Subtypes of Lymphoid Malignancies
- An Insertion in 5′ Flanking Region of Causes Blue Eggshell in the Chicken