An Evolutionary Framework for Association Testing in Resequencing Studies
Sequencing technologies are becoming cheap enough to apply to large numbers of study participants and promise to provide new insights into human phenotypes by bringing to light rare and previously unknown genetic variants. We develop a new framework for the analysis of sequence data that incorporates all of the major features of previously proposed approaches, including those focused on allele counts and allele burden, but is both more general and more powerful. We harness population genetic theory to provide prior information on effect sizes and to create a pooling strategy for information from rare variants. Our method, EMMPAT (Evolutionary Mixed Model for Pooled Association Testing), generates a single test per gene (substantially reducing multiple testing concerns), facilitates graphical summaries, and improves the interpretation of results by allowing calculation of attributable variance. Simulations show that, relative to previously used approaches, our method increases the power to detect genes that affect phenotype when natural selection has kept alleles with large effect sizes rare. We demonstrate our approach on a population-based re-sequencing study of association between serum triglycerides and variation in ANGPTL4.
Vyšlo v časopise:
An Evolutionary Framework for Association Testing in Resequencing Studies. PLoS Genet 6(11): e32767. doi:10.1371/journal.pgen.1001202
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1001202
Souhrn
Sequencing technologies are becoming cheap enough to apply to large numbers of study participants and promise to provide new insights into human phenotypes by bringing to light rare and previously unknown genetic variants. We develop a new framework for the analysis of sequence data that incorporates all of the major features of previously proposed approaches, including those focused on allele counts and allele burden, but is both more general and more powerful. We harness population genetic theory to provide prior information on effect sizes and to create a pooling strategy for information from rare variants. Our method, EMMPAT (Evolutionary Mixed Model for Pooled Association Testing), generates a single test per gene (substantially reducing multiple testing concerns), facilitates graphical summaries, and improves the interpretation of results by allowing calculation of attributable variance. Simulations show that, relative to previously used approaches, our method increases the power to detect genes that affect phenotype when natural selection has kept alleles with large effect sizes rare. We demonstrate our approach on a population-based re-sequencing study of association between serum triglycerides and variation in ANGPTL4.
Zdroje
1. MaherB
2008 Personal genomes: The case of the missing heritability. Nature 456 18 21
2. PritchardJK
CoxNJ
2002 The allelic architecture of human disease genes: common disease-common variant… or not? Hum Mol Genet 11 2417 2423
3. PritchardJK
2001 Are rare variants responsible for susceptibility to complex diseases? American Journal of Human Genetics 69 124137
4. ManolioTA
CollinsFS
CoxNJ
GoldsteinDB
HindorffLA
2009 Finding the missing heritability of complex diseases. Nature 461 747 753
5. Eyre-WalkerA
2010 Evolution in health and medicine sackler colloquium: Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proceedings of the National Academy of Sciences 107 1752 1756
6. GorlovIP
GorlovaOY
SunyaevSR
SpitzMR
AmosCI
2008 Shifting paradigm of association studies: Value of rare Single-Nucleotide polymorphisms. American Journal of Human Genetics 82 100112
7. LiB
LealSM
2009 Discovery of rare variants via sequencing: Implications for the design of complex trait association studies. PLoS Genet 5 e1000481 doi:10.1371/journal.pgen.1000481
8. RomeoS
YinW
KozlitinaJ
PennacchioLA
BoerwinkleE
2009 Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. The Journal of Clinical Investigation 119 70 79
9. Paisn-RuizC
WasheckaN
NathP
SingletonAB
CorderEH
2009 Parkinson's disease and low frequency alleles found together throughout LRRK2. Annals of Human Genetics 73 391 403
10. CohenJC
PertsemlidisA
FahmiS
EsmailS
VegaGL
2006 Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels. Proceedings of the National Academy of Sciences of the United States of America 103 1810 1815
11. CohenJC
BoerwinkleE
MosleyTH
HobbsHH
2006 Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N Engl J Med 354 1264 1272
12. RomeoS
PennacchioLA
FuY
BoerwinkleE
Tybjaerg-HansenA
2007 Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nat Genet 39 513 516
13. KotowskiIK
PertsemlidisA
LukeA
CooperRS
VegaGL
2006 A spectrum of PCSK9 alleles contributes to plasma levels of Low-Density lipoprotein cholesterol. The American Journal of Human Genetics 78 410 422
14. CohenJC
KissRS
PertsemlidisA
MarcelYL
McPhersonR
2004 Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305 869 872
15. WangJ
CaoH
BanMR
KennedyBA
ZhuS
2007 Resequencing genomic DNA of patients with severe hypertriglyceridemia (MIM 144650). Arterioscler Thromb Vasc Biol 27 2450 2455
16. KryukovGV
ShpuntA
StamatoyannopoulosJA
SunyaevSR
2009 Power of deep, all-exon resequencing for discovery of human trait genes. Proceedings of the National Academy of Sciences 106 3871 3876
17. RoachJC
GlusmanG
SmitAFA
HuffCD
HubleyR
2010 Analysis of genetic inheritance in a family quartet by Whole-Genome sequencing. Science 328 636 639
18. HoggartCJ
WhittakerJC
IorioMD
BaldingDJ
2008 Simultaneous analysis of all SNPs in Genome-Wide and Re-Sequencing association studies. PLoS Genet 4 e1000130 doi:10.1371/journal.pgen.1000130
19. KweeLC
LiuD
LinX
GhoshD
EpsteinMP
2008 A powerful and flexible multilocus association test for quantitative traits. American Journal of Human Genetics 82 386 397
20. LiB
LealS
2008 Methods for detecting associations with rare variants for common diseases: Application to analysis of sequence data. The American Journal of Human Genetics 83 311 321
21. MadsenBE
BrowningSR
2009 A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5 e1000384 doi:10.1371/journal.pgen.1000384
22. BartonNH
KeightleyPD
2002 Understanding quantitative genetic variation. Nat Rev Genet 3 11 21
23. JohnsonT
BartonN
2005 Theoretical models of selection and mutation on quantitative traits. Philosophical Transactions of the Royal Society B: Biological Sciences 360 1411 1425
24. HartlDL
ClarkAG
ClarkAG
1997 Principles of population genetics. Sinauer Sunderland, MA, USA
25. Eyre-WalkerA
WoolfitM
PhelpsT
2006 The distribution of fitness effects of new deleterious amino acid mutations in humans. Genetics 173 891 900
26. Eyre-WalkerA
KeightleyPD
2007 The distribution of fitness effects of new mutations. Nat Rev Genet 8 610 618
27. KeightleyPD
Eyre-WalkerA
2007 Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics 177 2251 2261
28. WelchJJ
Eyre-WalkerA
WaxmanD
2008 Divergence and polymorphism under the nearly neutral theory of molecular evolution. Journal of Molecular Evolution 67 418 426
29. KryukovGV
PennacchioLA
SunyaevSR
2007 Most rare missense alleles are deleterious in humans: Implications for complex disease and association studies. American Journal of Human Genetics 80 727739
30. YampolskyLY
KondrashovFA
KondrashovAS
2005 Distribution of the strength of selection against amino acid replacements in human proteins. Hum Mol Genet 14 3191 3201
31. GutenkunstRN
HernandezRD
WilliamsonSH
BustamanteCD
2009 Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet 5 e1000695 doi:10.1371/journal.pgen.1000695
32. NielsenR
HubiszMJ
HellmannI
TorgersonD
AndrésAM
2009 Darwinian and demographic forces affecting human protein coding genes. Genome Research 19 838 849
33. BoykoAR
WilliamsonSH
IndapAR
DegenhardtJD
HernandezRD
2008 Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 4 e1000083 doi:10.1371/journal.pgen.1000083
34. TorgersonDG
BoykoAR
HernandezRD
IndapA
HuX
2009 Evolutionary processes acting on candidate cis-Regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet 5 e1000592 doi:10.1371/journal.pgen.1000592
35. ZollnerS
WenX
PritchardJK
2005 Association mapping and fine mapping with TreeLD. Bioinformatics 21 3168 3170
36. HernandezRD
2008 A flexible forward simulator for populations subject to selection and demography. Bioinformatics 24 2786 2787
37. McCullochCE
SearleSR
2000 Generalized, Linear, and Mixed Models Hoboken, NJ, USA John Wiley & Sons, Inc
38. WedderburnRWM
1974 Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61 439 447
39. HeydeCC
1997 Quasi-likelihood and its application Springer 236
40. LittelRC
MillikenGA
StroupWW
WolfingerRD
1996 SAS system for mixed models SAS Inst
41. R Development Core Team 2009 R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org. ISBN 3-900051-07-0
42. VictorRG
HaleyRW
WillettDL
PeshockRM
VaethPC
2004 The dallas heart study: a population-based probability sample for the multidisciplinary study of ethnic differences in cardiovascular health. The American Journal of Cardiology 93 1473 1480
43. BrowningJD
SzczepaniakLS
DobbinsR
NurembergP
HortonJD
2004 Prevalence of hepatic steatosis in an urban population in the united states: impact of ethnicity. Hepatology (Baltimore, Md) 40 1387 1395
44. hon YauM
WangY
LamKSL
ZhangJ
WuD
2009 A highly conserved motif within the NH2-terminal coiled-coil domain of angiopoietin-like protein 4 confers its inhibitory effects on lipoprotein lipase by disrupting the enzyme dimerization. The Journal of Biological Chemistry 284 11942 11952
45. YinW
RomeoS
ChangS
GrishinNV
HobbsHH
2009 Genetic variation in ANGPTL4 provides insights into protein processing and function. The Journal of Biological Chemistry 284 13213 13222
46. MorgenthalerS
ThillyWG
2007 A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutation Research 615 28 56
47. ZengK
ManoS
ShiS
WuC
2007 Comparisons of site- and Haplotype-Frequency methods for detecting positive selection. Mol Biol Evol 24 1562 1574
48. PickrellJK
CoopG
NovembreJ
KudaravalliS
LiJZ
2009 Signals of recent positive selection in a worldwide sample of human populations. Genome Research 19 826 837
49. VoightBF
KudaravalliS
WenX
PritchardJK
2006 A map of recent positive selection in the human genome. PLoS Biol 4 e72 doi:10.1371/journal.pbio.1000072
50. PenningsPS
HermissonJ
2006 Soft sweeps III: the signature of positive selection from recurrent mutation. PLoS Genet 2 e186 doi:10.1371/journal.pgen.0020186
51. PritchardJK
PickrellJK
CoopG
2010 The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Current Biology: CB 20 R208 215
52. AhituvN
KavaslarN
SchackwitzW
UstaszewskaA
MartinJ
2007 Medical sequencing at the extremes of human body mass. American Journal of Human Genetics 80 779 791
53. NeuhausJM
JewellNP
1990 The effect of retrospective sampling on binary regression models for clustered data. Biometrics 46 977 990
54. BartonNH
TurelliM
2004 Effects of genetic drift on variance components under a general model of epistasis. Evolution; International Journal of Organic Evolution 58 2111 2132
55. HillWG
GoddardME
VisscherPM
2008 Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet 4 e1000008 doi:10.1371/journal.pgen.1000008
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2010 Číslo 11
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Genome-Wide Association Study Identifies Two Novel Regions at 11p15.5-p13 and 1p31 with Major Impact on Acute-Phase Serum Amyloid A
- Analysis of the 10q11 Cancer Risk Locus Implicates and in Human Prostate Tumorigenesis
- The Parental Non-Equivalence of Imprinting Control Regions during Mammalian Development and Evolution
- A Functional Genomics Approach Identifies Candidate Effectors from the Aphid Species (Green Peach Aphid)