The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling
Genome-wide association studies in human populations have facilitated the creation of genomic profiles which combine the effects of many associated genetic variants to predict risk of disease. The area under the receiver operator characteristic (ROC) curve is a well established measure for determining the efficacy of tests in correctly classifying diseased and non-diseased individuals. We use quantitative genetics theory to provide insight into the genetic interpretation of the area under the ROC curve (AUC) when the test classifier is a predictor of genetic risk. Even when the proportion of genetic variance explained by the test is 100%, there is a maximum value for AUC that depends on the genetic epidemiology of the disease, i.e. either the sibling recurrence risk or heritability and disease prevalence. We derive an equation relating maximum AUC to heritability and disease prevalence. The expression can be reversed to calculate the proportion of genetic variance explained given AUC, disease prevalence, and heritability. We use published estimates of disease prevalence and sibling recurrence risk for 17 complex genetic diseases to calculate the proportion of genetic variance that a test must explain to achieve AUC = 0.75; this varied from 0.10 to 0.74. We provide a genetic interpretation of AUC for use with predictors of genetic risk based on genomic profiles. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability (or sibling recurrence risk) available as an online calculator.
Vyšlo v časopise:
The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling. PLoS Genet 6(2): e32767. doi:10.1371/journal.pgen.1000864
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1000864
Souhrn
Genome-wide association studies in human populations have facilitated the creation of genomic profiles which combine the effects of many associated genetic variants to predict risk of disease. The area under the receiver operator characteristic (ROC) curve is a well established measure for determining the efficacy of tests in correctly classifying diseased and non-diseased individuals. We use quantitative genetics theory to provide insight into the genetic interpretation of the area under the ROC curve (AUC) when the test classifier is a predictor of genetic risk. Even when the proportion of genetic variance explained by the test is 100%, there is a maximum value for AUC that depends on the genetic epidemiology of the disease, i.e. either the sibling recurrence risk or heritability and disease prevalence. We derive an equation relating maximum AUC to heritability and disease prevalence. The expression can be reversed to calculate the proportion of genetic variance explained given AUC, disease prevalence, and heritability. We use published estimates of disease prevalence and sibling recurrence risk for 17 complex genetic diseases to calculate the proportion of genetic variance that a test must explain to achieve AUC = 0.75; this varied from 0.10 to 0.74. We provide a genetic interpretation of AUC for use with predictors of genetic risk based on genomic profiles. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability (or sibling recurrence risk) available as an online calculator.
Zdroje
1. McCarthyMI
AbecasisGR
CardonLR
GoldsteinDB
LittleJ
2008 Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics 9 356 369
2. IlesMM
2008 What can genome-wide association studies tell us about the genetics of common disease? PLoS Genet 4 e33 doi:10.1371/journal.pgen.0040033
3. JanssensAC
AulchenkoYS
ElefanteS
BorsboomGJ
SteyerbergEW
2006 Predictive testing for complex diseases using multiple genes: fact or fiction? Genet Med 8 395 400
4. WrayNR
GoddardME
VisscherPM
2007 Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res 17 1520 1528
5. KraftP
WacholderS
CornelisMC
HuFB
HayesRB
2009 OPINION Beyond odds - ratios communicating disease risk based on genetic profiles. Nature Reviews Genetics 10 264 269
6. JakobsdottirJ
GorinMB
ConleyYP
FerrellRE
WeeksDE
2009 Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers. PLoS Genet 5 e1000337 doi:10.1371/journal.pgen.1000337
7. MetzCE
1978 Basic principles of ROC analysis. Seminars in Nuclear Medicine 8 283 298
8. LuQ
ElstonRC
2008 Using the optimal receiver operating characteristic curve to design a predictive genetic test, exemplified with type 2 diabetes. American Journal of Human Genetics 82 641 651
9. van der NetJB
JanssensA
DefescheJC
KasteleinJJP
SijbrandsEJG
2009 Usefulness of Genetic Polymorphisms and Conventional Risk Factors to Predict Coronary Heart Disease in Patients With Familial Hypercholesterolemia. American Journal of Cardiology 103 375 380
10. GrosseSD
KhouryMJ
2006 What is the clinical utility of genetic testing? Genet Med 8 448 450
11. FalconerD
MackayT
1996 Introduction to Quantitative Genetics. England Longman 464
12. JamesJW
1971 Frequency in relatives for an all-or-none trait. Ann Hum Genet 35 47 49
13. DempsterER
LernerIM
1950 Heritability of Threshold Characters. Genetics 35 212 236
14. LynchM
WalshB
1998 Genetics and Analysis of Quantitative Traits. Sunderland, Massachusetts Sinauer Associates, Inc
15. RobertsonA
LernerIM
1949 The heritability of all-or-none traits - viability of poultry. Genetics 34 395 411
16. ReichT
JamesJW
MorrisCA
1972 The use of multiple thresholds in determining the mode of transmission of semi-continuous traits. Ann Hum Genet 36 163 184
17. SomersRH
1962 A new asymmetric measure of association for ordinal variables. American Sociological Review 27 799 811
18. HanleyJ
McNeilB
1982 The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve. Radiology 143
19. YangJ
VisscherPM
WrayNR
2009 Sporadic cases are the norm for common disease. European Journal of Human Genetics 2009 Oct 14. [Epub ahead of print]
20. JanssensAC
MoonesingheR
YangQ
SteyerbergEW
van DuijnCM
2007 The impact of genotype frequencies on the clinical validity of genomic profiling for predicting common chronic diseases. Genet Med 9 528 535
21. SchollHPN
FleckensteinM
IssaPC
KeilhauerC
HolzFG
2007 An update on the genetics of age-related macular degeneration. Molecular Vision 13 196 205
22. SeddonJM
CoteJ
PageWF
AggenSH
NealeMC
2005 The US twin study of age-related macular degeneration - Relative roles of genetic and einivironmental influences. Archives of Ophthalmology 123 321 327
23. GuJ
PauerGJ
YueX
NarendraU
SturgillGM
2009 Assessing susceptibility to age-related macular degeneration with proteomic and genomic biomarkers. Mol Cell Proteomics 8 1338 1349
24. ClaytonDG
2009 Prediction and interaction in complex disease genetics: experience in type 1 diabetes. PLoS Genet 5 e1000540 doi:10.1371/journal.pgen.1000540
25. RischN
1990 Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet 46 222 228
26. SlatkinM
2008 Exchangeable models of complex inherited diseases. Genetics 179 2253 2261
27. WrayNR
GoddardME
2010 Multi-locus models of genetic risk of disease. Genome Medicine In press
28. MaherB
2008 Personal genomes: The case of the missing heritability. Nature 456 18 21
29. BhangaleTR
RiederMJ
NickersonDA
2008 Estimating coverage and power for genetic association studies using near-complete variation data. Nature Genetics 40 841 843
30. RedonR
IshikawaS
FitchKR
FeukL
PerryGH
2006 Global variation in copy number in the human genome. Nature 444 444 454
31. YoungsonNA
WhitelawE
2008 Transgenerational epigenetic effects. Annual Review of Genomics and Human Genetics 9 233 257
32. BakerSG
CookNR
VickersA
KramerBS
2009 Using relative utility curves to evaluate risk prediction. Journal of the Royal Statistical Society 172 729 748
33. LevinsonDF
2006 The genetics of depression: A review. Biological Psychiatry 60 84 92
34. SullivanPF
NealeMC
KendlerKS
2000 Genetic epidemiology of major depression: Review and meta-analysis. American Journal of Psychiatry 157 1552 1562
35. MarenbergME
RischN
BerkmanLF
FloderusB
DefaireU
1994 Genetic susceptibility to death from coronary heart disease in a study of twins. New England Journal of Medicine 330 1041 1046
36. RischN
2001 The genetic epidemiology of cancer: interpreting family and twin studies and their implications for molecular genetic approaches. Cancer Epidemiol Biomarkers Prev 10 733 741
37. DasSK
ElbeinSC
2006 The Genetic Basis of Type 2 Diabetes. Cellscience 2 100 131
38. HemminkiK
LiX
SundquistK
SundquistJ
2007 Familial risks for asthma among twins and other siblings based on hospitalizations in Sweden. Clinical and Experimental Allergy 37 1320 1325
39. CraddockN
KhodelV
Van EerdeweghP
ReichT
1995 Mathematical limits of multilocus models: the genetic transmission of bipolar disorder. Am J Hum Genet 57 690 702
40. LichtensteinP
YipBH
BjorkC
PawitanY
CannonTD
2009 Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study. Lancet 373 234 239
41. McGueM
GottesmanII
RaoDC
1983 The transmission of schizophrenia under a multifactorial threshold model. American Journal of Human Genetics 35 1161 1178
42. HarneyS
WordsworthBP
2002 Genetic epidemiology of rheumatoid arthritis. Tissue Antigens 60 465 473
43. HyttinenV
KaprioJ
KinnunenL
KoskenvuoM
TuomilehtoJ
2003 Genetic liability of type 1 diabetes and the onset age among 22,650 young Finnish twin pairs - A nationwide follow-up study. Diabetes 52 1052 1055
44. WTCCC 2007 Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447 661 678
45. HarleyJB
Alarcon-RiquelmeME
CriswellLA
JacobCO
KimberlyRP
2008 Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nat Genet 40 204 210
46. SingT
SanderO
BeerenwinkelN
LengauerT
2005 ROCR: visualizing classifier performance in R. Bioinformatics 21 3940 3941
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2010 Číslo 2
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Genome-Wide Association Study in Asian Populations Identifies Variants in and Associated with Systemic Lupus Erythematosus
- Nuclear Pore Proteins Nup153 and Megator Define Transcriptionally Active Regions in the Genome
- The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling
- Nucleoporins and Transcription: New Connections, New Questions