Accurate and Robust Genomic Prediction of Celiac Disease Using Statistical Learning
Practical application of genomic-based risk stratification to clinical diagnosis is appealing yet performance varies widely depending on the disease and genomic risk score (GRS) method. Celiac disease (CD), a common immune-mediated illness, is strongly genetically determined and requires specific HLA haplotypes. HLA testing can exclude diagnosis but has low specificity, providing little information suitable for clinical risk stratification. Using six European cohorts, we provide a proof-of-concept that statistical learning approaches which simultaneously model all SNPs can generate robust and highly accurate predictive models of CD based on genome-wide SNP profiles. The high predictive capacity replicated both in cross-validation within each cohort (AUC of 0.87–0.89) and in independent replication across cohorts (AUC of 0.86–0.9), despite differences in ethnicity. The models explained 30–35% of disease variance and up to ∼43% of heritability. The GRS's utility was assessed in different clinically relevant settings. Comparable to HLA typing, the GRS can be used to identify individuals without CD with ≥99.6% negative predictive value however, unlike HLA typing, fine-scale stratification of individuals into categories of higher-risk for CD can identify those that would benefit from more invasive and costly definitive testing. The GRS is flexible and its performance can be adapted to the clinical situation by adjusting the threshold cut-off. Despite explaining a minority of disease heritability, our findings indicate a genomic risk score provides clinically relevant information to improve upon current diagnostic pathways for CD and support further studies evaluating the clinical utility of this approach in CD and other complex diseases.
Vyšlo v časopise:
Accurate and Robust Genomic Prediction of Celiac Disease Using Statistical Learning. PLoS Genet 10(2): e32767. doi:10.1371/journal.pgen.1004137
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1004137
Souhrn
Practical application of genomic-based risk stratification to clinical diagnosis is appealing yet performance varies widely depending on the disease and genomic risk score (GRS) method. Celiac disease (CD), a common immune-mediated illness, is strongly genetically determined and requires specific HLA haplotypes. HLA testing can exclude diagnosis but has low specificity, providing little information suitable for clinical risk stratification. Using six European cohorts, we provide a proof-of-concept that statistical learning approaches which simultaneously model all SNPs can generate robust and highly accurate predictive models of CD based on genome-wide SNP profiles. The high predictive capacity replicated both in cross-validation within each cohort (AUC of 0.87–0.89) and in independent replication across cohorts (AUC of 0.86–0.9), despite differences in ethnicity. The models explained 30–35% of disease variance and up to ∼43% of heritability. The GRS's utility was assessed in different clinically relevant settings. Comparable to HLA typing, the GRS can be used to identify individuals without CD with ≥99.6% negative predictive value however, unlike HLA typing, fine-scale stratification of individuals into categories of higher-risk for CD can identify those that would benefit from more invasive and costly definitive testing. The GRS is flexible and its performance can be adapted to the clinical situation by adjusting the threshold cut-off. Despite explaining a minority of disease heritability, our findings indicate a genomic risk score provides clinically relevant information to improve upon current diagnostic pathways for CD and support further studies evaluating the clinical utility of this approach in CD and other complex diseases.
Zdroje
1. AndersonRP (2011) Coeliac disease is on the rise. Med J Aust 194: 278–279.
2. GreenPH, CellierC (2007) Celiac disease. N Engl J Med 357: 1731–1743.
3. CatassiC, KryszakD, Louis-JacquesO, DuerksenDR, HillI, et al. (2007) Detection of Celiac disease in primary care: a multicenter case-finding study in North America. Am J Gastroenterol 102: 1454–1460.
4. DubeC, RostomA, SyR, CranneyA, SaloojeeN, et al. (2005) The prevalence of celiac disease in average-risk and at-risk Western European populations: a systematic review. Gastroenterology 128: S57–67.
5. AndersonRP, HenryMJ, TaylorR, DuncanEL, DanoyP, et al. (2013) A novel serogenetic approach determines the community prevalence of celiac disease and informs improved diagnostic pathways. BMC Med 11: 188.
6. ThompsonT (2005) National Institutes of Health consensus statement on celiac disease. J Am Diet Assoc 105: 194–195.
7. HusbyS, KoletzkoS, Korponay-SzaboIR, MearinML, PhillipsA, et al. (2012) European Society for Pediatric Gastroenterology, Hepatology, and Nutrition guidelines for the diagnosis of coeliac disease. J Pediatr Gastroenterol Nutr 54: 136–160.
8. Walker-SmithJ, GuandaliniS, SchmitzJ, ShmerlingD, VisakorpiJ (1990) Revised criteria for diagnosis of coeliac disease. Report of Working Group of European Society of Paediatric Gastroenterology and Nutrition. Archives of Disease in Childhood 65: 909–911.
9. HillID (2005) What are the sensitivity and specificity of serologic tests for celiac disease? Do sensitivity and specificity vary in different populations? Gastroenterology 128: S25–32.
10. HillPG, ForsythJM, SemeraroD, HolmesGK (2004) IgA antibodies to human tissue transglutaminase: audit of routine practice confirms high diagnostic accuracy. Scand J Gastroenterol 39: 1078–1082.
11. ChinMW, MallonDF, CullenDJ, OlynykJK, MollisonLC, et al. (2009) Screening for coeliac disease using anti-tissue transglutaminase antibody assays, and prevalence of the disease in an Australian community. Med J Aust 190: 429–432.
12. LefflerDA, SchuppanD (2010) Update on serologic testing in celiac disease. Am J Gastroenterol 105: 2520–2524.
13. KarellK, LoukaAS, MoodieSJ, AscherH, ClotF, et al. (2003) HLA types in celiac disease patients not carrying the DQA1*05-DQB1*02 (DQ2) heterodimer: results from the european genetics cluster on celiac disease. Human Immunology 64: 469–477.
14. HendersonKN, Tye-DinJA, ReidHH, ChenZ, BorgNA, et al. (2007) A structural and immunological basis for the role of human leukocyte antigen DQ8 in celiac disease. Immunity 27: 23–34.
15. KarellK, LoukaAS, MoodieSJ, AscherH, ClotF, et al. (2003) HLA types in celiac disease patients not carrying the DQA1*05-DQB1*02 (DQ2) heterodimer: results from the European Genetics Cluster on Celiac Disease. Human Immunology 64: 469–477.
16. LundinKE, ScottH, FausaO, ThorsbyE, SollidLM (1994) T cells from the small intestinal mucosa of a DR4, DQ7/DR4, DQ8 celiac disease patient preferentially recognize gliadin when presented by DQ8. Human Immunology 41: 285–291.
17. LundinKE, ScottH, HansenT, PaulsenG, HalstensenTS, et al. (1993) Gliadin-specific, HLA-DQ(alpha 1*0501,beta 1*0201) restricted T cells isolated from the small intestinal mucosa of celiac disease patients. Journal of Experimental Medicine 178: 187–196.
18. Tye-DinJA, StewartJA, DromeyJA, BeissbarthT, van HeelDA, et al. (2010) Comprehensive, quantitative mapping of T cell epitopes in gluten in celiac disease. Sci Transl Med 2: 41ra51.
19. MegiorniF, MoraB, BonamicoM, BarbatoM, NennaR, et al. (2009) HLA-DQ and risk gradient for celiac disease. Human immunology 70: 55–59.
20. SollidLM, LieBA (2005) Celiac disease genetics: current concepts and practical applications. Clin Gastroenterol Hepatol 3: 843–851.
21. RomanosJ, RosenA, KumarV, TrynkaG, FrankeL, et al. (2013) Improving coeliac disease risk prediction by testing non-HLA variants additional to HLA variants. Gut [Epub ahead of print].
22. RomanosJ, van DiemenCC, NolteIM, TrynkaG, ZhernakovaA, et al. (2009) Analysis of HLA and non-HLA alleles can identify individuals at high risk for celiac disease. Gastroenterology 137: 834–840, 840.e831–833.
23. van HeelDA, FrankeL, HuntKA, GwilliamR, ZhernakovaA, et al. (2007) A genome-wide association study for celiac disease identifies risk variants in the region harboring IL2 and IL21. Nature Genetics 39: 827–829.
24. DuboisPCA, TrynkaG, FrankeL, HuntKa, RomanosJ, et al. (2010) Multiple common variants for celiac disease influencing immune gene expression. Nature Genetics 42: 295–302.
25. IzzoV, PinelliM, TintoN, EspositoMV, ColaA, et al. (2011) Improving the estimation of celiac disease sibling risk by non-HLA genes. PloS ONE 6: e26920.
26. AbrahamG, KowalczykA, ZobelJ, InouyeM (2012) SparSNP: Fast and memory-efficient analysis of all SNPs for phenotype prediction. BMC Bioinformatics 13: 88.
27. EvansD, VisscherP, WrayN (2009) Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Human Molecular Genetics 18: 3525–3531.
28. YangJ, LeeSH, GoddardME, VisscherPM (2011) GCTA: a tool for genome-wide complex trait analysis. American Journal of Human Genetics 88: 76–82.
29. LeeSH, WrayNR, GoddardME, VisscherPM (2011) Estimating missing heritability for disease from genome-wide association studies. American Journal of Human Genetics 88: 294–305.
30. AbrahamG, KowalczykA, ZobelJ, InouyeM (2013) Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease. Genetic Epidemiology 37: 184–195.
31. WeiZ, WangW, BradfieldJ, LiJ, CardinaleC, et al. (2013) Large Sample Size, Wide Variant Spectrum, and Advanced Machine-Learning Technique Boost Risk Prediction for Inflammatory Bowel Disease. The American Journal of Human Genetics 92: 1008–1012.
32. WrayNR, YangJ, GoddardME, VisscherPM (2010) The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genetics 6: e1000864.
33. VarmaS, SimonR (2006) Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7: 91.
34. WoodIa, VisscherPM, MengersenKL (2007) Classification based upon gene expression data: bias and precision of error rates. Bioinformatics 23: 1363–1370.
35. ZhengX, ShenJ, CoxC, WakefieldJC, EhmMG, et al. (2013) HIBAG – HLA Genotype Imputation with Attribute Bagging. Pharmacogenomics Journal [Epub ahead of print].
36. MonsuurAJ, de BakkerPI, ZhernakovaA, PintoD, VerduijnW, et al. (2008) Effective detection of human leukocyte antigen risk alleles in celiac disease using tag single nucleotide polymorphisms. PLoS ONE 3: e2270.
37. The Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678.
38. CotsapasC, VoightBF, RossinE, LageK, NealeBM, et al. (2011) Pervasive sharing of genetic effects in autoimmune disease. PLoS Genetics 7: e1002254.
39. SmythDJ, PlagnolV, WalkerNM, CooperJD, DownesK, et al. (2008) Shared and distinct genetic variants in type 1 diabetes and celiac disease. The New England Journal of Medicine 359: 2767–2777.
40. AltmanDG, BlandJM (1994) Diagnostic tests 2: predictive values. BMJ 309: 102.
41. Rubio-TapiaA, Van DykeCT, LahrBD, ZinsmeisterAR, El-YoussefM, et al. (2008) Predictors of family risk for celiac disease: a population-based study. Clin Gastroenterol Hepatol 6: 983–987.
42. BourgeyM, CalcagnoG, TintoN, GennarelliD, Margaritte-JeanninP, et al. (2007) HLA related genetic risk for coeliac disease. Gut 56: 1054–1059.
43. VoltaU, TovoliF, CaioG (2011) Clinical and immunological features of celiac disease in patients with Type 1 diabetes mellitus. Expert Rev Gastroenterol Hepatol 5: 479–487.
44. HinH, BirdG, FisherP, MahyN, JewellD (1999) Coeliac disease in primary care: case finding study. BMJ 318: 164–167.
45. HarrellFEJ (2001) Regression Modeling Strategies: Springer.
46. NaiyerAJ, HernandezL, CiaccioEJ, PapadakisK, ManavalanJS, et al. (2009) Comparison of commercially available serologic kits for the detection of celiac disease. J Clin Gastroenterol 43: 225–232.
47. ReevesGE, SquanceML, DugganAE, MurugasuRR, WilsonRJ, et al. (2006) Diagnostic accuracy of coeliac serological tests: a prospective study. Eur J Gastroenterol Hepatol 18: 493–501.
48. LewisNR, ScottBB (2006) Systematic review: the use of serology to exclude or diagnose coeliac disease (a comparison of the endomysial and tissue transglutaminase antibody tests). Aliment Pharmacol Ther 24: 47–54.
49. WongRC, SteeleRH, ReevesGE, WilsonRJ, PinkA, et al. (2003) Antibody and genetic testing in coeliac disease.[see comment]. Pathology 35: 285–304.
50. SimellS, HoppuS, HekkalaA, SimellT, StahlbergMR, et al. (2007) Fate of five celiac disease-associated antibodies during normal diet in genetically at-risk children observed from birth in a natural history study. Am J Gastroenterol 102: 2026–2035.
51. SimellS, KupilaA, HoppuS, HekkalaA, SimellT, et al. (2005) Natural history of transglutaminase autoantibodies and mucosal changes in children carrying HLA-conferred celiac disease susceptibility. Scand J Gastroenterol 40: 1182–1191.
52. WalkerMM, MurrayJA, RonkainenJ, AroP, StorskrubbT, et al. (2010) Detection of celiac disease and lymphocytic enteropathy by parallel serology and histopathology in a population-based study. Gastroenterology 139: 112–119.
53. BiagiF, KlersyC, BalduzziD, CorazzaGR (2010) Are we not over-estimating the prevalence of coeliac disease in the general population? Ann Med 42: 557–561.
54. MakiM, MustalahtiK, KokkonenJ, KulmalaP, HaapalahtiM, et al. (2003) Prevalence of Celiac disease among children in Finland. New England Journal of Medicine 348: 2517–2524.
55. KurppaK, AshornM, IltanenS, KoskinenLL, SaavalainenP, et al. (2010) Celiac disease without villous atrophy in children: a prospective study. J Pediatr 157: 373–380, 380 e371.
56. KurppaK, CollinP, ViljamaaM, HaimilaK, SaavalainenP, et al. (2009) Diagnosing mild enteropathy celiac disease: a randomized, controlled clinical study. Gastroenterology 136: 816–823.
57. HopperAD, HadjivassiliouM, HurlstoneDP, LoboAJ, McAlindonME, et al. (2008) What is the role of serologic testing in celiac disease? A prospective, biopsy-confirmed study with economic analysis. Clin Gastroenterol Hepatol 6: 314–320.
58. HopperAD, CrossSS, HurlstoneDP, McAlindonME, LoboAJ, et al. (2007) Pre-endoscopy serological testing for coeliac disease: evaluation of a clinical decision tool. BMJ 334: 729.
59. NPD Group (January 2013) Dieting monitor: Eating Patterns in America. NPD Group.
60. JonesHJ, WarnerJT (2010) NICE clinical guideline 86. Coeliac disease: recognition and assessment of coeliac disease. Arch Dis Child 95: 312–313.
61. Rubio-TapiaA, HillID, KellyCP, CalderwoodAH, MurrayJA (2013) ACG clinical guidelines: diagnosis and management of celiac disease. Am J Gastroenterol 108: 656–676; quiz 677.
62. RostomA, MurrayJA, KagnoffMF (2006) American Gastroenterological Association (AGA) Institute technical review on the diagnosis and management of celiac disease. Gastroenterology 131: 1981–2002.
63. DoolanA, DonaghueK, FairchildJ, WongM, WilliamsAJ (2005) Use of HLA typing in diagnosing celiac disease in patients with type 1 diabetes. Diabetes Care 28: 806–809.
64. Matysiak-BudnikT, MalamutG, de SerreNP, GrosdidierE, SeguierS, et al. (2007) Long-term follow-up of 61 coeliac patients diagnosed in childhood: evolution toward latency is possible on a normal diet. Gut 56: 1379–1386.
65. HadithiM, von BlombergBM, CrusiusJB, BloemenaE, KostensePJ, et al. (2007) Accuracy of serologic tests and HLA-DQ typing for diagnosing celiac disease. Ann Intern Med 147: 294–302.
66. TrynkaG, HuntKa, BockettNa, RomanosJ, MistryV, et al. (2011) Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease. Nature Genetics 43: 1193–1201.
67. PattersonN, PriceAL, ReichD (2006) Population structure and eigenanalysis. PLoS Genet 2: e190.
68. ZhouX, CarbonettoP, StephensM (2013) Polygenic Modeling with Bayesian Sparse Linear Mixed Models. PLoS Genetics 9: e1003264.
69. R Core Team (2012) R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
70. Wickham H (2009) ggplot2: elegant graphics for data analysis. New York: Springer.
71. HanleyAJ, McNeilBJ (1982) The Meaning and Use of the Area under A Receiver Operating Characteristic (ROC) Curve. Radiology 143: 29–36.
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2014 Číslo 2
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
Najčítanejšie v tomto čísle
- Genome-Wide Association Study of Metabolic Traits Reveals Novel Gene-Metabolite-Disease Links
- A Cohesin-Independent Role for NIPBL at Promoters Provides Insights in CdLS
- Classic Selective Sweeps Revealed by Massive Sequencing in Cattle
- Arf4 Is Required for Mammalian Development but Dispensable for Ciliary Assembly