Predicting long-term type 2 diabetes with support vector machine using oral glucose tolerance test
Autoři:
Hasan T. Abbas aff001; Lejla Alic aff002; Madhav Erraguntla aff003; Jim X. Ji aff001; Muhammad Abdul-Ghani aff004; Qammer H. Abbasi aff005; Marwa K. Qaraqe aff006
Působiště autorů:
Department of Electrical & Computer Engineering, Texas A&M University at Qatar, Doha, Qatar
aff001; Magnetic Detection & Imaging Group, Faculty of Science & Technology, University of Twente, Enschede, The Netherlands
aff002; Department of Industrial & Systems Engineering, Texas A&M University, College Station, Texas, United States of America
aff003; UT Health, San Antonio, Texas, United States of America
aff004; James Watt School of Engineering, University of Glasgow, Glasgow, United Kingdom
aff005; College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
aff006
Vyšlo v časopise:
PLoS ONE 14(12)
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pone.0219636
Souhrn
Diabetes is a large healthcare burden worldwide. There is substantial evidence that lifestyle modifications and drug intervention can prevent diabetes, therefore, an early identification of high risk individuals is important to design targeted prevention strategies. In this paper, we present an automatic tool that uses machine learning techniques to predict the development of type 2 diabetes mellitus (T2DM). Data generated from an oral glucose tolerance test (OGTT) was used to develop a predictive model based on the support vector machine (SVM). We trained and validated the models using the OGTT and demographic data of 1,492 healthy individuals collected during the San Antonio Heart Study. This study collected plasma glucose and insulin concentrations before glucose intake and at three time-points thereafter (30, 60 and 120 min). Furthermore, personal information such as age, ethnicity and body-mass index was also a part of the data-set. Using 11 OGTT measurements, we have deduced 61 features, which are then assigned a rank and the top ten features are shortlisted using minimum redundancy maximum relevance feature selection algorithm. All possible combinations of the 10 best ranked features were used to generate SVM based prediction models. This research shows that an individual’s plasma glucose levels, and the information derived therefrom have the strongest predictive performance for the future development of T2DM. Significantly, insulin and demographic features do not provide additional performance improvement for diabetes prediction. The results of this work identify the parsimonious clinical data needed to be collected for an efficient prediction of T2DM. Our approach shows an average accuracy of 96.80% and a sensitivity of 80.09% obtained on a holdout set.
Klíčová slova:
Glucose tolerance tests – Insulin – Blood plasma – Cardiovascular diseases – Support vector machines
Zdroje
1. Mathers CD, Loncar D. Projections of Global Mortality and Burden of Disease from 2002 to 2030. PLoS Medicine. 2006;3(11):e442. doi: 10.1371/journal.pmed.0030442 17132052
2. Tuomilehto J, Lindström J, Eriksson JG, Valle TT, Hämäläinen H, Ilanne-Parikka P, et al. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. New England Journal of Medicine. 2001;344(18):1343–1350. doi: 10.1056/NEJM200105033441801 11333990
3. Diabetes Prevention Program Research Group. Long-term effects of lifestyle intervention or metformin on diabetes development and microvascular complications over 15-year follow-up: the Diabetes Prevention Program Outcomes Study. The Lancet Diabetes & Endocrinology. 2015;3(11):866–875. doi: 10.1016/S2213-8587(15)00291-0
4. Noble D, Mathur R, Dent T, Meads C, Greenhalgh T. Risk models and scores for type 2 diabetes: systematic review. BMJ. 2011;343:d7163. doi: 10.1136/bmj.d7163 22123912
5. Heikes KE, Eddy DM, Arondekar B, Schlessinger L. Diabetes Risk Calculator. Diabetes Care. 2008;31(5):1040–1045. doi: 10.2337/dc07-1150 18070993
6. Glümer C, Carstensen B, Sandbæk A, Lauritzen T, Jørgensen T, Borch-Johnsen K. A Danish Diabetes Risk Score for Targeted Screening. Diabetes Care. 2004;27(3):727–733. doi: 10.2337/diacare.27.3.727 14988293
7. Heliövaara M, Aromaa A, Klaukka T, Knekt P, Joukamaa M, Impivaara O. Reliability and validity of interview data on chronic diseases The mini-Finland health survey. Journal of Clinical Epidemiology. 1993;46(2):181–191. doi: 10.1016/0895-4356(93)90056-7 8437034
8. Expert Committee on the Diagnosis and Classification of Diabetes Mellitus. Report of the Expert Committee on the Diagnosis and Classification of Diabetes Mellitus. Diabetes Care. 1997;20(7):1183–1197. doi: 10.2337/diacare.20.7.1183 9203460
9. Stumvoll M, Mitrakou A, Pimenta W, Jenssen T, Yki-Järvinen H, Van Haeften T, et al. Use of the oral glucose tolerance test to assess insulin release and insulin sensitivity. Diabetes Care. 2000;23(3):295–301. doi: 10.2337/diacare.23.3.295 10868854
10. World Health Organization, International Diabetes Federation. Definition and diagnosis of diabetes mellitus and intermediate hyperglycaemia: report of a WHO/IDF consultation. World Health Organization; 2006.
11. DeFronzo RA, Abdul-Ghani M. Assessment and treatment of cardiovascular risk in prediabetes: Impaired glucose tolerance and impaired fasting glucose. The American Journal of Cardiology. 2011;108(3):3B–24B. doi: 10.1016/j.amjcard.2011.03.013 21802577
12. Shaw JE, Zimmet PZ, de Courten M, Dowse GK, Chitson P, Gareeboo H, et al. Impaired fasting glucose or impaired glucose tolerance. What best predicts future diabetes in Mauritius? Diabetes Care. 1999;22(3):399–402. doi: 10.2337/diacare.22.3.399 10097917
13. Unwin N, Shaw J, Zimmet P, Alberti KGMM. Impaired glucose tolerance and impaired fasting glycaemia: the current status on definition and intervention. Diabetic Medicine. 2002;19(9):708–723. doi: 10.1046/j.1464-5491.2002.00835.x 12207806
14. Abdul-Ghani MA, Williams K, DeFronzo RA, Stern M. What Is the Best Predictor of Future Type 2 Diabetes? Diabetes Care. 2007;30(6):1544–1548. doi: 10.2337/dc06-1331 17384342
15. Freeze J, Erraguntla M, Verma A. Data Integration and Predictive Analysis System for Disease Prophylaxis: Incorporating Dengue Fever Forecasts. In: Proceedings of the Hawaii International Conference on System Sciences (HICSS); 2018. p. 1–10.
16. Erraguntla M, Zapletal J, Lawley M. Framework for Infectious Disease Analysis: A comprehensive and integrative multi-modeling approach to disease prediction and management. Health Informatics Journal. 2017; p. 1460458217747112. doi: 10.1177/1460458217747112 29278956
17. Zapletal J, Erraguntla M, Adelman ZN, Myles KM, Lawley MA. Impacts of diurnal temperature and larval density on aquatic development of Aedes aegypti. PLOS ONE. 2018;13(3):e0194025. doi: 10.1371/journal.pone.0194025 29513751
18. Zapletal J, Gupta H, Erraguntla M, Adelman ZN, Myles KM, Lawley MA. Predicting aquatic development and mortality rates of Aedes aegypti. PLOS ONE. 2019;14(5):e0217199. doi: 10.1371/journal.pone.0217199 31112566
19. Maeta K, Nishiyama Y, Fujibayashi K, Gunji T, Sasabe N, Iijima K, et al. Prediction of Glucose Metabolism Disorder Risk Using a Machine Learning Algorithm: Pilot Study. JMIR Diabetes. 2018;3(4):e10212. doi: 10.2196/10212 30478026
20. Barakat N, Bradley AP, Barakat MNH. Intelligible Support Vector Machines for Diagnosis of Diabetes Mellitus. IEEE Transactions on Information Technology in Biomedicine. 2010;14(4):1114–1120. doi: 10.1109/TITB.2009.2039485 20071261
21. Han L, Luo S, Yu J, Pan L, Chen S. Rule Extraction From Support Vector Machines Using Ensemble Learning Approach: An Application for Diagnosis of Diabetes. IEEE Journal of Biomedical and Health Informatics. 2015;19(2):728–734. doi: 10.1109/JBHI.2014.2325615 24860043
22. Stern MP, Williams K, Haffner SM. Identification of persons at high risk for type 2 diabetes mellitus: do we need the oral glucose tolerance test? Annals of Internal Medicine. 2002;136(8):575–581. doi: 10.7326/0003-4819-136-8-200204160-00006 11955025
23. Abdul-Ghani MA, Abdul-Ghani T, Stern MP, Karavic J, Tuomi T, Bo I, et al. Two-Step Approach for the Prediction of Future Type 2 Diabetes Risk. Diabetes Care. 2011;34(9):2108–2112. doi: 10.2337/dc10-2201 21788628
24. Abdul-Ghani MA, Lyssenko V, Tuomi T, DeFronzo RA, Groop L. Fasting versus postload plasma glucose concentration and the risk for future type 2 diabetes: results from the Botnia Study. Diabetes Care. 2009;32(2):281–286. doi: 10.2337/dc08-1264 19017778
25. Ozery-Flato M, Parush N, El-Hay T, Visockienė Ž, Ryliškytė L, Badarienė J, et al. Predictive models for type 2 diabetes onset in middle-aged subjects with the metabolic syndrome. Diabetology & Metabolic Syndrome. 2013;5(1):36. doi: 10.1186/1758-5996-5-36
26. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research. 2002;16:321–357. doi: 10.1613/jair.953
27. Domingos P. MetaCost: A General Method for Making Classifiers Cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD’99. New York, NY, USA: ACM; 1999. p. 155–164.
28. Kubat M, Matwin S, et al. Addressing the curse of imbalanced training sets: one-sided selection. In: ICML. vol. 97. Nashville, USA; 1997. p. 179–186.
29. Tang Y, Zhang Y, Chawla NV, Krasser S. SVMs Modeling for Highly Imbalanced Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 2009;39(1):281–288. doi: 10.1109/TSMCB.2008.2002909
30. Burke JP, Williams K, Gaskill SP, Hazuda HP, Haffner SM, Stern MP. Rapid Rise in the Incidence of Type 2 Diabetes From 1987 to 1996: Results From the San Antonio Heart Study. Archives of Internal Medicine. 1999;159(13):1450. doi: 10.1001/archinte.159.13.1450
31. Lorenzo C, Williams K, Hunt KJ, Haffner SM. Trend in the Prevalence of the Metabolic Syndrome and Its Impact on Cardiovascular Disease Incidence: The San Antonio Heart Study. Diabetes Care. 2006;29(3):625–630. doi: 10.2337/diacare.29.03.06.dc05-1755 16505517
32. Vapnik VN. The nature of statistical learning theory. 2nd ed. Statistics for engineering and information science. New York: Springer; 2000.
33. Vapnik VN, Chervonenkis AY. On the uniform convergence of relative frequencies of events to their probabilities. In: Measures of complexity. Springer; 2015. p. 11–30.
34. Friedman J, Hastie T, Tibshirani R. The elements of statistical learning. Springer Series in Statistics. Springer New York; 2001.
35. Seino Y, Ikeda M, Yawata M, Imura H. The insulinogenic index in secondary diabetes. Hormone and Metabolic Research. 1975;7(02):107–115. doi: 10.1055/s-0028-1093759
36. Matsuda M, DeFronzo RA. Insulin sensitivity indices obtained from oral glucose tolerance testing: comparison with the euglycemic insulin clamp. Diabetes Care. 1999;22(9):1462–1470. doi: 10.2337/diacare.22.9.1462 10480510
37. Matthews D, Hosker J, Rudenski A, Naylor B, Treacher D, Turner R. Homeostasis model assessment: insulin resistance and β-cell function from fasting plasma glucose and insulin concentrations in man. Diabetologia. 1985;28(7):412–419. doi: 10.1007/bf00280883 3899825
38. Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005; p. 1226–1238. doi: 10.1109/TPAMI.2005.159 16119262
39. Ross BC. Mutual Information between Discrete and Continuous Data Sets. PLOS ONE. 2014;9(2):1–5. doi: 10.1371/journal.pone.0087357
Článok vyšiel v časopise
PLOS One
2019 Číslo 12
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
- Nejasný stín na plicích – kazuistika
- Masturbační chování žen v ČR − dotazníková studie
- Profylaxe infekční endokarditidy ve stomatologii
- Fixní kombinace paracetamol/kodein nabízí synergické analgetické účinky
Najčítanejšie v tomto čísle
- Methylsulfonylmethane increases osteogenesis and regulates the mineralization of the matrix by transglutaminase 2 in SHED cells
- Oregano powder reduces Streptococcus and increases SCFA concentration in a mixed bacterial culture assay
- The characteristic of patulous eustachian tube patients diagnosed by the JOS diagnostic criteria
- Parametric CAD modeling for open source scientific hardware: Comparing OpenSCAD and FreeCAD Python scripts