Predicting the occurrence of surgical site infections using text mining and machine learning

English version

Autoři: Daniel A. da Silva ^aff001; Carla S. ten Caten ^aff001; Rodrigo P. dos Santos ^aff002; Flavio S. Fogliatto ^aff001; Juliana Hsuan ^aff003
Působiště autorů: Industrial Engineering Department, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil ^aff001; Hospital de Clinicas de Porto Alegre, Porto Alegre, Brazil ^aff002; Copenhagen Business School, Copenhagen, Denmark ^aff003
Vyšlo v časopise: PLoS ONE 14(12)
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pone.0226272

Souhrn

In this study we propose the use of text mining and machine learning methods to predict and detect Surgical Site Infections (SSIs) using textual descriptions of surgeries and post-operative patients’ records, mined from the database of a high complexity University hospital. SSIs are among the most common adverse events experienced by hospitalized patients; preventing such events is fundamental to ensure patients’ safety. Knowledge on SSI occurrence rates may also be useful in preventing future episodes. We analyzed 15,479 surgery descriptions and post-operative records testing different preprocessing strategies and the following machine learning algorithms: Linear SVC, Logistic Regression, Multinomial Naive Bayes, Nearest Centroid, Random Forest, Stochastic Gradient Descent, and Support Vector Classification (SVC). For prediction purposes, the best result was obtained using the Stochastic Gradient Descent method (79.7% ROC-AUC); for detection, Logistic Regression yielded the best performance (80.6% ROC-AUC).

Klíčová slova:

Algorithms – Adverse events – Surgical and invasive medical procedures – Machine learning algorithms – Machine learning – Preprocessing – Text mining – Data mining

Zdroje

1. Anvisa/Brasil. Infection Diagnostic Criteria Related to Healthcare. In: Agência Nacional de Vigilância Sanitária [Internet]. 2017. p. 13–88. Available from: www.anvisa.gov.br

2. Wachter RM. Understanding Patient Safety. In: AMGH Editora. 2013. 479 p.

3. Stone PW, Kunches L, Hirschhorn L. Cost of hospital-associated infections in Massachusetts. Am J Infect Control. 2009;37(3):210–214. doi: 10.1016/j.ajic.2008.07.011 19111366

4. Bouzbid S, Gicquel Q, Gerbier S, Chomarat M, Pradat E, Fabry J, et al. Automated detection of nosocomial infections: Evaluation of different strategies in an intensive care unit 2000–2006. J Hosp Infect [Internet]. 2011;79(1):38–43. Available from: http://ovidsp.ovid.com/ovidweb.cgi?T=JS&PAGE=reference&D=emed10&NEWS=N&AN=2011437473 doi: 10.1016/j.jhin.2011.05.006 21742413

5. Michelson JD, Pariseau JS, Paganelli WC. Assessing surgical site infection risk factors using electronic medical records and text mining. Am J Infect Control [Internet]. 2014;42(3):333–336. Available from: http://www.scopus.com/inward/record.url?eid=2-s2.0-84896735530&partnerID=40&md5=dad6f7d7e12724529376cb7afb4d9edc doi: 10.1016/j.ajic.2013.09.007 24406258

6. Campillo-Gimenez B, Garcelon N, Jarno P, Chapplain JM, Cuggia M. Full-text automated detection of surgical site infections secondary to neurosurgery in Rennes, France. In: Studies in Health Technology and Informatics. 2013. p. 572–575. 23920620

7. Daltoé T, Breier A, dos Santos HB, Wagner MB, Kuchenbecker R de S. Hospital Infection Control Services: Characteristics, Dimensioning and Related Activities. Rev Soc Bras Clin Med. 2014;12(1):35–45.

8. Haley RW, Culver DH, White JW, Morgan WM, Emori TG, Munn VP, et al. The Efficacy of Infection Surveillance and Control Programs in Preventing Nosocomial Infections in Us Hospitals. Am J Epidemiol [Internet]. 1985;121(2):182–205. Available from: http://aje.oxfordjournals.org/content/121/2/182 doi: 10.1093/oxfordjournals.aje.a113990 4014115

9. Brown KL, Ridout DA, Shaw M, Dodkins I, Smith LC, O’Callaghan MA, et al. Healthcare-associated infection in pediatric patients on extracorporeal life support: The role of multidisciplinary surveillance. Pediatr Crit Care Med [Internet]. 2006;7(6):546–550. Available from: http://www.embase.com/search/results?subaction=viewrecord&from=export&id=L44715293%0Ahttp://dx.doi.org/10.1097/01.PCC.0000243748.74264.CE%0Ahttp://sfx.library.uu.nl/utrecht?sid=EMBASE&issn=15297535&id=doi:10.1097%2F01.PCC.0000243748.74264.CE&atitle=Health doi: 10.1097/01.PCC.0000243748.74264.CE 17006389

10. Curran ET, Coia JE, Gilmour H, McNamee S, Hood J. Multi-centre research surveillance project to reduce infections/phlebitis associated with peripheral vascular catheters. J Hosp Infect. 2000;46(3):194–202. doi: 10.1053/jhin.2000.0831 11073728

11. Friedman C, Elhadad N. Natural Language Processing in Health Care and Biomedicine. In: Shortliffe EH, Cimino JJ, editors. Biomedical Informatics: Computer Applications in Health Care and Biomedicine [Internet]. London: Springer London; 2014. p. 255–284. Available from: https://doi.org/10.1007/978-1-4471-4474-8_8

12. Freeman R, Moore LSP, García Álvarez L, Charlett A, Holmes A. Advances in electronic surveillance for healthcare-associated infections in the 21st Century: A systematic review. J Hosp Infect. 2013;84(2):106–119. doi: 10.1016/j.jhin.2012.11.031 23648216

13. Aramaki E, Miura Y, Tonoike M, Ohkuma T, Masuichi H, Waki K, et al. Extraction of adverse drug effects from clinical records. Stud Health Technol Inform [Internet]. 2010;160(Parte 1):739–743. Available from: http://ovidsp.ovid.com/ovidweb.cgi?T=JS&CSC=Y&NEWS=N&PAGE=fulltext&D=emed9&AN=20841784%5Cnhttp://sfxhosted.exlibrisgroup.com/cmc?sid=OVID:embase&id=pmid:20841784&id=doi:10.3233/978-1-60750-588-4-739&issn=0926-9630&isbn=9781607505877&volume=160&issue=PART+

14. Bian J, Topaloglu U, Yu F. Towards large-scale twitter mining for drug-related adverse events. In: Proceedings of the 2012 international workshop on Smart health and wellbeing—SHB ‘12 [Internet]. 2012. p. 25. Available from: http://dl.acm.org/citation.cfm?doid=2389707.2389713

15. Silva A, Cortez P, Santos MF, Gomes L, Neves J. Rating organ failure via adverse events using data mining in the intensive care unit. Artif Intell Med. 2008;43(0933–3657 Print):179–193.

16. Tan AH. Text Mining: The state of the art and the challenges. In: Proceedings of the PAKDD Workshop on Knowledge discovery from Advanced Databases. 1999. p. 71–76.

17. Han J, Kamber M. Data Mining: Concepts and Techniques. 2a ed. Soft Computing. 2006. 800 p.

18. Zafarani R, Abbasi MA, Liu H. Social media mining: An introduction. Cambridge University Press; 2014. 320 p.

19. Taylor RA, Moore CL, Cheung KH, Brandt C. Predicting urinary tract infections in the emergency department with machine learning. PLoS One. 2018;13(3).

20. Bartz-Kurycki MA, Green C, Anderson KT, Alder AC, Bucher BT, Cina RA, et al. Enhanced neonatal surgical site infection prediction model utilizing statistically and clinically significant variables in combination with a machine learning algorithm. Am J Surg. 2018;216(4):764–777. doi: 10.1016/j.amjsurg.2018.07.041 30078669

21. Wang Z, Shah AD, Tate AR, Denaxas S, Shawe-Taylor J, Hemingway H. Extracting diagnoses and investigation results from unstructured text in electronic health records by semi-supervised machine learning. PLoS One. 2012;7(1).

22. Zhang Y, Liu Z, Zhou W. Event recognition based on deep learning in Chinese texts. PLoS One. 2016;11(8).

23. Lucini FR, Fogliatto FS, da Silveira GJC, Neyeloff JL, Anzanello MJ, Kuchenbecker RDS, et al. Text mining approach to predict hospital admissions using early medical records from the emergency department. Int J Med Inform [Internet]. 2017;100 : 1–8. Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85009230698&doi=10.1016%2Fj.ijmedinf.2017.01.001&partnerID=40&md5=19fe17844ba5835727064ff45f39f13b doi: 10.1016/j.ijmedinf.2017.01.001 28241931

24. PostgreSQL. PostgreSQL [Internet]. http://www.postgresql.org/. 2017. Available from: http://www.postgresql.org/

25. Python Software Foundation. Comparing Python to Other Languages. https://www.python.org/. 2017.

26. Bird S, Klein E, Loper E. Natural Language Processing with Python. O’Reilly Media, Inc. 2009;43 : 479.

27. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res [Internet]. 2011;12(112–113):2825–2830. Available from: http://chrhc.revues.org/2190%5Cnhttp://chrhc.revues.org/pdf/2190

28. Feldmann R, Sanger J. The text mining handbook: Advanced approaches in analyzing unstructured data. New York: Cambridge Press; 2006. 257–300 p.

29. Perkins J. Python 3 Text Processing With NLTK 3 Cookbook [Internet]. Packt Publishing. Birmingham, UK; 2014. 304 p. Available from: http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=9781782167853

30. Sebastiani F. Machine learning in automated text categorization. ACM Comput Surv. 2002;34(1):1–47.

31. Guyon I, Gunn S, Nikravesh M, Zadeh L. Feature Extraction, Foundations and Applications. Springer. New York; 2008. 778 p.

32. Sklearn. No Title. http://scikit-learn.org/stable/modules/feature_extraction.html. 2019.

33. Yi B-K, Faloutsos C. Fast time sequence indexing for arbitrary Lp norms. In: Proceedings of the 26st International Conference on VLDB. 2000. p. 385–394.

34. Witten, Frank, Hall. Data Mining: Practical Machine Learning Tools and Techniques (Google eBook). Complementary literature None. 2011. 664 p.

35. Ross M, Truong K, Lin K, Kumar A, Conway M. Text Categorization of Heart, Lung, and Blood Studies in the Database of Genotypes and Phenotypes (dbGaP) Utilizing n-grams and Metadata Features. Biomed Inform Insights [Internet]. 2013;6 : 35. Available from: http://www.la-press.com/text-categorization-of-heart-lung-and-blood-studies-in-the-database-of-article-a3785 doi: 10.4137/BII.S11987 23926434

36. Yang M, Kiang M, Shang W. Filtering big data from social media—Building an early warning system for adverse drug reactions. J Biomed Inform. 2015;54 : 230–240. doi: 10.1016/j.jbi.2015.01.011 25688695

37. McCart JA, Berndt DJ, Jarman J, Finch DK, Luther SL. Finding falls in ambulatory care clinical documents using statistical text mining. J Am Med Informatics Assoc. 2013;20(5):906–914.

38. Botsis T, Nguyen MD, Woo EJ, Markatou M, Ball R. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection. J Am Med Inform Assoc. 2011;18(5):631–638. doi: 10.1136/amiajnl-2010-000022 21709163

39. de Bruijn B, Cherry C, Kiritchenko S, Martin J, Zhu X. Machine-learned solutions for three stages of clinical information extraction: The state of the art at i2b2 2010. J Am Med Informatics Assoc. 2011;18(5):557–562.

40. Chee BW, Berlin R, Schatz B. Predicting adverse drug events from personal health messages. AMIA Annu Symp Proc [Internet]. 2011;2011 : 217–226. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3243174&tool=pmcentrez&rendertype=abstract 22195073

41. Hur J, Özgür A, Xiang Z, He Y. Identification of fever and vaccine-associated gene interaction networks using ontology-based literature mining. J Biomed Semantics. 2012;3(1).

42. Genkin A, Lewis DD, Madigan D. Large-scale bayesian logistic regression for text categorization. Technometrics. 2007;49(3):291–304.

43. Ong MS, Magrabi F, Coiera E. Automated identification of extreme-risk events in clinical incident reports. J Am Med Informatics Assoc. 2012;19(E1).

44. Ramesh BP, Belknap SM, Li Z, Frid N, West DP, Yu H. Automatically recognizing medication and adverse event information from food and drug administration’s adverse event reporting system narratives. J Med Internet Res. 2014;16(6).

45. Rochefort CM, Verma AD, Eguale T, Lee TC, Buckeridge DL. A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data. J Am Med Informatics Assoc. 2015;22(1):155–165.

46. Gurulingappa H, Rajput AM, Roberts A, Fluck J, Hofmann-Apitius M, Toldo L. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. J Biomed Inform. 2012;45(5):885–892. doi: 10.1016/j.jbi.2012.04.008 22554702

47. Sklearn. No Title. http://scikit-learn.org/stable/modules/cross_validation.html. 2019.

48. Sklearn. No Title [Internet]. http://scikit-learn.org/stable/modules/grid_search.html. 2019 [cited 2017 Feb 4]. Available from: http://scikit-learn.org/stable/modules/grid_search.html

49. Bergstra J, Bengio Y. Random Search for Hyper-Parameter Optimization. J Mach Learn Res. 2012;13 : 281–305.

50. Dong Y, Li X, Li J, Zhao H. Analysis on weighted AUC for imbalanced data learning through isometrics. J Comput Inf Syst [Internet]. 2012;1(January):371–8. Available from: http://www.jofcis.com/publishedpapers/2012_8_1_371_378.pdf

51. Freifeld CC, Brownstein JS, Menone CM, Bao W, Filice R, Kass-Hout T, et al. Digital drug safety surveillance: Monitoring pharmaceutical products in Twitter. Drug Saf. 2014;37(5):343–350. doi: 10.1007/s40264-014-0155-x 24777653

52. Mangram AJ, Horan TC, Pearson ML, Silver LC, Jarvis WR. Guideline for Prevention of Surgical Site Infection, 1999. Hospital Infection Control Practices Advisory Committee. Infect Control Hosp Epidemiol [Internet]. 1999;20(04):250–78; quiz 279–80. Available from: https://www.cambridge.org/core/product/identifier/S0195941700070223/type/journal_article

53. Michelson J. Improved detection of orthopaedic surgical site infections occurring in outpatients. Clin Orthop Relat Res. 2005;(433):218–224. doi: 10.1097/01.blo.0000150666.06175.6b 15805961

Predicting the occurrence of surgical site infections using text mining and machine learning

Souhrn

Klíčová slova:

Zdroje

PLOS One