Discovering novel disease comorbidities using electronic medical records

Autoři: Shikha Chaganti aff001;  Valerie F. Welty aff002;  Warren Taylor aff003;  Kimberly Albert aff003;  Michelle D. Failla aff003;  Carissa Cascio aff003;  Seth Smith aff004;  Louise Mawn aff005;  Susan M. Resnick aff006;  Lori L. Beason-Held aff006;  Francesca Bagnato aff007;  Thomas Lasko aff008;  Jeffrey D. Blume aff002;  Bennett A. Landman aff001
Působiště autorů: Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, United States of America aff001;  Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, United States of America aff002;  Department of Psychiatry & Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America aff003;  Department of Radiology and Radiological Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America aff004;  Department of Ophthalmology and Visual Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America aff005;  Laboratory of Behavioral Neuroscience, National Institute on Aging, Baltimore, Maryland, United States of America aff006;  Department of Neurology, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America aff007;  Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America aff008
Vyšlo v časopise: PLoS ONE 14(11)
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pone.0225495


Increasing reliance on electronic medical records at large medical centers provides unique opportunities to perform population level analyses exploring disease progression and etiology. The massive accumulation of diagnostic, procedure, and laboratory codes in one place has enabled the exploration of co-occurring conditions, their risk factors, and potential prognostic factors. While most of the readily identifiable associations in medical records are (now) well known to the scientific community, there is no doubt many more relationships are still to be uncovered in EMR data. In this paper, we introduce a novel finding index to help with that task. This new index uses data mined from real-time PubMed abstracts to indicate the extent to which empirically discovered associations are already known (i.e., present in the scientific literature). Our methods leverage second-generation p-values, which better identify associations that are truly clinically meaningful. We illustrate our new method with three examples: Autism Spectrum Disorder, Alzheimer’s Disease, and Optic Neuritis. Our results demonstrate wide utility for identifying new associations in EMR data that have the highest priority among the complex web of correlations and causalities. Data scientists and clinicians can work together more effectively to discover novel associations that are both empirically reliable and clinically understudied.

Klíčová slova:

Probability distribution – Alzheimer's disease – Database searching – Vision – Multiple sclerosis – Autism spectrum disorder – Alzheimer's disease diagnosis and management – Electronic medical records


