Infectious Disease Reporting Provider Catalog and Visualization Project

Wednesday, June 22, 2016: 11:14 AM
Tubughnenq' 6 / Boardroom, Dena'ina Convention Center
Evan Caten , Massachusetts Department of Public Health, Jamaica Plain, MA
BACKGROUND:  Healthcare providers not only diagnose and report notifiable diseases, but are public health partners in prevention and control.  Tools that facilitate identification of provider patterns could facilitate rapid communication between public health and relevant providers, and highlight gaps in communication and collection systems. We demonstrate a series of open-source tools and methods aimed at distilling large sets of provider records into uniform and indexed collections.  Exploratory analytic techniques are applied to expedite pattern recognition and performance evaluation.

METHODS:  Hierarchical and linked provider details from 3 million infectious disease laboratory records were extracted from the Massachusetts MAVEN EDSS and cleaned with R and OpenRefine algorithms to condense free-text variation and produce unique provider names.  Web service requests to the National Provider Index (NPI) API helped validate and extend provider information forming a catalog.  A geocoding process further enriched the catalog.  A directed and weighted network graph facilitated the detection of communities and clusters within the network of providers.  Grouping records by date range allows for the visualization of change in network topology and provider attributes over time.

RESULTS:  Open-source tools and techniques refined and reconciled an overwhelming set of provider records resulting in a structured, tidy, validated and enhanced dataset.  Experience gained through the use of data cleaning and analytic/visualization tools built transferrable skills with wide application.  Exploration of the lab record provider network embedded in our system enabled detection of patterns and connections invisible to traditional analysis.

CONCLUSIONS:  : Classic analytic methods focus on observations as independent objects (aggregated for testing and analysis) ignoring the rich connections embedded in the data.  Revealing communities and clusters of lab providers (and how they change over time) within our infectious disease surveillance system enabled the generation of new hypotheses and research initiatives.  Feedback from other jurisdictions and researchers will help expand this methodology and its use.