METHODS: Hierarchical and linked provider details from 3 million infectious disease laboratory records were extracted from the Massachusetts MAVEN EDSS and cleaned with R and OpenRefine algorithms to condense free-text variation and produce unique provider names. Web service requests to the National Provider Index (NPI) API helped validate and extend provider information forming a catalog. A geocoding process further enriched the catalog. A directed and weighted network graph facilitated the detection of communities and clusters within the network of providers. Grouping records by date range allows for the visualization of change in network topology and provider attributes over time.
RESULTS: Open-source tools and techniques refined and reconciled an overwhelming set of provider records resulting in a structured, tidy, validated and enhanced dataset. Experience gained through the use of data cleaning and analytic/visualization tools built transferrable skills with wide application. Exploration of the lab record provider network embedded in our system enabled detection of patterns and connections invisible to traditional analysis.
CONCLUSIONS: : Classic analytic methods focus on observations as independent objects (aggregated for testing and analysis) ignoring the rich connections embedded in the data. Revealing communities and clusters of lab providers (and how they change over time) within our infectious disease surveillance system enabled the generation of new hypotheses and research initiatives. Feedback from other jurisdictions and researchers will help expand this methodology and its use.