BACKGROUND: Florida’s reportable disease surveillance system, Merlin, has over 500 state and local level staff entering, updating and reviewing surveillance data. Maintaining data quality is paramount and supported by system automation in addition to other approaches such as training or manual state-level review. Several system automated approaches have been implemented, including 1) case save stoppers that prevent invalid data from being saved; 2) case report stoppers that prevent cases from being reported with missing, invalid, or inconsistent data; 3) improbability algorithm checks that compare multiple fields to identify unlikely scenarios; 4) system-facilitated state-level review of individual cases for most diseases; and 5) systematic end-of-year data cleaning. This analysis reviews the change in reportable disease case data quality over time.
METHODS: Merlin case save stoppers, case report stoppers, and improbability algorithm quality checks have been implemented in phases. Reportable disease case data from 2010 to 2014 were reviewed for 14 data scenarios used to indicate data quality. The analysis focused on the improbability algorithm checks, as these are the hardest to identify through manual review and therefore benefit significantly from automated data quality checks. Examples of scenarios include a) cases marked as both a day care attendee and a health care worker or food handler, b) cases with no epidemiologic-link recorded that are noted to be secondary transmission, and c) cases with an onset date but no symptoms recorded.
RESULTS: A total of 246,926 cases were reported from 2010 to 2014. In 2010, 45,413 data scenario violations were identified in 18,355 cases (38% of reported cases), compared to 8,846 data scenario violations in 4,407 cases in 2014 (8% of reported cases). This represents an overall decrease of 80% violations and a 75% decrease in cases with violations.
CONCLUSIONS: The addition of system driven automated quality checks in Merlin improved data quality, decreased time spent on state-level case review, and reduced local disease investigator time spent updating cases. Performing data quality checks early (e.g., at time of case save or report) ensures accurate data are available for immediate decision-making. Leveraging system automation is particularly important during rapidly evolving events when there is little staff time for reviewing data quality. The algorithms put in place are transferrable and could be used by others operating notifiable disease surveillance systems. Harnessing the power of system automation to improve accuracy and consistency of reportable disease surveillance data is efficient and effective and should be considered a best practice.