Metrics for Assessing Data Quality: Beyond “Completeness”

Monday, June 15, 2015: 11:36 AM
Liberty B/C, Sheraton Hotel
Rachelle Boulton , Utah Department of Health, Salt Lake City, UT
Susan L Mottice , Utah Department of Health, Salt Lake City, UT

BACKGROUND:   Data quality assessment is an evaluation process that determines if the collected data are good enough for their intended purpose. Cursory data quality assessments often focus primarily on completeness – is there data in each field. However, not all data quality problems are manifested through data completeness. Comprehensive data quality assessments that utilize a number of different metrics are necessary to confidently determine the true quality collected data.

METHODS:  

We conducted a comprehensive review of the Utah Department of Health’s communicable disease surveillance process and identified components of data collection, data entry, and the database structure that are susceptible to data quality errors. Based on this review, we developed six metrics that should be considered for a comprehensive evaluation of data quality.

RESULTS:   Three of the identified metrics assess the quality of the data collection process. Completeness focuses on having data in each field; it is an indicator of the availability of the data that the system intends to collect. Timeliness focuses on when the data is available in relation to when the data is needed. Data source focuses on where the data comes from, and the reliability and accuracy of that source. Accuracy is concerned with the data entry process, and focuses on how well the data in the surveillance system was transcribed from the original sources. Assessing accuracy requires external validation of the data and is the most time-consuming metric to assess. The quality of the database structure is assessed through two additional metrics. Validity measures whether the data values conform to pre-determined requirements. The database structure should be sophisticated enough to flag outliers using programmed validity checks to maintain high data quality. Precisionfocuses on whether questions are ambiguous leading to values that do not have the intended meaning. Precise questions involve no interpretation on the part of the individual collecting and/or entering the data. 

CONCLUSIONS:   Comprehensive data quality assessments are necessary to ensure that data being used for public health surveillance purposes are collected, entered, stored, and analyzed appropriately. Exhaustive data quality assessments are particularly useful for a public health system already stressed for resources, because they allow for a refinement of the surveillance process and assist in program evaluation efforts by identifying gaps in data collection, data entry, and database structure. The six metrics proposed in this abstract will assist public health departments in maintaining high data quality across the broad spectrum of the surveillance process.