Validating Electronic Health Record Data: A Comparison with BRFSS Chronic Disease Prevalence Estimates

McCormick, Emily

BACKGROUND: Community level intervention efforts to control common chronic diseases (e.g., diabetes and hypertension) need robust, granular population health measurement tools to monitor impact. Existing population-based surveys are rarely able to provide adequate sub-county measurement. Electronic health records (EHRs) hold a wealth of individual-level health data that when aggregated may complement traditional population health surveys. Emerging EHR data networks facilitate access to data useful for population health monitoring. The accuracy and representativeness of community level EHR-based population health estimates have not been well established. Before drawing conclusions about intervention impact, methods are needed to identify and address potential biases from studying populations included in EHR information compared to the general population.

METHODS: EHR data were aggregated for all 2013 Denver resident, adult (>18 years) outpatient visits from two large health systems serving Denver County. To assess ‘measured’ prevalence, individuals with one or more ICD-9 diagnostic codes for diabetes (250.00-250.93) and hypertension (401.0-401.9) were identified as a prevalent case. Prevalence estimates were calculated and stratified by demographic groups. Percent of population covered and weighted prevalence estimates were calculated by age, gender, and race using American Community Survey data. Unweighted and weighted EHR ‘survey’ prevalence estimates were compared to questions from 2013 Denver County Behavioral Risk Factor Surveillance System.

RESULTS: In 2013, 124,964 adult Denver residents with at least one outpatient visit were assessed (25.6% population covered). County-wide hypertension prevalence estimates were similar for both data sources (23.9%-‘measured’ vs 24.7%-‘survey’ data) but confidence intervals were narrower with ‘measured’ estimates (23.6-24.1%) compared to ‘survey’ (22.0-27.4%). Diabetes prevalence estimates differed by 4.3% between sources (11.6%-‘measured’ vs 7.1%-‘survey’). EHR-based diabetes prevalence estimates, when weighted by age (9.9%) and age*gender (10.2%) were closer to survey estimates. EHR-based diabetes prevalence varied by 8.6% between racial ethnic groups (7.8% white, 15.1% black, 16.4% Hispanic) while survey-based diabetes prevalence varied by 4.8% (4.8% white, 9.6% black, 9.6% Hispanic).

CONCLUSIONS: EHR-data produce valid population health measurements and increased precision in prevalence estimates make EHRs good candidates for temporal and subgroup analyses. Racial ethnic prevalence estimates identified disparities and similar trends in both sources. EHR data had higher rates, which may be useful for targeting higher prevalent populations. Weighting brought the ‘measured’ estimate closer to ‘survey’ values, which may address concerns of differences in population represented. The granular statistical power (i.e., sub-county and race/ethnicity analyses) of EHR prevalence estimates should be considered as a timely feedback loop for interventions.

Validating Electronic Health Record Data: A Comparison with BRFSS Chronic Disease Prevalence Estimates

Navigation