Maximizing Data Dissemination While Minimizing Suppression: Aggregation and Stratification in Web-Based Data Query Systems

Wednesday, June 12, 2013: 7:15 AM
209 (Pasadena Convention Center)
Steven C Macdonald , Washington State Department of Health, Olympia, WA
BACKGROUND:  Public-access web-based data query systems (WbDQS) must balance the priorities of protecting privacy and satisfying data users. With small sub-populations and small numbers, aggregation can yield larger numbers, but stratification is needed to focus analysis. CSTE and CDC jointly developed the Data Release Guidelines and Procedures for Re-release of State-Provided Data in 2005. The CDC Tracking program wrote their Data Re-Release Planin 2008, including standard suppression rules. Standards for use of aggregation to minimize suppression have not been developed. METHODS:  Use of aggregation and stratification was assessed in the following WbDQS parameters: Geographic, Temporal, Age group, Gender, Race/Ethnicity, Topic-specific. Alternative schemes for aggregation of counties into multi-county regions (MCR) were assessed; criteria for selection of optimal MCR were analyzed. Options for static parameter control within a parameter, and dynamic parameter control between parameters (adaptive stratification) were explored. Thresholds for parameter control were examined. RESULTS:  Minimum and maximum aggregation schemes for each parameter were established. Criteria for MCR selection were adopted: minimum size to avoid suppression, contiguity, homogeneity in region size, heterogeneity between regions, homogeneity within regions, and face validity. Static parameter control was determined to include either selective blocking of certain strata within a parameter (e.g., aggregate into multi-year data, omit annual data), or complete parameter exclusion (for which the criterion is program/planning utility). Spatial and temporal thresholds for static parameter control were adopted (e.g., if <200 cases/year, then only multi-county regions available; if <400 cases/year, then only 5-year rollup available). Dynamic parameter control, adaptive stratification dependent (conditional) on interactive query choices, was defined by the number of parameters allowed to be simultaneously disaggregated, ranging from one stratification parameter at a time to five or more stratification parameters at a time. Thresholds for adaptive stratification were adopted (e.g., if rate is less than approximately 2 per 100,000, then only one parameter is allowed to be stratified at a time; if rate is approximately 10 or greater but less than 20 per 100,000, then three parameters are allowed to be stratified at a time; if rate is approximately 100 or greater but less than 2000 per 100,000, then five parameters are allowed to be stratified at a time). Feedback from data stewards was uniformly positive. CONCLUSIONS:  This project developed and adopted data standards for use of aggregation to minimize suppression. Allowing WbDQS users to select which parameter they want for stratification allows flexibility which can help focus data analysis.