METHODS: Disease-specific case files for 35 reportable diseases were generated from the surveillance database (Maven), including census tract, case count, and onset date. The prospective space-time permutation scan statistic (SaTScan) was used to simultaneously evaluate millions of potential cluster location and time period combinations, adjusting for multiple testing and purely spatial and purely temporal disease reporting patterns. For most diseases, a 365-day study period, a maximum temporal length of 30 days, and a maximum geographical size of 50% of reported cases were used. The number of cases observed versus expected inside and outside each space-time cylinder were compared by using a likelihood ratio-based test statistic. Monte Carlo hypothesis testing was used to assign a recurrence interval (RI) to each cluster; the higher the RI, the less likely a given cluster would be expected to occur by chance alone. A SAS program generated SaTScan case and parameter files, read in a coordinate file of census tract centroids, invoked SaTScan in batch mode, read analysis results into SAS for processing, and generated notification e-mails and summary reports for clusters with RI ≥100 days. We monitored signal frequency and tracked instances of true outbreak detection.
RESULTS: During February 2014–September 2015, for 15 of 35 diseases, 28 unique signals were observed per 365 days of surveillance. This system generated the first signals for the second largest U.S. outbreak of community-acquired legionellosis (South Bronx, July 2015), another legionellosis outbreak in Queens, and shigellosis and campylobacteriosis outbreaks in Brooklyn. Some detected clusters did not warrant intervention, including a giardiasis cluster associated with international travel and Shiga toxin-producing E. coli clusters attributable to reports from one facility using a new diagnostic test but not meeting the case definition.
CONCLUSIONS: Several notable outbreaks were first detected by this system, and the signal frequency was manageable. Keys to success included a strong informatics infrastructure, especially electronic laboratory reporting and automated geocoding of case addresses; a powerful statistical disease surveillance methodology; knowledgeable epidemiologists interpreting signals; modification of case input and parameter files following false and delayed signals; and adequate outbreak investigation resources. Challenges included maintaining up-to-date, cleaned surveillance data and troubleshooting information technology problems.