189 Oak Ridge Bio-Surveillance Toolkit (ORBiT): Integrating Heterogeneous Electronic Health “Big” Data for Public Health Disease Surveillance and Dynamics

Tuesday, June 24, 2014: 10:00 AM-10:30 AM
East Exhibit Hall, Nashville Convention Center
Laura Pullum , Oak Ridge National Laboratory, Oak Ridge, TN
Arvind Ramanathan , Oak Ridge National Laboratory, Oak Ridge, TN

BACKGROUND:   Our objective is to develop novel data analytic tools that can aid the bio-surveillance community in (1) integrating heterogeneous (e.g., structured and unstructured) datasets, (2) enabling automatic analysis of these datasets with multi-scale techniques and (3) providing timely alerts to aid public-health makers with essential information that can lead to better decisions and public health practice in general. 

METHODS:   We present an overview of our data analytic platform, namely the Oak Ridge Bio-surveillance Toolkit (ORBiT), which was specifically designed to address these requirements. ORBiT’s analytic components consists of a powerful NLP (natural language processing) toolkit that can effectively build a statistically relevant vocabulary or bag-of-words model to process text-related data-streams. Additionally, it includes powerful machine learning tools that use higher-order statistical techniques to track/tag events of interest using multi-scale temporal windows that can be specified by the analyst/end-user. Statistical feature-sets extracted from the filtered data allow one to quickly identify a baseline and tag events as outliers from these baselines. In order to track correlations across multiple data-streams and make predictions, we include several linear, non-linear and hybrid statistical inference techniques that achieve good performance in terms of an applied loss function within ORBiT. The visual analytic front-end of ORBiT allows the analysts (or end-users) to interact with and provide feedback to the data analytics components in the toolkit. 

RESULTS:   The results of comparing zip-code level electronic health data correlated very well with CDC data and was obtained more efficiently.  Additionally, we were able to discover spatial-temporal patterns within the 2009-2010 influenza season that were not apparent in other data sets.  Using diverse electronic health datasets obtained from public health departments as well as private entities, we show how ORBiT can provide valuable insights in the context of three application areas: (1) influenza-like illnesses (ILI) surveillance, (2) chronic disease monitoring, and (3) prescription drug overuse and abuse.

CONCLUSIONS:  ORBiT serves as a novel platform for the biosurveillance community and provides insights into local to nation-wide patterns in various disease settings.