Key Objectives:
- Explore some of the epidemiological methodologies conducted at the NYC DOHMH to accommodate the influx of novel big data sources
- Learn about a unique big data querying system that represents a prototype for pulling EHR data across many ambulatory care practices. This explanation will also demonstrate how to blend unique data sources examining similar populations and to treat aggregate data clustered at the practice- and provider-levels with the appropriate statistical methodologies.
- Consider the limitations and opportunities in analyzing big data that is clustered and aggregated.
- Examine the implications of blending distinct types of observational data and aggregate data analysis.
Brief Summary:
The New York City Department of Health and Mental Hygiene (NYC DOHMH)’s Primary Care Information Project (PCIP) helps NYC practices to adopt and meaningfully use electronic health records (EHRs). This collaboration has led to the development of a unique population data tool called the Hub Population Health System which allows practice- and provider-aggregated, de-identified querying of EHR data from up to 2,031,133 million patients seen by over 700 practices and 3,000 providers. Availability of EHR data offers opportunities to answer research questions not readily observable with other types of data traditionally collected by local health departments (e.g., survey data). Due to the aggregate nature of this data at the practice- and provider-levels, longitudinal analysis cannot be conducted using traditional logistic tests. Instead, procedures that can accommodate the clustered nature of the data are needed. This roundtable will focus on potential epidemiological uses of big data generated from surveys and EHRs and review methodologies to analyze and link EHR data to create a broader picture of health locally. The discussion will examine epidemiological techniques, including the differences in analyzing aggregated data across various generalized estimating equations, resources for identifying supplemental big data, methods for data linkage, the appropriateness of Quasilikelihood Information criterion in model selection, the changes in data results introduced by similar correlational structure models, and other lessons learned. The roundtable will explore questions of what types of methods are appropriate for analyzing and linking big data and discuss the value and potential for creating alternative epidemiological study designs – e.g., the transformation of observational studies into a “mock” clinical trial designs. This roundtable will provide a forum to advance analytical needs for treating/using big data, in all of its forms, while carefully considering the correct methodology for the data and inferences made from the results.