BACKGROUND: Multidimensional data arise frequently in prescription drug surveillance systems in the form of spatial, temporal, and individual-specific behavioral interactions. Using a graph scan, a fast algorithm was developed to identify clusters of Kansas providers who had more opioid claims than their peers in 2014 compared to 2013.
METHODS: The publicly available 2013-2014 Center for Medicare and Medicaid Services (CMS) Part D Summary File for Kansas providers was used to pilot test a multidimensional clustering algorithm. Anomalous event was defined as a provider/city with more opioid claims in 2014 than 2013 using nearby providers as controls. All locations were connected in a G(Vertex, Edge) graph using a minimum spanning tree with weights equal to the relative risk of higher than expected opioid claims and the distance from the nearest providers. All tree nodes were assessed for articulation points using a biconnected component optimization algorithm. Articulation points are equivalent to significant clusters of interest. All analysis was completed in SAS/STAT 14.1 using PROC STDRATE and SAS/OR 14.1 using PROC OPTMODEL and OPTNET.
RESULTS: Annual change in opioid claims for 3,727 providers and 98 specialty practices in Kansas were compared among 167 cities. Traditional spatial scan methods would require 2^N run-time, but this two-step algorithm resulted in an assessment of 8,889 stratified relative risk in O(A log N) run-time for the minimum spanning tree and O(N+A) run-time for the biconnected component algorithm. In less than 2 seconds on a standard desktop computer, this two-step algorithm identified 5 cities as minimum forest articulation points (i.e. clusters). At least four articulation points corresponded to places with typically high patterns of expected opioid prescribing (i.e. Kansas City, Wichita, Topeka, and Overland Park). These locations can serve as potential points of interest for surveilliance.
CONCLUSIONS: Open data, such as the CMS data presented here, can provide useful source of information to develop and test algorithms for public health surveillance. 2^N run-time algorithms for large datasets are problems that are impossible to solve on even the most advance computing platform. However, the quantitative literature have proposed many graph theory algorithms to solve these problems in linear time. One application of a method is presented here. Graph theory is a powerful tool for most problems, but may lack specificity and requires a careful heuristic algorithim. Future work will use graph theory to enhance the prescription drug poisoning prevention program in Kansas by identifying hotspots of public health interest.