BACKGROUND: In public health settings, screening programs need to optimize criteria to maximize the chances of identifying infected but previously undetected individuals while minimizing the costs per detected individual. Undetected hepatitis B & C (HBV & HCV) infections pose explicit health risks and contribute to downstream medical costs for treatment of the disease and its complications. In this investigation, an analysis to optimize the hepatitis screening program was conducted utilizing self-reported risk factors and demographics.
METHODS: Self-reported hepatitis risk factors and age of clients screened in 2009 and 2010 living in 10 Florida counties were extracted from CAREWare (HIV Patient Care database), while corresponding laboratory data were extracted from the Electronic Laboratory Report (ELR) server. After matching the data on date of birth, county, first and last name (~13,000 cases; 112 HBV active infections; 750 HCV exposures), interactive classification trees (STATISTICA DataMiner) were used to determine classification schemes based on a range of misclassification costs in which missing a positive client was increasingly penalized. Finally, results were used to determine the optimum client selection in financial context of public health settings.
RESULTS: By using classification trees to optimize screening criteria for hepatitis, all the positive cases could be identified, while not testing 4,561 (34.9%) and 1,490 (11.1%) of the clients for HBV & HCV, respectively, reducing the costs of the testing program by 19.9%. Moreover, identifying the first set of individuals is comparatively inexpensive, but grows exponentially with each additional set of positive individuals. Using an estimated budget limit of $150,000 per year for statewide HBV & HCV testing, the estimated number of HBV & HCV tests available for the whole state would be 1,135 and 14,206, respectively. This corresponds with detecting 22.3% of the HBV cases and 75.5% of the HCV cases currently detected, improving the positivity rate of screening tests from 0.9% to 4.4% (25 cases) for HBV and 5.7% to 16.3% (581 cases) for HCV, while cutting the costs of the screening program by about two-thirds.
CONCLUSIONS: Using advanced data mining techniques to generate classification trees can dramatically improve screening criteria for hepatitis while at the same time minimizing the number of individuals who are excluded from testing. Maximizing detection of positive cases within budgetary constraints should be the ultimate goal of any screening program. As shown here, using advanced statistical and data mining approaches can be a useful tool to achieve this goal.