Evaluating a Keyword-Based Algorithm for Assigning Nature of Injury and Body Part to Occupational Injury and Illness Narratives

Monday, June 23, 2014: 2:00 PM
209, Nashville Convention Center
Kathleen Grattan , Massachusetts Department of Public Health, Boston, MA
Kathleen Fitzsimmons , Massachusetts Department of Public Health, Boston, MA
SangWoo Tak , Southern California NIOSH Education and Research Center, Los Angeles, CA
Letitia K. Davis , Massachusetts Department of Public Health, Boston, MA

BACKGROUND: Assignment of standardized codes to narratives in occupational health data such as workers’ compensation and the Bureau of Labor Statistics Survey of Occupational Injuries and Illnesses (BLS SOII) is essential to aggregate the data needed to understand the extent and distribution of occupational injury and illness. Application of computer assisted, automated or semi-automated, methods can improve the quality, uniformity, and completeness of coding while allowing for efficient use of limited surveillance resources.

METHODS: A SAS-based algorithm was applied to narrative fields included in a sample of cases from the BLS SOII for Massachusetts. The algorithm assigned Nature of Injury (NOI) and Body Part (BP) codes based on the presence of key words in the narratives. Codes were assigned according to a modified version of the American National Standard Institute (ANSI) coding system, and conversion to the Occupational Injury and Illness Coding System (OIICS) was made via crosswalks. An important caveat of the algorithm is that it allows for more than one NOI and BP to be coded per case narrative. Notably, OIICS rules of selection instruct users to choose only one code. To evaluate algorithm performance at the 2-digit OIICS level, sensitivity and positive predictive value (PPV) were calculated using BLS manually assigned codes as the gold standard.

RESULTS: The algorithm assigned NOI and BP to 95.5% and 98.1% of the SOII cases, respectively. Of these, 35.6% were assigned multiple NOI codes. Overall measures of sensitivity for NOI and BP were 53.6% and 42.9%, respectively. For both NOI and BP, sensitivity and PPV varied by specific 2-digit OIICS category. The algorithm’s performance improved when restricted to cases where a single NOI or BP was assigned. For this subset, PPV for the majority of high frequency NOI and BP categories exceeded 95% and 90%, respectively.  

CONCLUSIONS: The algorithm for coding NOI and BP did not perform well when applied to all SOII cases. This was explained in part by the fact that this algorithm allowed for assigning multiple NOIs and BPs and did not incorporate OIICS rules for coding such cases. However, this basic algorithm can substantially reduce the amount of manual coding when restricted to cases where a single NOI or BP was assigned. Additionally, this analysis highlights the potential loss of information about the work-related incident that can occur when one relies on a classification system such as OIICS that allows only one NOI or BP to be chosen per case.