Automating Patient Safety Event Report Classification Using Natural Language Processing and Machine Learning

Incident reporting (IR) in healthcare organizations aims to improve the quality of care and patient safety. The primary purpose of IR is to identify safety issues and develop interventions to mitigate these hazards, thereby reducing harm in healthcare [21]. IR enables healthcare organizations to document Patient Safety Event (PSE) reports, detailing incidents that compromise patient safety, including medical errors, near misses, or adverse events [177]. Accurate classification of PSE reports into their corresponding incident type and severity level is crucial for analyzing trends, guiding interventions, and enhancing organizational learning [51, 15]. However, the high volume of reported PSEs [93] and the complexity of the classification taxonomy, which requires specialized knowledge for categorization, make the labelling process labour-intensive, time-consuming, and expensive [101, 42]. In this study, we aim to develop and evaluate machine-learning models that predict the type and severity level of an incident, based on textual description of the event in a PSE report. Our goal is two-fold. First, we aim to build predictive models that can identify high-severity PSE reports. This prioritization could speed up the analysis of these reports, improving overall patient safety and healthcare quality. Second, we investigate Active Learning (AL) strategies to streamline the manual labeling process for assigning event types to PSE reports. This approach aims to reduce labeling efforts while enhancing classification performance by querying instances that, if annotated by human experts, would significantly improve classification accuracy. We investigate the application of these models to PSE report datasets from two institutions: a large hospital in Canada and an academic hospital in the Southeastern United States. Our results show that models like LightGBM, trained on domain-specific representations such as GatorTron, significantly improve the ranking quality of incidents by severity level, with R-precision increasing from 0.197 to 0.4259 and MAP from 0.1546 to 0.4205. We also observe that using Active Learning strategies over baseline random sampling reduces the need for labeled samples by 24-69%, effectively decreasing manual workload while maintaining high classification accuracy.

Keywords: Incident report; Patient Safety Incident Classification; Incident Severity Prediction; Text Classification; NLP; Healthcare; AI; Deep Learning; Machine Learning