Background: Convolutional neural networks (CNNs) can identify vertebral compression fractures in GE vertebral fracture assessment (VFA) images with high balanced accuracy, but performance against Hologic VFAs is unknown. To obtain good classification performance, supervised machine learning requires balanced and labeled training data. Active learning is an iterative data annotation process with the ability to reduce the cost of labeling medical image data and reduce class imbalance.
Purpose: To train CNNs to identify vertebral fractures in Hologic VFAs using an active learning approach, and evaluate the ability of CNNs to generalize to both Hologic and GE VFA images.
Methods: VFAs were obtained from the OsteoLaus Study (labeled Hologic Discovery A, n = 2726), the Manitoba Bone Mineral Density Program (labeled GE Prodigy and iDXA, n = 12,742), and the Canadian Longitudinal Study on Aging (CLSA, unlabeled Hologic Discovery A, n = 17,190). Unlabeled CLSA VFAs were split into five equal-sized partitions (n = 3438) and reviewed sequentially using active learning. Based on predicted fracture probability, 17.6% (n = 3032) of the unlabeled VFAs were selected for expert review using the modified algorithm-based qualitative (mABQ) method. CNNs were simultaneously trained on Hologic, GE dual-energy and GE single-energy VFAs. Two ensemble CNNs were constructed using the maximum and mean predicted probability from six separately trained CNNs that differed due to stochastic variation. CNNs were evaluated against the OsteoLaus validation set (n = 408) during the active learning process; ensemble performance was measured against the OsteoLaus test set (n = 819).
Results: The baseline CNN, prior to active learning, achieved 55.0% sensitivity, 97.9% specificity, 57.9% positive predictive value (PPV), F1-score 56.4%. Through active learning, 2942 CLSA Hologic VFAs (492 fractures) were added to the training data—increasing the proportion of Hologic VFAs with fractures from 4.2% to 12.5%. With active learning, CNN performance improved to 80.0% sensitivity, 99.7% specificity, 94.1% PPV, F1-score 86.5%. The CNN maximum ensemble achieved 91.9% sensitivity (100% for grade 3 and 95.5% for grade 2 fractures), 99.0% specificity, 81.0% PPV, F1-score 86.1%.
Conclusion: Simultaneously training on a composite dataset consisting of both Hologic and GE VFAs allowed for the development of a single manufacturer-independent CNN that generalized to both scanner types with good classification performance. Active learning can reduce class imbalance and produce an effective medical image classifier while only labeling a subset of available unlabeled image data—thereby reducing the time and cost required to train a machine learning model.