Estimating individual minimum calibration for deep-learning with predictive performance recovery:​ an example case of gait surface classification from wearable sensor gait data

Lam, Guillaume; Rish, Irina; Dixon, Philippe C.

Affiliations

¹Department of Computer Science and Operations Research, Université de Montréal, Canada

²Mila - Quebec AI Institute, Université de Montréal, Canada

³School of Kinesiology and Physical Activity Sciences, Faculty of Medicine, Université de Montréal, Canada

⁴Research Center of the Sainte-Justine University Hospital (CRCHUSJ), Canada

⁵Institute of Biomedical Engineering, Faculty of medicine, Université de Montréal, Canada

Abstract and keywords

Clinical datasets often comprise multiple data points or trials sampled from a single participant. When these datasets are used to train machine learning models, the method used to extract train and test sets must be carefully chosen. Using the standard machine learning approach (random-wise split), different trials from the same participant may appear in both training and test sets. This has led to schemes capable of segregating data points from a same participant into a single set (subject-wise split). Past investigations have demonstrated that models trained in this manner underperform compared to those trained using random-split schemes. Additional training of models via a small subset of trials, known as calibration, bridges the gap in performance across split schemes; however, the amount of calibration trials required to achieve strong model performance is unclear. Thus, this study aims to investigate the relationship between calibration training set size and prediction accuracy on the calibration test set. A database of 30 young, healthy adults performing multiple walking trials across nine different surfaces while fit with inertial measurement unit sensors on the lower limbs was used to develop a deep-learning classifier. For subject-wise trained models, calibration on a single gait cycle per surface yielded a 70% increase in F1-score, the harmonic mean of precision and recall, while 10 gait cycles per surface were sufficient to match the performance of a random-wise trained model. Code to generate calibration curves may be found at (https://github.com/GuillaumeLam/PaCalC).

Keywords: Biomechanics; Calibration; Clinical datasets; Inter/intra-subject; Gait; Machine learning; Random/Record-wise split; Subject-wise split

Links

DOI: 10.1016/j.jbiomech.2023.111606

PubMed: 37187130

WoS: 001007732000001

History

Accepted: 2023-04-26

Online: 2023-04-30