Inertial sensing and computer vision are promising alternatives to traditional optical motion tracking, but until now these data sources have been explored either in isolation or fused via unconstrained optimization, which may not take full advantage of their complementary strengths. By adding physiological plausibility and dynamical robustness to a proposed solution, biomechanical modeling may enable better fusion than unconstrained optimization. To test this hypothesis, we fused video and inertial sensing data via dynamic optimization with a nine degree-of-freedom model and investigated when this approach outperforms video-only, inertial-sensing-only, and unconstrained-fusion methods. We used both experimental and synthetic data that mimicked different ranges of video and inertial measurement unit (IMU) data noise. Fusion with a dynamically constrained model significantly improved estimation of lower-extremity kinematics over the video-only approach and estimation of joint centers over the IMU-only approach. It consistently outperformed single-modality approaches across different noise profiles. When the quality of video data was high and that of inertial data was low, dynamically constrained fusion improved estimation of joint kinematics and joint centers over unconstrained fusion, while unconstrained fusion was advantageous in the opposite scenario. These findings indicate that complementary modalities and techniques can improve motion tracking by clinically meaningful margins and that data quality and computational complexity must be considered when selecting the most appropriate method for a particular application.
Keywords:
Kinematics; Inertial measurement units; Computer vision; Direct collocation; Simulation