This thesis presents the development and refinement of the UVEMAP system, an Uncertainty-aware Vision-based Ego-Motion-Aare target Prediction module designed for robust multi-object tracking using only monocular camera inputs.
This work aims at improving the performance of Multi-Object Tracking (MOT) systems by the incorporation of ego-motion and depth estimation uncertainty through a heuristic and computationally efficient solution.
The first step involves separating the impact of camera movement in Kalman-based MOT algorithms. This helped to increase tracking precision by ensuring that the vehicle ego-motion did not negatively impact the targets' anticipated location. The work is continued by creating a pure vision system that uses only the image stream from a monocular camera; thus removing the requirement for other sensor data, like IMUs, GPS, and wheel encoders. This strategy improves the system adaptability and applicability in many contexts as the proposed solution can completely be agnostic of the metric scale such as depth and translation vector.
One of the main contributions is the integration of visual odometry and depth estimation using a modified Monodepth2, which estimates depth and camera motion by employing a self-supervisory signal generated by image reprojection error. The modifications to Monodepth2 guarantee its compatibility with UVEMAP, allowing precise depth and pose estimation from monocular images. In order to account for uncertainty in depth estimation, a conformal prediction method is applied which identifies prediction intervals to gauge the level of uncertainty associated with each depth estimation by computing nonconformity scores in the dataset. This utilization of data enhanced the capability of the Kalman filter to handle occlusions and noisy readings.
The incorporation of unified scale depth and pose estimations as well as depth uncertainty quantification into the proposed target prediction module resulted in a substantial improvement in performance metrics compared to baseline methods. Experiments conducted on the KITTI dataset show that UVEMAP significantly reduces identity switches and enhances tracking accuracy and robustness. The computational efficiency of the proposed method, stemming from its heuristic nature, makes it suitable for deployment on edge devices, including autonomous ground robots and vehicles.
This research makes a notable contribution to the field of multi-object tracking by presenting a comprehensive framework that integrates ego-motion awareness and uncertainty quantification, all from a monocular video stream, to achieve superior tracking performance.