This thesis focuses on the development of a machine learning-based 3D computer vision system that can represent complex 3D shapes and scenes with a group of simple geometric primitives. The proposed system is a multi-stage process that utilizes a learning-based Hierarchical Gaussian Mixture Model (HGMM) as the shape representation, which is then taken as the input into the primitive detection module. The proposed computer vision system requires no supervision while training and is able to provide accurate and robust approximations of 3D shapes via a set of simple geometric primitives, such as cuboids, planes, or spheres. This significantly reduces the memory footprint while keeping a meaningful and discriminative representation of the original model.
The primitive detection system pipeline involves two sequential stages. The first stage involves the training and post-processing of the HGMM data representation. Firstly, it is shown that the Expectation Maximization (EM) algorithm requires a scenario-specific initialization method to succeed. To solve this problem, a learning-based neural network is used to generate meaningful and discriminative HGMMs without the need for an initial condition. An adaptive modelling module utilizing the Non-maximum Suppression (NMS) algorithm is developed as a post-processing technique to detect and eliminate overlapping mixture components and hence reduce the number of required parameters for a more lightweight representation.
For the second stage, a statistical primitive fitting module is applied to fit geometric primitives to the input point cloud based on the estimated HGMM. Then, a primitive alignment and merging algorithm is designed to locate and combine primitive segments that originally belong to a larger primitive model. This allows for a cleaner and more discriminative detection result. Finally, experiments are conducted to evaluate the performance of deep learning-based HGMM and the primitive detection results with the EM-based HGMMs as the baseline.
The proposed system is able to provide a visual abstraction of the original 3D shapes with parameterized and simplified geometry format, which can later be applied in real-world applications such as 3D rendering, robot simulation, and game development.