This thesis proposes a multi-camera active-vision system, which dynamically selects camera poses in real-time to improve time-varying-geometry (TVG) action sensing performance by selecting camera views on-line for near-optimal sensing-task performance. Active vision for TVG objects requires an on-line sensor-planning strategy that incorporates information about the object and the state of the environment, including obstacles, into the pose-selection process. Thus, this research is designed specifically for real-time sensing-system reconfiguration for the recognition of a single TVG object and its actions in a cluttered, dynamic environment, which may contain multiple other dynamic (maneuvering) obstacles.
The proposed methodology was developed as a complete, customizable sensing-system framework which can be readily modified to suit a variety of specific TVG action-sensing tasks – a 10-stage real-time pipeline architecture. This pipeline consists of Sensor Agents, a Synchronization Agent, Point Tracking and De-Projection Agents, a Solver Agent, a Form-Recovery Agent, an Action-Recognition Agent, a Prediction Agent, a Central Planning Agent, and a Referee Agent.
In order to validate the proposed methodology, rigorous experiments are also presented herein. They confirm the basic assumptions of active vision for TVG objects, and characterize gains in sensing-task performance. Simulated experiments provide a method for rapid evaluation of new sensing tasks. These experiments demonstrate a tangible increase in single-action recognition performance over the use of a static-camera sensing system. Furthermore, they illustrate the need for feedback in the pose-selection process, allowing the system to incorporate knowledge of the OoI’s form and action. Later real-world, multi-action and multi-level action experiments demonstrate the same tangible increase when sensing real-world objects that perform multiple actions which may occur simultaneously, or at differing levels of detail.
A final set of real-world experiments characterizes the real-time performance of the proposed methodology in relation to several important system design parameters, such as the number of obstacles in the environment, and the size of the action library. Overall, it is concluded that the proposed system tangibly increases TVG action-sensing performance, and can be generalized to a wide range of applications, including human-action sensing. Future research is proposed to develop similar methods to address deformable objects and multiple objects of interest