Robots capable of robust, real-time recognition of human intent during manipulation tasks could be used to enhance human-robot collaboration for innumerable applications. Eye gaze-based control interfaces offer a non-invasive way to infer intent and reduce the cognitive burden on operators of complex robots. Eye gaze is traditionally used for “gaze triggering” (GT) in which staring at an object, or sequence of objects, triggers pre-programmed robotic movements. Our long-term objective is to leverage eye gaze as an intuitive way to infer human intent, advance action recognition for shared autonomy control, and enable seamless human-robot collaboration not yet possible with state-of-the-art gaze-based methods.
In Study #1, we identified features from 3D gaze behavior for use by machine learning classifiers of action recognition. We investigated gaze behavior and gaze-object interactions as participants performed the bimanual activity of preparing a powdered drink. We generated 3D gaze saliency maps and used characteristic gaze object sequences to demonstrate an action recognition algorithm.
In Study #2, we introduced a classifier for recognizing action primitives, which we defined as triplets having a verb, “target object,” and “hand object.” Using novel 3D gaze-related features, a recurrent neural network was trained to recognize a verb and target object. The gaze object angle and its rate of change enabled accurate recognition and a reduction in the observational latency of the classifier. Using a non-specific approach to indexing objects, we demonstrated modest generalizability of the classifier across activities.
In Study #3, we introduced a neural network-based “action prediction” (AP) mode into a shared autonomy framework capable of 3D gaze reconstruction, real-time intent recognition, object localization, obstacle avoidance, and dynamic trajectory planning. Upon extracting gaze-related features, the AP model recognized, and often predicted, the operator’s intended action primitives. The AP control mode, often preferred over a state-of-the-art GT mode, enabled more seamless human-robot collaboration.
In summary, we developed machine learning-based action recognition methods using novel 3D gaze-related features to enhance the shared autonomy control of robot manipulators. Our methods can serve as a foundation for further enhancement with complementary sensory feedback such as computer vision and tactile sensing