Increased global competition has placed a great demand for manufacturers to be flexible with their products and services. This can be addressed with the introduction of robots which are very effective in carrying out repetitive, non-ergonomic tasks working in partnership with human operators who typically excel in precise tasks requiring dexterity, flexibility, and cognitive decision-making. This paradigm of humans and robots working together collaboratively, forms the motivation behind the field of humanrobot collaboration (HRC). This dissertation begins by introducing a novel taxonomy of HRC in response to a need for better understanding of the possible interactions between humans and robots and the significance of the level of robot intelligence to establish an effective interaction.
Cohesive HRC can be achieved through seamless and natural communication between human and robot partners. The field of human-robot communication (HRCom) finds its roots in human communication with the aim to achieve the “naturalness” inherent in the latter. This dissertation posits that the design aspects of HRCom can take inspiration from human communication to create more intuitive systems that truly leverage the presence of the human as a collaborating agent so that the human’s role is something more meaningful than just a command centre. However, the goal of this additional contribution must come at no additional effort to the human operator.
HRCom can be achieved through a robust robot perception system developed using data-driven ML. The challenge for HRCom is the dearth of comprehensive, labelled datasets while standard, publicly available ones do not generalize well to domain and application specific scenarios. Furthermore, models also fail to generalize under domain shifts stemming from changes in the environment of the robot. Keeping in mind the aforementioned challenges and the complexities inherent in HRCom, a framework, SIMLea, is presented. Statistically-Informed Multimodal (Domain Adaptation by Transfer) Learning takes inspiration from human communication to use human feedback to auto-label for domain adaptation.
The strength of the contribution lies in the use of incommensurable multimodal decision-level inputs for personalizing with user-specific data leading to statistically-informed extension of datasets, greater safety, enhanced monitoring of the continuous learning of the model, and judicious use of resources. The framework is validated with facial expression recognition that serves as a safety feature as well as hand gesture recognition for provision of imperative commands to the robot; but is also applicable to other combinations of multimodal inputs in HRC applications.