Teachable social robots leverage the learning-by-teaching paradigm to promote engagement and learning, and can adapt their behaviours to suit different contexts and user preferences. However, existing teachable robots show limited use of machine learning methods for adaptation, and rarely interact in natural language dialog with students, a communication modality with high context dependence and individual preferences. This thesis proposes an adaptive response-selection algorithm for a teachable robot, which uses reinforcement learning to make dialog choices within the teaching conversation, and aims to increase user engagement and learning outcomes. When the algorithm is learning to optimise for engagement it is rewarded based on the user’s paraphrasing behaviour and response time when teaching, and when optimising for learning outcomes the reward uses quiz results obtained during the teaching interaction. Alongside each objective, we also learn user preferences based on their acceptance of the robot’s dialog choices. An individualised policy is learned over a single interaction, which takes place using the Curiosity Notebook research platform, a learning-by-teaching web interface for learning about classification tasks.
The proposed approach is developed and validated through a series of user studies. First, the natural language teaching modality is implemented in the Curiosity Notebook, encouraging users to paraphrase the source material. The performance of this modality on learning and engagement is evaluated in a user study and compared to the baseline teaching method of clicking on full sentences. We find that teaching via paraphrasing has a positive effect on learning outcomes for the material covered, though it takes longer. Participants also perceive this approach to be more helpful for their learning.
Next, the adaptive response-selection algorithm is formulated with the objective of improving task engagement, and evaluated in three user studies, with adults recruited across two diverse groups in Melbourne and in Tokyo, and with a small group of children. Finally, the reward formulation is expanded to optimise for learning outcomes. The algorithm is shown to be capable of learning to select more rewarding dialog responses over time for both objectives. Participants show a significant increase in paraphrasing behaviour when optimising for task engagement, and a larger increase in test scores for higher levels of prior knowledge when optimising for learning outcomes. The algorithm is also able to learn an individualised policy for each user. This adaptive approach both improves positive perception and reduces negative perception of the teaching interaction, leading to a greater interest in future use of the teaching technology.