Revolutions in AI have recently enabled open-ended conversational
interactions with social robots. In educational contexts, interactions with robots
were until recently heavily structured and limited, which restricted the scope of
engagement and made the integration of educational content labour-intensive.
The emergence of data-driven AI, particularly Large Language Models (LLMs)
and Diffusion Models, now opens up new possibilities, while also introducing
challenges inherent to the probabilistic nature of these systems. This shift
compels us to rethink how we design, deploy, and evaluate social robots in
education.
One particularly promising application for LLM-powered social robots is second
language learning. These robots can engage in open-ended spoken dialogue with
learners, adapting both language level and conversational content to suit the
learner's interests and progress. We present a prototype robot that leverages
speech recognition, LLMs, and generative AI to support open-ended learning for
students of French. Our results show that while state-of-the-art Transformerbased
speech recognition systems typically achieve superhuman performance for
native speech (Word Error Rate < 5%), they produce a WER of 32.8% for French
learners. Despite this relatively high error rate, the output remains sufficient for
the LLM to generate coherent and contextually appropriate responses, enabling
effective spoken interaction.
We further enhance the experience with images generated by Diffusion Models,
which illustrate the conversation and provide visual grounding to support shared
understanding. This paper explores the educational potential of AI-driven
dialogue and, in particular, investigates the application of Ellis' usage-based
approach to second language acquisition—emphasising exposure over corrective
feedback in robot-assisted learning.