
My first degree was in A.I. back in the 1980s at Sussex University. Back then, at Sussex, the A.I degree was in the School of Social Sciences and figures such as Aaron Sloman and Maggie Boden with philosophy and humananities backgrounds meant the course had a big input from psychology and cogitive philosphy. I like the world of ideas but I found wrestling with notions which seemed very divorced from the applied reality of computing frustrating. When I entered the speech world in 1994, when I took the wonderful MSc course at Edinburgh University in Speech and Language Processing, I found a mixture of the applied and the unknown which suited me very well.
Speech is just there. It is not a theoretical contruct, it is not blocks world, it is not even defined by the invention of writing. Either recognising or producing speech artificially is a concrete task. But it is also a very challenging one, because speech is not just a means of communication. It is part of what we are as humans, it betrays our background, our motivations, it communicates our emotions, it is part of social life and part of our human experience. We can maniplulate speech to lie, to love, to attack and to support. Thus, when we enter the world of speech we are entering a human world which is far removed from GUIs, and computer systems. To build systems that can interact in this human context is a facinating engineering problem that encompasses digital signal processing right through to the social sciences.
My research interests across this theme are quite eclectic and looking at my publications on scholar is quite a good way of undetstanding the breadth of my interests. At present the area I’m most focused on is conversational interaction.
Human - Robot/Agent Conversational Interaction
Speech synthesis is sounding good nowadays, automatic speech recognition is working quite well. However, conversational interaction hasn’t changed for years from a speak-wait approach. Not only is this clunky and frustrating to use in a single person robot/agent context, it is unfit for purpose in a multi-dialog context where humans and not systems dictate turn-taking. For a good overview of my thoughts on this filed have a look at the paper I co-wrote with Marta Romeo for CUI in 2023 - You Don’t Need to Speak, You Need to Listen: Robot Interaction and Human-Like Turn-Taking .
Personification, Anthropomorphism and Designing Systems to Perform Rather than to Mimic
Coming from a speech synthesis background I have had a long interest in both mimicking and cloning human voices. For me this raises some fundamental questions about the design of anthropomorphic and semi-anthropomorphic systems. Firstly can we build systems that perform in the same way an actor might perform in a context, Secondly when should we or should we not try to do this. The alt.CHI paper from 2019 is a good summary of my thinking on this topic.