Developing the World's First AI Voice with a Stammer

Challenging the status quo in AI voice technology by developing an authentic AI voice that stammers.

At SuperPenguin, our mission has always been to create an inclusive and supportive space for families navigating speech and language therapy. When we first developed our pilot app, Penguin, two members of our team, Ronan and Jaclyn, personally recorded the voiceovers for our content. Their voices brought warmth, authenticity, and, in Ronan’s case, a stammer that is a natural part of his speech.

However, as we expanded into SuperPenguin, we quickly realised that manually recording voiceovers for the rapidly expanding range of topics we had planned was more than our small team could manage. We wanted to support more families and increase our impact, all while preserving the personal touch that made our content so effective.

As a result, we decided to embrace emerging text-to-speech technology, and through extensive development created near-identical AI voice clones of Ronan and Jaclyn that replicated their persona and tonality. This allowed us to generate high-quality voiceovers on demand, making our content development process faster and more efficient. However, while the AI voices replicated Ronan and Jaclyn's tone and voice, they only offered the option of fluent speech.

Stammering voices are rarely heard in mainstream technology and media. Their absence in AI-generated speech reinforces the idea that fluent speech is the only acceptable form of verbal communication, which is simply not true. Stammering is a natural and valid speech pattern, and by making space for it in AI-generated voices, we want to help continue normalising diverse speech in everyday life.

So, we set out to do something that, as far as we know, has never been done before: develop an AI voice that stammers.

Our goal is not to caricature or mimic stammering but to authentically reflect real human speech patterns with respect and accuracy. We want to create a voice that represents people who stammer - one that children and adults alike can recognise and relate to. Because when someone hears a voice like their own, it reinforces the message that their speech is valid, valued, and belongs in the world.

Sample Audio Without Stammering
Sample Audio With Stammering

We appreciate that stammering is an incredibly personal and unique thing. With this in mind, we are treating this as an ongoing project, something which we can share with the stammering community and beyond, in the hopes of starting an important conversation of representation in innovative technology. We’re still in the development phase, but this is a project we are deeply passionate about and we know that meaningful progress comes from collaboration.

We’d love to hear your thoughts around this. How can we ensure this voice is authentic and meaningful? What would you like to hear in an AI-generated stammering voice? Do you feel this work can reflects the stammering community?

Join the conversation. Let’s build this together.