Over the past several weeks, I’ve repeatedly run into the topic of what our future intelligent assistants and robot companions will sound like. To reach their full potential, digital assistants are going need to understand what we say and then speak back to us in voices we like to hear.
While I was at SpeechTek 2015, I spoke with Tara Kelly of Splice Software. I wrote a blog post that explained Splice’s technology, which provides crowdsourced, top quality voice files that are the foundation of flexible text to speech systems. Splice focuses on Outbound IVR and delivering the perfect voice for the customer and situation at hand. This same technology could potentially provide voices for talking devices in the future.
Just this week I wrote about an article in the New York Times that covered Mattel and ToyTalk’s Hello Barbie. While I didn’t mention it in my post, the article went into some depth about how Mattel selected the voice talent for Hello Barbie and the process used for recording the scripts that comprise the doll’s conversational repertoire. Suffice it to say that a lot of thought went into choosing the voice, with the company opting for someone “less breathy and more down to earth.” And the process used to record each script also seems to have been fairly involved.
Yesterday I stumbled across a video interview with Blade Kotelly, VP of Design & Consumer Experience at Jibo, Inc. In the interview, Kotelly talks about the concept of Jibo as a character. Before beginning the search for the right voice, the team conceptualized what type of character they wanted Jibo to be. They landed on the idea of a young male who was energetic, earnest, and helpful.
Once they’d solidified the basic concept for the character’s personality, the Jibo team did a huge casting call for voice talent and listened to over four hundred demo recordings. They eventually chose fifty candidates for numerous auditions. They kept holding auditions and winnowing the talent down until they’d picked the top four voices. From there, the team spent a lot of time evaluating the pros and cons of each voice actor, until they ultimately selected the voice of Jibo.
Kotelly goes into a lot of interesting detail about the auditions and what they were looking for. He also describes the complexities of the recording process. The voice actor ultimately had to record 14,000 phrases!
Unfortunately, the video doesn’t reveal who the voice actor behind Jibo is, or give us a chance to listen to any of his audio recordings. It’ll be interesting to hear how the “real” JIbo’s voice differs from the one used in the Indiegogo campaign video.
Will Jibo’s voice actor join the likes of Susan Bennett (Siri) and Jenn Taylor (Cortana) in the annals of famous voice talent? I suppose that remains to be seen. But as the number of conversational assistants, devices, and hardware characters continues to increase, the opportunity for great voice actors and actresses is growing. Getting text to speech right is hard. But the importance of the right voice can’t be underestimated.