New Ivee Device: An Intelligent Assistant in Every Room

The team at Interactive Voice has launched an Indiegogo campaign to raise funds for their ambitious new Ivee device. Last year I wrote about Interactive Voice’s Ivee Sleek. The Sleek was an improvement over Ivee’s earlier devices, the Digit and Flex products. Ivee Sleek was already a full-fledged digital assistant with voice-driven features and Internet connectivity. As I noted in the original blog post, the Sleek leveraged AT&T’s Watson speech recognition technology, which was recently acquired by Interactions.

IveeWhat does Interactive Voice have planned for its latest evolution of the Ivee product? Based on the current Indiegogo campaign, the new device is a direct competitor to Amazon’s Echo. Though it’s thinner in appearance, the new Ivee device is a tall cylindrical structure somewhat reminiscent of the Echo. Many of the skills listed on Ivee’s crowdfunding page overlap with those offered by Amazon’s smart device, including:

  • smart home
  • streaming media
  • alarms & reminders
  • traffic
  • weather
  • questions & answers

However, I noticed three functions that Ivee aspires to offer that I don’t think are currently on Echo’s skill’s list. These are:

  • on demand services (Uber)
  • emergency communications
  • multi-room support

Ivee’s on demand service offers a quick and easy way to order an Uber ride with a voice command. The emergency communications feature enables you to call out to emergency services and could make Ivee appealing as an elder care device. The Ivee is supposed to be available at a low enough price point to allow a family to purchase multiple copies to place in different rooms around the house.

Last month I wrote about the successful Mycroft crowdfunding campaign. Mycroft is another voice-driven Internet-connected smart assistant device that aims to be in every room in your house. Mycroft offers an open source hardware platform built on Raspberry Pi and a software platform that uses Ubuntu. 

Amazon Echo and Mycroft offer software development kits (SDKs) that encourage app owners and developers to build new skills for the devices. It’s not clear to me at this point whether Interactive Voice plans to offer an SDK for Ivee, but it seems that might be necessary in order to compete.

With about a month to go, the Ivee campaign is 90% funded, so it’s well on its way to successfully reaching the goal. There is an obvious interest and appetite for intelligent voice-driven devices, especially those that can connect to devices in the smart home via IoT protocols.

The Voices of Our Future Digital Companions

Over the past several weeks, I’ve repeatedly run into the topic of what our future intelligent assistants and robot companions will sound like. To reach their full potential, digital assistants are going need to understand what we say and then speak back to us in voices we like to hear.

Voice RecordingWhile I was at SpeechTek 2015, I spoke with Tara Kelly of Splice Software. I wrote a blog post that explained Splice’s technology, which provides crowdsourced, top quality voice files that are the foundation of flexible text to speech systems. Splice focuses on Outbound IVR and delivering the perfect voice for the customer and situation at hand. This same technology could potentially provide voices for talking devices in the future.

Just this week I wrote about an article in the New York Times that covered Mattel and ToyTalk’s Hello Barbie. While I didn’t mention it in my post, the article went into some depth about how Mattel selected the voice talent for Hello Barbie and the process used for recording the scripts that comprise the doll’s conversational repertoire. Suffice it to say that a lot of thought went into choosing the voice, with the company opting for someone “less breathy and more down to earth.” And the process used to record each script also seems to have been fairly involved.

Yesterday I stumbled across a video interview with Blade Kotelly, VP of Design & Consumer Experience at Jibo, Inc. In the interview, Kotelly talks about the concept of Jibo as a character. Before beginning the search for the right voice, the team conceptualized what type of character they wanted Jibo to be. They landed on the idea of a young male who was energetic, earnest, and helpful.

Once they’d solidified the basic concept for the character’s personality, the Jibo team did a huge casting call for voice talent and listened to over four hundred demo recordings. They eventually chose fifty candidates for numerous auditions. They kept holding auditions and winnowing the talent down until they’d picked the top four voices. From there, the team spent a lot of time evaluating the pros and cons of each voice actor, until they ultimately selected the voice of Jibo.

Kotelly goes into a lot of interesting detail about the auditions and what they were looking for. He also describes the complexities of the recording process. The voice actor ultimately had to record 14,000 phrases!

Unfortunately, the video doesn’t reveal who the voice actor behind Jibo is, or give us a chance to listen to any of his audio recordings. It’ll be interesting to hear how the “real” JIbo’s voice differs from the one used in the Indiegogo campaign video.

Will Jibo’s voice actor join the likes of Susan Bennett (Siri) and Jenn Taylor (Cortana) in the annals of famous voice talent? I suppose that remains to be seen. But as the number of conversational assistants, devices, and hardware characters continues to increase, the opportunity for great voice actors and actresses is growing. Getting text to speech right is hard. But the importance of the right voice can’t be underestimated.

Hello Barbie, Chatbots, and the Challenges of Talking Toys

James Vlahos recently published an insightful article on Hello Barbie, the soon to be released talking edition of Mattel’s fashion doll. This past spring, when Hello Barbie was debuted at Toy Fair 2015 in New York, there was a flood of negative press. The primary objection to the doll, which Mattel is building in partnership with speech and AI company ToyTalk, was that the Toytalk technology represents an invasion of privacy. In a blog post I wrote back in February, I briefly addressed these concerns. Conversations children have with ToyTalk’s apps and devices are sent back to their servers for analysis and storage (just like conversations we have with Siri and GoogleNow).

Hello BarbieThough Vlahos’s article touches on the privacy issue, his main focus is on what makes the talking Barbie work. Vlahos was invited to Mattel’s Imagination Center in El Segundo, CA to observe firsthand how the newly conversational Barbie interacted with real girls from the area.

Hello Barbie, it turns out, shares a lot in common with chatbots. All of her dialog is scripted by a small team of writers, including dramatic author Sarah Wulfeck. The writers decide what words to put into Barbie’s mouth. More importantly, they must anticipate what girls will ask their doll companion.

Years ago when I built my first chatbot on the popular site Pandorabots using AIML, I didn’t have to start from scratch. I could import the open source A.L.I.C.E library into my newly created chatbot and take advantage of the fruits of many years worth of someone else’s dialog-writing labor. The ToyTalk authors started from a blank canvas. Based on Vlahos’s article, they even (re)invented a chatbot scripting language they call PullString, named after the cord you used to pull from the backs of dolls and animals to get them to talk.

But what do you have Barbie say? Guessing what children will ask and figuring out how to respond is a challenge every botmaster knows well. To make the challenge even greater, Barbie is targeted at girls from the ages of 3 to 9 and even older. That’s a wide age range to cover and a response that might work for a 5-year-old could very well backfire for a 9-year-old, and vice versa.

One way around this challenge is to create a framework that puts Barbie in charge. If Barbie can be the one leading the conversation, then the possible user inputs are narrowed. Vlahos discovered that the team of writers is trying to do just that by constructing games that Hello Barbie can play with her human friends. One example is a mock game show in which Barbie asks the girl to nominate a person as the winner of various humorous prizes.  Creating a game environment allows Hello Barbie to maintain control of the conversation.

How will Barbie’s conversational abilities hold up under real usage? That remains to be seen. Many of those who commented online about the article expressed skepticism that Hello Barbie will be engaging for children in the long term. Presumably the writing team can keep adding scripts to Barbie’s repertoire so that her topics of conversation expand over time.

A big advantage that Hello Barbie has over traditional chatbots is that the ToyTalk technology enables the doll to remember key facts from previous discussions. If Barbie can remember that Brittany’s favorite color is orange, that she loves to play soccer, and that she wants to be a paleontologist when she grows up, that should help keep Brittany engaged.

Regardless of the skepticism, the age of talking toys is upon us. Hello Barbie is scheduled to launch in time for this Christmas. Now it’s up to the creative writers and chatbot master-types of the world to prove to the skeptics that talking toys and conversational devices can be forces for good. 

Storytelling Program Weaves Tales on the Fly

Will intelligent assistants of the future, powered by machine learning algorithms, be able to craft interactive stories on the fly? A team at the Entertainment Intelligence Lab of the Georgia Institute of Technology has developed programmatic techniques that could make this scenario possible. CNET recently reported on the results of the project in which the Georgia Tech team built an interactive storytelling program called Scheherazade.

ScheherazadeThe Entertainment Intelligence Lab’s website states that one of their primary focus areas is “intelligent narrative computing.” Scheherazade differs from other story generating applications, because it doesn’t need to be programmed in advance to understand the subject matter of the tale. Scheherazade can create an interactive narrative about any topic using its ability to construct a plot graph.

For their research, the team sourced narrative sequences from a crowdsourcing platform.  These sequences consisted of linear examples of typical events. For example, narrative sequences describing a cruise vacation might look something like this: Jack and Jill plan a cruise vacation, Jill buys cruise ship tickets, Jack and Jill prepare for the cruise, they arrive at the cruise ship, they spend the first night aboard the boat, they get off the boat at the first island stop, and so on.

When someone requests that Scheherazade tell them an interactive story about cruise vacations, it checks to see if it already has a plot graph on that topic. If it doesn’t have a plot graph, the program accesses the human-authored, crowdsourced linear story examples and applies algorithms to create the plot graph. Once the plot graph has been established, Scheherazade can choose plausible alternative storylines to weave together an interesting narrative that the listener can influence.

According to the Entertainment Intelligence Lab’s report, Scheherazade’s stories were judged to be at least as plausible and enjoyable as those created by humans using the same plot snippets. Interactive games can certainly leverage the type of interactive narrative generation that the Entertainment Intelligence Lab’s work describes. Will our future intelligent assistants one day use these same,or similar capabilities to entertain us? Only time will tell, but we all know how compelling entertainment can be.

Baidu Enters Intelligent Assistant Race with Duer

The intelligent assistant wars continue to heat up. As reported in Tech Times and other news sources, Baidu used their World Conference on September 8th to announce the launch of a new digital intelligent assistant called Duer. Duer is a voice-driven assistant that is currently integrated into Baidu’s Android search app. The company has plans to incorporate Duer into other services and products in the future.

Baidu DuerBased on descriptions of Duer, the assistant has core capabilities in search that you would expect. But Duer appears to be very task-oriented as well. Duer can respond to spoken or text instructions to execute tasks that include: buying movie tickets, making restaurant reservations, ordering food for takeout or delivery, booking a ride with a ride sharing service, and purchasing airline tickets. Future iterations of the assistant are expected to control devices within the connected home and integrate with Baidu shopping apps.

Andrew Ng is Baidu Research’s Chief Scientist in Silicon Valley. Ng is also an associate professor at Stanford University and an expert in machine learning and deep learning. Before joining Baidu, Ng founded the Google Brain project at Google. Ng’s work in machine learning is wide-ranging, but one of his notable areas of research has been in visual learning. It appears that some of Ng’s knowledge will be applied in Duer’s ability to scan user reviews to discern answers to questions such as “is the restaurant pet friendly?” Presumably the intelligent assistant will make the conclusion that a restaurant with lots of dogs on the patio is a good place to bring Fido.

Ng is one of the rock stars of deep learning, so it will be interesting to watch the developing battle of intelligent assistants powered by artificial intelligence that is shaping up between Google, Facebook, Apple, Baidu and, to some extent, Microsoft.


Microsoft’s Volometrix Acquisition and the Potential of Intelligent Advisors in the Enterprise

Intelligent Advisors in the EnterprisePersonal assistants are coming to the enterprise; it’s just a matter of when and how. I published a guest blog post on Opus Research’s site today about Microsoft’s acquisition of the small organizational analytics company Volometrix.

Volometrix provides software that helps employee’s track and improve several key areas of productivity. Check out the blog post to learn more about how Volometrix’s software works and what Microsoft’s acquisition might foretell about the future of intelligent assistants (or even intelligent personal advisors) within the enterprise.

SPLICE – Building the Voice of Your Future Bespoke Intelligent Assistant?

When was the last time you enjoyed listening to the voice of a company’s automated call tree? Probably never. Have you heard Scarlett Johansson’s voice? It’s pretty smooth. Johansson provided the voice for the intelligent operating system in the movie HerIf you saw the film, chances are you remember that voice. If you haven’t seen the film, believe me when I say there’s no way the movie would have been effective, or even bearable, if casting had replaced Johansson’s voice with Siri’s. Voices are important.

Outbound IVRAt SpeechTek 2015, I met with Tara Kelly, President and CEO of SPLICE Software, to learn what SPLICE offers customers and how it relates to the topic of intelligent assistants. What does SPLICE have to do with the operating system’s voice in Her? SPLICE offers customers many valuable services, but all of them rest on the foundation of SPLICE’s vast library of crowdsourced voice files. Kelly explained that the company records in phrases and applies algorithms to concatenate these phrases into customizable sentences.

SPLICE isn’t currently providing voices for intelligent assistants. Instead, Kelly and the team at SPLICE focus primarily on Outbound IVR services. IVR stands for Interactive Voice Response and consists of technology that enables a human to interact with a computer system through touchtone and voice. When people hear the term IVR, they generally think of the dreaded call tree. But hang on for a bit, because it turns out that dynamically generated audio dialogue doesn’t have to be horrible. In fact, it can be amazingly good.

Unlike Inbound IVR, Outbound IVR consists of automated calls to targeted customers. Don’t confuse outbound IVR with robocalling. Robocalls are used to send the same generic message to as many people as possible, hoping that a small percentage of them will listen to the message and care about the content. Outbound IVR involves tailoring a message to one specific customer to get that person information he or she wants to hear.

Examples of Outbound IVR include calls made to:

  • alert a credit card customer that he/she is late on a payment
  • provide an insurance customer with up-to-date status info on a claim
  • invite a special opted in customer to a VIP event

SPLICE has the technology to manage the end-to-end Outbound IVR process. Companies leverage Outbound IVR to provide a service to their clients, increase customer loyalty, and handle commonly needed customer communications.

What I really find intriguing about the SPLICE approach is that Kelly and team are very focused on providing an excellent customer experience. Kelly started out as a small business owner and she was frustrated by how dismal the voices were for automated appointment reminder services. She began creating her own voice files using local voice talent and a family-owned sound studio. Kelly didn’t just focus on getting quality voice files; she also strived to get the right tone and even regional accent to match her customer’s clients.

There turned out to be such a high demand for these perfectly tailored human voices that SPLICE Software was born. SPLICE focuses not only on building out the voice file library, but also on learning how to categorize callers so that the appropriate voice files can be selected and on incorporating technologies such as sentiment analysis to further tailor communications. Kelly also indicated that the company is an open source advocate with plans to further extend their API. You can see Kelly and the SPLICE technology in action in this video. 

We may still be a few years (decades?) away from having intelligent assistants as capable as the one in Her. Part of the challenge will be to teach intelligent assistants to sound more like humans. SPLICE is focused on nearer term voice needs. But who knows? Perhaps the voice files and technologies SPLICE is building today will be part of your ideal intelligent assistant in the future.