SPLICE – Building the Voice of Your Future Bespoke Intelligent Assistant?

When was the last time you enjoyed listening to the voice of a company’s automated call tree? Probably never. Have you heard Scarlett Johansson’s voice? It’s pretty smooth. Johansson provided the voice for the intelligent operating system in the movie HerIf you saw the film, chances are you remember that voice. If you haven’t seen the film, believe me when I say there’s no way the movie would have been effective, or even bearable, if casting had replaced Johansson’s voice with Siri’s. Voices are important.

Outbound IVRAt SpeechTek 2015, I met with Tara Kelly, President and CEO of SPLICE Software, to learn what SPLICE offers customers and how it relates to the topic of intelligent assistants. What does SPLICE have to do with the operating system’s voice in Her? SPLICE offers customers many valuable services, but all of them rest on the foundation of SPLICE’s vast library of crowdsourced voice files. Kelly explained that the company records in phrases and applies algorithms to concatenate these phrases into customizable sentences.

SPLICE isn’t currently providing voices for intelligent assistants. Instead, Kelly and the team at SPLICE focus primarily on Outbound IVR services. IVR stands for Interactive Voice Response and consists of technology that enables a human to interact with a computer system through touchtone and voice. When people hear the term IVR, they generally think of the dreaded call tree. But hang on for a bit, because it turns out that dynamically generated audio dialogue doesn’t have to be horrible. In fact, it can be amazingly good.

Unlike Inbound IVR, Outbound IVR consists of automated calls to targeted customers. Don’t confuse outbound IVR with robocalling. Robocalls are used to send the same generic message to as many people as possible, hoping that a small percentage of them will listen to the message and care about the content. Outbound IVR involves tailoring a message to one specific customer to get that person information he or she wants to hear.

Examples of Outbound IVR include calls made to:

  • alert a credit card customer that he/she is late on a payment
  • provide an insurance customer with up-to-date status info on a claim
  • invite a special opted in customer to a VIP event

SPLICE has the technology to manage the end-to-end Outbound IVR process. Companies leverage Outbound IVR to provide a service to their clients, increase customer loyalty, and handle commonly needed customer communications.

What I really find intriguing about the SPLICE approach is that Kelly and team are very focused on providing an excellent customer experience. Kelly started out as a small business owner and she was frustrated by how dismal the voices were for automated appointment reminder services. She began creating her own voice files using local voice talent and a family-owned sound studio. Kelly didn’t just focus on getting quality voice files; she also strived to get the right tone and even regional accent to match her customer’s clients.

There turned out to be such a high demand for these perfectly tailored human voices that SPLICE Software was born. SPLICE focuses not only on building out the voice file library, but also on learning how to categorize callers so that the appropriate voice files can be selected and on incorporating technologies such as sentiment analysis to further tailor communications. Kelly also indicated that the company is an open source advocate with plans to further extend their API. You can see Kelly and the SPLICE technology in action in this video. 

We may still be a few years (decades?) away from having intelligent assistants as capable as the one in Her. Part of the challenge will be to teach intelligent assistants to sound more like humans. SPLICE is focused on nearer term voice needs. But who knows? Perhaps the voice files and technologies SPLICE is building today will be part of your ideal intelligent assistant in the future.


Mycroft – An Open Source AI Platform on Kickstarter

A team of seasoned open source enthusiasts has launched a Kickstarter campaign to build an Amazon Echo rival based on the Raspberry Pi mini-computer and the Ubuntu operating system. The device, called Mycroft, looks like a radio-clock. It will integrate with an as yet undisclosed Speech-to-Text engine and respond to voice commands.

MycroftMycroft is designed to drive devices that speak IoT, including products from Nest, Philips Hue, Iris, and others. It will also connect to YouTube, Netflix, Pandora, and Spotify and be able to control Roku and Chromecast. The real beauty of the platform, though, is that it is open source and completely extensible. In fact, the Mycroft team is targeting the developer community.

Developers and creative types of all stripes can use Mycroft as a springboard to give life to their IoT or other voice-driven ideas. Support for If This Then That (IFTTT) and “Snappy Apps” from Snappy Core Ubuntu means that there will be lots of ways to plug functionality into Mycroft.

The projected retail cost of one Mycroft unit is relatively low at $129. The team hopes that families will opt to have several Mycroft’s throughout the home. Each unit can talk to others in an intercom style.

The Mycroft product is still in the planning stages and so far the team is not quite halfway to their $99K goal on  Kickstarter. So it remains to be seen whether or not Mycroft makes it into production. The concept certainly looks promising. If you’d like to see Mycroft become a reality, there’s time to support the campaign and order an early adopter edition.

Tables as Intelligent Assistants of the Future?

IKEA has released a website showcasing a vision of the kitchen of 2025. IKEA collaborated with IDEO and design students from Swedish universities to come up with the imaginative concepts. Central to their ultrasleek and eco-conscious culinary environment of the future is a smart table. Though the table doesn’t talk in the video, it uses language to communicate with the family it serves.

Kitchen Table 2025The smart table can sense the objects on it and communicate by projecting words onto its surface. For example, if the chef of the family sets down two different foods, the table acts as intelligent assistant by recommending various flavor combinations or recipes. It can even include suggestions based on other ingredients it knows are available in the household.

The chef can also place ingredients on the table’s smart cutting board to receive visual advice cues on how to prepare them. Not sure how to slice a mango? No problem. Just put the mango on the cutting board and the intelligent table will project graphics and instructions that step you through the process.

The table will even guide you through the preparation of an entire meal from start to finish, using a combination of projected graphics and helpful textual hints.

The smart table in the video isn’t speech-enabled. In watching the video that explains the development process for the Kitchen 2025 concepts, it’s never mentioned whether the choice to leave the intelligent table silent was deliberated or not.

It brings up an interesting question. As intelligent objects evolve, will people be more drawn to chatty assistants or to those that get their message across without speech? Only time will tell. And will we pay extra for a table that instructs us on how to slice a mango, or will we just pull up a You Tube video? It’s probably going to be a combination of compelling features that draws us into the world of intelligently assistive objects. Once we’re there, it’ll be hard to go back.

Oracle Voice – A Virtual Assistant for Enterprise Software

Oracle recently announced the availability of Oracle Voice, a smartphone-based virtual assistant designed to work with Release 9 of the Oracle Sales Cloud. Enterprise software applications are perfect candidates for virtual assistants. Notoriously complex and cumbersome, enterprise software is a great target for disruption. With the new Voice technology, Oracle seems to be pioneering the use of specialized intelligent assistants to help users more easily navigate specialized business software applications–in this case a Customer Relationship Management (CRM) system.

Oracle Voice AppIn their marketing material, Oracle describes Voice as a “fast, friendly, fun” “Siri-like Assistant.” With the emphasis on “fun,” it seems that the Voice assistant currently has a limited range of capabilities. Designed for sales reps, the Voice app helps reps speed up the process of preparing for and wrapping up sales meetings. According to a blog post from 2014, Oracle partnered with Nuance for the speech recognition software employed within Oracle Voice.

Sales reps can use the Voice assistant to post key insights into the CRM system as soon as they come out of meetings. They can also use the assistant to enter new contacts into the system. The reps use natural language dialog to make their notes. If speech interaction isn’t their preferred method of making updates to the Oracle Sales Cloud application, they can switch to a touch-and-type interface.

To improve the performance of the Voice assistant, Oracle has included many product and industry specific vocabulary items in the Voice assistant’s knowledge base. Siri doesn’t need to recognize words like “Exadata” and “Exalytics,” but Oracle Voice does.

Voice also helps reps update opportunities and add tasks. The Voice assistant prompts the rep for key details. For example, when the rep wants to create a new task, the assistant asks for the due date. Sales reps can also prepare for meetings by using voice commands to access notes, activities, and sales information.

Oracle Voice for Oracle Sales Cloud is one of the first application-specific assistants that I’ve seen in the enterprise software space. SAP offers voice-enabled commands for specific logistics functions and I’m sure there are others. I anticipate that we’ll see many more such enterprise software assistants in the coming years. To see this assistant in action, watch the Oracle Voice demo video.


IBM Ups the Ante on Watson and Cognitive Computing

IBM WatsonToday IBM announced a big investment in its IBM Watson technology and business. IBM is establishing a separate business unit for Watson, which includes a $100 million equity fund to promote small companies developing solutions within the IBM Watson ecosystem. The new business unit comes with a swanky office (see link above) in New York City’s East Village, which is home to other prominent Silicon Alley types.

There was pretty broad press coverage of the IBM announcement, but I found an article by Neal Ungerleider on Fast Company’s website to be one of the better articles on the topic. Ungerleider points out that IBM is introducing two new Watson-fueled products called IBM Watson Analytics Advisor and IBM Watson Discovery Advisor. The Analytics tool is apparently meant to be a fairly easy to use version of Watson’s question answering system that businesses can tap into. Users can send large datasets into the cloud, ask questions about the data, and let Watson do all the processing to return the answers. I’m imagining a scenario where a travel company uploads a slew of transactional, and maybe even unstructured, data to the Watson cloud and asks: “What travel destination specials should we be offering to people who live in region X” or “how can we better entice people to book a car when they book a flight?”

The Discovery Advisor tool is apparently geared more towards helping research organizations analyze and make sense of huge datasets. The articles I’ve seen indicate that areas of focus for Discovery Advisor are currently pharmaceutical, publishing, and education research.

Ungerleider also points out that IBM announced plans to move the Watson infrastructure to its Softlayer cloud platform. Critics of Watson have used the technology’s impractical hardware requirements as one of the reasons for its slow commercial adoption. Offering Watson as a Software as a Service might remove some of those concerns.

I’ve written about IBM Watson several times on the blog and I see a lot of potential for cognitive computing. The fact that IBM is putting such a big investment behind Watson dispels any doubt about how bullish they are on the technology’s future possibilities. But AI has had a difficult time living up to the hype, so it’ll be interesting to see how cognitive computing evolves over the next couple of years and whether IBM’s bet pays off. In the meantime, you can watch the cool promotional video.

Spambot or Mechanical Turk? We’d Rather Have the Spambot!

Twitter SpambotBianca Bosker recently wrote an interesting piece for Huffington Post on the alleged Twitterbot @Horse_ebooks. This isn’t your typical twitterbot tale. It turns out that almost everyone believed the tweeter behind @Horse_ebooks was a spambot, and yet the innocent, uncanny cleverness and wit of the automated tweets drew in lots of followers. But then it was revealed that the bot was actually a real person, or two real people. Followers were devastated.

Bosker draws comparisons between the @Horse_ebooks scam and the mechanical Turk robot of 1770. Wolfgang von Kempelen, a Hungarian inventor, developed a turban-wearing robot that could supposedly play chess. The inventor traveled around with the mechanical Turk and delighted crowds with the machine’s game-winning prowess. For forty years or so, the robot had crowds fooled. But eventually the robot was revealed to be a hoax, having a hidden compartment where a human could operate the machine without being seen.

Bosker postulates that people are disappointed when such supposed technological advances are shown to be fake, because we want to believe that machines can transcend their limits and become more human. Bosker also sites chatbots, such as Joseph Weizenbaum’s Eliza and the AOL Instant Messenger bot SmarterChild as evidence that people are willing to be hoodwinked by machines posing as humans. She doesn’t mention dating chatroom bots that prey on lonely singles, but I suppose such date bots fall into the same category.

In another article on the people behind the @Horse_ebooks hoax, a Gawker report reveals the team’s odd behavior and other bizarre projects they were involved with that played on user expectations and emotions.

What does all this really say about the future of virtual agent technology? Does it indicate that our conversational software doesn’t really have to be all that great to make people happy? I suppose that depends on what we’re expecting. If we’re interacting with a mobile personal assistant like Siri, we want it to be adept at understanding our questions and giving us the right answers. But we really don’t expect it to have a sense of humor. If it utters anything that even hints at being funny or quirky, it delights us. So maybe robots designed to help seniors, such as those under development by Hoaloha Robotics, don’t have to be all that talented at conversing. They need to understand commands, but even the most marginal capabilities in the areas of humor and empathy could be enough to win over hearts. At least to start out with…

Virtual Human Toolkit – A Treasure Trove for Virtual Agent Developers

Virtual Human ToolkitThe University of Southern California’s Institute for Creative Technologies offers a Virtual Human Toolkit for constructing animated conversational characters. I ran across the Virtual Human Toolkit while browsing through the official proceedings from the 13th International Conference, IVA 2013, Edinburgh, UK, August 29-31, 2013. A team from the USC Institute for Creative Technologies wrote a paper titled “All Together Now: Introducing the Virtual Human Toolkit” that was presented at  IVA 2013.

The goal of the Virtual Human Toolkit is to provide a suite of ready-made components that developers can use to more quickly build well-rounded virtual agents. Virtual agent characters can have many benefits, but they are comprised of numerous complex technical components. Most teams don’t have access to all the knowledge and skills needed to build virtual characters with a broad range of capabilities. A versatile virtual human would ideally be able to simulate human behavior, perceive and adequately react to the behavior of others, and respond appropriately to questions or statements. Virtual humans are costly to develop, so the toolkit from USC’s Institute for Creative Technologies should be a great help to small teams looking to experiment with the technology.

Based on the documentation available, the virtual human toolkit currently consists of the following components:

  • Speech Recognition
  • Natural Language Understanding
  • Nonverbal Behavior Understanding
  • Natural Language Generation
  • Nonverbal Behavior Generation

These capabilities are embedded within individual modules that are all connected via an underlying messaging platform. A core module is called Multisense. This module enables the virtual human to track and interpret the non-verbal behavior of its human conversational partner. The virtual human can track facial expressions and body language using various input devices and then analyze the input to make generalizations about the human conversational partner’s emotional state.

The NPCEditor module understands incoming dialog and then determines an appropriate response. Currently the Virtual Human Toolkit uses chatbot-like pattern matching technology to engage in dialog. The editor does appear to have the ability to use statistical models to find the best perceived response if it encounters an utterance that doesn’t match, so this capability would put it ahead of basic pattern matching scripts.

The NonVerbal Behavior Generator helps the Virtual Human plan out its nonverbal responses, which can consist of things like nodding, gesturing with arms and hands, and so on. Other components work to synchronize behaviors associated with conversational speech, which include speech, gaze, gesturing and head movements.

In their IVA 2013 article, the Institute for Creative Technologies team suggests a number of practical applications for the Virtual Human Toolkit. Among the types of uses for the technology are: Question-Answering Characters, Virtual Listeners, Virtual Interviewers,  and Virtual Role-Players.

The toolkit is available free of charge for the academic research community and for U.S. Government uses. There’s an email address to use if you’d like to contact the team about using the toolkit for commercial purposes.