A few months ago, I wrote about X.ai and the intelligent assistant product that they call Amy. Amy is different from most personal intelligent assistants in that you don’t talk to her directly. Instead, Amy ‘listens in’ to your email conversations and takes instructions based on the content of what you write. You can instruct her specifically to set up a meeting for you, but she’s intelligent enough to pick up the nuances of what you need by analyzing your emails.
This type of implicit understanding seems to be the latest trend in intelligent assistant technologies. Techcrunch reported last week that Google acquired Emu, an instant messaging app that appears to have the same sort of context-aware, behind-the-scenes smart assistant built into it. It was just back in April of this year, according to another Techcrunch report, that Emu exited Beta with its mobile messaging app. Obviously, Google must see a lot of promise in the technology if they were anxious to snap it up so quickly.
Emu seems to have a broader range of talents than X.ai’s Amy at this point. According to the Techcrunch article, Emu can proactively offer up contextual information based on a number of different topics that you might happen to be texting with friends about. If you’re texting about a dinner date, for example, Emu can show you your calendar, as well as the location and Yelp ratings of relevant restaurants. It can offer the same type of on the spot info about nearby movies if the conversation turns in that direction. The app also lets you tap a button to carry out an action related to the information Emu has retrieved. For example, you can reserve a table at a restaurant or purchase movie tickets.
All of the attributes make Emu sound more like a real personal assistant then either Siri or Google Now. And it seems the importance of perfecting voice recognition is taking a back seat to an assistant’s ability to infer context and relevant data based on “ambient” information. I use the term ambient to refer to information that surrounds us in our emails, texts, and search behavior. Google Now seems to be more satisfying than Siri as an assistant, precisely because you don’t have to talk to it or ask it anything. It picks up pieces of relevant information about your life by accessing the same data sources that you use routinely.
It will be interesting to see what Google does with the Emu acquisition. It’s also a fun thought experiment to consider how this type of ambient assistance could be applied to enterprise virtual assistants. Recommendation engines, like those suggesting books and movies you might like, are an example of this technology. Customer service intelligent agents that are smart enough to assist you based on a knowledge of your past purchases and preferences might be an appealing concept–as long as they can steer clear of the creepy factor.
Back in August, Parmy Olson wrote an interesting article in Forbes on some of the challenges that Nuance faces with its digital assistant Wintermute. I wrote about Wintermute in an earlier post. Olson makes the observation that Nuance rolled out its Wintermute personal assistant technology just as Google and Apple might be catching up and arguably even surpassing Nuance at its own game.
Nuance acquired Dragon Naturally Speaking software technology via a round about means from inventors James and Janet Baker. The Dragon dictation software has been a big component of Nuance’s product line. Nuance also licensed its speech recognition technology to Apple for use in Siri. But with speech recognition becoming such a core capability for today’s smart phone apps, Google and Apple have been investing heavily in developing their own homegrown solutions.
Olson points out that Google’s voice recognition is based on deep learning technology, whereas Nuance’s approach to speech technology relies on statistical inference that analyzes syllable sounds to identify words. The jury is still out on which technical approach has the most promise, but Google’s implementation of voice recognition has been working. What’s even more threatening to competitors is the fact that Google offers its technology free to Android developers. A case in point are the recent German high school grades behind Voicesphere, which we wrote about a few weeks ago.
Apple recently established a research center in Boston where it’s been pursuing speech technology projects. Many of the team members are former employees of a speech software company that was once acquired by none other than Nuance. Observers speculate that Apple is developing its own voice recognition software that will displace the Nuance components from Siri in upcoming versions.
None of these facts proven with any certainty that Nuance is being overtaken by the competition (and current partners). Wintermute’s mission is to learn as much about your preference and habits as possible, store this knowledge in the cloud, and use that data to infer what you want and what you mean. In other words, Nuance has trained Wintermute to read your mind, which is what a really helpful digital companion needs to be able to do. So far Wintermute still seems to be more of a project than a fully fleshed out commercial offering. It’ll be interested to see how this technology pans out for Nuance as the competition in speech technology and personal digital assistants continues to heat up.
Venture Beat recently reported that Google has acquired two speech technology patents from SR Tech Group LLC. A press release from SR Tech Group LLC identified the patents as U.S. Patent No. 7,742,922, titled “Speech interface for search engines” and U.S. Patent No. 8,056,070, titled “System and method for modifying and updating a speech recognition program.”
The filing date for the first patent was November 9, 2006. Reading the abstract of the technology covered by the patent, it sounds like a very generic description of a voice-activated search. The user says what he/she wants to look up, the application uses speech recognition and natural language processing to determine how best to construct the search query, and the application runs the query and returns the result. The second patent describes a system that a user or system administrator can employ to makes updates to the grammar (as in underlying language database) of a speech recognition program.
I’m not a patent attorney, but based on the generic flavor of both of these patents, it seems like Google may have acquired them as a defensive maneuver. Having these broad reaching patents could give them ammunition against other companies that might want to declare future patent infringements in other technology areas. It’s not readily apparent that either patent offers breakthroughs that wold drastically improve Google Now or Google’s recently demonstrated conversational search functionality for Chrome. It’ll certainly be interesting to observe how Google continues to build out speech activated search and how other companies look to compete within the same arena. There seems little doubt that conversational search will play an important role in search, and in virtual agent technologies of the near future.
Forbes published an interesting interview with Jeff Dean of Google, along with an introduction to the concept of Deep Learning. Dean’s work using the Deep Learning paradigm has led to a fundamental change in the way Google’s speech recognition systems work. The previous model was an acoustic model, whereas the new approach uses deeply layered neural networks to sequence and categorize phonemes.
Using this new process, Google saw dramatic improvements in speech recognition.
The three keys areas contributing to these advancement are:
- The Deep Learning architecture
- The ability to feed systems vast amounts of data for training
- The availability of extremely powerful computer processing
One of the advantages of the Deep Learning technology is that software can train itself to recognition patterns. This unsupervised learning enables machines to improve their performance without relying on humans to painstakingly point out every identifying trait of the object or sound the machine needs to learn to recognize. Instead of describing to the program every individual feature of a cat, for example, the Deep Learning network is fed with millions of examples of cats. The network teaches itself to recognize the features that typically identify a cat. This unsupervised learning approach turns out to be much more effective in reality. It’s impossible for a human to code every possible feature of every possible cat in every possible position. A Deep Learning network that has trained itself to pick out a cat by looking at millions of examples is less likely to be fooled.
So what does the Deep Learning technology have to do with intelligent virtual agents and web self-service? Though many virtual agents are still text based, the tide is slowly shifting towards voice activated apps. Customers want their questions to be understood. High performing speech recognition technologies may very well be the backbone of customer-facing and enterprise virtual agents of the future.
The race is on for a digital virtual assistant that can understand spoken language and provide users with exactly the information they need, when and how they need it. Oh, and the digital agent should be able to carry on a meaningful conversation as well. In a recent PCWorld article covering South by Southwest Interactive in Austin, Amit Singhal of Google is quoted as positioning the understanding of speech, natural language, and conversation as some of the key challenges facing the search giant today.
People’s need to search for information has become integral to our interactions with the web. When we are online, we are nearly always searching for something, be it news of the world, updates from friends, information on specific goods and services, or what’s playing at the local cinema or on TV. The list goes on. Google imagines a near-term future where smart virtual agents are embedded in our world, in wearable devices such as Google Glass, and where these agents can quickly act on our behalf to get us the information we need. Our virtual assistants should be able to understand what we’re saying when we speak to them. And they should be able to answer us in a way that’s not only informative, but natural.
Competing with Google in the arena of intelligent voice-activated agents, albeit indirectly, are companies that provide voice recognition systems for use across multiple problem spaces. In my next post, I’ll take a look at an article that examines some of the companies in this arena and that highlights how today’s most successful virtual agents are deployed. In the meantime, I hope you enjoy the referenced article with information from Google’s Amit Singhal.