Hear Me, Understand Me

Hear me, Understand meVisionMobile published a very informative report last June on the history of Virtual Assistant technology called Beyond Siri: The Next Frontier in User Interfaces. One section of the report dissects the five core components of a mobile virtual agent. These components are relevant to web-based customer support virtual assistants as well. The five virtual assistant components are:

  • Speech Recognition
  • Natural Language Processing
  • User Profiling
  • Search & Recommendations
  • Avatar Visualization

For this post, I’m going to start by looking more closely at the first two components. What’s the difference between Speech Recognition and Natural Language Processing? Based on the definitions provided in the VisionMobile paper, Speech Recognition is the ability of the software agent to understand spoken sounds and transcribe these into written language. When you speak the words “What’s the current weather in Miami?” the software has to be able to recognize these sounds as words.

Julius is an example of an open source speech recognition decoder software. This technology is so central to the success of mobile virtual assistants, or any virtual agents that rely on spoken language as their user interface, that a provider’s speech recognition software can be a major differentiator in the virtual agent marketplace.

Natural Language Processing describes the software agents ability to parse spoken words, put them into context, and extrapolate their meaning. To do this successfully, the agent needs to have some understanding of the speaker and the speakers environment and situation. The virtual agent also uses Natural Language Processing to formulate responses to the speaker’s questions. The goal of Natural Language Processing is to enable life-like back and forth dialogue. Current technology still seems to be a long way from achieving this goal, but vendors are working hard to improve the conversational ability of their virtual agent software.

OpenNLP is a machine learning based toolkit for processing natural language text.  Check out the OpenNLP website for more information on the software’s capabilities.

Whether you’re interested in purchasing virtual software agent to support your business and customers or in building a web-based or mobile virtual assistant, you’ll be interested in understanding as much as you can about of both Speech Recognition and  Natural Language Processing.

I’m Sorry You’re Upset

How Are You FeelingFeeling angry, depressed, or maybe happy as a clam? It may not be long before an intelligent virtual agent you’re conversing with is able to accurately gauge your emotional state. Advances in technology are enabling artificially intelligent software to pick up on subtle queues in human speech patterns. In the article Teaching Computers to Hear Emotions, IEEE reports on recent work by interns at Microsoft Research with spoken language software systems. Their work shows that software can have a surprising success rate at predicting a speaker’s emotional state by examining variations in the loudness and pitch of the speaker’s voice.

The implications for intelligent virtual agent technologies are just beginning to be explored. Obviously, a digital customer support agent that is able to sense when a customer is losing patience or becoming angry will be better equipped to serve the customer effectively. Recognizing the onset of negative emotions could prompt the virtual support agent to take a different approach with the customer. In some cases, a change in emotional state may be a signal to the virtual chatbot to escalate the conversation to a human support agent.

As speech recognition and spoken language systems become more sophisticated in picking up on human emotional queues, the applications in the realm of virtual agents, digital support representatives, and artificial intelligence in general are limitless.