What It Will Take to Build “Her”

Vlad Sejnoha, Nuance’s Chief Technology Officer, recently published a piece on Wired.com called “Can We Build Her?: What Samantha Tells Us About the Future of AI.” Like many who watched Spike Jonze’s film Her, Sejnoha was duly impressed by Samantha’s abilities. So how far away are we from achieving this type of capable intelligent personal assistant in reality? We’ve got a long way to go, according to Sejnoha, at least in several major areas.

Her filmHere are the key capabilities Sejnoha believes a personal assistant must have to match Samantha:

Under the heading of communication and context, an assistant has to be able to understand natural language in spoken form. This first one is pretty easily accomplished with today’s voice recognition and natural language processing technologies. But a truly great personal assistant must also be able to apply context to infer the meaning of ambiguous statements, which assumes knowledge about the social and physical worlds. We’re still working on that one.

Next comes the requirement for emotional Intelligence. A personal assistant such as Samantha needs to understand and express humor. That’s an amazingly tall order. But there’s more. The assistant needs to grasp the context of our social relationships and emotional ties. It won’t be able to understand our motivations or preferences if it doesn’t know how we’re connected to other people, who we like to hang out with, who we’d rather avoid, and so on. Sejnoha even suggests that a great personal assistant should understand why we feel a certain way or want a certain thing. It’s this comprehension of our underlying motivations that enables the intelligent assistant to proactively serve us.

Another key attribute for personal assistants of the future is introspection. Why introspection? Because, according to Sejnoha, an effective virtual assistant must be ale to tell us why they recommend certain options over others. It needs to explain its rationale for how it decides to fulfill our requests. Why did it choose to route us via the long way home last night, for example, or order us Chinese takeout, or refrain from passing on a message from a close friend until after we’d enjoyed a few cocktails first? Only if we understand what it’s thinking can we truly come to understand, appreciate, and trust our personal assistant. The assistant should also share with us where it’s getting its information and present a confidence level in the information (like IBM Watson applies confidence levels to its answers). In the best scenario, our assistant would even share alternative views when warranted: “This story says Company X’s stock is falling due to an anticipated decline in earnings, but this one says it’s because of the botched merger with Company Y.”

A  competent personal assistant should also possess the ability to access and understand unstructured data without curation or pre-structuring of information sources. Today, you can’t just hook IBM Watson up to a digital library and expect it to get smart. There is ton of work that goes into establishing rules and frameworks that enable Watson to make sense of the data first. This sort of hand-holding isn’t scalable.

Lastly, a personal assistant with a compelling personality that we’re attracted to and comfortable with needs to have a great, human-sounding voice. We all know that’s still a faraway target.

We’re still looking for solutions to most of these challenges. Though our technologies still have a long way to go to producing Samantha, Sejnoha mentions some recent advances that are helping us make progress towards the goal. Noise processing and multi-microphone arrays with directional beams are dramatically improving the quality of sound processing and enabling improved voice recognition. Deep neural networks (DNN) are improving machine learning. Sejnoha speculates that combining deep neural networks (DNN) with the best of older symbolic processing techniques might result in improved acoustic and language models to further enhance voice recognition quality and natural language processing. This accelerated learning might also help us develop personal assistants that are able to understand and articulate their own cognitive processes. Perhaps a hybrid of DNN and symbolic processing could even aid personal assistants in understanding unstructured data without lots of prior help from humans. And the same advances in machine learning are giving rise to new speech generation models that will one day produce more modulated human-like speech, Sejnoha writes.

Interestingly, an article appeared recently on FastCompany.com about a personal assistant technology company that grew out of research by Cambridge University. The researchers claim to have developed a framework that is much more effective than the traditional rules-based approach (used by Siri and Google Now) at producing the kind of advanced personal assistant that Sejnoha describes. I’ll try to write a post about the FastCompany article soon. In the meantime, keep dreaming about the day we all have access to our very own Samantha.

Share your thoughts on this topic

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s