How Intelligent Assistants Might Predict What Restaurants You’ll Like

The Technology Review published an article on Jetpac, a new travel app that uses image identifying technology to rate and recommend restaurants and other travel destinations. Whereas most recommendation apps rely on user reviews, Jetpac applies image analysis algorithms based on deep learning to crowdsourced images–in this case photos on Instagram–to make determinations about a venue.

JetpacHow does this work? When it comes to restaurants, the machine algorithm analyzes Instagram photos and tries to identify specific objects. For example, if it picks up a prevalence of martini glasses or wine glasses, the program makes the assumption that the restaurant is higher class. if it finds more plastic cups or beer bottles, it assumes a lower-end establishment. The program can also determine whether the restaurant is pet friendly by counting the number of pets at outdoor tables in photos.

To get a sense for how well people like the restaurant, or whether it’s a fun place to hang out, the program looks at how many people are smiling or laughing in the photos. By observing what people are wearing, the app can also try to make judgments about the general type of clientele that frequent the place (apparently chunky glasses are indicative of hipster types).

Can intelligent assistants leverage this kind of machine learning to help pick out the best places to recommend? Right now, most assistants are limited to Yelp ratings to rank the restaurant results they return. But what if a super powerful intelligent assistant could scan hundreds of Instagram photos when you ask about the best local coffee shop? And what if it knew you well enough to know you’ll probably like the place with outdoor tables and flower boxes? It could use that knowledge to tailor its recommendations just for you.

Though this type of photo analysis technology is in the early stages, apps like Jetpac give a hint of the future possibilities. At some point, they’re sure to be integrated into the smart assistants that serve us.


Deep Learning, Neural Networks, and the Future of Virtual Agent Technologies

The Sydney Morning Herald recently published an article by Iain Gillespie in their Digital Life section on advances in deep learning technologies. Gillespie quotes Tim Baldwin of the University of Melbourne as confirming that deep learning has gained some new ground recently, helped along by Moore’s Law and the advent of ever faster computational processing power that’s needed to successfully train multilayered neural networks.

Neural NetworksRay Kurzweil is quoted as saying that he’s looking for 2029 to be the date when intelligent software develops both logical and emotional intelligence. Everyone probably has an opinion on Kurzweil technology predictions, but there’s certainly evidence that machine learning has made a lot of progress in just the last five years. Some of this progress is evident in speech recognition applications, recommendation engines, and control systems such as self-driving cars. Kurzweil’s description of intelligent assistants of the future sounds reminiscent of the capabilities exhibited by Samantha, the intelligent talking operating system in the movie Her, which I wrote about earlier this month.

An example often used to show both the promise and current limitations of neural networks is a Google X lab experiment that fed millions of still YouTube images into Google Brain, an AI system based on neural networks. The Gillespie article mentions this example too. After evaluating the million+ data points, Google Brain was able to independently recognize the images of human faces, human bodies, and–unexpectedly–cats. The cat recognition capability provided fodder for lots of geek jokes. (New York Times: “How many computers to find a cat? 16,000).

The Gillespie article got me searching for more information on deep learning. There’s a recent article on the topic in  Nature by Nicola Jones. Jones calls deep learning a revival of an older AI technique: neural networks. A concept inspired by the architecture of the brain, neural networks consist of a hierarchy of relatively simple input/ouput components that can be taught to select a preferred outcome and to remember that right answer. When these simple learning components are strung together and can operate in parallel, they are capable of processing large amounts of data and performing useful analysis (such as correctly determining what someone is saying, even when there’s distracting background noise).

One of the ongoing debates in the machine learning field revolves around the effectiveness of unsupervised versus supervised learning for AI. Some researchers believe that the best way to teach an artificial intelligence system is to  prime the database with facts about the world (“dolphins are mammals, marlin are fish”). Supervised learning typically refers to explicitly teaching a computer system/neural network by presenting it with linear data sets and giving it the next right answer. Being able to predict the next logical output in a sequence is key to machine learning.

Unsupervised learning involves feeding a neural network or other system of computer algorithms with data sequences that it analyzes to find meaningful patterns and relationships on its own. The Google Brain cat example referred to earlier is an example.

Regardless of the techniques used, it seems evident that some form of machine learning will be a critical force, if not the force, behind advances in virtual agent / intelligent virtual assistant technologies. To achieve true conversational capability, virtual agents will have to be able to routinely understand and engage their human dialogue partners. For a very in depth and informative article on machine learning, I recommend the article “Machine Learning, Cognition, and Big Data” by Steve Oberlin.

Deep Learning and Speech Recognition

NEural networkForbes published an interesting interview with Jeff Dean of Google, along with an introduction to the concept of Deep Learning.  Dean’s work using the Deep Learning paradigm has led to a fundamental change in the way Google’s speech recognition systems work. The previous model was an acoustic model, whereas the new approach uses deeply layered neural networks to sequence and categorize phonemes.

Using this new process, Google saw dramatic improvements in speech recognition.

The three keys areas contributing to these advancement are:

  • The Deep Learning architecture
  • The ability to feed systems vast amounts of data for training
  • The availability of extremely powerful computer processing

One of the advantages of the Deep Learning technology is that software can train itself to recognition patterns. This unsupervised learning enables machines to improve their performance without relying on humans to painstakingly point out every identifying trait of the object or sound the machine needs to learn to recognize. Instead of describing to the program every individual feature of a cat, for example, the Deep Learning network is fed with millions of examples of cats. The network teaches itself to recognize the features that typically identify a cat. This unsupervised learning approach turns out to be much more effective in reality. It’s impossible for a human to code every possible feature of every possible cat in every possible position. A Deep Learning network that has trained itself to pick out a cat by looking at millions of examples is less likely to be fooled.

So what does the Deep Learning technology have to do with intelligent virtual agents and web self-service? Though many virtual agents are still text based, the tide is slowly shifting towards voice activated apps. Customers want their questions to be understood. High performing speech recognition technologies may very well be the backbone of customer-facing and enterprise virtual agents of the future.