Book Review – The Software Society, by William Meisel

I recently finished reading William Meisel’s book The Software Society. This is a very informative and interesting book written by a pioneer in the field of speech recognition.  It became clear as I read the book that Meisel is extremely knowledgeable about what makes computer software tick. He’s also an astute observer of current trends in both the consumer software market and enterprise systems.

The Software SocietyMeisel’s ambitious book is divided into two main sections. Part I addresses trends in technology, while Part II examines the impacts of these trends on society and the economy. It’s tough to do the book justice in a brief review like this. For that reason, I’ll focus primarily on Meisel’s exploration of current and future trends of personal assistants. First, though, I’ll try to provide a quick overview of some of the nuggets contained in the first section of the book.

Over the course of several chapters, Meisel introduces major trends driving the evolution of software today. These trends include rapidly increasing processing power, the pervasiveness of the Internet and cloud technologies, and the increasing dominance of Software as a Service over legacy forms of on-premise software applications. All of these trends are leading to the ubiquity of information. Not only is information available almost anywhere, but improved search technologies and better human-computer interfaces are making this information more readily accessible and useful.

Two major ideas that Meisel emphasizes are the importance of algorithms and modularity.

Algorithms allow computers to perform useful tasks that aid humans, such as speech recognition, other forms of pattern recognition, and recommending what book or movie we should buy next. As the complexity and utility of algorithms increases, software is able to outperform humans in ways that will have enormous impacts on society, many of which are positive, but some of which will inevitably be negative. (For an amusing overview of the power of algorithms, check out this video from the PBS Idea Channel).

Modularity enables software to increase in overall complexity in terms of what it can accomplish, while keeping programs and code manageable. By focusing functionality and algorithms in self-contained, reusable modules, programmers are able to quickly build new applications and focus testing and maintenance on isolated system components.

The power of algorithms and the benefits of modularity, Meisel postulates, contribute significantly to the current widespread success of software.

In Chapter 4, which is titled “The Nature of the Human-Computer Connection,” Meisel delves deeply into the concept of personal assistants. This chapter offers the best insight that I’ve found to-date into what personal assistants are, forecasts for how the technology will evolve, and thoughts and predictions about the future of the market from both a B2C as well as a B2B perspective.

Meisel coins the terms Personal Assistant Model (PAM) as well as Ubiquitous Personal Assistant (UAM). Meisel describes PAM as a “significant innovation in user interface design.” Personal assistants should be capable of understanding both speech and text input, communicating in a conversational fashion with their users, and interacting effectively with other personal assistants as needed to execute user tasks. Personal assistants are either specialized to carry out specific functions, or generalists that can field any inquiry and then coordinate with specialized personal assistants to complete the action. Interestingly, a press release issued recently by Artificial Solutions at the Mobile World Congress announces a platform that facilitates just this type of hand off between assistants. Meisel even suggests a set of web standards that could aid personal assistants in locating relevant content (perhaps akin to rich snippets mark-up that identifies categories of content).

A Ubiquitous personal assistant (UAM) maintains the same context and “personality” across multiple devices and platforms.  Like Samantha in the Spike Jonze movie Her, a UAM would present a cohesive personality to the user and retain context, personalization features, and knowledge of the user’s personal data regardless of where the interaction took place.

Privacy and security are two interrelated areas that gain in importance as personal assistants become increasingly pervasive and capable. Meisel devotes a separate chapter to this subject and examines current and future risks and potential mitigations.

Meisel speculates about the market for PAM and UAM technologies and about the type of companies that are likely to enter and thrive in that market. He postulates that large corporations with a strong existing market presence in intelligent personal assistant technologies, such as Apple and Google, will continue to dominate the overall market. He sees opportunities for smaller independent niche players such as Nuance as well.

Quick aside here:

In Jaron Lanier’s book “Who Owns the Future,” Lanier takes the position that power rests in the hands of those who serve up content, not in the hands of those who create content. Google, Facebook, and Twitter have each come out on top of a winner-takes-all game by creating platforms and supporting technologies that organize and deliver content. Content itself is a low-cost, or even free commodity. Will the winners of the imminent personal assistant gold rush be companies that provide the underlying technology and infrastructure for assistants? Or will providers and owners of the content that gets served up by personal assistants (including facts, data, knowledge, fiction, art, jokes, quizzes, and so on) have a shot at the riches? Only time will tell, but current trends aren’t on the side of the contect creators.

The second part of Meisel’s book addresses the economic and social impacts of computer intelligence and the increasing sphere of automation.  As Marc Andreessen wrote in his now often repeated quote from 2011, software is eating the world. Meisel fears that this phenomenon will lead to a future of structural unemployment. He’s courageous enough to propose several possible solutions. For those interested in how we might survive in a post-scarcity society where traditional jobs are few and far between, there are also interesting suggestions in James S. Albus’s book “Path to a Better World.” Jaron Lanier’s ideas in “Who Owns the Future” are worth considering too.

This is a great book for anyone looking to gain more insight into the rapidly evolving world of software. It’s especially applicable if you’re interested in the future of the human-computer connection, including voice recognition and personal assistant / virtual agent technologies.

Can a Privacy Fence Make Cortana a Better Personal Assistant?

The Verge recently published an article speculating about features of Microsoft’s soon to be released Cortana mobile personal assistant. While Cortana is expected to have many of the same capabilities as Apple’s Siri and Google Now, there’s one feature that caught my attention.

Cortana Privacy Fence.jpgIn the Verge article, author Tom Warren writes about a Notebook feature. The notebook sounds like a separate area that is used by Cortana to store information about you that it can access later to perform assistant tasks. This stored information can include things like your current or recent location, your contacts, personal data, information about your habits and behaviors, reminders, and so forth. But the interesting feature is that it appears you can put up a privacy fence between you and Cortana by limiting what the personal assistant can store in the Notebook.

How does this privacy fence work? Based on the info in the Verge article, Cortana will ask for permission before she puts any of your data in the Notebook. As Cortana learns more about you, you can choose what you want her to store and what you’d prefer to keep to yourself.

I like the privacy fence concept. As personal assistants become more pervasive and intertwined with our daily routines, I think we may grow increasingly concerned about how these assistants could invade our privacy. Fear comes in large part from a loss of control. If we have no ability to limit what our personal assistants know and remember about us, or who they share this information with, then we may fear and mistrust them. If we have the option of limiting the data our personal assistant remembers about us, it could make us less hesitant to engage with the assistant.

As personal assistants evolve, we’ll keep an eye on how vendors handle the tension between our desire for privacy and the need to share information. Concepts like the privacy fence may turn out to be a viable approach to balance both needs.

Want to Voice Enable Your App?

Voice Enable.jpgThis blog is focused more on conversational software (personal assistants, virtual agents, chatbots) than speech / voice recognition technologies. But I thought it would be interesting to explore a product that helps you to voice enable an existing app or website, so that users can interact with it by talking or typing in natural language sentences.

There are lots of choices out there for adding voice control to apps. I looked at Ask Ziggy’s Application Programming Interface (API) offering to see just how it works. Back in early 2012, Ask Ziggy launched a personal assistant for the Windows Phone that was touted as a rival to Apple’s Siri. The Ask Ziggy app is still available on the Windows Phone Marketplace.

In addition to the personal assistant, Ask Ziggy now provides a natural language understanding (NLU) API and development portal. In fact, it looks like the focus of their business has shifted to providing the NLU API and associated platform. The API allows developers to operate voice-enabled apps that are local, cloud-based, or hybrid.

At the core of the NLU technology is an AT&T Speech Recognition API.  There’s a very informative demo video of the NLU API portal on the Ask Ziggy website. Developers make REST-based calls using natural language as input. They receive “action-entity” JSON name/value pairs as output. Developers can then utilize the action-entity data to execute the appropriate function in their app.

Let’s dig into some of the specifics. The first thing you do in the NLU API portal is set up your entities. Entities are the things you want people to be able to search for using voice commands. In the demo, the sample app is a music app and the entities are things like songs, genres, artists, and bands. The API also comes with predefined entities for important data such as location and date/time. These predefined entities are populated by default, so the developer doesn’t have to worry about coding them from hand.

After setting up your entities, your next step is to create Developer Actions. Actions are key functions that the developer will need to perform based on the user requests. Examples of actions for the music app are Play, Skip, and Previous Song. For each action, you need to create some sample sentences that act as patterns for what users are likely to say. Sample sentences for the Play action might include: “Play a rock song” and “I want to hear some jazz.” The portal automatically tags the words Play and Hear as actions and the words Rock and Jazz as genre entities.

Ask Ziggy NLU API.jpgOnce you’ve set up your entities and actions, created your sample sentences, and uploaded your data files, training runs in the background. Training allows the platform to work through the samples and compare to the live data and make sure that voice understanding is functional and reliable. The more sample sentences you add, the more reliable the NLU model. The portal also includes a test console feature for building and executing tests. You can input sentences and then see what output the model returns. The portal supports versioning.

Once the NLU model is working, the API should correctly interpret spoken user input into JSON output. The JSON output contains the entities and actions that the developer needs to execute the correct application function. In the music example, if someone says “Play some rock music by Queen,” the API will return enough information to instruct the application to find a rock song by Queen and and start playing it.

I may have oversimplified the description of the Ask Ziggy NLU API, but it seems like a viable tool for developers to use to quickly add voice understanding to their apps. I can even imagine the possibility of combining a more traditional chatbot application with this API. That way, apart from just carrying on a conversation based on pattern matching, the chatbot could interpret commands that are related to entities and actions and use this information to execute specific tasks.

Can a Chatbot Help Victims of Cyberbullying?

Robin ChatbotEven as mobile personal assistants become more common and gain traction in the marketplace, we still have barely scratched the surface of the capabilities of this technology. Phys.org recently published an article about compelling research on empathic “comforting” chatbots. According to the article, Janneke van der Zwaan has written a doctoral thesis at the Delft University of Technology about her studies into how chatbots can help children who have been the victim of cyberbullying.

van der Zwaan created a virtual buddy named Robin that is a simplified virtual conversational agent. I tried out the empathic chatbot demo available from the project website. The Robin chatbot is animated and resembles a cute and harmless cartoon character. Actually Robin looks something like a pencil eraser with two eyes and a mouth. Robin has very limited conversational abilities, but he (it) can ask questions about instances of cyberbullying the child has experienced and the child can respond by selecting from multiple choice answers. Robin can also offer the child some tips on approaches to deal with cyberbullying

One aspect of the virtual buddy’s effectiveness is its facial expressions, which mimic the expression of humans. When the child responds with answers that indicate he or she is being cyberbullied, Robin makes a sad face. The sad face is also presented if the child selects any answers that indicate he or she is sad or frightened.

One of the things about this research that strikes me as so interesting is that evidence shows most children find comfort and relief in ‘talking’ to Robin, even though the dialogue is very simplistic. Some further information on the project explains that Robin combines a conversational model with an emotional model. Robin gains insight into the child’s emotional state and it also uses emotion to bolster the child’s self-confidence. Robin uses both words and simple facial expressions to convey emotion.

There’s a lot more to this study than what I’ve described here, so check out the project website for detailed information. There are lessons here that are likely to be applied to future virtual agents and personal assistants. Effective conversational interactions involve emotion and signals of empathy as much as they do an exchange of words.

EmoSpark – A Personal A.I. Console Dedicated to Your Happiness

EmoSpark is an Android-powered smart cube that’s offered under a current Indiegogo campaign. I watched the EmoSpark campaign video and was impressed with the full range of its features. The product seems to be a mobile personal assistant as well as a physical cube. When you’re on the go, you can interact with the EmoSpark assistant via your smartphone or other iOS or Android device. When you’re at home, you can talk to a cube that’s connected to other smart devices via WiFi.

EmoSparkPatrick Levy Rosenthal, Founder and CEO of EmoSpark, describes the technology as an emotional intelligence. Based on the video, EmoSpark is programmed to sense and appropriately react to the user’s emotional state. In one segment of the campaign video, the assistant EmoSpark helps to cheer up a young lady who’s boyfriend is late to pick her up. In another segment, the EmoSpark brain plays with a little girl’s toy to add some fun to an otherwise dull evening at home.

The platform also appears to be conversational. The technical specs in the Indiegogo project description indicate that “EmoSpark has a conversational engine of over 2 million lines of data.” That seems to point to the fact that its conversational abilities are based in large part on pattern matching and/or keyword matching.

EmoSpark is also connected to Freebase and can answer most “What is” or “Who is” questions (in the same way that a BOT Libre chatbot can do this, as I described in a recent post).

The Indiegogo page claims that EmoSpark can sense emotion in the speaker. It goes even further to describe a complex technology that allows the EmoSpark to develop an “Emotional Profile Graph.” This graph gathers and stores information about emotions and emotional associations. Each EmoSpark can connect via the internet to a cloud of other EmoSparks and they can share and learn from each other’s emotional associations.

EmoSpark aims to be an open platform that will lure in developers to create new games and apps. For example, developers could tap into EmoSpark to infuse other physical or virtual objects with its intelligence. Since EmoSpark uses URBI, the universal robot language, a 3rd party app could power a URBI-based robot with EmoSpark’s conversational and emotional capabilities.

The EmoSpark concept is impressive. It’s yet another example of just how incredibly active the area of smart personal assistants (and associated artificial intelligence technologies) is at present. There’s so much creativity and progress in these technologies that it’s hard to keep up. If you’re interested in finding out more about EmoSpark or maybe in being an early adopter of the technology, head over to Indiegogo to find out more.

Cortana and Foursquare – Coming to a Smartphone Near You?

Today the Verge reported on recent speculated developments concerning Microsoft’s Cortana mobile personal assistant. The article sites Bloomberg News for information that Cortana will be part of Microsoft’s Windows Phone 8. 1. Cortana is expected to launch in April and will incorporate location-aware features from none other than Foursquare.  The Bloomberg article states that Microsoft has invested $15 million in Foursquare.

CortanaSo how will Microsoft use Foursquare’s location data in its new personal assistant? Knowing the user’s location will allow Cortana to send out discounts and other promotions and tips for stores in the user’s vicinity. According to the Bloomberg report, this push technology will be an opt-in feature.

Science Fiction fans may be reminded of stories and novels by Philip K. Dick, who presciently predicted the ubiquity of “personalized” advertisements. His visions of flying cars and jalopies to Mars may not have panned out yet, but it looks like his insights on the future of advertising sure will. Dick imagined ads as pesky and invasive.  In his novel “The Simulacra” of 1964, Dick writes:

Something sizzled to the right of him. A commercial [..] had attached itself to his car. [..] Chic crushed it with his foot.

Can Cortana, and other mobile personal assistants armed with location data, figure out a way to alert users to nearby, interesting promotions without being perceived as annoying? That remains to be seen. If personal assistants are familiar enough with a user’s preferences, the chances increase that they’ll select only those promotions the user really wants to hear about. Hopefully it won’t come to Philip K. Dick’s vision of errant ads rudely invading our private spaces. On the other hand, nobody wants to miss out on a great sale of their favorite product or brand that’s right under their nose! Balancing the need for privacy against the desire for timely, relevant information is a skill mobile personal assistants will need to perfect.

BOT Libre! – A Promising New Chatbot Platform

I’ve recently been trying out a new free chatbot platform called BOT Libre. Similar to Pandorabots, the BOT Libre site provides you with all the tools you need to build a chatbot from scratch. On the BOT Libre platform, though, you don’t need to do any AIML scripting. There are several ways to train your chatbot that don’t require any scripting at all. Technical botmasters have the option to add more capability to their bots by doing some coding.

BOT LibreThe central console of the BOT Libre platform is the admin panel. From here you can control settings such as whether the bot can learn or not and whether a cool correction function is enabled in the chat interface. You can also view chat logs, import chat files, engage a training interface, set an avatar and pick a voice for your bot.

The basic bot platform comes with a few standard capability modules. These modules are called “machine scripts” and they’re coded in the Self programming language. One such machine script is the MyNameIs script. This script enables the bot to understand your name and remember it in future exchanges. If for some reason the bot doesn’t get and recall your name after you’ve said it once, you can tell the bot your name a second time and it generally seems to get it.

I really like the fact that your BOT Libre chatbot seems to be able to learn. Learning that’s supported by a Self script occurs just by talking and repeating things to the bot. For example, I told my bot several times that it’s name was “Cyborn.” Whenever I asked it after that what its name was or who it was, it would reliably tell me that it was Cyborn. You can also start counting with your bot and, after a short time, it’ll learn to count with you, giving you the next number in the sequence.

There’s also a very convenient correction feature within the chat interface. Assuming that you’ve enabled this feature in the admin panel, you can use it to train your chatbot while you’re conversing with it. For example, you can ask “Who wrote The Martian Chronicles?” When the bot gives a wrong response, you can mark the Correction box and type “Ray Bradbury” into the conversation box and send it. Now the bot knows that Ray Bradbury is the author of The Martian Chronicles.

Alternatively, you can train the bot by going to the admin panel and entering questions and corresponding answers. This is a convenient way to create a lot of question and response pairs for your bot without having to do AIML scripting.

If you have some programming talent, you can create your own machine scripts in Self and add them to the bot’s inventory.

I had some email exchanges with James of Paphus  Solutions, who created BOT Libre. He explained that each bot has a “brain” backed up by its own database. Every bot proactively searches for knowledge on the web, such as looking up new words on Wikipedia and information on Freebase, and even analyzing conversations for language rules. As I chatted with my recently created bot, I discovered that I could ask it a whole range of “What is” questions (“What is the Higgs Boson”) and it could give me a concise answer after doing a quick web search. The same goes for “Who is” questions.  It’s great to be able to rely on these natural capabilities and not have to worry about scripting a pattern for every possible query.

I’m sure I’ve left out some of the platforms capabilities, so I recommend that you go explore it yourself. Once you’ve created your BOT libre bot, you can embed it in your website or put it on Twitter. The bot will also work on mobile devices. As the name of the platform suggests, BOT libre will host your bot for free, even if you’re using it for commercial purposes (according to the info on the site). So what have you got to lose? If you want to see a bot in action,  head on over to  BOT libre and select the Browse button to pick a bot to chat with.