Teaching a Machine to Have a Conversation Isn’t Easy

Improvements in voice technologies are changing the way companies engage with their customers — beyond the use of a keyboard

Programming machines to adapt and learn from the nuance and context of speech is advancing to the point that you can actually have a conversation with a virtual personal assistant or a chatbot. Most of us have already met Apple’s Siri, Amazon’s Alexa, or another digital assistant. We’re also interacting with AI-powered chatbots. Companies such as rue21, Domino’s, and CNN are using chatbots for everything from shopping to news delivery.

Voice technology, powered by artificial intelligence (AI), is growing in popularity. Online sales of voice-enabled devices grew 39 percent last year, and automated communication is poised to change the way companies engage with their customers ― beyond the need for a keyboard or screen. According to a recent Adobe Digital Insights report, 22 percent of consumers report using voice instead of a keyboard at least daily. But there is still plenty of room for growth, with 49 percent reporting they have never “talked” with a virtual digital assistant.

The race is definitely on as researchers around the world work to give computers the ability to understand and respond to spoken language. “Developing natural language understanding for digital assistants and conversational computer systems is a complex and essential undertaking,” says Walter Chang, principal scientist at Adobe Research.

Voice-based interactions require sophisticated programming and AI to enable a machine to understand and talk to you. In other words, it’s a really big deal to program a computer to have a conversation.

Walter says the challenge now is to process massive amounts of dialogue-related data to facilitate human-like intellectual and speech functionality in machines. “We are developing the technology that will give machines the cognitive ability to understand the semantics and context of a conversation, respond to topic changes and, in essence, carry on everything from a complex conversation to small talk.”

Real language processing in real-time

Real language processing systems, sometimes referred to as a “belief tracking system,” will give companies a new way to virtually engage with their customers. It also will improve efficiency by reducing the need for live customer service and sales support.

Belief tracking is based on a computer algorithm designed to respond to a human by calculating the user’s intended meaning as the dialogue progresses. While the initial development work is being done in English, the process itself can ultimately be used for any language.

Researchers at the Massachusetts Institute of Technology (MIT) and Adobe Research are on the forefront of the development of dialogue tracking technology that facilitates voice-based computer systems. The team is even working on adding a voice-controlled virtual assistant to help you with editing in Photoshop.

The Adobe Research system tracks the language structure of actual conversations, in real-time. The program is designed to learn from the nuance and context of every interaction. Based on a dataset developed from actual conversations, a complex algorithm helps interpret information the computer receives, enabling the machine to determine how to respond to a user.

For example, a natural language app for tourists needs to understand which tourist attraction the user is interested in, as well as the types of information related to that venue — such as entrance fees and physical location. To have a conversation, the computer must anticipate that a user question about visiting the Louvre is likely to be followed up with a request for ticket prices, an address, and transportation options.

AI is becoming mainstream

Conversational AI is being used today to provide customer service, make product recommendations, and facilitate purchases. The most familiar ways in which we talk with machines is with voice-enabled digital assistants and chatbots that are programmed to provide online customer experiences.Today, the conversations you have with virtual assistants are relatively simple.However, as AI and machine learning improve, a computer will be able to learn from each interaction.

AI-driven technology is already helping marketers improve voice-based customer interactions. A hotel chain, for example, can recognize a customer immediately based on a “conversation” and target the traveler with personalized content and promotions. Insights from voice-enabled devices are also helping companies automatically leverage other channels to reach customers, such as facilitating engagement on a mobile app or in a connected car.

Ultimately, the potential payoff from implementing chatbot technology is improved personalization and better customer engagement. “Chatbots have the potential to not only deliver exceptional online experiences, but also inspire purchases and increase the number of items in people’s shopping carts,” says Michael Klein, Adobe’s director of industry strategy for retail.

But conversational AI — between a human and a machine — is still in its early stage of development, and there are, well, cultural differences. “We have not yet, as humans, been completely trained to communicate with the bots,” says Michael. “You have to be very specific. So, until the bots start learning more about what your intent really is, communication is going to be a challenge.”

Natural language processing is evolving

Teaching machines to converse is easier said than done. Language is nuanced, and meaning may vary from one region to the next, making writing bot scripts that engage a wide range of customers a formidable task.

Giving a machine the ability to understand and respond to spoken language hinges on the development of a database of syntax that helps the computer process both meaning and context. The Adobe Research system works by programming the computer to tag a given human utterance based on its relative context.

One of the Adobe team’s innovations is a system that can follow the thread of a conversation and track dialogue in real-time. This innovative AI technology can extrapolate meaning from the context of a conversation — even when a specific value is not explicitly present in a particular utterance. In essence, the machine can use its database of tagged data in order to “think” abstractly, following the thread of a conversation.

“Although it might be easy for humans to track dialog states, it is difficult for computers because they do not ‘understand’ natural human language in the same way that humans do,” says Trung H. Bui, lead Adobe researcher on the project, working with a PhD intern from the MIT lab in Cambridge, Mass. “But dialogue state tracking is crucial for a reliable conversation between a computer and person, because the machine must be able to process the information it receives, in the context of what was spoken and as the conversation evolves, in order to choose an appropriate response.”

Even better conversations are on the way

As natural language processing evolves, machines will get better at understanding nuance, context, and even slang. In essence, machines are increasingly able to use the equivalent of a “photographic memory” in order to converse.

Next generation systems will be able to discuss multiple topics with you, perhaps as seamlessly as another person can. Accurate dialogue state tracking will help reduce errors in speech recognition and compensate for the ambiguity inherent in a natural language conversation.

More complex conversations will be a huge step forward for natural language processing, which is currently used primarily to provide a pre-defined response to a simple command, such as asking a computer to: “Give me the recipe for succotash,” or “Who is the Dali Lama?”

Untethered communication

Digital personal assistants and chatbots are just the first step toward an untethered ecosystem that is no longer dependent on a user interacting with a screen. The power of conversation is already enabling companies to better engage with their customers in new ways, from ordering a pizza to sharing step-by-step directions on how to make a casserole.

Businesses that can truly wow their customers with accurate and responsive AI voice interactions will ultimately deliver better user experiences. The goal is to use AI and machine learning to improve customer interactions based on your own database of contextual data. Once your machines start talking with your customers, listen closely. What you learn might just help you improve your bottom line.

Read more about Adobe’s cutting-edge research from the Adobe research team.