New deep learning model understands and answers questions

Ankit Kumar, Richard Socher - June 25, 2015

Illustration of DMN performing transitive inference.

Today, we published new state of the art results on a variety of natural language processing (NLP) tasks. Our model, which we call the Dynamic Memory Network (DMN), combines two lines of recent work on memory and attention mechanisms in deep learning. Memory components give models the ability to store various facts internally, to access them later. Attention mechanisms are typically used to allow the model to focus on a specific part of an input while performing a task, ignoring spurious information.

The DMN takes these ideas one step further, utilizing an iterative attention mechanism to access its memory. It reads an input and stores relevant facts in its memory. Then, its attention mechanism focuses on a specific part of the memory, but rather than outputting an answer, it updates its internal knowledge about the input, and repeats the process. At the end, the model has a summary of the important information, which it uses to perform its task.

To see why this modification is important, consider the input and question in the picture above. Given the question, “what is winona afraid of,” the model cannot know that the sentence “sheep are afraid of mice” is relevant until it first learns that “winona is a sheep.” The iterative attention mechanism allows the DMN to perform this type of transitive inference, where the parts of the input that it focuses on in one iteration reveals the need to focus on other parts in subsequent iterations. Another example of this behaviour is shown below in a figure from the paper. This example reveals interesting insight into how the model operates: in the second attention iteration, the model has incorrectly placed some weight on the sentence, “John moved to the bedroom,” which makes sense, because the correct sentence (which it has picked up more strongly), “John went to the hallway,” is also a place where John went.

The power of this framework is that most, if not all, NLP tasks can be cast as a question-answering problem; e.g, “What is the sentiment of this sentence?” or “What are the part-of-speech tags of the words in this sentence?” By re-casting sentiment analysis and part-of-speech tagging as question-answering problems, the DMN can be trained for these tasks as well, without any change in architecture. We demonstrate this ability by achieving state-of-the-art results.

We are excited for the future of this technology. A model that can read any corpus of input and then answer arbitrary questions about it would be a great step for the machine learning community. We believe the DMN makes good progress towards that. There has recently been a surge of interest in this area, and we look forward to seeing what is produced, here at MetaMind and at other top research labs.

Features on our dynamic memory network: Wired, MIT Technology Review.