Intention Recognization


Hello, nice to meet you! This is my summary of my recent learning and it will continue updating for some time since I’m still learning. I will summarize some method of Intention Recognization that I learned from paper recently. I will just show the main idea about a specific without too many details. I will list the paper that corresponding to the topic that I’m showing below. If you are interested in some topics, you can find the original paper and get more details. Always, if you want to talk with me, please contact me at haroldliuj@gmail.com. Let’s get Started! Have a nice trip!

1. Word Embedding Based Correlation Model for Question/Answer Matching

This paper is proposed by Yikang Shen, Wenge Rong, Nan Jiang, Baolin Peng, Jie Tang and Zhang Xiong. They introduced one method of computing the relationship between Q&A(Question & Answer) Pairs. They propesed a word level correlation function so that they can compute the correlation between a Q&A pair. The word level correlation function is:

where $v_{q_i}$ is the i-th word vector of the question $q$ and $v_{a_j}$ is the j-th word vector of the answer $a$ and $\mathbf{M}$ is a trainable weight matrix and it is called translation matrix, because it maps word in the answer into a possible correlated word in the question.

This is technically an “improved cosine similarity” which is trainable by the data.

Having the word level correlation function, we can then computer the sentence level corelation:

where $|a|$ is the number of answers.

With this correlation function, we can either map Question and Answer directly or use it as the input of the neural networks. The authors gave an example of CNN in the paper.

2. Hybrid Attentive Answer Selection in CQA with Deep Users Modeling

This method is proposed by Jiahui Wen, Jingwei Ma, Yiliu Feng and Mingyang Zhong. They gave out a method that can take both the local and the mutual importance of the word in QA pairs into consideration called Hybrid Attention Mechanism. They also gave a method the model users expertises. Since user expertises is not my main focus, I will just skip this part and pay my attention more on Hybrid Attention Mechanism.

Their model architecture is shown below:

The left part is about user modeling which I won’t talk about here. Let’s just see the right part. The bottom half part of the model is quite normal, they map the question and answer into word vector space and passed two encoders for questions and answers separately. The real show is is that green bar called “Interaction Attention”. This is their main contribution. I will explain them with more details later.

Firstly, the authors calculated attention weights for the words in question and answer sentences separately. Here, they just showed the whole process for the question, the answer has the same process as the question. The expression for the individual attention for question is:

where $\mathbf{H}^q$ is the output of LSTM($\mathbf{H}^{q}=\left\{\mathbf{h}_{t}^{q}\right\}_{t=1}^{L^{q}}=L S T M\left(\left\{\mathbf{x}_{t}^{q}\right\}_{t=1}^{L^{q}}\right)$) and $w_1$ is a trainable transformation vector. $\alpha_i^q$ indicating the importance of i-th word in the question sentence.

Then they calculated the attention over the words in one sentence for each word in the conterpart sentence. Let $\mathbf{h}_i^q$ be the hidden vector for the i-th word in a question sentence and $\mathbf{h}_j^a$ be the hidden vector for the j-th word in the corresponding answer sentence, the the question word’s attention over the answer words is obtained as:

where $\mathbf{W}^q, \mathbf{W}^a, \mathbf{W}^{qa}$ are trainable transformation matrices, $\mathbf{w}_2$ is a trainable vector and $\odot$ is the element-wise multiplication of two vectors. In this “inter-sentece attention” part, authors first calculated the interaction between each questions words and each answer words with non-linear transformation. $\gamma_i^q$ is a vector containing attentions over the words in the answer for i-th word in the question. $\beta_i^q$ is the information entropy of the attention vector, and it implies the mutual importance of i-word for q-a matching task.

Finally, the representation of a question can be summarized as:

This is technically the combination of $\alpha$ aka the individual attention and $\beta$ aka the inter-sentence attention. The explaination from the authors of the expression is: if a word is locally important but does not align well with the word in the counterpart sentence, it needs to be endowed with less importance as it is useless for semantic matching. On the other hand, if a word is highly related to the counterpart sentence but is not a keyword, it should be neglected as it can mislead sentence matching.

We can computer $\tilde{h}^a$ in a similar way.

With $\tilde{h}^a$ and $\tilde{h}^q$ we can give them to a hidden layer whose activation is tanh:

Finially we can get the output from a dense layer with a softmax activation: