Understand Tree LSTM Model: A Completed Guide

Tree LSTM model is widely used in many deep learning fields. It is often used to process tree structure data.

For example:

Action recognition is tree structure data.

Social-media conversations are also tree structure data. One post contains some replies, a reply also contians some replies.

Non-tree structure data also can be converted to tree strucutre, for example, use dependency parsing technology to convert a sententce to a tree.

We have introduced the basic LSTM, Tree LSTM is much similar to it.

Understand Long Short-Term Memory Network(LSTM) – LSTM Tutorial

The structure of basic lstm likes:

We can find C_t and h_t are only determined by one C_t-1 and h_t-1. However, in Tree LSTM, each C_t and h_t are determined by multiple C_t-1, C_t-2…. and h_t-1, h_t-2…, which is the key difference between classic lstm and tree lstm.

What is Tree LSTM?

A Tree LSTM Model can be defined as:

where C(j) denotes the set of children of node j and O_kis an operator that acts on the hidden vector h_k of child k to output .

Then we can define Tree LSTM equations as follows:

From equations above, we can find:

The input gate and output gate are calculated base on x and
The forget gate is calculated based on x and each h_k, which is different from the input and forget gate.

How to get ?

We shoud get by its child nodes, there are some ways to get it.

Child Sum Tree Unit

The child-sum unit involves using sum of all h_k vectors which means O = ∑. Therefore

where k this the number of child nodes of a parent node.

Limited: This method views weights of child nodes equally.

Child Max-Pooling Unit

The child max-pooling unit involves using the maximum of all h_k vectors across a dimension.

Limited: This method only picks maximum value.

Child Convolve + MaxPooling Tree Unit

Child convolve uses convolution operation of the set of child hidden vectors i.e. O=⊗ where ⊗ denotes vector convolution operation. As a normal tree node can have any number of child nodes, convolution operation using all child nodes requires a max-pooling operation to preserve the dimension of .

where ⊗ denotes vector convolution operation and max_Pdenotes max pooling operation.

Limited: This method only picks the convoluted features with maximum value.

Understand Tree LSTM Model: A Completed Guide – LSTM Tutorial

What is Tree LSTM?

Child Sum Tree Unit

Child Max-Pooling Unit

Child Convolve + MaxPooling Tree Unit

Leave a Reply Cancel reply