Understand Tree LSTM Model: A Completed Guide – LSTM Tutorial

By | July 29, 2020

Tree LSTM model is widely used in many deep learning fields. It is often used to process tree structure data.

For example:

Action recognition is tree structure data.

the structure of action recognition

Social-media conversations are also tree structure data. One post contains some replies, a reply also contians some replies.

the tree structure of social-media conversations

Non-tree structure data also can be converted to tree strucutre, for example, use dependency parsing technology to convert a sententce to a tree.

Dependency tree of a sentence

We have introduced the basic LSTM, Tree LSTM is much similar to it.

Understand Long Short-Term Memory Network(LSTM) – LSTM Tutorial

The structure of basic lstm likes:

lstm inpute gate, forget gate and output gate

We can find Ct and ht are only determined by one Ct-1 and ht-1. However, in Tree LSTM, each Ct and ht are determined by multiple Ct-1, Ct-2…. and ht-1, ht-2…, which is the key difference between classic lstm and tree lstm.

What is Tree LSTM?

A Tree LSTM Model can be defined as:

the hidden output of tree lstm

where C(j) denotes the set of children of node j and Ok is an operator that acts on the hidden vector hk of child k to  output the hidden output of lstm.

Then we can define Tree LSTM equations as follows:

The equation of tree lstm

From equations above, we can find:

  • The input gate and output gate are calculated base on x and the hidden output of lstm
  • The forget gate is calculated based on x and each hk, which is different from the input and forget gate.

How to get the hidden output of lstm?

We shoud get the hidden output of lstmby its child nodes, there are some ways to get it.

Child Sum Tree Unit

The child-sum unit involves using sum of all hk vectors which means O = ∑. Therefore

Child Sum Tree Unit

where k this the number of child nodes of a parent node.

Limited: This method views weights of child nodes equally.

Child Max-Pooling Unit

The child max-pooling unit involves using the maximum of all hk vectors across a dimension.

Child Max-Pooling Unit

Limited: This method only picks maximum value.

Child Convolve + MaxPooling Tree Unit

Child convolve uses convolution operation of the set of child hidden vectors i.e. O=⊗ where denotes vector convolution operation. As a normal tree node can have any number of child nodes, convolution operation using all child nodes requires a max-pooling operation to preserve the dimension of the hidden output of lstm.

Child Convolve + MaxPooling Tree Unit

where denotes vector convolution operation and maxP denotes max pooling operation.

Limited: This method only picks the convoluted features with maximum value.

Leave a Reply