We have created our own custom lstm network using tensorflow. We initialize all lstm weights and biases by this:

# initialize matrix def init_matrix(self, shape): return tf.random_normal(shape, stddev=0.1) # initialize vector def init_vector(self, shape): return tf.zeros(shape)

This tutorial contains detail content.

Build Your Own LSTM Model Using TensorFlow: Steps to Create a Customized LSTM – TensorFlow Tutorial

In order to make the performance of our custom lstm network be the same to tf.nn.rnn_cell.LSTMCell(), we should initialize weights and biases in our custom lstm like tf.nn.rnn_cell.LSTMCell().

**LSTM biases in TensorFlow**

Check the source code of RNN or LSTMCell in tensorflow, We can find how lstm biases are initialized in tensorflow.

The source code is here:

https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/ops/rnn_cell_impl.py

Here is an example:

self._bias = self.add_variable( _BIAS_VARIABLE_NAME, shape=[4 * self._num_units], initializer=init_ops.zeros_initializer(dtype=self.dtype))

We can find lstm biases are initialized to zero in tensorflow.

**LSTM Weights in TensorFlow**

We also can find how lstm weights are initialized in tensorflow.

self._kernel = self.add_variable( _WEIGHTS_VARIABLE_NAME, shape=[input_depth + h_depth, 4 * self._num_units])

LSTM weights or LSTM kernel are initialized by self.add_variable() function.

The source code of self.add_variable() function is here:

https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/layers/base.py

It is defined as:

def add_variable(self, name, shape, dtype=None, initializer=None, regularizer=None, trainable=True, constraint=None, partitioner=None):

In this function we can find:

It means tensorflow use tf.get_variable() to create or return weight tensors in lstm network.

When initializer = None, tensorflow will use tf.glorot_uniform_initializer() to initialize weights in lstm.

Here is the tutorial:

Understand How tf.get_variable() Initialize a Tensor When Initializer is None: A Beginner Guide

So, we should modify init_matrix() in our custom lstm to:

def init_matrix(self, shape): return tf.contrib.layers.xavier_initializer()(shape)

Notice: All code above is from tensorflow 1.8.

As to tensorflow 1.10, tensorflow/python/layers/base.py will add weights as follow:

variable = super(Layer, self).add_weight( name, shape, dtype=dtypes.as_dtype(dtype), initializer=initializer or scope.initializer, trainable=trainable, constraint=constraint, partitioner=partitioner, use_resource=use_resource, synchronization=synchronization, aggregation=aggregation, getter=vs.get_variable)

Parent class Layer is from:

from tensorflow.python.keras.engine import base_layer

We will check file tensorflow/python/keras/engine/base_layer.py, we can find the code below:

# Initialize variable when no initializer provided if initializer is None: # If dtype is DT_FLOAT, provide a uniform unit scaling initializer if dtype.is_floating: initializer = initializers.glorot_uniform() # If dtype is DT_INT/DT_UINT, provide a default value `zero` # If dtype is DT_BOOL, provide a default value `FALSE` elif dtype.is_integer or dtype.is_unsigned or dtype.is_bool: initializer = initializers.zeros() # NOTES:Do we need to support for handling DT_STRING and DT_COMPLEX here?

If initializer is None, tensorflow will use initializers.glorot_uniform() to initilize a weight.

However, we view the keras LSTM, we can find:

__init__( units, activation='tanh', recurrent_activation='hard_sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, recurrent_dropout=0.0, implementation=1, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False, **kwargs )

The source code is here:

https://www.github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/keras/_impl/keras/layers/recurrent.py

**In keras LSTM**

kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal',

kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation of the inputs

recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state

It means W_{xt} shoud be initialized by tf.contrib.layers.xavier_initializer(), W_{ht} should be initialized by tf.orthogonal_initializer() in tensorflow.