Implementing recurrent models in Python
In the previous sections, we reviewed the principles of organizing a recurrent model architecture, and even built a recurrent neural layer using the LSTM block algorithm. Earlier, we used the Keras library for TensorFlow to build previous neural network models in Python. The same library offers a number of options for building recurrent neural layers. These include classes of basic recurrent neural layers as well as more complex models.
- AbstractRNNCell abstract object representing an RNN cell
- Bidirectional bidirectional shell for RNN
- ConvLSTM1D 1D convolutional LSTM block
- ConvLSTM2D 2D convolutional LSTM block
- ConvLSTM3D 3D convolutional LSTM block
- GRU recurrent block by Cho et al. (2014)
- LSTM layer of long-term short-term memory by Hochreiter (1997)
- RNN base class for the recurrent layer
- SimpleRNN fully connected recurrent layer in which the output must be returned to the input
In the presented list, in addition to the basic recurrence layer class, you can find already familiar LSTM and GRU models. It is also possible to create bidirectional recurrent layers, which are most often used in text translation tasks. The ConvLSTM model is built based on the architecture of the LSTM block but uses convolutional layers instead of fully connected layers as gates and a new content layer.
Additionally, there is an abstract recurrent cell class for creating custom architectural solutions for recurrent models.
We won't go deep into the Keras library API right now. We will use the LSTM block to create our test recurrent models. Exactly this kind of model we recreated using MQL5 and will be able to compare the performance of our models created in different programming languages.
The LSTM block class is designed to automatically choose between CuDNN or pure TensorFlow implementations based on available hardware and environment constraints, ensuring optimal performance.
Users have access to an excessive range of parameters for fine-tuning the recurrent block:
- units dimensionality of the output space
- activation activation function
- recurrent_activation activation function for the recurrent step (gate)
- use_bias flag of using an offset vector
- kernel_initializer method to initialize the weights matrix for the new context layer
- recurrent_initializer method to initialize the weight matrix for gates
- bias_initializer initialization method for bias vector
- kernel_regularizer function to regularize the weight matrix for the new content layer
- recurrent_regularizer function to regularize the weight matrix for gates
- bias_regularizer bias vector regularization function
- activity_regularizer output layer regularization function
- kernel_constraint function of constraints for the weight matrix of the new content layer
- recurrent_constraint function of constraints for the weight matrix of gates
- bias_constraint function of vector constraints
- dropout floating-point number from 0 to 1, defining the share of elements to be dropped out during linear transformation of input data
- recurrent_dropout floating-point number from 0 to 1, determining the share of elements to be dropped out during linear transformation of memory state
- return_sequences boolean flag to specify whether to return the last result in the output sequence or the results of the whole sequence
- return_state boolean flag to indicate whether to return the last state in addition to the output
- go_backwards boolean flag to instruct the processing of the input sequence in the backward order and return the reverse sequence
- stateful boolean flag to indicate the use of the last state for each sample with the i index in the batch as the initial state for the sample with the i index in the next batch
- time_major the format of the input and output sequence tensor shapes
- unroll boolean flag used to indicate whether to unroll the recurrent network or use a simple loop; unrolling can accelerate the training of the recurrent network, but it requires more memory
After acquainting ourselves with the control parameters of the LSTM layer class, we will proceed to the practical implementation of various models using the recurrence layer.