4.Saving and restoring the LSTM block

We have already looked at the methods for initializing the feed-forward and backpropagation passes of an LSTM block. This is enough for small-scale experiments but not enough for industrial use. One of the key requirements of practical application is the reusability of a once-trained neural network. We learned how to build and train our neural network. We can even get the results of applying it to real data. But we cannot yet save a trained LSTM block to restore it later from previously saved data. Two methods are provided in our neural layer classes to accomplish this functionality:

  • Save saves the class.
  • Load restores the class functions by previously saved data.

Before we start creating methods, let's look at the class structure of our LSTM block and determine which data we need to store and which we can simply initialize with initial values.

class CNeuronLSTM    :  public CNeuronBase
  {
protected:
   CNeuronBase*      m_cForgetGate;
   CNeuronBase*      m_cInputGate;
   CNeuronBase*      m_cNewContent;
   CNeuronBase*      m_cOutputGate;
   CArrayObj*        m_cMemorys;
   CArrayObj*        m_cHiddenStates;
   CArrayObj*        m_cInputs;
   CArrayObj*        m_cForgetGateOuts;
   CArrayObj*        m_cInputGateOuts;
   CArrayObj*        m_cNewContentOuts;
   CArrayObj*        m_cOutputGateOuts;
   CBufferType*      m_cInputGradient;
   int               m_iDepth;
 
   void              ClearBuffer(CArrayObj *buffer);
   bool              InsertBuffer(CArrayObj *&arrayCBufferType *element,
                                                   bool create_new = true);
   CBufferType*      CreateBuffer(CArrayObj *&array);

public:
                     CNeuronLSTM(void);
                    ~CNeuronLSTM(void);
   //---
   virtual bool      Init(const CLayerDescription *descoverride;
   virtual bool      SetOpenCL(CMyOpenCL *opencloverride;
   virtual bool      FeedForward(CNeuronBase *prevLayeroverride;
   virtual bool      CalcHiddenGradient(CNeuronBase *prevLayeroverride;
   virtual bool      CalcDeltaWeights(CNeuronBase *prevLayeroverride 
                                                              { return true; }
   virtual bool      UpdateWeights(int batch_sizeTYPE learningRate,
                                   VECTOR &BetaVECTOR &Lambdaoverride;
   //---
   virtual int       GetDepth(void)                 const { return m_iDepth; }
   //--- methods for working with files
   virtual bool      Save(const int file_handleoverride;
   virtual bool      Load(const int file_handleoverride;
   //--- method of object identification
   virtual int       Type(void)  override     const { return(defNeuronLSTM); }
  };

First of all, we understand that no constants or methods change during the class operation. Therefore, we will only store variables.

When declaring the class variables, the first variables we declared were those for storing pointers to the internal neural layers. Of course, it makes absolutely no sense to save pointers to class objects. However, we must save the contents of these objects because the trained weight matrices are stored in them.

Next, we declared pointers to stacks of chronological data. The stacks themselves, as well as their contents, are of no value to us when saving the data. Stacks are dynamic array objects that will be effortlessly recreated. Regarding their contents, the situation is as follows. For recurrent networks, the sequence of data and the absence of gaps are crucial. At the time when the data is saved, we do not understand when the data will be reused. Consequently, at the time of data loading, it is very likely that there are gaps between the current state of the analyzed system and the data at the time of saving. In such a situation, their use will not only be unhelpful but on the contrary will distort the results. Therefore, saving this data would only increase the amount of stored information without providing any benefit for later use.

The error gradient accumulation buffer m_cInputGradient is an auxiliary object for accumulating data and is overwritten with new data during each backpropagation pass. It does not contain information important for subsequent iterations and is not appropriate for saving.

The last global variable we declared is the depth of the analyzed chronological iterations: m_iDept. It is a component of the architectural block design and is to be preserved.

After defining the scale of the work, we can proceed to its execution. First, we create the CNeuronLSTM::Save method to save the data. In the parameters, the method gets the handle of the file for saving the data. However, we will not organize a control unit to check the incoming parameters as usual. Instead of that, we will pass the received parameter to a similar method in the base class, where all the necessary controls are already implemented. Besides, earlier we analyzed only the variables declared in the class of the LSTM block but did not evaluate the need to preserve the contents of the parent class. However, we did this work when creating the method of saving data of the base class. Therefore, by calling the method of the parent class, we perform both functionalities in one line of code.

bool CNeuronLSTM::Save(const int file_handle)
  {
//--- calling a method of the parent class
   if(!CNeuronBase::Save(file_handle))
      return false;

After the successful execution of the parent class method, we save the value of the depth of the analyzed chronological iterations.

//--- saving the constants
   if(FileWriteInteger(file_handlem_iDepth) <= 0)
      return false;

After this, we only need to save the contents of the internal neural layers. For this purpose, we will also utilize the functionality of the underlying neural layer. We just need to call the save method for each of our internal layers, providing the file handle for writing data that we received as a parameter from the external program. At the same time, we will not forget to control the process of operations at each step.

//--- call the same method for all inner layers
   if(!m_cForgetGate.Save(file_handle))
      return false;
   if(!m_cInputGate.Save(file_handle))
      return false;
   if(!m_cOutputGate.Save(file_handle))
      return false;
   if(!m_cNewContent.Save(file_handle))
      return false;
//---
   return true;
  }

After successful completion of all operations, we will exit the method with a positive result.

Now I suggest looking at the whole code of the data-saving method again and evaluating how concise and readable it is. This effect is achieved through the use of object-oriented programming (OOP). Creating classes significantly reduces code and speeds up the work of the programmer, while using ready-made and tested libraries helps avoid many errors. Believe me, no matter how complex creating our library might seem, using it will make it easy and without significant effort for the programmer to create their own neural networks. Moreover, you don't need to be a highly qualified programmer to do it.

But I digress. We have created a method to save the data. Now, we need to build the process of restoring the functionality of our recurrent block from the saved data.

The data loading method CNeuronLSTM::Load is constructed in clear correspondence with the data saving method. The saved data must be loaded from the file in the same sequence, otherwise, we could encounter distorted data or loading errors.

In the parameters, the method gets the handle of the data file to load. Just like when saving data, instead of setting up a control block, we call the method of the parent class. It already implements all the necessary controls and data loading of the parent class.

bool CNeuronLSTM::Load(const int file_handle)
  {
//--- call a method of the parent class
   if(!CNeuronBase::Load(file_handle))
      return false;

Next, we load the depth of the analyzed chronological iterations and the contents of the internal neural layers from the file. We will also use the methods of the neural layer base class to perform the latter operations. And, as always, we will check the results of the operations.

But here, we need to pay attention to one significant detail. The method for saving the base neural layer CNeuronBase::Save begins with writing the type of object to be saved. We read its value in the neural network loading dispatcher method to determine the type of object to be created. Hence, in the neural layer loading method, we start reading the file from the next element. In this case, to maintain the sequence of loading data from the file, we must first read the type of the next neural layer and only then call the loading method of the corresponding internal neural layer. Besides, this can be an additional point of control for loading the correct type of internal neural layer.

//--- read the constants
   m_iDepth = FileReadInteger(file_handle);
//--- call the same method for all inner layers
   if(FileReadInteger(file_handle) != defNeuronBase || 
      !m_cForgetGate.Load(file_handle))
      return false;
   if(FileReadInteger(file_handle) != defNeuronBase || 
      !m_cInputGate.Load(file_handle))
      return false;
   if(FileReadInteger(file_handle) != defNeuronBase || 
      !m_cOutputGate.Load(file_handle))
      return false;
   if(FileReadInteger(file_handle) != defNeuronBase ||
      !m_cNewContent.Load(file_handle))
      return false;

After loading the data from the file, we need to initialize the remaining objects with the initial values. First, we initialize the memory stack and add a buffer with initial values to it. To do this, we will use the CreateBuffer method we already know. I'd like to remind you that this method only creates a buffer with zero values for an empty stack. Otherwise, the method will return the last buffer written. Therefore, before calling the method, we check the size of the stack: if the stack contains data, we clear the stack and set all buffer values to zero.

//--- initialize Memory
   if(m_cMemorys.Total() > 0)
      m_cMemorys.Clear();
   CBufferType *buffer =  CreateBuffer(m_cMemorys);
   if(!buffer)
      return false;
   if(!m_cMemorys.Add(buffer))
      return false;

After all operations are completed, we will add the newly created buffer to the stack. Then we will repeat the same operations for the stack and the hidden state buffer.

//--- initialize HiddenStates
   if(m_cHiddenStates.Total() > 0)
      m_cHiddenStates.Clear();
   buffer =  CreateBuffer(m_cHiddenStates);
   if(!buffer)
      return false;
   if(!m_cHiddenStates.Add(buffer))
      return false;

We built the forward pass method in such a way that it is not critical for us to create and initialize the other stacks now. However, we acknowledge that the data loading operation might be performed on a working neural network, where the stacks already hold some information. In such cases, using data from stacks created with different weights would be incorrect. Therefore, we will clear all previously created stacks.

//--- clear the rest of the stacks
   if(!m_cInputs)
      m_cInputs.Clear();
   if(!m_cForgetGateOuts)
      m_cForgetGateOuts.Clear();
   if(!m_cInputGateOuts)
      m_cInputGateOuts.Clear();
   if(!m_cNewContentOuts)
      m_cNewContentOuts.Clear();
   if(!m_cOutputGateOuts)
      m_cOutputGateOuts.Clear();
//---
   return true;
  }

Once all operations of the method have been successfully executed, we terminate the method with a positive result.

At this point, we complete the construction of recurrent LSTM block by means of MQL5 and move on to complementing the methods of our class with the ability to perform multi-threaded operations.