Building a batch normalization class in MQL5
After considering the theoretical aspects of the normalization method, we will move on to its practical implementation within our library. To do this, we will create a new CNeuronBatchNorm class derived from the CNeuronBase base class of the fully connected neural layer.
To ensure the full functionality of our class, we need to add a few things. We will add just one buffer for recording normalization parameters for each element of the sequence and a variable to store the batch size for normalization. For the rest, we will use base class buffers with minor amendments. We will talk about them during the implementation of the methods.
class CNeuronBatchNorm : public CNeuronBase
|
We'll be redefining the same set of basic methods:
- Init method for initializing a class instance
- FeedForward feed-forward method
- CalchiddenGradient method of distributing error gradients through a hidden layer
- CalcDeltaWeights method for distributing error gradients to the weight matrix
- Save method for saving neural layer parameters
- Load method for restoring the neural layer performance from the saved data
Let's start working on the class with its constructor. In this method, we only set an initial value for the normalization batch size. The class destructor remains empty.
CNeuronBatchNorm::CNeuronBatchNorm(void) : m_iBatchSize(1)
|
After that, we move on to working on the class initialization method. But before we start implementing this method, let's pay attention to the nuances of our implementation.
First of all, the normalization method does not involve changing the number of elements. The output of the neural layer will have the same number of neurons as the input. Therefore, the size of the source data window should be equal to the number of neurons in the layer being created. Of course, we can ignore the source data window size parameter and only use the number of neurons in the layer. However, in this case, we would lose additional control during the neural layer initialization stage and would have to constantly check whether the number of neurons matches during each feed-forward and backpropagation pass.
The second point is related to the lack of a matrix of weights in our usual form. Let's look at mathematical formulas again.
To calculate the normalized value, we use only the mean and standard deviation, which are calculated for the dataset and do not have adjustable parameters. We have only two configurable parameters when we shift and scale the values of γ and β. Both parameters are selected individually for each value from the source data tensor.
Now let's remember the mathematical formula for a displaced neuron.
Don't you think that when N = 1, the formulas will look identical? We will use this similarity.
Now let's get back to our method of initializing an object instance. This method is virtual and inherits from the parent class. According to the rules of inheritance, this method stores the return type and the list of method parameters. The parameters of our method contain only one pointer to the object describing the neural layer being created.
In the body of the method, we immediately check the received pointer to the description object of the created neural layer, while also simultaneously verifying the correspondence between the size of the input data window and the number of neurons in the created layer. We discussed this point a little earlier.
After successfully checking the obtained object, we change the size of the initial data window by one in accordance with the similarity shown above. Now we call the parent class initialization method, remembering to check the results of the operations.
bool CNeuronBatchNorm::Init(const CLayerDescription *description)
|
It should be noted here that during the initialization of the parent class, the weight matrix is initialized with random values. However, for batch normalization, the recommended initial values are 1 for the scaling coefficient γ and 0 for the offset β. As an experiment, we can leave it as it is, or we can fill the weight matrix buffer now.
//--- initialize the training parameter buffer
|
After successfully initializing the objects of the parent class, we proceed to create objects and specify initial values for the variables and constants of the new class.
First, we initialize the normalization parameter buffer. In this buffer, we need three elements for each element in the sequence. There we will save:
- μ average value from previous iterations of the forward pass.
- σ 2 dataset variance over previous iterations of the forward pass.
- normalized value before scaling and shifting.
I deliberately numbered the values starting from 0. This is exactly the indexing that values in our data buffer will get. At the initial stage, we initialize the entire buffer with zero values and check the results of the operations.
//--- initialize the normalization parameter buffer
|
At the end of the initialization method of our class, we save the batch normalization size into a specially created variable. We then exit the method with a positive result.
m_iBatchSize = description.batch;
|
At this point, we conclude our work with the auxiliary initialization methods and move on to building the algorithms for the class. As always, we will begin this work by constructing a method for the feed-forward pass.