5.GPT feed-forward method

We continue our work on implementing the GPT algorithm proposed by the OpenAI team. We have already created the basic skeleton of the class with objects to implement the algorithm. Now we are proceeding directly to its implementation. Yes, the class will utilize the familiar Self-Attention algorithm, but with some implementation specifics.

As in all the previously discussed classes, all the feed-forward functionality is implemented in the CNeuronGPT::FeedForward method. As you know, this method is virtual, is inherited from the base neural network class, and is overridden in each class to implement a specific algorithm. In the method parameters, it receives a pointer to the object of the previous neural layer, which contains the initial data in its buffer for executing the algorithm.

As in all previous implementations, we start the method with the control block. In this block, we check the validity of pointers to the objects involved in the method. This operation allows us to prevent many critical errors when accessing invalid objects.

bool CNeuronGPT::FeedForward(CNeuronBase *prevLayer)
  {
//--- check the relevance of all objects
   if(!prevLayer || !prevLayer.GetOutputs())
      return false;

Next, we increment the m_iCurrentPosition index of the current object in the Key and Value buffers. We need this pointer for organizing the stack in these buffers. In fact, the Self-Attention algorithm performs a weighted summation of different contexts into a single vector. According to the mathematical rules, rearranging the places of the summands does not change the sum. That is, it is absolutely irrelevant at which position of the data buffer the element is located. What's important is its presence. This is the disadvantage of this algorithm for handling timeseries, but also a plus for our implementation. When organizing the data stack in the Key and Value buffers, we will not perform a costly full data shift. Instead, we will move the pointer along the stack and overwrite the data in the corresponding data buffer elements.

//--- increment the pointer to the current object in the data stack
   m_iCurrentPosition++;
   if(m_iCurrentPosition >= m_iUnits)
      m_iCurrentPosition = 0;

The next straightforward step is taken to organize the correct functioning of our internal multi-layered architecture. The pointer to the previous neuron layer obtained in the parameters is needed only for the first internal layer. Further internal neural layers will use the output from the preceding internal neural layer as their input data. Therefore, for internal use, we introduce a local variable to store a pointer to the previous neural layer. Now we will assign it the pointer obtained from the method parameters, but after the iterations of each internal neural layer, we will write a new pointer into it. So, we can organize the loop operation through all internal neural layers. In this case, we will work with one object pointer variable in the loop body. In reality, each neural layer will access a buffer of its own input data.

   CNeuronBase *prevL = prevLayer;

As mentioned before, the main functionality of our feed-forward method will be implemented within the body of the loop iterating through the internal neural layers. Therefore, the next step is to create such a loop. Right within the loop, we extract from the collection the pointer to the Querys object corresponding to the current internal neural layer. We check the validity of the extracted pointer and then execute the feed-forward method of the corresponding object.

//--- run the loop through all internal layers
   for(int layer = 0layer < m_iLayerslayer++)
     {
      CNeuronBase *Querys = m_cQuerys.At(layer);
      if(!Querys || !Querys.FeedForward(prevL))
         return false;

Further functionality is not covered by the methods of internal objects. Therefore, as in previous Self-Attention implementations, we will implement it within the body of the method. Here, it is important to remember that in all implementations of our library, we provided the user with the option to choose the device and the technology for performing mathematical operations. In this class, we will not deviate from our principles and will also implement algorithm separation based on the chosen computational device. But first, let's perform some preparatory work and extract pointers to the objects of the analyzed internal layer from the collections. Do not forget to validate the obtained pointers.

      CNeuronBase *Querys = m_cQuerys.At(layer);
      if(!Querys || !Querys.FeedForward(prevL))
         return false;
      CNeuronBase *Keys = m_cKeys.At(layer);
      if(!Keys)
         return false;
      CNeuronBase *Values = m_cValues.At(layer);
      if(!Values)
         return false;
      //--- initializing Scores
      CNeuronBase *Scores = m_cScores.At(layer);
      if(!Scores)
         return false;
      //--- initializing AttentionOut
      CNeuronBase *AttentionOut = m_cAttentionOut.At(layer);
      if(!AttentionOut)
         return false;

Next, we split the algorithm based on the chosen computational device. In this chapter, we will discuss the organization of the process using standard MQL5 tools, and we will revisit the implementation of multi-threaded computations using the OpenCL technology in the following sections.

      //--- branching of the algorithm by the computing device
      if(!m_cOpenCL)
        {
         MATRIX array[];
         if(!Querys.GetOutputs().m_mMatrix.Vsplit(3array))
            return false;
         if(!Keys.GetOutputs().Row(array[1].Row(0), m_iCurrentPosition))
            return false;
         if(!Values.GetOutputs().Row(array[2].Row(0), m_iCurrentPosition))
            return false;

As you may recall, during the execution of the feed-forward pass of the specified object, we simultaneously construct all the vectors for the Query, Key, and Value tensors for all attention heads. In the next step, we move the vectors of the last two tensors to the corresponding stacks. For this purpose, we will divide the result buffer of the Querys layer into 3 equal parts: query, key, and value. First, we copy the data into the appropriate data buffers. When copying data, we will use the m_iCurrentPosition variable to determine the offset in the buffers.

Then we will do a bit of preparatory work. To facilitate access to the elements of the objects, we will create local pointers to the result buffers of the internal neural layers for Query and Key. We will also prepare dynamic arrays to perform the computational part.

         MATRIX out;
         if(!out.Init(m_iHeadsm_iKeysSize))
            return false;
         MATRIX array_keys[], array_values[];
         MATRIX array_querys[];
         MATRIX keys = Keys.GetOutputs().m_mMatrix;
         MATRIX values = Values.GetOutputs().m_mMatrix;

Similarly to the construction of the feed-forward algorithm in the previously discussed implementation of multi-head attention, we will split the data matrices according to the attention heads.

         if(!array[0].Vsplit(m_iHeadsarray_querys))
            return false;
         if(!keys.Reshape(m_iUnitsm_iHeads * m_iKeysSize))
            return false;
         if(!keys.Vsplit(m_iHeadsarray_keys))
            return false;
         if(!values.Reshape(m_iUnitsm_iHeads * m_iKeysSize))
            return false;
         if(!values.Vsplit(m_iHeadsarray_values))
            return false;

After that, we create a nested loop for computations. In it, we iterate through the attention heads used. Right here in the body, we extract the Query vector and the Keys matrix of the analyzed attention head. We multiply them and divide the resulting vector by the square root of the dimension of the description vector for one element in the Keys matrix. We normalize it using the Softmax function.

         //--- define Scores
         for(int head = 0head < m_iHeadshead++)
           {
            MATRIX score=array_querys[head].MatMul(array_keys[head].Transpose())/
                                                               sqrt(m_iKeysSize);
            //--- normalize Scores
            if(!score.Activation(score,AF_SOFTMAX))
               return false;
            if(!Scores.GetOutputs().Row(score.Row(0), head))
               return false;

Thus, after normalizing the data, the sum of all dependency coefficients will be equal to one. This gives us reason to expect a vector with appropriate characteristics at the output of the Self-Attention block. We save the normalized data in a buffer for later use during the backpropagation pass.

After calculating and normalizing the dependency coefficient vector, we have all the necessary data to calculate the output values of the Self-Attention block. We multiply the normalized Score vector by the Value tensor. Then we copy the resulting vector into the local result matrix.

         //--- attention block output
            MATRIX o = score.MatMul(array_values[head]);
            if(!out.Row(o.Row(0), head))
               return false;
           }

As a result of performing all iterations of the loop system in our out matrix, the concatenated output of the Multi-Heads Self-Attention block will be collected. We transfer them to the result buffer of the AttentionOut neural layer to use in our algorithm later.

         if(!out.Reshape(1m_iHeads * m_iKeysSize))
            return false;
         AttentionOut.GetOutputs().m_mMatrix = out;
        }
      else // OpenCL block
        {
         return false;
        }

This completes the operation separation block depending on the computing device. Next, we will use the methods of our internal objects.

According to the Multi-Heads Self-Attention algorithm, the next step is to create a single ordered weighted vector of results for the entire multi-head attention block from the concatenated output of all attention heads. For this purpose, the matrix W0 is provided in the method algorithm. In contrast, we have assigned this functionality to the internal fully connected neural layer W0. We extract the pointer to the object of the corresponding neural layer and call its feed-forward method. To prevent critical errors, we must validate the pointer to the object before calling its method.

      //--- weighted output of all heads of attention
      CNeuronBase *W0 = m_cW0.At(layer);
      if(!W0 || !W0.FeedForward(AttentionOut))
         return false;

We are nearing the completion of the implementation of the Multi-Heads Self-Attention block algorithm. According to the GPT model algorithm, we need to add the obtained result to the original data and normalize the result using the formulas.

First, we call the CBufferType::SumArray method of summarizing two buffers. Then we normalize the data using the CNeuronGPT::NormlizeBuffer method. Its algorithm completely repeats the relevant method of the CNeuronAttention class.

      //--- add to the input data and normalize
      if(!W0.GetOutputs().SumArray(prevL.GetOutputs()))
         return false;
      if(!NormlizeBuffer(W0.GetOutputs(), GetPointer(m_dStd[layer]), 0))
         return false;
 

After successfully normalizing all the data, we will pass the signal through two internal neural layers of the Feed Forward block. This operation is straightforward: we sequentially extract pointers to the respective neural layer objects, validate the pointers, and call the feed-forward method for each internal layer.

      //--- forward pass of Feed Forward block
      CNeuronBase *FF1 = m_cFF1.At(layer);
      if(!FF1 || !FF1.FeedForward(W0))
         return false;
      CNeuronBase *FF2 = m_cFF2.At(layer);
      if(!FF2 || !FF2.FeedForward(FF1))
         return false;

Finally, we add the result of the Feed Forward block to the result of the Multi-Heads Self-Attention block. Then we normalize the obtained values.

      //--- perform summation with the attention output and normalizing
      CBufferType *prev = FF2.GetOutputs();
      if(!prev.SumArray(W0.GetOutputs()))
         return false;
      if(!NormlizeBuffer(prevGetPointer(m_dStd[layer]), 1))
         return false;

This completes the feed-forward pass for one internal layer. We can proceed to the next iteration of the loop and the next internal neural layer. But first, we need to change the pointer to the neural layer of the initial data, as we discussed at the beginning of the method. The results of the forward pass are contained in the buffer of the internal neural layer FF2. We write the pointer to it into the local variable prevL, with which we work at the next iteration of the loop.

      prevL = FF2;
     }
//---
   return true;
  }

So, upon completing all iterations of the nested neural layer enumeration loop, we obtain a complete recalculation of the feed-forward pass for our block. To change the number of such neural layers, we only need to modify one parameter when calling the initialization method of the CNeuronGPT class in the GPT model.

With this, we conclude the work on the feed-forward pass method and move on to organizing the backpropagation process.