Discussion of article "Backpropagation Neural Networks using MQL5 Matrices" - page 2

 
Lorentzos Roussos #:

Yes but the activation values are passed to the derivation function while it expects the pre-activation values. That's what i'm saying .

I think you're wrong. The derivation is calculated according to the formula I gave above: y'(x) = y(x)*(1-y(x)), where x is neuron state before activation and y(x) is the result of applying activation function. You don't use pre-activation value to calculate the derivation, use result of activation (y) instead. Here is a simplified test:

double derivative(double output)
{
  return output * (1 - output);
}

void OnStart()
{
  vector x = {{0.68}};
  vector y;
  x.Activation(y, AF_SIGMOID);                            // got activation/sigmoid result as "y(x)" in y[0]
  vector d;
  x.Derivative(d, AF_SIGMOID);                            // got derivative of sigmoid at x
  Print(derivative(x[0]), " ", derivative(y[0]), " ", d); // 0.2176 0.2231896389723258 [0.2231896389723258]
}
 
Stanislav Korotky #:

I think you're wrong. The derivation is calculated according to the formula I gave above: y'(x) = y(x)*(1-y(x)), where x is neuron state before activation and y(x) is the result of applying activation function. You don't use pre-activation value to calculate the derivation, use result of activation (y) instead. Here is a simplified test:

Yeah , that's what i'm saying , the correct derivative matches the derivative called from the x values.

In the back prop function you are calling the equivalent y.Derivative(d,AF_SIGMOID)

The outputs matrix is y in the backprop in the article i don't think you are storing the equivalent of x in a matrix to call the derivative from that.

(again , according to the mq function)

--

even in your example you are calling the derivative from x , i bet you typed y at first and then "whoopsed"

Just tell them in the russian forum . It will save a lot of people a lot of time if they can add it in the docs .

Thanks

 
Stanislav Korotky #:

I think you're wrong. The derivation is calculated according to the formula I gave above: y'(x) = y(x)*(1-y(x)), where x is neuron state before activation and y(x) is the result of applying activation function. You don't use pre-activation value to calculate the derivation, use result of activation (y) instead. Here is a simplified test:

Let me simplify this . 

This is your example

double derivative(double output)
{
  return output * (1 - output);
}

void OnStart()
{
  vector x = {{0.68}};
  vector y;
  x.Activation(y, AF_SIGMOID);                            // got activation/sigmoid result as "y(x)" in y[0]
  vector d;
  x.Derivative(d, AF_SIGMOID);                            // got derivative of sigmoid at x
  Print(derivative(x[0]), " ", derivative(y[0]), " ", d); // 0.2176 0.2231896389723258 [0.2231896389723258]
}

In your example you are calling x.Derivative to fill the derivatives vector d.

You are not calling y.Derivative to fill the derivatives , why ? Because it returns the wrong values . (and you probably saw it that's why you used x.Derivative).

What is y ? the activation values of x .

So when you do this :

x.Activation(y, AF_SIGMOID);  

you fill y with the activation values of x , but you call the derivative on x not on y. (which is correct according to the mql5 function)

In your article , in the feed forward temp is

matrix temp = outputs[i].MatMul(weights[i]);

And y would be the activation values of x . What matrix is that ?

temp.Activation(outputs[i + 1], i < n - 1 ? af : of)

The outputs. In the article the y (which you don't call the derivative from in the example) is the outputs matrix.

(what we are seeing in code above is the equivalent x.Activation(y,AF) from the example which fills y with the activation values)

In your back prop code you are not calling x.Derivative because x (matrix temp = outputs[i].MatMul(weights[i]);)

is not stored anywhere and you cannot call it. You are calling the equivalent of y.Derivative which returns the wrong values

outputs[n].Derivative(temp, of)
outputs[i].Derivative(temp, af)

because y has the activation values .

Again , according to the mql5 function.

So in your example you are using the right call and in your article you are using the wrong call .

Cheers

 

So, you want this:

   bool backProp(const matrix &target)
   {
      if(!ready) return false;
   
      if(target.Rows() != outputs[n].Rows() ||
         target.Cols() != outputs[n].Cols())
         return false;
      
      // output layer
      matrix temp;
      //*if(!outputs[n].Derivative(temp, of))
      //*  return false;
      if(!outputs[n - 1].MatMul(weights[n - 1]).Derivative(temp, of))
         return false;
      matrix loss = (outputs[n] - target) * temp; // data record per row
     
      for(int i = n - 1; i >= 0; --i) // for each layer except output
      {
         //*// remove unusable pseudo-errors for neurons, added as constant bias source
         //*// (in all layers except for the last (where it wasn't added))
         //*if(i < n - 1) loss.Resize(loss.Rows(), loss.Cols() - 1);
         #ifdef BATCH_PROP
         matrix delta = speed[i] * outputs[i].Transpose().MatMul(loss);
         adjustSpeed(speed[i], delta * deltas[i]);
         deltas[i] = delta;
         #else
         matrix delta = speed * outputs[i].Transpose().MatMul(loss);
         #endif
         
         //*if(!outputs[i].Derivative(temp, af))
         //*   return false;
         //*loss = loss.MatMul(weights[i].Transpose()) * temp;
         if(i > 0) // backpropagate loss to previous layers
         {
            if(!outputs[i - 1].MatMul(weights[i - 1]).Derivative(temp, af))
               return false;
            matrix mul = loss.MatMul(weights[i].Transpose());
            // remove unusable pseudo-errors for neurons, added as constant bias source
            mul.Resize(mul.Rows(), mul.Cols() - 1);
            loss = mul * temp;
         }
         
         weights[i] -= delta;
      }
      return true;
   }

I'll think about it.

 
Stanislav Korotky #:

So, you want this:

I'll think about it.

On first glance it looks okay ,yes . The calculation on the spot is faster than storage i assume .

👍

 
Lorentzos Roussos #:

On first glance it looks okay ,yes . The calculation on the spot is faster than storage i assume .

I think I know the reason why it's originally coded via output of activation function. In all my previous NN libs and some other people libs I used, the derivatives are calculated via outputs, because it's more simple and effective (during adaptation to the matrix API, I didn't pay attention to the difference). For example:

sigmoid' = sigmoid * (1 - sigmoid)
tanh' = 1 - tanh^2
softsign' = (1 - |softsign|)^2

This way we do not need to keep pre-activation arguments (matrices) or re-calculate them again during backpropagation phase (as it's done in the fix). I don't like both approaches. Calculation of "self-derivative", so to speak, looks more elegant. Hence I'd prefer to find some references with formulae for self-derivatives of all (or many) supported activation functions, and return to my original approach.

It's interesting that it's not required that the self-derivative formula is strictly derived from the activation function - any function with equivalent effect is suffice.
 
Stanislav Korotky #:

I think I know the reason why it's originally coded via output of activation function. In all my previous NN libs and some other people libs I used, the derivatives are calculated via outputs, because it's more simple and effective (during adaptation to the matrix API, I didn't pay attention to the difference). For example:

This way we do not need to keep pre-activation arguments (matrices) or re-calculate them again during backpropagation phase (as it's done in the fix). I don't like both approaches. Calculation of "self-derivative", so to speak, looks more elegant. Hence I'd prefer to find some references with formulae for self-derivatives of all (or many) supported activation functions, and return to my original approach.

Yeah , but mq has decided to do it this way ,so it applies to all activations functions. 

In simple words , instead of the .Derivative function "adapting" to the activation function (like the 3 you mentioned could receive the outputs) they have decided to have the functions receive the pre-activation values across the board anyway. That is okay , the problem is it is not in the documentation .

The default assumption by anyone is that it adapts to the AF.

This is bad for someone new (like me for example) as they "tackle" them before they even start .The thing that saved me was that i built an object based network first.

(the comparison of object based and matrix based networks would also be a very interesting article and would help many coders who are not math savvy)

Anyway i placed it in the thread a moderator has for reporting documentation issues .

(off topic : you can use this TanH , its faster and correct , i think) 

double customTanH(double of){
  double ex=MathExp(2*of);
  return((ex-1.0)/(ex+1.0));
}
It's interesting that it's not required that the self-derivative formula is strictly derived from the activation function - any function with equivalent effect is suffice.

You mean like a "substitute" ? 

For instance , a node receives an error on its output , and you know the "fluctuation" of the output so if you "shift it" to a simpler activation and derive it it will work ?

So in theory it'd be like "regularizing" the output but without doing it and just multiplying by the derivative of the regularization before the derivative of the activation ? 

For instance : 

tanh output -1 -> +1 
sigmoid output 0 -> +1 
tanh to sigmoid output = (tanh_out+1)/2.0
and you just multiply by the derivative of that which is 0.5 ? (without touching the tanh outputs at all)
mql5 documentation errors, defaults or inconsistencies.
mql5 documentation errors, defaults or inconsistencies.
  • 2023.04.07
  • www.mql5.com
Page : https://www.mql5.com/en/docs/globals/globalvariabletime This is inexact, GlobalVariableCheck() does NOT modify the last access time. 2023.04...
 
Bugfix.
Files:
MatrixNet.mqh  23 kb
 

An admin responded to the moderators thread about this . You may be interested 

Forum on trading, automated trading systems and testing trading strategies

mql5 documentation errors, defaults or inconsistencies.

Rashid Umarov, 2023.04.18 11:36

Will be improved as soon as possible. For a while you can use this include file as a reference.

@Stanislav Korotky
 
Stanislav Korotky # : Bugfix.

@Stanislav Korotky

Your efforts to put Neural Network concept with MQL is well appreciated. This is really a great piece of work to start with for beginners like me. Kudos :)

Thanks for updating the file with bug fixed. However, I would suggest to replace the file in the download area.

Luckily I went through the Discussion on the topic and found there is a bug in the file. And than now I was trying to look for the fix, I found this file link here.

I hope there are only these places for bugfix at line numbers 490-493, 500, 515-527, I could with //* marking. If anywhere else please mention line numbers or mark //*BugFix ...

Regards