Hybrid neural networks. - page 16

 
I want to know for myself!
 
gumgum >> :
I want to know it myself!

Kgm ... I think I know :) When I first implemented RPRop I came across a situation when error starts to grow and dEdW value (gradient) goes to +Inf.

Limit number of learning epochs to say 10-15 or introduce check on top value of gradient into the code, I have such code there:


if (Math::Abs(this->dEdW[j][k][i]) < 10e-25)
{
this->dEdW[j][k][i] = 0;
}


This means that the algorithm has hit a local minimum or we are dealing with network retraining.

 
So I understand, to feed all examples from the training set, calculating for each dedw and then dividing dedw by the number of training examples, is that how batch mode works?
 
gumgum >> :
I understand that we feed all examples from the training set by calculating for each dedw we accumulate the sum of the dedw. And then divide dedw by the number of training examples? Is that how the batch mode works?

The disadvantage of this algorithm is that it is discrete

 
gumgum >> :
So, I understand, to feed all examples from the training set, calculating for each dedw we accumulate the sum of it. and then divide dedw by the number of training examples? is that how the batch mode works?

Yes, but don't confuse local gradient for a single neuron and dEdW - you have as many local gradients as neurons, in dEdW you have as many synaptic connections with respect to threshold of the function.

 
dentraf >> :

>> the downside to this algorithm is that it's discrete.

hmm ... What do you mean by discrete? This algorithm is no worse for a number of problems than any gradient method. It is inferior to quasi-Newtonian methods or say LMA. But it is faster than simple gradient.

 
rip >> :

hmm ... what is meant by discrete? This algorithm is no worse for a number of problems than any gradient method. It is inferior to quasi-Newtonian methods or say LMA. But it works faster than simple gradient.

I didn't say anything about speed.)

 
A neural network - I see. How do you prepare it? What kind of data do you run it on? Intervals?
 
Thank you all!
 
rip писал(а) >>

hmm ... what is meant by discrete? This algorithm is no worse for a number of problems than any gradient method. It is inferior to quasi-Newtonian methods or say LMA. But it works faster than simple gradient.

More details about the quasi-Newtonian and LMA methods.