You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
I want to know it myself!
Kgm ... I think I know :) When I first implemented RPRop I came across a situation when error starts to grow and dEdW value (gradient) goes to +Inf.
Limit number of learning epochs to say 10-15 or introduce check on top value of gradient into the code, I have such code there:
if (Math::Abs(this->dEdW[j][k][i]) < 10e-25)
{
this->dEdW[j][k][i] = 0;
}
This means that the algorithm has hit a local minimum or we are dealing with network retraining.
I understand that we feed all examples from the training set by calculating for each dedw we accumulate the sum of the dedw. And then divide dedw by the number of training examples? Is that how the batch mode works?
The disadvantage of this algorithm is that it is discrete
So, I understand, to feed all examples from the training set, calculating for each dedw we accumulate the sum of it. and then divide dedw by the number of training examples? is that how the batch mode works?
Yes, but don't confuse local gradient for a single neuron and dEdW - you have as many local gradients as neurons, in dEdW you have as many synaptic connections with respect to threshold of the function.
>> the downside to this algorithm is that it's discrete.
hmm ... What do you mean by discrete? This algorithm is no worse for a number of problems than any gradient method. It is inferior to quasi-Newtonian methods or say LMA. But it is faster than simple gradient.
hmm ... what is meant by discrete? This algorithm is no worse for a number of problems than any gradient method. It is inferior to quasi-Newtonian methods or say LMA. But it works faster than simple gradient.
I didn't say anything about speed.)
hmm ... what is meant by discrete? This algorithm is no worse for a number of problems than any gradient method. It is inferior to quasi-Newtonian methods or say LMA. But it works faster than simple gradient.
More details about the quasi-Newtonian and LMA methods.