Discussion of article "Neural networks made easy (Part 6): Experimenting with the neural network learning rate"

 

New article Neural networks made easy (Part 6): Experimenting with the neural network learning rate has been published:

We have previously considered various types of neural networks along with their implementations. In all cases, the neural networks were trained using the gradient decent method, for which we need to choose a learning rate. In this article, I want to show the importance of a correctly selected rate and its impact on the neural network training, using examples.

The third experiment is a slight deviation from the main topic of the article. Its idea came about during the first two experiments. So, I decided to share it with you. While observing the neural network training, I noticed that the probability of the absence of a fractal fluctuates around 60-70% and rarely falls below 50%. The probability of emergence of a fractal, wither buy or sell, is around 20-30%. This is quite natural, as there are much less fractals on the chart than there are candlesticks inside trends. Thus, our neural network is overtrained, and we obtain the above results. Almost 100% of fractals are missed, and only rare ones can be caught.  

Training the EA with the learning rate of 0.01

To solve this problem, I decided to slightly compensate for the unevenness of the sample: for the absence of a fractal in the reference value, I specified 0.5 instead of 1 when training the network.

            TempData.Add((double)buy);
            TempData.Add((double)sell);
            TempData.Add((double)((!buy && !sell) ? 0.5 : 0));

This step produced a good effect. The Expert Advisor running with a learning rate of 0.01 and a weight matrix obtained from previous experiments shows the error stabilization of about 0.34 after 5 training epochs. The share of missed fractals decreased to 51% and the percentage of hits increased to 9.88%. You can see from the chart that the EA generates signals in group and thus shows some certain zones. Obviously, the idea requires additional development and testing. But the results suggest that this approach is quite promising. 

Learning with 0.5 for no fractal

Author: Dmitriy Gizlyk

 
I was just passing by))))) Are you not confused by the fact that the error grows as you learn???? it should be the other way round)))))
 
Александр Алексеевич:
I was just passing by)))) Aren't you confused by the fact that the error grows with training??? it should be the other way round)))))
The error grows during the first passes after initialisation with random values. After several passes it starts to decrease. This may be due to chaotic scatter of initial values from the level of the target function. Also, one can notice a slight growth of the error against the background of the general tendency to its decrease, which is explained by the non-smoothness of the target function.
 

Hi Dmitriy,


I really like this series, as a learning tool for me for Neural Networks. I use MT4, including finding an implementation of SymbolInfo. I am guess that is where the problem is, as it is running but not doing anything during learning. Would you have any idea on what would be needed for it to run in MT4? Thanks!

 

Good afternoon!

Can you tell me, do you train NS only on the closing price? Or do you also use the trading volume at a given TM?

 
Oleg Mazurenko:

Good afternoon!

Can you tell me, do you train NS only on the closing price? Or do you also use the trading volume at a given TM?

Now in the described example, the neural network receives opening, closing, high, low, volume, time and readings of 4 indicators. The process of transferring the initial data to the neural network is described in the link.

Нейросети — это просто (Часть 2): Обучение и тестирование сети
Нейросети — это просто (Часть 2): Обучение и тестирование сети
  • www.mql5.com
В данной статье мы продолжим изучение нейронных сетей, начатое в предыдущей статье и рассмотрим пример использования в советниках созданного нами класса CNet. Рассмотрены две модели нейронной сети, которые показали схожие результаты как по времени обучения, так и по точности предсказания.
 

For anyone coming after me: note the first example Fractal_OCL1.mql won't compile

You need to change

//#define  lr 0.1

double eta=0.1;

 
The main problem is not in selecting the training coefficient, after all, Tensor Flo has a function that gradually reduces it during training to a specified value, selecting the optimal one. The problem is that the neural network does not find stable patterns, it has nothing to hold on to. I have used models ranging from fully connected layers to the new-fangled ResNet and Attention. The effect does not exceed 60%, and this is on a narrow area, in general, everything slips to 50/50. With neural networks we need to think about what could be analysed in general. Just arrays of prices and volumes, in any combinations, do not give results.
 
eccocom #:
The main problem is not in selecting the training coefficient, after all, Tensor Flo has a function that gradually reduces it during training to a specified value, selecting the optimal one. The problem is that the neural network does not find stable patterns, it has nothing to hold on to. I have used models ranging from fully connected layers to the new-fangled ResNet and Attention. The effect does not exceed 60%, and this is on a narrow area, in general, everything slips to 50/50. With neural networks we need to think about what could be analysed in general. Just arrays of prices and volumes, in any combinations, do not give results.

Try to analyse the correlation between the initial data and the target result.

 

"...in the absence of a fractal in the reference value, when training the network, I specified 0.5 instead of 1."

Why exactly 0.5, where did this figure come from?

 
Gexon training the network specified 0.5 instead of 1."

Why exactly 0.5, where did this figure come from?

During training, the model learns the probability distribution of each of the 3 events. Since the probability of fractal absence is much higher than the probability of its appearance, we artificially underestimate it. We specify 0.5, because at this value we come to approximately equal level of maximum probabilities of events. And they can be compared.
I agree that this approach is very controversial and is dictated by observations from the training sample.