Neuer Artikel Neuronale Netze leicht gemacht (Teil 13): Batch Normalization ist erschienen:
Autor: Dmitry Gizlyk
For me as beginner with NN it was very enlighting. I want to use your proposals to code an EA. It should be an constructions set for DNN to try out different functions and topologies and learn, which are better.
So I modified your last example (MLMH + Convolutional).
I added many different activation functions (32 functions - gaussian, SeLU, SILU, Softsign, Symmetric Sigmoid,...) and their derivates,
I changed the error/success calculation (Buy, Sell, DontBuySell) because I think "Don't trade" isn't undefined. So if the NN recognizes no buy and no sell and this is correct in real it should be rewarded in feedback loop.
Maybe someone has already solutions or can help with following questions:
I'm not able to create functions which need weights from complete layer: Softmax, Maxout, PRelu with learned alpha.
Also I'm not able to do different optimizations (AdaBound, AMSBound, Momentum).
I'm thinking of a DNN-Builder-EA for testing to find the best net-topology.
1. how can I find the in-/out-count of neurons and weights per layer?
2. What topology do you suggest? I tried many variations:
A) A few Neuron layers with count=19000 then descending count in next layers *0.3
B) 1 Convolutional + 12 layers MLMH with each 300 neurons
C) 29 layers with each 300 neurons
D) 29 layers which each 300 neurons and Normalization between each layer.
I get forecasts to max 57%, but I think it can/has to be be better.
Should there be layers with rising neuron count and then descending again?
3. How can I make a back test? There is a condition to return false when in test modi - I tried to remark it, but no success.
There are many explanations in very detail but I don't understand some overview.
4. Which layer after which? Where should be BatchNorm layers?
5. How much output neurons has convolutional or all the multi head like MLMH when layers=x, step=y, window_out=z? I have to calculate count of next Neuron layer. I want to avoid too big layers or bottlenecks.
6. What about LSTM_OCL? Is it too weak in relation to attention/MH, MHML?
7. I want to implement eta for each layer, but had no success (lack of know how about classes - I'm an good 3rd gen coder).
8. What should be modified, to get an error rate < 0.1. I have constant 0,6+.
9. What about bias neurons in this existing layer layouts?
I studied already many websites for weeks, but didn't found answers for these questions.
But I'm looking forward, to solve this, because of the positive feedback of others, which had already success.
Maybe there is coming up part 14 with solutions for all these issues?
Best regards
and many thanks in advance
HI. I am getting this error
CANDIDATE FUNCTION NOT VIABLE: NO KNOW CONVERSION FROM 'DOUBLE __ATTRIBUTE__((EXT_VECTOR_TYPE92000' TO 'HALF4' FOR 1ST ARGUMENT
2022.11.30 08:52:28.185 Fractal_OCL_AttentionMLMH_b (EURJPY,D1) OpenCL program create failed. Error code=5105
when using EA since article part 10 examples
Please any guess???
Thank you
HI. I am getting this error
CANDIDATE FUNCTION NOT VIABLE: NO KNOW CONVERSION FROM 'DOUBLE __ATTRIBUTE__((EXT_VECTOR_TYPE92000' TO 'HALF4' FOR 1ST ARGUMENT
2022.11.30 08:52:28.185 Fractal_OCL_AttentionMLMH_b (EURJPY,D1) OpenCL program create failed. Error code=5105
when using EA since article part 10 examples
Please any guess???
Thank you
Hi, can you send full log?
Hello Rogerio.
1. You don't create model.
CS 0 08:28:40.162 Fractal_OCL_AttentionMLMH_d (EURUSD,H1) EURUSD_PERIOD_H1_ 20Fractal_OCL_AttentionMLMH_d.nnw CS 0 08:28:40.163 Fractal_OCL_AttentionMLMH_d (EURUSD,H1) OnInit - 130 -> Error of read EURUSD_PERIOD_H1_ 20Fractal_OCL_AttentionMLMH_d.nnw prev Net 5004
2. Your GPU doesn't support double. Please, load last version from article https://www.mql5.com/ru/articles/11804
CS 0 08:28:40.192 Fractal_OCL_AttentionMLMH_d (EURUSD,H1) OpenCL: GPU device 'Intel HD Graphics 4400' selected CS 0 08:28:43.149 Fractal_OCL_AttentionMLMH_d (EURUSD,H1) 1:9:26: error: OpenCL extension 'cl_khr_fp64' is unsupported CS 0 08:28:43.149 Fractal_OCL_AttentionMLMH_d (EURUSD,H1) 1:55:16: error: no matching function for call to 'dot' CS 0 08:28:43.149 Fractal_OCL_AttentionMLMH_d (EURUSD,H1) c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:2199:61: note: candidate function not viable: no known conversion from 'double4' to 'float' for 1st argument
- www.mql5.com
Hi Dmitriy
You wrote: You don't create model.
But How do I create a model ? I compile all program fonts and run the EA.
The EA creates a file on folder 'files' whith the extension nnw. this file isn't the model ?
Thanks
Hi Teacher Dmitriy
Now none of the .mqh compiles
for example when i try to compile the vae.mqh i obtain this error
'MathRandomNormal' - undeclared identifier VAE.mqh 92 8
I will try to start from the begning again.
One more question:When you put a new version of NeuroNet.mqh this version is fully compatible with the other olders EA ?
Thanks
rogerio
PS: Even deleting all files and directories and start with a new copy from PART 1 and 2 I can not more compile any code.
For exemple when a try to compile the code in fractal.mq5 a obtain this error:
cannot convert type 'CArrayObj *' to reference of type 'const CArrayObj *' NeuroNet.mqh 437 29
Sorry I realy wanted to understand your articles and code.
PS2: OK i removed the word 'const' on 'feedForward', 'calcHiddenGradients' and 'sumDOW' and now i could compile the Fractal.mqh and Fractal2.mqh
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
New article Neural networks made easy (Part 13): Batch Normalization has been published:
In the previous article, we started considering methods aimed at improving neural network training quality. In this article, we will continue this topic and will consider another approach — batch data normalization.
Various approaches to data normalization are used in neural network application practice. However, all of them are aimed at keeping the training sample data and the output of the hidden layers of the neural network within a certain range and with certain statistical characteristics of the sample, such as variance and median. This is important, because network neurons use linear transformations which in the process of training shift the sample towards the antigradient.
Consider a fully connected perceptron with two hidden layers. During a feed-forward pass, each layer generates a certain data set that serves as a training sample for the next layer. The result of the output layer is compared with the reference data. Then, during the feed-backward pass, the error gradient is propagated from the output layer through hidden layers towards the initial data. Having received an error gradient at each neuron, we update the weight coefficients, adjusting the neural network for the training samples of the last feed-forward pass. A conflict arises here: the second hidden layer (H2 in the figure below) is adjusted to the data sample at the output of the first hidden layer (H1 in the figure), while by changing the parameters of the first hidden layer we have already changed the data array. In other words, we adjust the second hidden layer to the data sample which no longer exists. A similar situation occurs with the output layer, which adjusts to the second hidden layer output which has already changed. The error scale will be even greater if we consider the distortion between the first and the second hidden layers. The deeper the neural network, the stronger the effect. This phenomenon is referred to as internal covariate shift.
Classical neural networks partly solve this problem by reducing the learning rate. Minor changes in weights do not entail significant changes in the sample distribution at the output of the neural layer. But this approach does not solve the scaling problem which appears with an increase in the number of neural network layers, and it also reduces the learning speed. Another problem of a small learning rate is that the process can get stuck on local minima, which we have already discussed in article 6.
Author: Dmitriy Gizlyk