Taking Neural Networks to the next level - page 24

 
Chris70:

You can also run it in the optimizer --> should be faster in non-visual mode; just set the "training counter" to e.g. 1-100 and optimize "slow and complete" with all but one threads disabled; the weight matrix will be saved to the common file folder and re-used upon the next training cycle. But be aware that if " every tick" is chosen, you get as many iterations per cycle as there are ticks, so then it's probably better to not chose "years" of training history. Shorter time per cycle means you get more preliminary reports (also in the common file folder).

Ok. Did that now, using only OHLC data from M1 charts and a single day.

Now, I see some dots plotted in the "Passes/Custom max" window.

What do the dots represent? They seem to be moving somewhere between 60k and 59k at the moment.

 
NELODI:
Ok. Did that now, using only OHLC data from M1 charts and 1 day of training. Now, I see some dots plotted in the "Passes/Custom max" window. What do the dots represent?

That's the loss, i.e. 1/2 squared error times 10^6. Ideally, it should go down between the passes. If not, usually the learning rate is too high (depends both on initial rate and time decay value) or we're stuck in a local minimum (then momentum (setting: ~ 0-0.9) can help or an optimizer that has built-in momentum functionality like Nesterov or ADAM).

Because the function results are not scaled and can be as high as y, the MSE will have pretty high values with this concrete function example, too.

The "10^6" is a relict of other applications, cause if I have e.g. outputs 0-1 and an average absolute error of let's say 0.01, the MSE is sometimes too low to be shown with the few digits after the decimal points (2? 3?) that the custom results graph can show, so better too high indications than too low.

 
Chris70:

That's the loss, i.e. 1/2 squared error times 10^6. Ideally, it should go down between the passes. If not, usually the learning rate is too high (depends both on initial rate and time decay value) or we're stuck in a local minimum (then momentum (setting: ~ 0-0.9) can help or an optimizer that has built-in momentum functionality like Nesterov or ADAM).

Ok. I've changed it now to use 5 days of M1 OHLC data from EURUSD per training pass, with the "train counter" input parameter starting at 1 and stoping at 10000 (10k steps). That should be enough to run through the night. But the "Custom max" output doesn't look like it's going down. It just keeps moving up and down between 60k and 59k. Except for the "train counter", all the input parameters are at your default settings. Do you want me to change something and restart?

Btw ... looking at input parameters, "MLP optimizer method" is set to "Nesterov Accelerated Gradient (NAG)"

I'd upload the image, but I don't see any options to do it on this Forum. Hmmm.

Here it is, as an attachment.
Files:
ANN_Run2.png  85 kb
ANN_Run3.png  27 kb
 

Nesterov is pretty robust (I think it was the preset in the uploaded file; can't you change it in the input settings?); you can go with that. The Nesterov algorithm has "look-ahead step" for the weight update built into the formula, which serves as a neat momentum adaption. I would only stay away from ADADELTA; although this one has the advantage, that no learning rate needs to be set, it requires other fine tuning cause it's e.g. pretty sensitive to a good weight initialization or you deal with vanishing or exploding gradients very quickly.

As a starting point for the learning rate: go into visual mode; for test purposes set a very high learning rate just to see what I mean, e.g. 0.2; then you'll probably see the loss curve graph skyrocketing almost immediately. From there go down with the learning rate  in ~ factor 0.1 increments. If you found a value with decreasing loss curve, take about 10-50% of this value to start the training (this will usually be somewhere in between 0.0001-0.01). Time decay is not obligatory, but can help with fine-tuning at the later stages of the training. Another method would be setting the highest tolerated learning rate in the beginning, but combined with a quicker decline using a higher time decay value (time decay means nothing else but learning rate: *=1/(1+decay*iterations), so if for example you set a value of 0.0001, this means that the learning rate will have reached half the original value after 10.000 iterations.

But if the loss still fails to improve after many thousand iterations (I almost expect this to happen with this formula! The challenge was to force the network to fail and your example might do a very good job!), it is quite possible, that we just have proven, that your function actually is a pretty close approximation to ideal randomness (although it will technically still be pseudo-random).

If (!!) the neural network still is able to reveal some non-random principles, we should (in theory) over time at least see that 0.5*sqrt(2*MSE) < y

By the way: I set the range for y only to 0-100.

Okay... enough looking at computer screens for today...

 

For something that is supposed to be capable for finding the best formula for given inputs and outputs, it sure has a lot of parameters that need to be manually tweaked.

Anyway ... points printed on the chart keep going up with each new training iteration (now at 60369), but I am going to let it run as-is until tomorrow, then I'll post a new screenshot.

Good Night!

 
does somebody have a link for a good NN EA ? Or is this something people are currently developing ?Also new to trading (2 months). Read some stuff about NN and it’s interesting! 
 

I've stopped training in the optimizer, because the results weren't getting any better and I wanted to see details. As you can see in the attached Screenshots, the ANN already went through more than 14 million iterations, but the average error is still very high and it's NOT improving with new iterations.

Files:
ANN_Run4.png  71 kb
ANN_Run5.png  33 kb
 
Since using 3 hidden layers with 100 neurons per layer didn't work out, I've deleted the files generated by the ANN and have started training a new ANN using 10 hidden layers with 500 neurons each (as per your comment a few posts back). Here is a Screenshot of the new training session, so you can see the initial results. I will be running this through the optimizer now, to speed up the training process.
Files:
 

Here are the reports from the 1st test, using 3 hidden layers with 100 neurons each, which was running through 14170063 iterations before I've stopped it:

=============== PROJECT 'ANN_challenge' (TRAINING SUMMARY) ===============

network name: MT5 forum challencetraining data start from: 2019.10.14 00:00
training data end at: 2019.10.17 23:58
symbol (active chart window): EURUSD
MODEL: multilayer perceptron (fully connected)
     neural network architecture:
     0: input layer, 3 neurons
     1: hidden layer, 100 neurons (400 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
     2: hidden layer, 100 neurons (10100 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
     3: output layer, 1 neurons (101 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
     loss function: MSE

     total number of neurons: 204 plus 3 bias neurons
     dead neurons (apoptosis): 0
     total number of weights: 10601 (including 201 bias weights)


ADDITIONAL LEARNING PARAMETERS:
     learning rate decay: 0.0001
     learning rate momentum: 0.3
     optimizer: Nesterov
     optimization coeff. beta2: 0.9
     dropout level: 0.0
     weight initialization method: Chris_uniform

INPUT FEATURE SCALING PARAMETERS:
     method: minmax_method
     exponent: 1.0
     radius/clipping: 10.0
     baseline shift: 0.0

LABEL SCALING PARAMETERS:
     method: minmax_method
     exponent: 1.0
     radius/clipping: 1.0
     baseline shift: 0.0

RESULTS:
     total number of backprop. iterations:            14170063 (=773 passes over 18331 training set samples)
     total training time:                             274 minutes

     MSE loss (entire network):                       0.060732458
     single output neuron loss (average):             0.060732458
                                                      with sqrt(2*(L/n))=0.3485181718 avg. abs. error (before rescaling)

=============== PROJECT 'ANN_challenge' (TEST REPORT) ===============

network name: MT5 forum challence
test data start from: 2019.10.14 00:00
test data end at: 2019.10.17 23:58
symbol (active chart window): EURUSD
total number of samples in the test set: 22515

the network has been trained on 14170063 total backpropagation iterations
(=773 passes over 18331 training set samples)

MODEL: multilayer perceptron (fully connected)

     neural network architecture:
     0: input layer, 3 neurons
     1: hidden layer, 100 neurons (400 incoming weights), activation: sigmoid (logistic)
     2: hidden layer, 100 neurons (10100 incoming weights), activation: sigmoid (logistic)
     3: output layer, 1 neurons (101 incoming weights), activation: sigmoid (logistic)
     loss function: MSE

     total number of dead neurons (apoptosis): 0
     total number of neurons: 204 plus 3 bias neurons
     total number of weights: 10601 (including 201 bias weights)

=============== TEST RESULTS SUMMARY ===============
     LABELS (=TARGETS / 'REAL' VALUES):
          mean: 49.94030646
          median: 46.0
          variance: 833.00869208
          standard deviation: 28.86188996
          excess kurtosis: -1.21960764
          sample skewness: 0.06041718
          median skewness (Pearson): 0.40956844
     PREDICTIONS (=RESCALED OUTPUTS):
          mean: 61.82898731
          median: 48.62273801
          variance: 209.89952254
          standard deviation: 14.48790953
          excess kurtosis: -1.86380558
          sample skewness: 0.3105026
          median skewness (Pearson): 2.73460763
     TEST OUTCOME (=RESCALED OUTPUTS VS LABELS):
          mean squared error (MSE): 441.38377362
          standard error of regression (=SER,=RMSE): 21.00913548
          mean absolute error (MAE): 16.29652085
          maximum absolute deviation (MAD): 48.62272513
          explained variance (SSE): 4725887.75010007
          residual variance (SSR): 9937755.66295825
          total variance (SST): 21675447.70111327
          R squared (coefficient of determination): 0.54152017

=============== INDIVIDUAL OUTPUT NEURON RESULTS ===============
output[1] my fn function result 'r':
           label mean: 49.94030646, label var.: 833.00869208, label std.dev.:28.86188996, label exc. kurtosis: -1.21960764, label median skewness: 0.13652281,
           output mean: 61.82898731, output var.: 209.89952254, output std.dev.: 14.48790953, output exc. kurtosis: -1.86380558, output median skewness: 0.91153588,
           MAE: 16.29652085, MAD: 48.62272513, MSE: 441.38377362, SER: 21.00913548, SSE: 4725887.75010007, SSE: 4725887.75010007, SST: 21675447.70111327, R2: 0.54152017


And these are the results of a new test, using an ANN with 10 hidden layers and 500 neurons each ...


=============== PROJECT 'ANN_challenge' (TRAINING SUMMARY) ===============

network name: MT5 forum challencetraining data start from: 2019.10.17 00:00
training data end at: 2019.10.18 23:59
symbol (active chart window): EURUSD
MODEL: multilayer perceptron (fully connected)
     neural network architecture:
     0: input layer, 3 neurons
     1: hidden layer, 500 neurons (2000 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
     2: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
     3: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
     4: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
     5: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
     6: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
     7: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
     8: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
     9: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
     10: output layer, 1 neurons (501 incoming weights), activation: sigmoid (logistic), learning rate: 0.001
     loss function: MSE

     total number of neurons: 4504 plus 10 bias neurons
     dead neurons (apoptosis): 0
     total number of weights: 2006501 (including 4501 bias weights)

ADDITIONAL LEARNING PARAMETERS:
     learning rate decay: 0.0001
     learning rate momentum: 0.3
     optimizer: Nesterov
     optimization coeff. beta2: 0.9
     dropout level: 0.0
     weight initialization method: Chris_uniform

INPUT FEATURE SCALING PARAMETERS:
     method: minmax_method
     exponent: 1.0
     radius/clipping: 10.0
     baseline shift: 0.0

LABEL SCALING PARAMETERS:
     method: minmax_method
     exponent: 1.0
     radius/clipping: 1.0
     baseline shift: 0.0

RESULTS:
     total number of backprop. iterations:            203760 (=1068 passes over 190 training set samples)
     total training time:                             416 minutes

     MSE loss (entire network):                       0.1146638594
     single output neuron loss (average):             0.1146638594
                                                      with sqrt(2*(L/n))=0.4788817379 avg. abs. error (before rescaling)
=============== PROJECT 'ANN_challenge' (TEST REPORT) ===============

network name: MT5 forum challence
test data start from: 2019.10.17 00:00
test data end at: 2019.10.18 23:45
symbol (active chart window): EURUSD
total number of samples in the test set: 192

the network has been trained on 203952 total backpropagation iterations
(=1069 passes over 190 training set samples)

MODEL: multilayer perceptron (fully connected)

     neural network architecture:
     0: input layer, 3 neurons
     1: hidden layer, 500 neurons (2000 incoming weights), activation: sigmoid (logistic)
     2: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
     3: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
     4: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
     5: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
     6: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
     7: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
     8: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
     9: hidden layer, 500 neurons (250500 incoming weights), activation: sigmoid (logistic)
     10: output layer, 1 neurons (501 incoming weights), activation: sigmoid (logistic)
     loss function: MSE

     total number of dead neurons (apoptosis): 0
     total number of neurons: 4504 plus 10 bias neurons
     total number of weights: 2006501 (including 4501 bias weights)

=============== TEST RESULTS SUMMARY ===============
     LABELS (=TARGETS / 'REAL' VALUES):
          mean: 52.85416667
          median: 29.0
          variance: 787.65624819
          standard deviation: 28.0652142
          excess kurtosis: -1.17058506
          sample skewness: 0.04767082
          median skewness (Pearson): 2.54986474
     PREDICTIONS (=RESCALED OUTPUTS):
          mean: 49.69327202
          median: 49.97117145
          variance: 8.36744753
          standard deviation: 2.89265406
          excess kurtosis: 34.61102201
          sample skewness: 5.36822157
          median skewness (Pearson): -0.28821223
     TEST OUTCOME (=RESCALED OUTPUTS VS LABELS):
          mean squared error (MSE): 809.83882906
          standard error of regression (=SER,=RMSE): 28.45766732
          mean absolute error (MAE): 24.4477848
          maximum absolute deviation (MAD): 50.24216175
          explained variance (SSE): 1606.54992483
          residual variance (SSR): 155489.05517977
          total variance (SST): 159126.21266913
          R squared (coefficient of determination): 0.02285706

=============== INDIVIDUAL OUTPUT NEURON RESULTS ===============
output[1] my fn function result 'r':
           label mean: 52.85416667, label var.: 787.65624819, label std.dev.:28.0652142, label exc. kurtosis: -1.17058506, label median skewness: 0.84995491,
           output mean: 49.69327202, output var.: 8.36744753, output std.dev.: 2.89265406, output exc. kurtosis: 34.61102201, output median skewness: -0.09607074,
           MAE: 24.4477848, MAD: 50.24216175, MSE: 809.83882906, SER: 28.45766732, SSE: 1606.54992483, SSE: 1606.54992483, SST: 159126.21266913, R2: 0.02285706

Attached are screenshots taken from the Strategy Tester showing the "Passes/Custom max" chart, raw data from the last couple of iteration, and a detailed view of the ANN training in visual mode.

I've switched to visual mode now, to see how the ANN looks like. From what I can see, the larger ANN, after 210k iterations, thinks that all the outputs are somewhere between 49 and 51, with "predicted result" slowly moving up or down with each iteration. It looks almost like the large ANN is updating the weights to find the average value of all possible outputs (range of outputs is between 0 and 100 in this testset).

Ok. I think, after these two tests, it is fairly safe to say that the ANN is useless when the output is as far away from being linear as possible, as is the case with this pseudo-random generator.

Files:
 

This was fun. The outcome was kind-of expected, but it was fun getting there.

So ... Chris, would you accept another challenge? ;)

No random numbers this time. Now, I'd like to use price data. No trading, just training and prediction.

For input neurons, use the Open price of the last 100 bars (completed bars, not the current unfinished bar).

For output neurons, use the Close price of the exact same Bars. That's 100 inputs and 100 outputs.

In this scenario, once the ANN is fully trained, the only thing it would actually have to "predict" is the last Close price, since the first 99 outputs are always going to be identical to last 99 inputs. The challenge is to find out how good the ANN will be in predicting the result of the last output neuron when presented with previously unseen data for input neurons. But ... I'm also curious to see if the ANN can "figure out" that the first 99 outputs are just copies of the last 99 inputs and get to a state where it would always produce correct results for the first 99 outputs.

Feel free to increase or decrease the number of input and output neurons, if you think it can improve the results. I've used 100 here only as an example.

PS. I'll try to do the same with my very simple ANN (source code I've posted earlier in this thread).