Machine learning in trading: theory, models, practice and algo-trading - page 516

 
elibrarius:

Checked and forests - much faster than the NS (4 minutes), and the result is about the same. And what's interesting linear regression counts even faster, with the same results.
As someone wrote here - it's all about features.


Well, that's the main thing, and the game with different models and bags will not give a big increase :)

 
Maxim Dmitrievsky:

There, as far as I understood, you can set 1-2 epochs, because it almost always converges with the first time... maybe this was an omission? although I have not used it for a long time, I may be confused

Nowhere have I seen a limit in epochs
 
elibrarius:
I haven't seen a restriction anywhere in epochs.

mlptrainlm function

/************************************************************************* Neural network training using modified Levenberg-Marquardt with exact Hessian calculation and regularization. Subroutine trains neural network with restarts from random positions. The algorithm is well suited for small and medium scale problems (hundreds of weights). INPUT PARAMETERS: Network - neural network with initialized geometry XY - training set NPoints - training set size Decay - weight decay constant, >=0.001 Decay term 'Decay*|||Weights||^2' is added to error function. If you don't know what Decay to choose, use 0.001. Restarts - number of restarts from random position, >0. If you don't know what Restarts to choose, use 2.
I think it's the same as above.
 
Maxim Dmitrievsky:

Well this is the main thing, while the game with different models and bags will not give a big increase :)

I think the advantage of NS is to find non-linear dependencies and use them.
Vladimir's last article has 2 of them.

Linear regression they will on the contrary worsen.

 
Maxim Dmitrievsky:

mlptrainlm function

This is just a recommended value, no one prevents at least 1000, but it will be long... I looked in the code - there's just a loop on the number of epochs (I also used 2, by the way).
 
elibrarius:
I think the advantage of NS is to find nonlinear dependencies and use them.
There are 2 of them in Vladimir's last article.

Linear regression they will on the contrary worsen.


Scaffolding is also used exclusively for nonlinear patterns in advance, it does not work on linear ones

 
elibrarius:
This is just a recommended value, no one is preventing to put even 1000, but it will be long... Looked in the code - there's just a cycle on the number of epochs (I also used 2, by the way).
Did it up to 1500 on a 6 ply. Yes, long - about 32 hours, but, firstly, the result exceeds expectations. Secondly, not long at all, compared to projecting by hand). And with MLP - standard structure, and teach whatever you want.))
 
Maxim Dmitrievsky:

scaffolding is also used exclusively for nonlinear patterns in advance, it does not work on linear patterns

Maybe that's why the forest on the validation plot is as much as 0.4% better than the linear regression))). Learning time is 36 and 3 minutes, respectively (at 265 inputs). I'm starting to like linear regression.
 

If anyone wants to play around, the scaffold is trained on increments and gives a prediction for 1 bar forward. The depth of training, lag for increments and number of entries are set in the settings (each new entry is a shift of 1 bar back). Then the forecasted value is subtracted from the current prices. The histogram is drawn only for each new bar.

//+------------------------------------------------------------------+
//|                                          NonLinearPredictor.mql5 |
//|                                                  Dmitrievsky Max |
//|                        https://www.mql5.com/ru/users/dmitrievsky |
//+------------------------------------------------------------------+
#property copyright "Dmitrievsky Max."
#property link      "https://www.mql5.com/ru/users/dmitrievsky"
#property version   "1.00"
#property indicator_separate_window
#property indicator_buffers 2
#property indicator_plots   1
//--- plot Label1
#property indicator_label1  "Tensor non-linear predictor"
#property indicator_type1   DRAW_COLOR_HISTOGRAM
#property indicator_color1  clrOrangeRed,clrOrange
#property indicator_style1  STYLE_SOLID
#property indicator_width1  1
//--- подключим библиотеку Alglib
#include <Math\Alglib\alglib.mqh>

//RDF system. Here we create all RF objects.
CDecisionForest      RDF;                                     //Random forest object
CDFReport            RDF_report;                              //RF return errors in this object, then we can check it
double RFout[1], vector[];                                    //Arrays for calculate result of RF
CMatrixDouble RMmatrix;
int retcode=0;

//--- input parameters
input int             last_bars=500;
input int             lag=5; 
input int             bars_seria = 100;

//--- indicator buffers
double         SpreadBuffer[];
double         ColorsBuffer[];

//--- время открытия предыдущего бара
static datetime last_time=0;
int needToLearn=0;
//+------------------------------------------------------------------+
//| Custom indicator initialization function                         |
//+------------------------------------------------------------------+
int OnInit()
  {
//--- indicator buffers mapping
   SetIndexBuffer(0,SpreadBuffer,INDICATOR_DATA);
   SetIndexBuffer(1,ColorsBuffer,INDICATOR_COLOR_INDEX);
//--- установим индексацию как в таймсерии - от настоящего в прошлое
   ArraySetAsSeries(SpreadBuffer,true);
   ArraySetAsSeries(ColorsBuffer,true);
   RMmatrix.Resize(last_bars,bars_seria);
   ArrayResize(vector,bars_seria-1);
//---
   IndicatorSetString(INDICATOR_SHORTNAME,StringFormat("Non-linear predictor (%s, %s, %s)",_Symbol,(string)last_bars, (string)lag));
   IndicatorSetInteger(INDICATOR_DIGITS,5);
   return(INIT_SUCCEEDED);
  }
 
int OnCalculate(const int rates_total,
                const int prev_calculated,
                const datetime &time[],
                const double &open[],
                const double &high[],
                const double &low[],
                const double &close[],
                const long &tick_volume[],
                const long &volume[],
                const int &spread[])
  {
//---
  ArraySetAsSeries(close,true);
  
   if(prev_calculated==0 || needToLearn>last_bars/5) 
     {
      for(int i=0;i<last_bars;i++) 
       {   
        for(int l=0;l<ArraySize(vector);l++)
         {
          RMmatrix[i].Set(l,MathLog(close[i+1+l]/close[i+lag+1+l]));
         }   
        RMmatrix[i].Set(bars_seria-1,MathLog(close[i]/close[i+lag]));
       }
      CDForest::DFBuildRandomDecisionForest(RMmatrix,last_bars,bars_seria-1,1,100,0.95,retcode,RDF,RDF_report);
      needToLearn=0;
     }
     
   if(isNewBar()) 
    {
     if(retcode==1)
      {
       for(int i=0;i<ArraySize(vector);i++)
        {
         vector[i]=MathLog(close[i]/close[i+lag]);
        }
    
       CDForest::DFProcess(RDF,vector,RFout);
       SpreadBuffer[0]=MathLog(close[0]/close[0+lag])-RFout[0];
       ColorsBuffer[0]=(SpreadBuffer[0]>0?0:1);
      }
      needToLearn++;
     }
   return(rates_total);
  }
//+------------------------------------------------------------------+
//| возвращает true при появлении нового бара                        |
//+------------------------------------------------------------------+
bool isNewBar()
  {
   datetime lastbar_time=datetime(SeriesInfoInteger(Symbol(),_Period,SERIES_LASTBAR_DATE));
   if(last_time==0)
     {
      last_time=lastbar_time;
      return(false);
     }
   if(last_time!=lastbar_time)
     {
      last_time=lastbar_time;
      return(true);
     }
   return(false);
  
  }
 
elibrarius:
Maybe that's why the forest in the validation plot is as much as 0.4% better than linear regression.)) Learning time 36 and 3 min respectively (at 265 inputs). Something I'm starting to like about linear regression.

I also compared - did BP autoregression and did the same through the forest - the differences are minimal :) In essence this means that neither there is a normal pattern

Reason: