Machine learning in trading: theory, models, practice and algo-trading - page 935

 
forexman77:

How to determine the "noise predictors"? Tried selecting by relevance and removing in this way, the result is worse.

There were a lot of recommendations.

I use a very simple scheme, I'm not looking for accuracy, but it is very clear.

I take the predictor, which is a vector, and divide it into two parts: one part refers to one class of the target, and the other part refers to another class of the target. Then I build histograms of each part and compare them: if they coincide, then it is noise, if they diverge, then they have some predictive power. If they diverge completely, they have 100% predictive power (haven't seen it). Intersection of histograms is a classification error that cannot be overcome in principle.

I posted the graphs here, had rsi as the predictor. It is possible to make a measure of the divergence of the histograms.


This is rattle - with zero cost you can check an arbitrary number of predictors against a large number of target variables, which I sometimes do as a courtesy to especially lazy ones.

 
Aleksey Vyazmikin:

It is not clear how to read your indicator.

It is not an indicator, but a random forest...

You can evaluate models by OOB, RMS, etc., but you cannot understand the result of the model.

Only after imposing the result of a random forest on a price chart the peculiarities of the model's behavior are revealed.

The graph is generated using https://github.com/Roffild/RoffildLibrary/blob/master/Include/Roffild/ToIndicator.mqh

 
Dr. Trader:

Filter_02 2016 arr_Buy

There class "1" even exceeds "0" in number, so there are less false inputs compared to before. Try this tree in your EA please? I wonder what will be on the profit chart myself.


y_pred
y_true01
0
1


The first column with digits - without filter, and the second with a filter

Maybe I messed up with the tree logic?


void FilterTree()
{
int arr_DonProc=DonProcf();
int arr_DonProcVisota=DonProcVisotaf();
int arr_DonProc_M15=DonProc_M15f();
int Level_Support_D1=LevlSupportf(PERIOD_D1);
int Level_Support_W1=LevlSupportf(PERIOD_W1);
int arr_Regressor=RegressorP();
bool BlockBuy=false;

if (arr_DonProc>=2.5)
   {
   if (arr_DonProcVisota>=7.5)
      {
         BlockBuy=true;      
      }
   else
      {
       if (arr_DonProc>=6.5)
          { 
            if (Level_Support_D1>=-2.5)
            {
               if (Level_Support_W1<=1.5)BlockBuy=true;                  
            }          
          }
       else
          {
            if (arr_Regressor>=2.5)
               {
                  if(arr_DonProc_M15>=4.5)BlockBuy=true;
               
               }
            else
               {
                  if(Level_Support_D1<1.5)
                  {
                     if(Level_Support_W1<-1.5)BlockBuy=true;
                  }
               }
          }         
      }   
   
   }

if (BlockBuy==true)BuyNow=false;
SellNow=false;
}
 
Roffild:

This is not an indicator, but a random forest...

You can evaluate models by OOB, RMS, etc., but you cannot understand the result of the model.

Only after imposing the result of the random forest on the price chart the peculiarities of the model's behavior are revealed.

Graph generated with https://github.com/Roffild/RoffildLibrary/blob/master/Include/Roffild/ToIndicator.mqh

I do not understand, you obviously have not the forest, but the indicator - maybe the fact that it has processed some logic of the "forest" and brought it out, but it does not become not an indicator.

So what did you want to show, how to read the readings, what do they provide?

 
Aleksey Vyazmikin:


The first column with digits - without filter, and the second with a filter

Maybe I messed up the logic of the tree?


There was a mistake.

if (BlockBuy==false)BuyNow=false;


 
Aleksey Vyazmikin:

It is not clear, you clearly have not a forest there, and the indicator - maybe the fact that he processed some logic "forest" and brought it out, but this does not make it not an indicator.

So what did you want to show, how to read the readings, what they give?

CDForest::DFProcess(forest1, netinputs, netoutputs);
indicSymbol.buffer(netoutputs[1] * 100.0, 0);

CDForest::DFProcess(forest2, netinputs, netoutputs);
indicSymbol.buffer(netoutputs[1] * 100.0, 1);
If you don't believe that the graph is the most real forest, then the notion of forests is obviously wrong...
 
Roffild:
If you don't believe that the graph is the most real forest, then the notion of forests is clearly wrong...

What does this have to do with a question of faith? I see squiggles on the graph - and I don't understand how to interpret them - that's it.

 

Dr. Trader, in general you need to branch further, very little information to make a decision.

 
Aleksey Vyazmikin:

The first column with numbers - without filter, and the second with a filter

I see. The tree could not learn how to filter properly, so the result was not much better with filtering, just less deals. Basically filtered out some of the good trades and some of the bad trades randomly,

I trained the tree on 2015 only for malovhodov.
I trained filter_02 and mnogovhodov_02 on 2016, it is better to compare 2016 and 2017 in the tester (2017 - all new data, which was not in the archive, that's the most interesting to see).

 
Aleksey Vyazmikin:

Dr. Trader, in general you need to branch the tree further, very little information to make a decision.

Further branching in my case led to an overfit. For better accuracy we should move to more complex models - a forest or a neuron.

You can branch to 100% accuracy on training data, but what is the point in it if such a tree will only fail on new data. We need to teach such a model that will be able to show almost the same result on new data as on the training.

Reason: