Machine learning in trading: theory, models, practice and algo-trading - page 2130

 
Aleksey Vyazmikin:

I have already started training without minutes - let's see.

I also use time on the type of 1/4 bar - hours, 4 hours, days.

In general it turns out that the wooden models need to be fed a lot of inputs, which are pre-partitioned as much as possible, i.e. have the minimum number of possible own divisions.

 
elibrarius:

In general, it turns out that wooden models need to supply many inputs, which are pre-partitioned as much as possible, i.e. have the minimum number of possible own divisions.

If you do the quantization yourself, then yes, but there is also built-in automation.

The histograms above are just different numbers of quanta per predictor, you can see how they affect the final result.

If to take out veins of valuable information as a separate quantum of the predictor, it is possible to code this vein as a binary feature and feed it separately.

 

In general, it turns out that the wooden models should be fed a lot of inputs, which are pre-divided as much as possible, ie, have the minimum number of possible own divisions.

Balance - T1 with minutes and T2 without minutes - average result: 3384/3126/3890

Recall - average result: 0.0459/0.0424/0.0458


Precision - average score: 0.5216/0.5318/0.5389

In terms of the average scores in the T2 aggregate, the worst case turned out.

I opened the table of predictor significance and was surprised



Somehow the teaching method did not like the last changes, maybe I did not do something right?

//день недели, час = ввести через 2 предиктора sin и cos угла от полного цикла 360/7,  360/24
   double tmp[4];
   int nInd=0;
   MqlDateTime dts;
   double pi=3.1415926535897932384626433832795;
   for(int buf=0; buf<2; buf++)
   {
      TimeToStruct(iTime(Symbol(),PERIOD_CURRENT,0),dts);
      //tmp[buf]=(double)(dts.hour*60+dts.min)*360.0/1440.0;
      tmp[buf]=(double)(dts.hour*60+dts.min)*360.0/24.0;
      tmp[buf]=(buf==0?MathSin(tmp[0]*pi/180.0):MathCos(tmp[0]*pi/180.0));

      TimeToStruct(iTime(Symbol(),PERIOD_CURRENT,0),dts);
      //tmp[buf+2]=(double)(dts.day_of_week*1440+dts.hour*60+dts.min)*360.0/10080.0;
      tmp[buf+2]=(double)dts.day_of_week*360.0/7.0;
      tmp[buf+2]=(buf==0?MathSin(tmp[0]*pi/180.0):MathCos(tmp[0]*pi/180.0));
   }


I opened the sample



And I saw that the column TimeHG - are hours - my mistake - I need to redo all the tests.


 
Aleksey Vyazmikin:

Balance - T1 with minutes and T2 without minutes - average score: 3384/3126/3890

Recall - average score: 0.0459/0.0424/0.0458


Precision - average score: 0.5216/0.5318/0.5389

In terms of the average scores in the T2 aggregate, the worst case turned out.

I opened the table of predictor significance and was surprised



Somehow the teaching method did not like the last changes, maybe I did not do something right?


I opened the sample



And saw that the column TimeHG - are hours - my mistake - I need to redo all the tests.


And the hours remained with the minutes.

 tmp[buf]=(double)(dts.hour*60+dts.min)*360.0/24.0;

It should be like this

 tmp[buf]=(double)(dts.hour)*360.0/24.0;
TimeHG - apparently took over everything, that's why the rest of the clock was not used.
 
elibrarius:

And the clock is left with the minutes

That's the way it should be.

Okay.

 

Training for 3 months - purely for fun tried it - training at the beginning. The whole schedule 2014-2020.

If you take a large period, you get a mediocre model. At the same time you can take different periods the size of 3 months for the entire period.

Here, for example, you can see the period of training - before and after the dynamics is positive.


The current futures

The mathematical expectation shows 6.15 on real ticks.

I took a newer model.


It is interesting that they are different, which gives the potential to combine them into a committee. The mathematical expectation is 12.64.

Below is a histogram with the estimated balance, including the training sample, depending on the training window number - the higher the number, the closer to our time, let me remind you that the sample is from 2014 to October 2020.

Interestingly, in some places, profits fall to almost half the maximum value. What could this mean - noisier sites on training?

 
elibrarius:

And the clock is left with the minutes

It should be like this

TimeHG - apparently took over everything, that's why the rest of the clock was not used.

Balance - T1 with minutes and T2 without minutes - average result: 4209.70/2882.50/3889.90


Recall - average result: 0.0479/0.0391/0.0458


Precision - mean score: 0.5318/0.5168/0.5389

Importance of predictors



On average, the option without minutes (T2) merges.

 
Aleksey Vyazmikin:

Balance - T1 with minutes and T2 without minutes - average score: 4209.70/2882.50/3889.90


Recall - average score: 0.0479/0.0391/0.0458


Precision - mean score: 0.5318/0.5168/0.5389

Importance of predictors



In terms of averages, the variant without minutes (T2) merges.

What is the conclusion?
Minutes give an increase.
About which is better, I did not understand - time as a sine and cosine or just as the numbers of the day, hour, minutes?
 
elibrarius:
What's the conclusion?
Minutes give an increase.
About which is better - time as sine and cosine or just as day, hour, minute numbers?

So far we can conclude that T2 is clearly a worse option, and that adding sine and cosine is not identical to linear time. I think the results are different because of the representation of the numbers, namely the distance between them. The different distance affects the formation of the clustering grid - hence the discrepancy.

 
Aleksey Vyazmikin:

So far we can conclude that T2 is clearly the worse version, and that adding sine and cosine is not identical to linear time. I think that the results are different because of the representation of the numbers, namely because of the distance between them. Different distance affects the formation of the clustering grid - hence the discrepancy.

Theoretically - should be the same for the tree.
The number of different choices in days, hours and minutes is equal to the number of choices in sines and cosines. Both there and there in 7 days are 10080 different values, with a change once a minute.
If there is any randomization in training, it may be because of this difference.

What did you train with, catbust?

For NS sine and cosine is of course better, because 59 and 1 minute will be close to each other, with a number representation they are as far away as possible. Tree, to understand it you need to do a couple of extra splits, maybe it is not enough for depth limitations.

Reason: