Machine learning in trading: theory, models, practice and algo-trading - page 1354

 
Yuriy Asaulenko:

Keep the archives. See attachment.

Learn.csv - inputs. The very first digit in each line is history binding, it should be removed.

Cell.scv - target.

This should produce the following chart after learning from the data.

The filter is approximately equal to EMA(16) and the forecast is 5 min.

I will do the test later, when I need it.

It's not quite clear, you got this graph from which sample - is it from training or from the test?

Here is the CatBoost on the test - the last 100 values.

Histogram of deviations.

I took 4000 for training, 2000 for validation, and 100 lines for the test. I trained 1000 trees of depth 6, RMSE formula (replaced byPoisson).

Attached sample and settings, for playback you need to download CB and put it in the directory Setup.

On the training sample also the distribution does not look like yours

Added: I use wrong model - I've got on probability graphs...

Files:
Setup.zip  587 kb
 
Aleksey Vyazmikin:

It is not quite clear, you got this graph on what sample - it is on the training or on the test?

Here is the CatBoost on the test - the last 100 values.

Histogram of deviations.

I took 4000 for training, 2000 for validation, and 100 lines for the test. I trained 1000 trees of depth 6, RMSE formula.

Attached sample and settings, to play you need to download CB and put it in the directory Setup.

On the training sample also the distribution does not look like yours

My graph is only training on the whole sample. I didn't do a test on this one. It will be about identical to the training.
Where are the negative values on the x-axis in the graph? And the range of x values is not the same as y? How's that?
I have a graph comparing predict and real values (target). No distributions.
 
Yuriy Asaulenko:
My graph is only training on the whole sample. I have not done the test on this one. Will be approximately identical to the training.
Where are the negative values on the x-axis in the graph? And the range of x values does not coincide with y? How's that?
I have a graph comparing predicts and real values.

Yes, I haven't done regression before, there are a lot of unclear fitness functions, as opposed to classification, give different results, and I took the wrong value.

Here on the test sample it turned out

And here is the training sample of 4,000 lines.

Histogram of deviations for the test sample

Here is the overall graph for the 3 samples

The metric that was used to train the test sample

He says that I could have stopped training at 250 iterations and the model is retrained.

 
Aleksey Vyazmikin:

Yes, I did not do regression before, there are a lot of unclear fitness functions, unlike the classification, give different results, and I took the wrong value.

I got it on the test sample.

And here is the training sample of 4,000 lines.

Histogram of deviations for the test sample

Here is a general graph for three samples

Seems to be OK. On the test, too. though retrained).
 
Yuriy Asaulenko:
Seems OK.

Well, yes, if you want, you can improve - I just have no experience with regression models.

So the main predictors are working tools :)

I attached final version with settings - it trains 10 models with different Seed

Files:
Setup.zip  588 kb
 
Aleksey Vyazmikin:

Well, yes, you can improve it if you want - I just don't have any experience with regression models.

So the main predictors are working tools :)
The input is a scaled price series. - The input is 20 close values and that's it. The problem is not about the predictors, but about the formulation of the problem - it is solvable. And your forest will come up with the predictors).
 
Yuriy Asaulenko:
The input is a scaled price series. - There are 20 close values and that's it. The problem is not about predictors, but the formulation of the problem - it is solvable. And your forest will come up with predictors by itself).

Yes, it's about the problem statement, I agree. I just do not consider the price as a dough from which pies are molded, and for the shape of these pies we need predictors.

 
Maxim Dmitrievsky:

One of the classic techniques that can improve the model. Or rather, find the optimal one. The original application of Monte Carlo.

https://en.wikipedia.org/wiki/Importance_sampling

Isn't this the method you used in your article?

 
Maxim Dmitrievsky:

For off-policy (policy gradient) RL

https://medium.com/@jonathan_hui/rl-importance-sampling-ebfb28b4a8c6

Can you explain in Russian, in your own words, what the idea is? In English, so to speak.)

 
Yuriy Asaulenko:

The LPF filter we have predicted quite successfully. Even now the two of us, not even just the NS, but the forest. Now let's try to predict the price, which is a pointless thing to do at all). It would be better to predict the LF component of the expected change in the price expectation, which (expectation) is unknown in the present. And here in the conditions of all sorts of movements, HF fluctuations and everything else.

I obtained the following: the forecast time is 5 ms on the timeframe of 1 meter.

As usual: x is the forecast, y is the real value. Well, a 45 degree slanted rectangle reminds me of a circle, thank God it's not a circle. If you move a little to the right-left on x from zero, you can even play with a probability of a little over 50% (see areas).

Of course, it would be nice to build all sorts of regression lines, distributions, but it is necessary to do slices, at least a few - that's for later.

PS Well, and a forecast with a slightly modified algorithm. Same 5 min at timeframe 1m.

It is already much better.) Starting from the forecast >2 and < -2 by X, loss-making trades are hardly expected if we simply close in 5 minutes.

The second picture is really good! What changes in the algorithm made it possible?