Machine learning in trading: theory, models, practice and algo-trading - page 2381

 
Maxim Dmitrievsky:

catbust has rather strong regularization, moreover, if signs are categorical, you should declare them so in the boost

No improvement was made by reducing L2 regularization. So Lasso is better.

 
elibrarius:

Maybe it's just a good piece of the test sample. And you make a fit to it, choosing the model with the best parameters for it.

I now always crossvalidation (or valving-forward) check, there is no fitting to a small section, but at once to all data, I think this is the best training option.
Doc also before disappearing from the forum advised to check it.

First of all, I can't adjust Lasso, so there's no adjustment at all, it's just the way the parameters are.

Secondly, it's the same plot with CatBoostom - and there are 800 models to choose from and I took almost the best options.

I've attached a file - try for yourself different models, Lasso recommended just for binary sampling - that's the trick.

 
Aleksey Vyazmikin:

First, I do not know how to set up Lasso, so there is no fitting there at all - as the parameters are so and so it turns out.

Secondly, it's the same site with CatBoostom - and there are 800 models to choose from and I took almost the best options.

I've attached the file - try for yourself different models, Lasso recommended just for binary sampling - this is the trick.

Try it as is for cross validation. Cycle 10 times with different unknown plots for 1/10 of the total data. This will be the best estimate for selecting catbust with some parameters or lasso with default parameters.

 
Maxim Dmitrievsky:


try the same way. In the custom tester worked well, when exporting the model is a problem, later I'll look for the error.

If MA is involved in training, should it not be when applying the model?

The essence of MAShka in the markup type - above only one class, and below only the other?

 
elibrarius:

Try as is for crossvalidation. Cycle 10 times with different unknown plots of 1/10 of the total data. This will be the best estimate for selecting a catbust with some parameters or a lasso with default parameters.

Binarization is done by a particular sampling estimation method, so cross validation will show better results on the main sampling plots.

Cross validation is not quite relevant for samples that are bound to time, but in case of trading it is - the market gradually changes and the model must find stable patterns in time, and in case of cross validation the time interval of training and checking can be near or fractured from the trained sample.

Right now I actually have CatBoost trained on 60% of all data - 20% is stop control and the last 20% is model evaluation.

If we're talking about 10% for training, that's too small a sample.
 
Aleksey Vyazmikin:

Cross validation is not quite appropriate for samples that are bound to time, while in the case of trading there is such a binding - the market gradually changes and the model must find stable patterns over time, and in the case of cross validation the time interval of training and checking can be near or crushed from the trained sample.

You're talking about some kind of standard/ancient cross validation.
First, do not shuffle the lines and take blocks as is 0-90 training 90-100 test, then 10-100 training, 0-10 test, then 20-100-10 training 10-20 test, etc..
Secondly, according to Prado's advice you need to leave a space (pruning) between train and test, so neighboring examples from train and test don't get into work. Example from train adjacent to 10-100 examples from the quiz will be a hint / peek. Read more here https://dou.ua/lenta/articles/ml-vs-financial-math/
Or here's a picture:

Aleksey Vyazmikin:

Now my CatBoost is actually trained on 60% of all data - 20% of it is stop control and the last 20% is model evaluation.

If we're talking about 10% for training, it's too little for sampling.
You can do 20% or as much as you want.

And finally, instead of crossvalidation, you can use a rolling forward. Which doesn't take the test section all the way around, but only in front.
Машинное обучение против финансовой математики: проблемы и решения
Машинное обучение против финансовой математики: проблемы и решения
  • dou.ua
Всем привет! Так получилось, что я уже около семи лет занимаюсь машинным обучением. В последние несколько из них я как исследователь и CTO Neurons Lab часто работаю с финансовыми данными в рамках проектов, связанных с инвестиционным менеджментом и алгоритмическим трейдингом. Чаще всего клиенты приходят с текущими стратегиями, которые нужно...
 
Maxim Dmitrievsky

What does shuffle do?Usually if it's False, the results are much worse than when it's True.

train_test_split(X, y, train_size = 0.5, test_size = 0.5, shuffle=True)
 

A picture with an explanation of the valking forward.

 
Evgeni Gavrilovi:

What does shuffle do?Usually if it's False, the results are much worse than when it's True.

shuffles examples for the train and the test, so they don't go sequentially

 
Maxim Dmitrievsky:

shuffles the examples for the trane and the test so that they don't go sequentially

randomly? i.e. as stated here? the test on a random sample of 50%

Reason: