Machine learning in trading: theory, models, practice and algo-trading - page 2426

 
mytarmailS:

Now the answer to the first question

Thank you!

I'll try to figure it out, but it's hard to get it right away, because the code syntax is quite different from C++.

 
elibrarius:

Don't you think you're tuning your model to the most successful version on the test?

At what point do you think I'm tuning to test? Sampling "test" is used to stop training, in Projects, except one, it does not exist at all, then I used it in the final training - well you can replace it with a fixed number of trees - 50/100/300/500/800 and see the result on all samples, then you believe will get significantly worse results?

 
Aleksey Vyazmikin:

Thank you!

I'll try to figure it out, but it's difficult at first, because the code syntax is quite different from C++.

A lot of unfamiliar functions because the language is high-level.
But what you write in 300 lines in C++ I can write in 3 lines)

 
Aleksey Vyazmikin:

And at what point do I tuning to test, in your opinion? Sampling "test" is used to stop training, in Projects, except one, it does not exist at all, then I used it in the final training - well you can replace it with a fixed number of trees - 50/100/300/500/800 and see the result on all samples, then you believe will get significantly worse results?

Yes - stopping training is also a test fit. I don't know other details of your system, I can't say anything more.
In crosvalidation all data is a test and all of them are also a train. It's just one at a time. You just wanted to increase the tray plot by 40%.
 
mytarmailS:
A lot of unfamiliar functions, because the language is high-level.
But what you write in 300 lines in C++, I'll write in 3 lines.)

I don't think you can easily implement all my twists in R :)

 
elibrarius:
Yes - the learning stop is also a test fit. I don't know other details of your system, I can't say anything else

I agree that in theory it increases the result on the test sample, but I'm evaluating the result on the exam sample!

Well, I thought I described all the details, if you have any questions ask me.

elibrarius:
In crosvalidation, all the data is a test and all of it is also a trayn. It's just one at a time. You just wanted to increase the tray plot by 40%.

Well, what is your purpose for using crosvalidation? I see its purpose so far as looking for hyperparameters of the model, since it will show on average which settings are best of all on random plots.

 
Aleksey Vyazmikin:

I don't think you can implement all my perversions easily in R :)

Ahahaha))))

If I can realize my perversions, then yours are like a rest)

 
Aleksey Vyazmikin:

Well, for what purpose do you use crosvalidation? I see its purpose so far as finding hyperparameters for the model, as it will show on average which settings are best of all on random plots.

That's exactly what it's for. Is there anything else you need? And a specific set of features. With different features, the hyperparameters are likely to be different. So those you pick with the best hyperparameters should be used.

Aleksey Vyazmikin:
Well, I think I have described all the details, if you have any questions.

I am too lazy to go into details.

 
mytarmailS:

AHAHAHAHH))))

If I can implement my own perversions, yours how to rest)

Well, here I am, since making a script to prepare the data, I still need to make a file listing the excluded columns, which include:

1. Columns with correlated predictors (by the way, how do you choose which column to discard, say 5 correlated predictors?).

2. Columns discarded from the first file-table, except for the column with the target.

Plus the column with the target label should be written into the file, preferably searched by the name of the column.

The structure of the file is

5336    Auxiliary
5337    Auxiliary
5338    Label
5339    Auxiliary
5340    Auxiliary
 
elibrarius:

That's exactly what it's for. Is there anything else you need? And a specific set of traits. With different signs, the hyperparameters are likely to be different. The ones you select with the best hyperparameters are the ones you need to put to work.

I'm too lazy to go deep.

I need to select the right predictors in less time. To go through the predictors again is to increase the processing time by hundreds of times. My method is based on the logic that a good predictor (including one suitable for a particular learning method) will be demanded by the model at all sample intervals, which eliminates fitting to a sample area.

Reason: