Machine learning in trading: theory, models, practice and algo-trading - page 1203

 
Aleksey Vyazmikin:

Thank you. Randomize with the same values that the predictor had in the sample, right?

In general, the approach is clear, thank you, I need to think about how to implement and try it out.

Alas, I will not be able to master it, so I'll listen to a retelling from your lips on the occasion.

No, to randomize at all left-handed, i.e. to completely clean out the predictor values and shove white noise there

then shove the original values back in when you go to check the next

roughly speaking, put white noise in place of each predictor, one at a time. That's probably easier to understand.

Important condition: the perdictors must not correlate, otherwise you will get gibberish with errors. For this I first transformed through PCA, but it is possible to make a correlation matrix and remove all strongly correlated ones. There is another mechanism, but it is complicated.
 
Maxim Dmitrievsky:

No, randomize at all by left-handedness, i.e. completely clean the predictor values and shove white noise in there

then shove the original values back when you go to check the next

If just noise, then we'll break split at all, for example there's a split with "over 100" rule, and we'll add a random from 0 to 99, then further splitting won't be active anymore, and it's probably important to see how further splitting will work if one of the list rules falls out...

 
Maxim Dmitrievsky:


Important condition: Predictors must not correlate, otherwise you will get gibberish with errors... For this I first transformed through PCA, but it is possible to create a correlation matrix and remove all strongly correlated ones. There is another mechanism, but it is complicated.

What kind of correlation is acceptable? After all, good predictors must correlate with the target, which means that they will correlate with each other to some extent...

 
Aleksey Vyazmikin:

If just noise, then we break the split at all, for example there is a split with the rule "over 100", and we shove a random from 0 to 99, then further splitting will no longer be active, and it is probably important to see how further splitting will work when one of the sheet rules falls out...

so the error will drop a lot and everything will be fine, importance is low. Do not get into models, how do you know how the trees are split, and each of them differently with a different number of features. It always looks like the hospital average.

 
Aleksey Vyazmikin:

What kind of correlation is acceptable? After all, good predictors must correlate with the target, which means they will correlate with each other to some extent...

this is heresy for linear regression with one predictor, in non-linear models nothing must correlate with the target, especially if it is a classification

I don't know which one is acceptable, it's hard... or experimentally. It's easier to use PCA in this sense, of course.
 
Maxim Dmitrievsky:

So the error will drop a lot and everything will be fine, imports are low. Do not get into models, how do you know how the trees are split, and each of them differently with a different number of features. You always look at the hospital average.

Then you can just null the value or replace it with any other value - the same random, but it seems to me that this is not logical... In general, if I can implement it, I will try two variants.

Maxim Dmitrievsky:

this is heresy for linear regression, in non-linear models nothing should correlate with the target

Well, what's the argument that if there's a correlation with the target, then the predictor is bad?

 
Aleksey Vyazmikin:

Then you can reset just the value to zero or replace it with any other value - the same random, but it doesn't seem logical to me... In general, if I can implement it, I will try two options.

Well, what can be the argument that if there is a correlation with the target, then the predictor is bad?

I think it's more like a trifle.

I'm not talking about one, I'm talking about when there are a lot of them and the importance is approximately the same, because the correlation between them is strong. It turns out, that removing one strong feature at rearrangement the model error won't fall, because there will be similar features with the same importance, and none of the strong features will be recognized. That's why you should either randomize all correlated features at once (which is harder to implement) or take care not to strongly correlate anything

 
Maxim Dmitrievsky:

as you want so do, the main principle of shifting the chip, it seems to me, it's most likely a trifle

I'm not talking about one, but when there are a lot of them and the importance is about the same, because the correlation is strong. So, removing one strong feature during shuffling won't make model error decrease, because there will be similar features with the same importance and none of the strong features will be recognized

That's how model should build predictors to build symmetric trees - because without re-training it's unlikely, in my opinion, since it makes no sense when creating model.

So what correlation is acceptable?
 
Aleksey Vyazmikin:

It is still the model must be so predictors to build symmetric trees - because without re-learning it is unlikely, as it seems to me, because it makes no sense when creating the model.

it works fine in the case of the forest, in the case of catbust you need to read, I do not remember how it works. Maybe it has a good imports itself, because of the structure of the model itself

I don't know what is acceptable, put some threshold and see. +- very little will change in the model. The fact that the boosting does not work like RF, maybe there is a clear importance from the very beginning

or if you are sure that the features are heterogeneous and do not correlate then forget to try this step

these are all important things, especially if you have a lot of chips and you need to cut the crap out of the model, but not so much that you have to worry about every % of correlation, I think. in the range of -0.5; 0.5 is probably normal.

I'll make such a variant myself later and see.

 
Maxim Dmitrievsky:

it works fine in the case of the forest, in the case of catbust you need to read, I do not remember how it works. Maybe it has a good implementation of imports itself, because of the structure of the model itself

I don't know what is acceptable, put some threshold and see. +- little will change in the model. The fact that the boosting does not work like RF, maybe there is a clear importance from the very beginning

or if you are sure that the features are heterogeneous and do not correlate then forget to try this step

these are all important things, especially if you have a lot of chips and need to cut the crap out of the model, but not so much that you have to worry about every % of correlation, I think. in the range of -0.5; 0.5 is probably normal.

I'll make such a variant myself later and see.

I see, I have to try it. I just want to see leaves for correlation and maybe catbust models. I know for sure that model-pairing is possible, but I should do everything more reasonable and the detected correlation will reduce the number of iterations for model-pairing.