Machine learning in trading: theory, models, practice and algo-trading - page 1778

 
Dmitry:

And the ability to predict is determined by how?

Well, not by correlation...

Maybe by cross-correlation through lag estimation...

Dimitri:

By dumbly shoving everything in the world into the model?

Why not? Crossvalidation will weed out what you don't need, or use statistics...

How do you know "what is what" until you check it?

 
Aleksey Vyazmikin:

You never said how to trade on it - so I don't know what kind of TS you need to make up.

How? It's obvious)) ZZ up means buy, down means sell.

You`re predicting the ZZ direction, aren`t you?

 
mytarmailS:

Well, not by correlation...

Maybe by cross-correlation through lag estimation...

And why not? In training crossvalidation will sift out what is not necessary, or some statistics...

How do you know "what is what" until you test it?

Well, I would talk at length about the problem of redundancy, especially relevant to the NS, but I'm lazy.

By the way, this problem is often the reason for the weak predictive ability of the model

 
mytarmailS:

How? It's obvious)) ZZ up means buy, down means sell.

You're predicting the direction of ZZ, aren't you?

That would turn out to be a twitch, probably.

Have you tried to average/smooth out the classification indicator with the window to exclude outliers?

 
Aleksey Vyazmikin:

That would get twitchy, probably.

Have you tried averaging/smoothing the classification index with the window to eliminate outliers?

In this case, averaging equals lag. You need to improve the quality of classification, smoothing is not an option.

Try it as it is!

Dmitry:

Well, I would talk at length about the problem of redundancy, especially relevant to NS, but I'm lazy.

By the way, this problem is often the reason for poor predictive ability of the model

That's why I'm thinking in this direction, signs can be already trained AMO, or working rules, those signs should already be qualitative, compressed information and my mini experiment on the previous page proved it.

I don't understand how to predict correlation, though(

 
mytarmailS:


And how to predict by correlation I still do not understand(

Forecasting again....

The correlation coefficient helps to determine the most significant predictors in advance - the higher the correlation between the dependent variable and the predictor, the more significant that variable is to the model.

So, in your example, there are two ways. The first, yours, is to substitute one predictor at a time into the model and see how much the prediction accuracy improves. That's a long time.

The second, use the correlation coefficient to weed out unimportant predictors in advance that noise up the model.


Simply the problem of redundancy is that you can add 100+1 new predictors to the model, but 100 predictors will add 0,01% to the forecast quality, while 1 will add 10%. And there's no point in overloading the model with those 100 new predictors - overfitting

 
mytarmailS:


And by the way, on a lot of predictors the tree is crap, the random forest rules

 
Dmitry:

Forecasting again....

The correlation coefficient helps to identify the most significant predictors in advance - the higher the correlation between the dependent variable and the predictor, the more significant that variable is to the model.

So, in your example, there are two ways. The first, yours, is to substitute one predictor at a time into the model and see how much the prediction accuracy improves. That's a long time.

The second, use the correlation coefficient to weed out unimportant predictors in advance that noise up the model.

Well, correlation is only one of the options for sifting it out, and it is definitely not the best... You can also use cointegration, cross-correlation, non-linear correlation etc... and it will be even better, but all of them are hierarchically lower than simple error of classification, that is why I chose criterion as error of prediction for a trait

Dmitry:

And by the way, on a large number of predictors the tree is crap, random forest rules

I partially agree, but if you think bigger, the forest is the same rule, the only difference is complexity.

There is a package in R, which can compress the forest of 200 trees into one or three rules by removing all unnecessary and redundant, the loss of quality of classification is 0,5-2%, this is a compression of information which should seek + interpretability

 
mytarmailS:

In this case averaging equals lag. It is necessary to improve the quality of classification, smoothing is not an option.

Try it as it is!

It is not an option. There is too much jamming in the flat.

Of course it is possible to shift the activation threshold 0.65 - buy, 0.35 - sell.


 
Aleksey Vyazmikin:

This is not an option. There is too much hoarding in the flat.

You can, of course, shift the activation threshold to 0.65 - buy, and 0.35 - sell.

Show me a chart with trades.