Machine learning in trading: theory, models, practice and algo-trading - page 1302

 
Maxim Dmitrievsky:

What pleases me the most is the large number of "predictors. Where would it even come from in the quotes? It's 90% garbage.

Everyone describes his illusion in different ways, and the illusion of the one who has a lot of money in a moment of time works. That's why it's true, there can be many predictors, I don't see any contradiction here, it's like bushes consisting of branches and leaves, but someone may trim them to different intricate figures that cause different reaction of contemplators.

 
Aleksey Vyazmikin:

Everyone describes his illusion in different ways, and the illusion of the one who has a lot of money in the moment of time works. So really, there can be many predictors, I do not see any contradiction here, it is like bushes consisting of branches and leaves, but someone comes to mind to trim them to different intricate figures, causing a different reaction from contemplators.

Well, to each his own, I'm struggling with such scrupulousness, in any case, the fitting that so what it was, the main thing that would work for a while

it turns out that if you find an optimal combination of inputs/outputs then 4 predictors are enough

in short, a compromise between efficiency and time is needed

 
Maxim Dmitrievsky:

Whichever way you dig, you will find some illusory "regularities" everywhere, they can be found in any phenomenon

What pleases me the most is the large number of "predictors". Where does it come from in the quotes? It's 90% garbage.

Exactly, it's garbage. And each indicator taken separately is approximately 50/50, and, moreover, has a very narrow operating range - where its readings really make sense.

But taken together... They already limit the applicability area of other indicators, defining the area of N-dimensional space, where their readings together already make sense. It seems to be called a fashionable word - synergy.)

As I see it, it needs about 7-8 indicators-predictors. The only problem is that they should not measure the same thing.)

 
Maxim Dmitrievsky:

Well, to each his own, I'm concerned about such scrupulousness, in any case, the fitting that so what is the main thing that would work for some time

it turns out that if you find an optimal combination of inputs/outputs then 4 predictors are enough

In short, a compromise between efficiency and time is needed

That's the point, the main thing is to make it work...

And yet, so far it turns out that:

1. the large model will overtrain because of the memory effect

2. The better the rule (list/binary tree) worked on the history, the fewer chances it has in production

Otherwise you will get such "grails" with high accuracy and high return on history

And on the examination sample (available on the chart) for the year 1000 profit only (and the drawdown of funds is about the same), and the accuracy falls to 58%.

The tests were done with a 1/0 split activation at "probability" 0.6, and at probability 0.5 the profit is around 5000 on the off-training period, but the test period is around 57 and the chart walks more, has less accuracy.

Does it mean that a very good reading on the training period is the guarantee of retraining?

 
Aleksey Vyazmikin:

but on the test period in the neighborhood of 57

Does this mean that a very good reading on the training period is a guarantee of overtraining?

Accuracy 57% on the test is very good, even too good, but yes, the more difference between the results on the firn and the test, the higher is the probability of overfitting.

 
The Grail:

Accuracy 57% on the test is very good, even too good, and so yes, the more different the results on the firn and the test, the more likely the overfits.

So I'm starting from the fact that you don't know the future, and no one can tell me that you'll do well on a sample outside of training... which is why I'm looking for some kind of correlation.

And what about accuracy (it's not Accuracy, because it doesn't take into account missed entries, those that are classified 0 when they should have been 1), then it's also not clear, because profits don't equal losses - may be more profits than losses, or vice versa. It turns out that, yes, the model seems to work, but the income does not bring.

 
Aleksey Vyazmikin:

That's the point, the main thing is to make it work...

And yet, so far it turns out that:

1. the large model will overtrain because of the memory effect

2. The better the rule (list/binary tree) worked on the history, the fewer chances it has in production

Otherwise you will get such "grails" with high accuracy and high return on history

And on the examination sample (available on the chart) for the year 1000 profit only (and the drawdown of funds is about the same), and the accuracy falls to 58%.

The tests were done with a 1/0 split activation at "probability" 0.6 and at probability 0.5 the profit is around 5000 on the period outside the study, but on the test period around 57 and the chart walks more, has less accuracy.

Does this mean that a super good reading on the training period is the key to overtraining?

As a rule, yes.

The more signs, the more overtraining
 
Grail:

Accuracy 57% on the test is very good, even too much, and so yes, the more the difference between the results on the firn and the test, the higher the probability of overfitting.

Random plus 7% nerandom is bad, but it's better than random.

no, it's not bad... it's disgusting, it's not a model at all.

everyone learn the basics of machine learning and terver urgently

Especially if the graph gro up at 57% then you can immediately look at it as overtraining, a priori, and do not analyze anything further
 
Maxim Dmitrievsky:

Random + 7% of nerandom is bad, but it's a little better than random.

no it's not bad... it's disgusting, it's not a model at all

Everyone learn the basics of machine learning and terver urgently.

What is the accuracy of your models right now outside of training? And with what period, how does this figure drop (change)?

I have a period outside of training of 10 months.

 
Aleksey Vyazmikin:

What is the accuracy of your models now outside of training? And with what period of time, how does this figure fall (change)?

I have a period outside of training of 10 months.

10% error per test and trace for ~10k examples, increases smoothly with increase

with such an error the models started to work on the new data

it's different for validation, i need to try many variants

Algorithms are no longer revealed, just communicating
Reason: