Machine learning in trading: theory, models, practice and algo-trading - page 3478

 
mytarmailS #:
There are three sections

Train - (in sample) where the model is trained

Validate - (in sample) where the performance of the trained model is evaluated and playing with hyperparameters + selection of the final model.

Test - (out of sample) completely new data for the model

Not exactly.

We have a file - file

indexes             <- createDataPartition(file......,  p = .70, list = F)

in sample = Train <- file[ indexes,] 

in sample = Validate <- file[ -indexes,] 

и

Test - (out of sample) полностью новые данные для модели

And another very important condition: the classification error at all three sites can NOT be very different. If it is different, it is overtrained and can be safely discarded.

 
СанСаныч Фоменко #:

Not exactly.

And what's the contradiction with what I wrote?
 
mytarmailS #:
And what is the contradiction with what I wrote?

Train is a random selection from a file, and Validate is also a random selection from a file, but does not repeat Train

 
mytarmailS #:

who cares how much noise, noise has an average = 0.

So there should be no inverse correlation, but there is


By your logic, it should be like this


It's not like that at all.

Are the classes balanced? SL/TP is fixed? Spread, as I understand, is not taken into account?

 
fxsaber #:

The algorithm for finding the maximum FF was interrupted after computing 3000 FFs. Then sorting the 3000 results by FF value and running the best 20 of them on OOS. Among them, OOS passes sometimes 50%, sometimes 5% or 0%. This percentage does not tell us anything about the robustness of the TC. Because the search algorithm is monomodal.

Are you exactly generating the TC, or tuning the TC parameters?

Theoretically, robustness can be indicated by the stability of the result with minor changes of settings - as if compensation of similar but different new data.

Personally, when I use FF for optimisation, I evaluate the usefulness of a number of settings of some conditional filter, and if at most settings it gives improved results - then the idea is correct in it. Then you can take only part of the settings of this filter, and go to the settings of the next. And then, you can randomise the settings and filters from the list with favourite settings.

Although I don't know how and what is inside your system, maybe this method is not suitable. Andrei is a professional in these matters....

 
Aleksey Vyazmikin #:

Do you have exactly TC generation, or TC parameter setting?

Second.
 
fxsaber #:
Second.

Then the subsequent answer is relevant.

It all boils down to the fact that there will not be the same data, on which TC's attention is focused, at best - similar.

 
СанСаныч Фоменко #:

Train is a random sample from a file, and Validate is also a random sample from a file, but does not replicate Train

Where did I say that validate repeats train????
 
I've had your stationarity wobbled.
 
mytarmailS #:

why not 50/50?

Noise has an average = 0.

That's on an infinite plot. On real plots it is a random number. And not necessarily around zero)

Reason: