How to improve the quality of training in computer system? - General

СанСаныч Фоменко 2024.04.19 17:42 #34771

mytarmailS #:
There are three sections

Train - (in sample) where the model is trained

Validate - (in sample) where the performance of the trained model is evaluated and playing with hyperparameters + selection of the final model.

Test - (out of sample) completely new data for the model

Not exactly.

We have a file - file

indexes             <- createDataPartition(file......,  p = .70, list = F)

in sample = Train <- file[ indexes,] 

in sample = Validate <- file[ -indexes,] 

и

Test - (out of sample) полностью новые данные для модели

And another very important condition: the classification error at all three sites can NOT be very different. If it is different, it is overtrained and can be safely discarded.

mytarmailS 2024.04.19 18:25 #34772

СанСаныч Фоменко #:

Not exactly.

And what's the contradiction with what I wrote?

СанСаныч Фоменко 2024.04.19 18:34 #34773

mytarmailS #:
And what is the contradiction with what I wrote?

Train is a random selection from a file, and Validate is also a random selection from a file, but does not repeat Train

Aleksey Vyazmikin 2024.04.19 18:59 #34774

mytarmailS #:

who cares how much noise, noise has an average = 0.

So there should be no inverse correlation, but there is

By your logic, it should be like this

It's not like that at all.

Are the classes balanced? SL/TP is fixed? Spread, as I understand, is not taken into account?

Aleksey Vyazmikin 2024.04.19 19:32 #34775

fxsaber #:

The algorithm for finding the maximum FF was interrupted after computing 3000 FFs. Then sorting the 3000 results by FF value and running the best 20 of them on OOS. Among them, OOS passes sometimes 50%, sometimes 5% or 0%. This percentage does not tell us anything about the robustness of the TC. Because the search algorithm is monomodal.

Are you exactly generating the TC, or tuning the TC parameters?

Theoretically, robustness can be indicated by the stability of the result with minor changes of settings - as if compensation of similar but different new data.

Personally, when I use FF for optimisation, I evaluate the usefulness of a number of settings of some conditional filter, and if at most settings it gives improved results - then the idea is correct in it. Then you can take only part of the settings of this filter, and go to the settings of the next. And then, you can randomise the settings and filters from the list with favourite settings.

Although I don't know how and what is inside your system, maybe this method is not suitable. Andrei is a professional in these matters....

Elite indicators :) Algorithm Optimisation Championship. Trading Strategies Based On

fxsaber 2024.04.19 20:41 #34776

Aleksey Vyazmikin #:

Do you have exactly TC generation, or TC parameter setting?

Second.

Aleksey Vyazmikin 2024.04.19 20:45 #34777

fxsaber #:
Second.

Then the subsequent answer is relevant.

It all boils down to the fact that there will not be the same data, on which TC's attention is focused, at best - similar.

mytarmailS 2024.04.19 21:22 #34778

СанСаныч Фоменко #:

Train is a random sample from a file, and Validate is also a random sample from a file, but does not replicate Train

Where did I say that validate repeats train????

Maxim Dmitrievsky 2024.04.20 03:52 #34779

I've had your stationarity wobbled.

Valeriy Yastremskiy 2024.04.20 10:27 #34780

mytarmailS #:

why not 50/50?

Noise has an average = 0.

That's on an infinite plot. On real plots it is a random number. And not necessarily around zero)

Machine learning in trading: theory, models, practice and algo-trading - page 3478