Machine learning in trading: theory, models, practice and algo-trading - page 1308

 
elibrarius:

ISOs aren't released for everything). In this case, you can go by what the MO covenants call the 2nd section.

In the catbust you are using, even though they wrote test, they write in the explanation that it is used for validation. In other packages XGBoost, Darch - they write validation.

Initially there was a test and training set, the crossvalidation method appeared, and such sample was called validation sample (it is actually used for cross training and testing). Now there is a boosting, which needs sample to stop training - it is called a test sample, and it is also validation due to its application to check training results, but there is no training on it, unlike crossvalidation.

My point is that sampling can be used differently in different training methods. Validation is more of an action than a type of sampling...

 
Vladimir Perervenko:

Validation set participates in training. It is used to set model parameters during training. In some packages the validation set is not required, in this case the training set is divided into train/valid in some proportion in the fit() function itself. But it is better to set it yourself.

The test set is used to check the quality of the trained model and this data should not be seen by the model during training.

So these are still different things, no need to confuse them.

Good luck

Ok, let it be so. I have no statistics on the statements of hundreds of people participating in the creation of different methods of MO, no desire to argue, since initially I spoke about how convenient for me to separate concepts in my mind, and if it is not convenient for others, then let me be alone with my concepts.

 
Aleksey Vyazmikin:

Well, so be it. I have no statistics on the statements of hundreds of people involved in the creation of different methods of MO, no desire to argue, because originally I spoke about how convenient it is for me to separate concepts in my mind, and if it is not convenient for others, then let me be alone with my concepts.

Yeah, the topic is already pretty cluttered, and now everyone has to invent and own terminology:)

Although specifically on the name of the data samples, I think there is no point to argue because there are all sorts of methods for their formation and use, and essential, IMHO, remains only one fact - whether these data participated (In-Sample) or not participated (Out-Of-Sample), in the learning process.
Because all IS samples are used in one way or another to fit the model, and OOS only to assess its quality.


And that there would be an unambiguous understanding, I think it would be logical to present the results in the usual form for the tester, where all the samples that were used in training - IS represent as a backtest, and OOS as a forward.

 
Ivan Negreshniy:

Yes, the topic is already pretty cluttered, and now everyone has to make up his own terminology:)

Although specifically on the name of the data samples, I think there is no point to argue because there are all sorts of methods for their formation and use, and essential, IMHO, remains only one fact - whether these data participated (In-Sample) or not participated (Out-Of-Sample), in the learning process.
Because all IS samples are used in one way or another to fit the model, and OOS only to assess its quality.


To have a clear understanding, I think it would be logical to present the results in the usual form for a tester, where all samples that were used in training - IS to represent as a backtest, and OOS as a forward.


It is better to show separate graphs, because the sample that did not participate in the training is usually much smaller than the one that participated and visually nothing is clear on such a ragged graph, that's for me personally.

 
By the way, Catbust has crosswadding - then it does not need the "test" key, but uses a single sample, which is broken down in different ways.
 
Aleksey Vyazmikin:
By the way, Catbust has cross-wadding - then it does not need the "test" key, but uses a single sample, which is broken down in different ways.

Scientists work with such things, but they also do not understand what happens in neural networks, much less in forests, how and for what reason everything there is exactly as it turns out, where what changes at what moment and why, we are left only to trust their authority and apply their models, trusting in higher forces.

 
Kesha Rutov:

Scientists work with such things, but they also do not understand what happens in neural networks, much less in forests, how and for what reason everything there is exactly as it turns out, where what changes at what moment and why, we are left only to trust their authority and apply their models, trusting in higher forces.

Forests/trees you clearly haven't dealt with. Their solutions are easily interpreted by humans. Any basic article on the tree algorithm in a couple of pages would explain it to you.
 
Aleksey Vyazmikin:

Well, let it be so. I have no statistics on the statements of hundreds of people involved in the creation of different methods of MO, no desire to argue, because originally I spoke about how convenient for me to separate concepts in my mind, and if others are not comfortable, then let me be alone with my concepts.

Stubbornness is close in meaning to stubbornness. I hope they help you get to the point of successfully translating your ideas into MO. These are useful qualities for researchers. ;-)

PS I came up with a name for your leaf selection system: "Herbarium" - add to your collection of methods from trees, forests, stumps, jungles.
 
elibrarius:

PS I came up with a name for your leaf selection system: "Herbarium" - add to your collection of methods from trees, forests, stumps, jungles.

))) I would call it Lumberjack or Sawmill.

 
Kesha Rutov:

Scientists work with such things, but they also do not understand what happens in neural networks, and even more so in forests, how and for what reason everything there is exactly as it turns out, where what changes at what moment and why, we can only trust their authority and apply their models, trusting in the higher forces.

I partly agree, this is the era of fast computing, and while people used to do calculations on paper before they had access to a computer, now the amount of information and the methods of processing it are so great that it's often more appropriate to focus on the result rather than the process.